Executing SQL Statements on Postgres DB

Hey guys,

I am thinking about executing a SQL query on my postgres DB that is part of my Baserow app to remove duplicates according to some non-trivial logic as it would be very inefficient to implement using the Baserow API.

Is this a good idea or will Baserow throw errors? I will only remove rows - no removing or renaming of columns.

Please let me know if there is anything I need to keep in mind.

Best
Daniel

UPDATE:
Let me explain my situation in a bit more detail:

My table looks vaguely like this:
id | email | name | color | size | created at | Link Col

Unfortunately, I have a few duplicates that I want to clean out:

id | email | name | color | size | created at | Link Col
2 | peter@me.com | Peter | Blue | L | 2023-07-02 | null
1 | peter@me.com | Peter | Red | XL | 2023-05-27 | some_row_id

I would like to coalesce and string_agg some of the values but retain all columns while prioritizing earlier data as such:
id | email | name | color | size | created at | Link Col
2 | peter@me.com | Peter | Blue, Red | L | 2023-07-02 | some_row_id

Plan A would be to execute a SQL script that updates the table completely in one go. Based on what Alex wrote, the biggest uncertainty would probably be how to handle the link column.

Plan B would be to execute a select query that identifies rows to update as well as their new values and a second select query that identifies which row to delete and than iterate over both lists of items and call the baserow api.

Based on what @Alex wrote I should probably go for plan B here.

Hey @Daniel,

Executing raw SQL in the DB is considered very risky. We can’t say for certain that you won’t destroy something since it heavily depends on the query you will execute.

If your data is important to you I would recommend to make a backup of the database first before executing any query directly.

That being said, it is absolutely possible to do this without destroying your application.

Would you mind giving us more detail on how your table is structured and the query you intend to execute? How safe it is may depend on if you have any relationship to other tables and some other factors as well.

Hey Alex,

Thanks for getting back to me so quickly. I updated my question to add more details.

Cheers
Daniel

Thanks for providing the additional details!

Plan B sounds much safer, running a select query to identify ids should have no impact on the data directly while calling the endpoint with the ID makes sure that every side affect a deletion might have is taken into account.

If it’s not too much work for you I would 100% agree with your assessment of plan B.

Once again a gentle reminder to backup your data just in case something goes wrong :slight_smile:

Thanks for the support @Alex. Will definitely back my data up prior to this!

Have an awesome day!

You too, and best of luck for the execution! :slightly_smiling_face: