Hey guys,
I am thinking about executing a SQL query on my postgres DB that is part of my Baserow app to remove duplicates according to some non-trivial logic as it would be very inefficient to implement using the Baserow API.
Is this a good idea or will Baserow throw errors? I will only remove rows - no removing or renaming of columns.
Please let me know if there is anything I need to keep in mind.
Best
Daniel
UPDATE:
Let me explain my situation in a bit more detail:
My table looks vaguely like this:
id | email | name | color | size | created at | Link Col
Unfortunately, I have a few duplicates that I want to clean out:
id | email | name | color | size | created at | Link Col
2 | peter@me.com | Peter | Blue | L | 2023-07-02 | null
1 | peter@me.com | Peter | Red | XL | 2023-05-27 | some_row_id
I would like to coalesce and string_agg some of the values but retain all columns while prioritizing earlier data as such:
id | email | name | color | size | created at | Link Col
2 | peter@me.com | Peter | Blue, Red | L | 2023-07-02 | some_row_id
Plan A would be to execute a SQL script that updates the table completely in one go. Based on what Alex wrote, the biggest uncertainty would probably be how to handle the link column.
Plan B would be to execute a select query that identifies rows to update as well as their new values and a second select query that identifies which row to delete and than iterate over both lists of items and call the baserow api.
Based on what @Alex wrote I should probably go for plan B here.