What is the recommended way to insert a new column & data into a table with 20k+ records

0xJJW · July 10, 2023, 12:54pm

I have a customer who needs to add a new column to a relatively large table with over 20,000 rows.

The table has numerous linked records, so it’s more complex than exporting and reimporting the table, as the references will be lost.

Is there any workaround or recommendation for solving this issue?

Environment: Hosted
New column data type: String

Many thanks,
Joel

nigel · July 10, 2023, 1:02pm

Is the goal to add a new column, but with some predefined data for that new column and not just a blank new one?

0xJJW · July 10, 2023, 1:13pm

Yes, all of the records will have a unique value.

If there weren’t so many records, I would write a script to update the records via the API, but I assume this would go against the TOS as I am on the shared hosting plan.

Additionally, I would like to make the column the Primary column on the table and set the current primary column as a ‘normal’ column.

Thank you for the swift response.

nigel · July 10, 2023, 1:20pm

Right now the easiest way to do this would be using our batch update API endpoint and a script. It’s not against the TOS to use our API in a sensible way, i’d suggest adding a couple seconds sleep between batch update calls to be safe and go for it.

Otherwise your only manual option is to make a blank new column and then to copy and paste in the values 200 rows at a time, which is not ideal.

We have a prioritized issue to add the ability for the importer to also do an UPSERT based on some key column, partially tracked here: Introduce endpoint to upsert a row in a table (#1395) · Issues · Baserow / baserow · GitLab so hopefully this will become much easier in the near future. We are also alternatively going to add support for pasting in more than 200 values at once.

Separately on making this new field the primary key, it isn’t yet possible to swap which field is the primary field, you can see the issue tracking this here: Allow changing the primary field (#1301) · Issues · Baserow / baserow · GitLab.

One way i’d suggest to get around this is to:

Duplicate/snapshot the table to make a backup (please note the ID’s of the table/fields/views will all be different in the new duplicate/restored snapshot).
Make a new formula column
use the field('my primary field') name to make the formula column copy all the primary field data into this new column
convert the formula column to text to persist the copied data.
Now you can batch update the primary field with your script.

0xJJW · July 10, 2023, 1:35pm

Thank you Nigel, I will try this