Discussion: how do you sync an external database with baserow?

lucw · September 1, 2023, 2:24am

In my use case, Baserow makes a very good front-end for databases that may not have a GUI, or may have a very primitive GUI. For a lot of CRM use cases, being able to just have a list of users along with some metrics or some tags is very valuable.

But i’m finding that the syncing code is always annoying to write. I end up always writing the same logic:

retrieve all records from baserow table
look at the authoritative data source and determine what needs to be done:
- If the record doesn’t exist on Baserow, then it’s an Add
- If the record exists both on Baserow and local data source, then update
- If the record exists on Baserow, but not on local, then delete from baserow.
then perform the adds, updates, and deletes in separate loops, taking care of doing pagination well.

It would be smoother if there was somehow a way to just tell Baserow: here are the authoritative records, and the unique key is “field”. Do what you have to do to replicate this on the baserow side.

Anyone came across this use case ? How did you solve it ? It’s not hard by any means, just that it’s a pattern that’s likely to be reimplemented many times over.

nigel · September 1, 2023, 8:19am

We definately want to add upserting both via the API and also via our file importer at some point. We have one issue tracking part of the API side here: Introduce endpoint to upsert a row in a table (#1395) · Issues · Baserow / baserow · GitLab .

And so yeh i’d imagine as you suggest you say “this field/fields are the key fields” which are used for updating/inserting. With deletes i guess there could be an extra parameter saying “delete any rows which didn’t get updated or inserted”, which also sounds similar to our desire to add table truncation.

nigel · September 1, 2023, 8:23am

One issue that comes up when we’ve discussed bulk deleting/updating of rows is what to do with webhooks.

For example, say someone has a 10+ million row table, and they do some sort of upserting import file that updates all 10+ million rows, should Baserow attempt to call any configured webhooks on that table with information about all 10+ million rows and what changed? Should we instead introduce new websocket events that just say “hey some bulk change happened” but don’t include specifics on what?

Right now, if a user imports rows using the file importer, we don’t send any webhook events due to worries around performance/DDOSing the recieving server/causing extreme load on Baserow etc. So for a future “upserting file importer” we could just, also not send any webhook events and thats a limitation of our webhooks. Then for an upserting API endpoint we can restrict it to 200 upserted rows at a time and for those we can safely send webhooks.

lucw · September 1, 2023, 8:43am

I’m perfectly happy to disable webhooks when doing batch operations via the API.

Let me illustrate what I believe is a common CRM use case, at least that’s the way I think about it:
I have a user list where the authoritative source of data lives in my own local database. Users can register for an account on my SaaS and they appear in my own local database. I periodically refresh Baserow using the batch API as baserow is essentially my front-end for that user list. That’s one way of the 2 way communication.

Occasionally, I need to perform some one-off manual operations on that user list from Baserow, and I would like to reflect that in my own internal database. Maybe I need to extend their free trial, or enable a feature for them, or something like that. And when I do that, I expect webhooks to fire because my local database needs to be notified. That’s the other direction.

I only need to update in one direction. Either from my local database to baserow using batch API, or the other way with webhooks. I can’t imagine a use case where both need to happen simultaneously.

If deleting rows is enabled, you may face a difficulty with that upsert API, if it uses pagination. How will baserow know that the batch of updates is done ? I’m still very interested in having that feature. A lot of what i’ve been doing with Baserow involves replicating a list of records from the source to Baserow.

Unrelated, but let me use this message to notify you (I can’t reach you over email, maybe spam filter): I finished my baserow plugin tutorial: https://github.com/Language-Tools/baserow-translate-plugin/blob/master/tutorial/README.md , it’s now in the hands of Olga and Juliet to perhaps publish it on the baserow blog.

360Creators · February 21, 2024, 7:57am

I’m very much interested in this as well!

I’ve been looking for bidirectional sync tools/methods, preferably opensource. I found Prisma to be interesting.

Prisma is an open-source ORM that makes it fun and safe to work with a database like MySQL, Postgres, SQL Server, or MongoDB. It generates a type-safe client library to provide an awesome DX in any Node.js or JavaScript project.

That got me into ORM, standing for Object-Relational Mapping.

Still don’t have any answers if it really is as good as I imagine, I’m simply being curious and exploring options.

eshnil · June 7, 2024, 8:06am

It would be great if Baserow supported connecting to an external database like NocoDB does: Connect to a Data source | NocoDB

olgatrykush · June 7, 2024, 8:17am

Hey @eshnil, I think we want to introduce this feature in the future. Let me check with the team to see if we already have an issue for that.

glarrain-cdd · June 14, 2024, 5:30pm

Please keep us posted

glarrain-cdd · June 17, 2024, 9:36pm

What about having a different webhook for bulk operations, where it is invoked just once for the whole operation? The payload could include the list of affected rows although the feature would still be useful without that.
For example, upon receiving the webhook call, the external system could query rows by a “last modified at” field.

olgatrykush · June 26, 2024, 2:38pm

Hey @glarrain-cdd @eshnil, I’ve checked with the team and we plan to add a data sync feature which will allow connection to external data sources and synchronization with Baserow: API data sync (#2278) · Issues · Baserow / baserow · GitLab. It won’t be implemented in the same manner as NocoDB, but it should address the use case that you, @glarrain-cdd, described above. The issue doesn’t contain any details yet, but you can follow it for the future updates.

lucw · June 26, 2024, 10:03pm

Thanks olga, I’ve written my use case on that ticket.

olgatrykush · June 27, 2024, 1:15pm

Perfect, @lucw. The development team will keep your input in mind while implementing the feature. Thank you.