Discussion: how do you sync an external database with baserow?

In my use case, Baserow makes a very good front-end for databases that may not have a GUI, or may have a very primitive GUI. For a lot of CRM use cases, being able to just have a list of users along with some metrics or some tags is very valuable.

But i’m finding that the syncing code is always annoying to write. I end up always writing the same logic:

  1. retrieve all records from baserow table
  2. look at the authoritative data source and determine what needs to be done:
    • If the record doesn’t exist on Baserow, then it’s an Add
    • If the record exists both on Baserow and local data source, then update
    • If the record exists on Baserow, but not on local, then delete from baserow.
  3. then perform the adds, updates, and deletes in separate loops, taking care of doing pagination well.

It would be smoother if there was somehow a way to just tell Baserow: here are the authoritative records, and the unique key is “field”. Do what you have to do to replicate this on the baserow side.

Anyone came across this use case ? How did you solve it ? It’s not hard by any means, just that it’s a pattern that’s likely to be reimplemented many times over.

We definately want to add upserting both via the API and also via our file importer at some point. We have one issue tracking part of the API side here: Introduce endpoint to upsert a row in a table (#1395) · Issues · Baserow / baserow · GitLab .

And so yeh i’d imagine as you suggest you say “this field/fields are the key fields” which are used for updating/inserting. With deletes i guess there could be an extra parameter saying “delete any rows which didn’t get updated or inserted”, which also sounds similar to our desire to add table truncation.

1 Like

One issue that comes up when we’ve discussed bulk deleting/updating of rows is what to do with webhooks.

For example, say someone has a 10+ million row table, and they do some sort of upserting import file that updates all 10+ million rows, should Baserow attempt to call any configured webhooks on that table with information about all 10+ million rows and what changed? Should we instead introduce new websocket events that just say “hey some bulk change happened” but don’t include specifics on what?

Right now, if a user imports rows using the file importer, we don’t send any webhook events due to worries around performance/DDOSing the recieving server/causing extreme load on Baserow etc. So for a future “upserting file importer” we could just, also not send any webhook events and thats a limitation of our webhooks. Then for an upserting API endpoint we can restrict it to 200 upserted rows at a time and for those we can safely send webhooks.

I’m perfectly happy to disable webhooks when doing batch operations via the API.

Let me illustrate what I believe is a common CRM use case, at least that’s the way I think about it:
I have a user list where the authoritative source of data lives in my own local database. Users can register for an account on my SaaS and they appear in my own local database. I periodically refresh Baserow using the batch API as baserow is essentially my front-end for that user list. That’s one way of the 2 way communication.

Occasionally, I need to perform some one-off manual operations on that user list from Baserow, and I would like to reflect that in my own internal database. Maybe I need to extend their free trial, or enable a feature for them, or something like that. And when I do that, I expect webhooks to fire because my local database needs to be notified. That’s the other direction.

I only need to update in one direction. Either from my local database to baserow using batch API, or the other way with webhooks. I can’t imagine a use case where both need to happen simultaneously.

If deleting rows is enabled, you may face a difficulty with that upsert API, if it uses pagination. How will baserow know that the batch of updates is done ? I’m still very interested in having that feature. A lot of what i’ve been doing with Baserow involves replicating a list of records from the source to Baserow.

Unrelated, but let me use this message to notify you (I can’t reach you over email, maybe spam filter): I finished my baserow plugin tutorial: https://github.com/Language-Tools/baserow-translate-plugin/blob/master/tutorial/README.md , it’s now in the hands of Olga and Juliet to perhaps publish it on the baserow blog.

I’m very much interested in this as well!

I’ve been looking for bidirectional sync tools/methods, preferably opensource. I found Prisma to be interesting.

Prisma is an open-source ORM that makes it fun and safe to work with a database like MySQL, Postgres, SQL Server, or MongoDB. It generates a type-safe client library to provide an awesome DX in any Node.js or JavaScript project.

That got me into ORM, standing for Object-Relational Mapping.

Still don’t have any answers if it really is as good as I imagine, I’m simply being curious and exploring options.

It would be great if Baserow supported connecting to an external database like NocoDB does: Connect to a Data source | NocoDB

Hey @eshnil, I think we want to introduce this feature in the future. Let me check with the team to see if we already have an issue for that. :slightly_smiling_face:

Please keep us posted

What about having a different webhook for bulk operations, where it is invoked just once for the whole operation? The payload could include the list of affected rows although the feature would still be useful without that.
For example, upon receiving the webhook call, the external system could query rows by a “last modified at” field.