Large number of row updates, how to notify front-end

lucw · August 8, 2022, 2:35pm

My Baserow addon involves adding language translation (and other language related functionality) to baserow (background: Anonymous API access, or universal token).

When a user adds a new “translation” field to an existing, populated table, if the number of rows is large, it will result in a large number of updates. I use a celery task to achieve this, and it’s working as expected, however I’m running into issues (mostly front-end I believe) when very frequent updates involve a large number of rows.

I experimented with processing small batches. So let’s say there are 2,000 rows in the table, and a user adds a “translated” field. In the celery worker, I need to compute a translation for those 2,000 fields and populate the translated value. The backend handles this fine (I break up the 2,000 rows in smaller batches), but the problems start appearing when I send a “rows_updated” signal with a large number of rows (more than 100). This seems to confuse the front-end, and I sometimes end up with a blank grid, which requires an F5 to clear up.

I thought using “rows_updated” as opposed to the row-by-row “row_updated” signal would take care of performance issues, but I see that the front-end handling of “rows_updated” still has some nested loops, so maybe it’s just now designed to deal with large number of rows.

I see in this commit Blame · backend/src/baserow/contrib/database/ws/table/signals.py · develop · Bram Wiepjes / baserow · GitLab that a “force_table_refresh” parameter was added to the “table_updated”, and hence I could potentially use it to signal an update on a large number of rows.

Ideally, here’s what I’d like to see happen:

I do a large number of row updates on the backend, inside my celery task
I send a signal to the front-end over the websocket channel
the front-end lazily reloads the table based on visible rows.

Would using the “table_updated” signal give me this ? or will the front-end proactively reload everything ?

I’m open to any suggestions anyone may have, thank you in advance !

petrs · August 10, 2022, 3:39pm

Hi @lucw, yes, the front-end is not optimized yet so you could make your own implementation of optimized rows_updated signal handeling. You could even contribute the changes back to Baserow possibly if your implementation would be correct and general enough.

As you wrote, you could always just refresh the table entirely after all changes are in, even though this could be a disruptive event for users.

lucw · August 11, 2022, 1:40pm

@petrs I’m honored that you believe i’m capable of optimizing the front-end handling of rows_updated I did actually work with web grids at some point so I may look into it without guarantees of course !
Edit: using table_updated with force_table_refresh=True works beautifully when doing massive number of updates. And it’s fast. Not sure how much i’m going to be able to improve over 1.11 (in 1.10 it did feel buggy and my grid blanked out, but not anymore)