Thanks for the continued focus on performance @bram it’s very appreciated.
As far as baserow + AI, the most obvious use case in my opinion is to run chatbot prompts based on field data. Here’s what it would work:
- Field A contains a paragraph of text.
- Field B could be added as a derived field , meant to contain the result of a chatbot API call on prompt “${prompt} ${field_data}”.
So what can the baserow team do to help facilitate this kind of workflow ? It might be tempting to hardcode calls to certain APIs such as OpenAI and support them directly out of the box on Baserow (you’d have an OpenAI ChatGPT field type for example, the user would supply their own API key). And it might still be the right thing to do from a marketing, exposure point of view for Baserow. (and it might be the perfect use case for what i’m describing below)
But what I would really like as a developer is an actual framework for running derived field calculations in background threads. Because in my project, the derived fields will call other APIs such as translation, dictionary lookup, which take 300ms-2000ms to respond.
Here’s what I’d love to see (and i’m 100% open to doing some of the work myself or somehow financially sponsoring some of the development with my meager funds)
- the FieldType base class will have support for “long running computations”
- on row update, the derived field would be amended at a later time (using one of the celery workers), to avoid blocking the HTTP PATCH call for long periods of time (particularly useful if you’re running a GPT-4 query).
- very importantly, there would be support for adding a new field on an already populated table. Let’s say I have 100 rows of data. I add a OpenAI GPT field. It has to run on 100 rows. A celery task should be started which periodically updates the table data.
- care should be taken when adding a new such derived field on a massive table. Let’s say we have 10k rows. Do we want to let the user casually add a GPT4 field ? First of all, it’ll block one of the celery workers for a long time, second, it’ll quickly exhaust the user’s OpenAI credits without them necessarily knowing about it.
- There should be a “retry” button in case one of the query failed, or maybe the user is not satisfied with the output, they could quickly retry and get a different result.
- cherry on top of the cake, there could be a visual progress indicator telling the user that background tasks are running.
I’ve tried to tackle these issues in my plugin. @nigel has been extremely helpful in pointing me in the right direction and writing some of the code. I rolled out some solutions which work for me, but I suspect sooner or later, other people will want the same functionality for their own derived fields.
Here’s how I do it:
my field type, the call to update all rows:
the corresponding celery task:
I don’t have an answer for the following questions:
- what happens when a large number of users add derived fields on massive tables, clogging up the celery tasks ?
- how to give a warning that adding derived fields on massive table will exhaust the user’s credits.
- how to retry (for example in case of a timeout).
I suspect having support for these long-running calculations will open up a ton of other use cases. For example at work (financial industry), we have a spreadsheet-like app to do pricing on derivative products. We could replace with with a custom baserow instance which calls the pricing code once all pricing parameters are configured. Baserow would be a much much nicer UI than what we have right now.