AI/LLM field type coming in 1.24 - I need to know EVERYTHING NOW!

I’m super excited about this:
https://www.reddit.com/r/baserow/comments/1bsyj9f/ai_within_baserow/

I’m going to assume this teaser hints at a new LLM/AI/Chatbot field type. If i’m wrong then the rest of this post won’t make much sense.

I actually suggested it should be done that way some time ago here on this forum. So this is a hugely welcome development.

Only question I have: so far all baserow derived field types (formulas) execute instantly. This is the first time that baserow will ship with an “asynchronous” field type as far as I know.

How is this handled technically ?

  • is the LLM query done synchronously as part of the PATCH / POST request which updates the row ? Possible, but not user friendly due to the large lag.
  • Or is there an asynchronous callback with a “processing” indicator on the field, replaced by the LLM result when available ?
  • How does back-populate work when adding the field type on a large table ?
  • is there a way to “retry” in case of a timeout ?
  • Does this pave the way for having more field types derived from REST APIs ?

In fact you could already implement this in a baserow plugin (described here How to create a Baserow plugin for language translation with ChatGPT // Baserow), but baserow had limited support for fields which operate using asynchronous resources (and resources that can timeout / fail in a transient way).

If this new LLM/AI field implies that Baserow will now natively support “asynchronous / REST API” derived field types, then this will greatly benefit my own project so I’m very excited about this.

@bram could you please check this out? :slightly_smiling_face:

yes @bram you’re maintaining the suspense here ! :laughing:

Hey @lucw, thank you so much for your interest! I remember the topic you opened in the community about “slow” rendering of cells. We realize that it can take a couple of seconds to get a response from an LLM, and that it’s not feasible to generate 100k cell values, if you have 100k rows when you create the field. We’re therefore implementing a more manual approach, which might not be what you’re looking for.

Take a look at this secret sneak peak. :wink:

I think that this video will clear things up, but still want to answer your questions:

is the LLM query done synchronously as part of the PATCH / POST request which updates the row ? Possible, but not user friendly due to the large lag.

It’s not done synchronously. The user must manually agree to generate the value for the row. This can be done by clicking on the generate button. This is because there could be financial, per token, cost when prompting the LLM.

Or is there an asynchronous callback with a “processing” indicator on the field, replaced by the LLM result when available ?

Running the prompt is done in an async background task (in Celery), and the cell will be in a loading state while this is happening. When it’s finished, it will be updated through a web-socket event.

How does back-populate work when adding the field type on a large table ?

For now, the user would manually need to confirm that the row must be generated. In the future, we might want to make it possible for all the cell values immediately, but we would need to communicate super clear that it’s going to massively prompt the LLM.

is there a way to “retry” in case of a timeout ?

Yes, the user can always hit the generate or re-generate button.

Does this pave the way for having more field types derived from REST APIs ?

It can definitely be. The AI field can be used as inspiration, and maybe we’ll make a fully reusable system for these kinds of “slow” fields.

The preliminary code of the AI field can be found here: AI field (!2108) · Merge requests · Baserow / baserow · GitLab in case you’re interested.

I hope this answers your questions!

I completely agree the manual approach is the right first step. Particularly since for LLMs, tweaking the prompts will be critical to obtaining a good result. The user will probably tweak the prompt a dozen times, test on a single field, then when they’re happy, mass-generate.
Having an automatic mass-generation would probably be more frustrating than anything.

Having this “generate / regenerate” button and the associated channel (as well as error handling) was something I needed for my own project (Words) and was thinking of implementing myself, so many thanks for implementing this !

Just curious how are you handling the billing part ? Obviously baserow.io free users won’t be allowed to rack up huge OpenAI bills !

The billing on baserow.io will be usage-based. We’re not yet sure how to approach it yet, so in the initial version we’re going to allow setting an OpenAI API key via the web-frontend interface. This will force the billing to go through OpenAI instead.

OK this all sounds good. Will the self hosted version also have capability to set an OpenAI API key by the user ?

Just tested on baserow.io and it works beautifully ! Congratulations on the 1.24 release !!