Automating AI-generated field population in Baserow

Hello Baserow community,

I’m excited about the new AI row capability in Baserow. I have a specific use case that requires automation, and I’d like to know if it’s possible to achieve this:

  1. Can the AI-generated fields be triggered automatically or through an API, rather than manually clicking the ‘generate’ button?
  2. If I have a row with multiple AI-generate buttons (say, 5 different AI-generated fields), is there a way to populate all these fields at once without having to click each button individually?
  3. Are there any existing methods or best practices for bulk-generating AI content across multiple rows and fields?

Any insights or documentation on automating this process would be greatly appreciated. Thanks in advance for your help!

Hi @pdhym,

Can the AI-generated fields be triggered automatically or through an API, rather than manually clicking the ‘generate’ button?

Yes, via generate_table_ai_field_value endpoint.

If I have a row with multiple AI-generate buttons (say, 5 different AI-generated fields), is there a way to populate all these fields at once without having to click each button individually?

Currently no.

Are there any existing methods or best practices for bulk-generating AI content across multiple rows and fields?

Currently no, you track or comment on the issue for that here: Allow regenerating all AI field cell values at once (#2492) · Issues · Baserow / baserow · GitLab.

given the recent update in Baserow v2 regarding batch AI field generation - are there any updates to the generate_table_ai_field_value endpoint in the API?

Hi @dev-rd, multiple rows should be regenerated at once based on the row_ids attribute in request body. The endpoint is under-documented now, we will fix it here: Missing OpenAPI spec for request of generate_table_ai_field_value endpoint · Issue #4339 · baserow/baserow · GitHub

Hey @dev-rd,

the spec for that endpoint were actually wrong. We’re going to fix it.

Anyway the new feature is using directly https://api.baserow.io/api/redoc/#tag/Jobs/operation/create_job with generate_ai_values as type.

Thanks again for this!

However, I’d need some more clarification regarding the endpoint (generate_table_ai_field_value ) and the BASEROW_AI_FIELD_MAX_CONCURRENT_GENERATIONS env. var
and how it plays with the endpoint

| The explanation in the docs regarding the BASEROW_AI_FIELD_MAX_CONCURRENT_GENERATIONS says:
If AI field values are recalculated in a large number (i.e. recalculating whole table, empty rows, or a selection of rows), this controls the number of concurrent requests issued to AI model to generate values.
|
|----|

  1. What exactly constitutes a “large number” for `BASEROW_AI_FIELD_MAX_CONCURRENT_GENERATIONS`?
    Does each generation of more than one rows counts against that limit?

  2. What is the expected behavior when the limit is exceeded? (Error message? Queuing?)

    for one test instance I set up BASEROW_AI_FIELD_MAX_CONCURRENT_GENERATIONS=“1” and this does not prevent concurrent generations (at least it does not prevent me from starting new generations beyond the limit via the “generate all AI values” modal - unless I misunderstood the concept for this limit here)

    in another instance:
    I encounter an API key error when the user approaches or exceeds the limit (for this example the limit was set to 12), but I admit this may be coincidental - I tested that briefly with the old method of “select → right click → generate” -
    This does not seem like a genuine API rate limit, etc - ai api was working fine during testing, and the test was way below rate limits.
    Do let me know if this warrants a separate bug/issue report.

  3. If I want to trigger the generate_table_ai_field_value endpoint - for example via N8N - what is the expected error message (or what should happen) when I’d try to generate more batches than specified in the BASEROW_AI_FIELD_MAX_CONCURRENT_GENERATIONS?
    (asking to avoid unnecessary spending for tokens :slight_smile: )

  4. What is the recommended max value for BASEROW_AI_FIELD_MAX_CONCURRENT_GENERATIONS
    or rather:
    What does/should it depend on (available cpu threads? other env var settings, etc?)
    if any specs matter, assume 16 cpu threads and 32+ GB ram.

  5. In the API redoc spec, each “batch of selected rows” (number of allowed array items for row_ids) seems correlated to and limited by the BATCH_ROWS_SIZE_LIMIT value.
    How does that play out with respect to:

    Firstly - for views exceeding the default number of 200 items (or whatever the user sets there) → does Baserow internally use this endpoint and splits the generation into smaller batches accordingly? if so, do these count towards the limit for MAX concurrent generations?

    Secondly - how does that impact other types of AI field generation (both in terms of the concurrent limit and/or any queue handling (if applicable)? This may in fact be a more general question about how AI generation batches are currently handled, and how does that translate to performance and engagement of “workers”, celery/gunicorn, etc

  6. In the redoc documentation, this part is only specified in the 202 response:

    1. row_ids Array of integers

    The IDs of the rows to generate AI values for. If not provided, all rows in the view or table will be processed.

    view_id integer

    The ID of the view to generate AI values for. If not provided, the entire table will be processed.

    only_empty boolean

    Whether to only generate AI values for rows where the field is empty.

  • 6a) does the BATCH_ROWS_SIZE_LIMIT apply to custom baserow api calls where view ID is specified?
  • 6b) If I want to rely only on `row_ids` (without filtering by view), should I omit `view_id` entirely, or is there a specific value (e.g., `null` or `0`) I should provide?
    (specifically regarding: The ID of the view to generate AI values for. If not provided, the entire table will be processed.)
  1. It’s not a limit, it represents the number of concurrent threads used to generate values. It defaults to 5, meaning Baserow can make up to 5 concurrent requests to your LLM model to generate values.

Since it’s not a limit, most of the other questions should be clearer now.

6a) yes
6b) row_ids and view_id are mutually exclusive. If row_ids is provided, it will use those ids, otherwise if the view_id is provided, it will use the view to get all the rows in the view.

@davide Your previous reply clarified a lot - but we run into an separate issue - It seems there is a limit to how many column generations can be turned on at the same time. This does not seem to apply to manually selected batches - so I may be missing some setting?

Is there a way to increase this limitation?

it could be at the expense of speed of generation or concurrency per job.
The point is - ideally - someone could schedule generation of say, 10 -20 columns (could be scheduled) and get back a few hours later to find all the replies nicely generated (or leave the generation running overnight)

Secondly - since these seem to be run as “jobs”, this limit applies: BASEROW_JOB_SOFT_TIME_LIMIT ,
are there any Baserow specific considerations here against setting this for sth like 12 hrs? (other than the risk of a “run-away” job going for 12 hrs?)

Also - if generation takes longer than that limit allows - the UI reports completion even though the job failed (or just stopped, actually).

I’d appreciate your expertise in this
Thanks!

At the moment, there is a limit of 3 concurrent generations per user, if the generation is on an entire table or a view.

This does not seem to apply to manually selected batches.

Correct, mainly for backward compatibility reasons.

Is there a way to increase this limitation?

This isn’t possible yet. Currently, all jobs are scheduled immediately and run as soon as workers are available, there’s no queuing mechanism. Per-user limits only cap how many resources a single user can consume at once, not when jobs execute. We’d need to add proper job queuing to support this.

Secondly - since these seem to be run as “jobs”, this limit applies: BASEROW_JOB_SOFT_TIME_LIMIT ,
are there any Baserow specific considerations here against setting this for sth like 12 hrs?

The main consideration is stuck jobs. If a background worker is killed and fails to update the database, the job remains in “running” state until the timeout expires. During that window, you can’t start another job of the same type (e.g., CSV export), so with a 12-hour timeout, that job type could be blocked for your user for 12 hours.

That said, the timeout isn’t workload-aware, so you want enough margin for your longest legitimate jobs to complete. It’s a tradeoff: shorter timeouts mean faster recovery from stuck jobs, but risk terminating long-running work.

We have some improvements on this in our backlog, but I can’t say now exactly when they will be implemented.