How do I optimize my self-hosted install? Having issues with large API requests timing out or crashing the server

andy · March 27, 2024, 2:31pm

Hello Baserow Community!

Thank you for taking the time to read my post. I recently set up a self-hosted instance of Baserow and have been absolutely blown away by its power and intuitiveness, so much so that I have spent the last several days moving over my existing set up (which consists of a few locally-hosted MongoDB databases, a ton of CSV files, and an ungodly number of Google Sheets and scripts).

The thousand foot overview of my structure is as such: three tables in one database. Table one has about 50,000 rows and is updated using PATCH request through the API. Table two has about 5,000 rows and is updated the same way. Both tables one and two have about a dozen formulas in total, nothing too complicated just some if then statements, and a few ‘greater’ and ‘sum’ formulas. Table three looks up values from tables one and two, and runs about half a dozen formulas to compare the data pulled from the first two tables. That data is then obtained through a GET request via the API, and then notes are added to text fields within table three using PATCH requests via the API. In due course, the plan is to add thousands of rows to each table, but I am running into issues with resource management.

All of this is hosted on a Hetzner VPS with 16 VCPUs, 32GB of RAM, and a 160 GB SSD. I am running Ubuntu, Docker and EasyPanel to run Baserow, Redis, and PostgreSQL.

These are the custom variables I currently have for Baserow:
BASEROW_AMOUNT_OF_GUNICORN_WORKERS=16
BASEROW_AMOUNT_OF_WORKERS=16
BATCH_ROWS_SIZE_LIMIT=9999
BASEROW_ROW_PAGE_SIZE_LIMIT=9999

THE PROBLEM: I have no problem with making GET requests to pull 9999 records from any of the three tables, but when I try to make PATCH requests, as few as 200 records can cause requests to tables 1 and 2 to time out, and as little as 50 records can cause requests to table three to either time out or outright crash the entire server. Yesterday, I had to redeploy the whole server from a backup because my PostgreSQL DB had a fatal checkpoint error after such a crash.

When a request times out, I see that PostgreSQL is frequently using 100% of the CPU capacity and sometimes as much as 20GB of RAM, at least according to EasyPanel.

Patching records one by one does not cause a crash, but it slows things down a lot so I don’t know if that’s the most effective way to accomplish my goal.

MY ASK:
What can I do to optimize my self-hosted installation to improve performance through the API? Is the issue a general lack of resources (i.e. I need a better VPS or a dedicated server) or are there configuration changes that I can implement?

Thank you so much!

davide · March 28, 2024, 4:49pm

Hello @andy,

thanks a lot for your feedback; it is really appreciated.

It’s a bit difficult to say how to optimize Baserow in general. We’re already trying to optimize queries as much as possible. Still, if a row has many dependent rows, a single update can result in a lot of queries to update every dependent row, and optimizing that depends on how the tables are structured.

One piece of advice could be to take a look at this document: Working with metrics and logs as a developer // Baserow

It’s technical, but if you add better metrics and logs to your instance, it’ll be easy to understand what is causing these performance issues.

Another way is to send us the structure of your database and all the information worth sharing to reproduce your specific scenario. We’ll try to reproduce your case with some random data.