API Webhooks NOT WORKING!

Please fill in the questionnaire below.

Technical Help Questionnaire

Have you read and followed the instructions at: *READ ME FIRST* Technical Help FAQs - #2 by nigel ?

Answer: YES

Hi I am experiencing problems with the webhook service.

I am creating new rows in my table but a webhook is NOT being generated !!!

Is the service down???

image

Hi @Thorby, we’re sorry to hear you’re having webhook issues. Could you please let us know whether you’re using the hosted or self-hosted version of Baserow, and which version number you’re running if you’re self-hosting?

I’m experiencing the same issue using the SAAS version. This morning, the webhooks were working but with a delay of over 10 minutes. Now, it’s been more than an hour, and they’re not triggering at all.

Im using the Hosted version

Hey @Thorby, @Ivan, I’m currently looking into why this is happening.

It looks like we had an unusual high number of row update API requests from one single user since 10:00 UTC this morning. This was causing the background workers to be flooded with search index update tasks. This is happening in the same worker as the webhooks. I’ve rate limited this user to prevent it from happening, and I’ve started more background workers to get rid of the huge backlog of tasks.

Once that’s done, your webhooks should start working as expected. Please also note that there is also a webhook queue. There can only be one concurrent webhook call. If another one is added before the previous one is finished, it’s added to a queue. The max of that queue is 5000 webhook calls. If it exceeds that number, they’re dropped.

Because the background workers were still processing tasks within 10 minutes, this didn’t cause any notifications on our side.

Hey @Thorby, @Ivan, I can confirm that the background worker queue is now empty, and is picking tasks up immediately again. The webhooks work fast for me now. Can you confirm that it’s working as expected for you now?

Thank you so much for reporting this.

HI Bram,

Many thanks for the update, it would appear all is working again.

However, this is very concerning as it seems this issue can and will recurr regularly.

It’s also very concerning that “they are dropped” when the webhook queue exceeds 5000. Please can you point us to the documantation that details what is dropped and when (e.g. old webhooks dropped/new webhooks not queued). Hopefully the same documentation will detail how we can recover/reprocess “dropped webhooks”. This is critical information especially when we use Baserow webhooks to power/manage 3rd party systems ?

Note: Without some sort of resolution to recover/reprocess “dropped webhooks”, I would have to periodically examine every Baserow record, on a number of tables, via the API. Then, if necessary “sync” all records with 3rd party systems, via the API, . This in itself could trigger a “high usage” outage, similar to one experienced yesterday !!

Yes! It’s working perfectly again. Thank you!

Hi Bram,

It looks looks we’ve run into an issue again.

I made changes to my table at 10:13 and 10:31. Currently 10:35 and the webhook has not generated.

My webhook is simply configured:
image

Hey @Thorby, I’m really sorry about that. It looks like we had another spike in API requests from another user. Just made some actions to fix it, so it should work as expected now. We’re working on some monitoring improvements from our side, so that we get notified about it.

Thanks Bram,

Please see my previous post about the issue recurring.

Thanks again.

Hey @Thorby, I saw your message about the problem recurring. Did you try just now, like a couple of minutes ago, because we just fixed another problem related to it?

Hi Bram,

Yes I have been testing and all seems OK for now.

Does this mean the error will repeat and repeat until Baserow notice high usage from a user and manually adjsut their constraints ?

Also, plase can you point me in the direction of the documentation that describes the technical fallovers/contraints/features of webhooks (e.g. dropped webhooks, queued webhooks, batched webhooks, etc) so we can code our applications accordingly?

Many thanks.

Hey @Thorby, glad to hear that it’s okay now.

This is a difficult problem to avoid, to be honest. A solution would be to limit every user in the number of API requests further, but that would prevent users from temporarily peaking in API requests, and that is also not something we want.

I’ve just introduced new queue size health checks here: Celery queue size health check endpoints (!2940) · Merge requests · Baserow / baserow · GitLab. Once that code is merged and deployed, we can add a new service to our status page https://status.baserow.org/, allowing us (and you) to closely monitor if too many tasks are being queued.

We’re also going to introduce auto-scaling of our background task workers based on queue size. If the queue size increases, we will then automatically spin up more servers to handle the load accordingly, so that other users like you are not affected by a temporary increase.

A combination of both should allow us to be notified early when it happens, and allow everyone to keep using Baserow as usual.