AWS ECS Installation Issues

Please fill in the questionnaire below.

Technical Help Questionnaire

Have you read and followed the instructions at: *READ ME FIRST* Technical Help FAQs - #2 by nigel ?

Answer: Yes

How have you self-hosted Baserow

AWS ECS

What are the specs of the service or server you are using to host Baserow.

I’ve deployed a task with recommended settings:

4vCPU 8 GB RAM

Which version of Baserow are you using.

baserow/baserow:1.23.2
All in One

How have you configured your self-hosted installation?

I’ve used a guide from: Install on AWS // Baserow

What commands if any did you use to start your Baserow server?

Nothing

Describe the problem

I’ve created an AWS ECS cluster, however my service and task is always failing.
What has been done:
1 - ALB with 443 and 80
2 - Target Group to ECS through port 80
3 - Environment variables configured


Here is the JSON definition of the task

{
   "taskDefinitionArn": "arn:aws:ecs:eu-central-1:AWSACCOUNTNUMBERHIDDEN:task-definition/MANUAL:1",
   "containerDefinitions": [
      {
         "name": "BASEROW-MANUAL",
         "image": "baserow/baserow:1.23.2",
         "cpu": 4096,
         "memory": 8192,
         "portMappings": [
            {
               "containerPort": 443,
               "hostPort": 443,
               "protocol": "tcp"
            },
            {
               "containerPort": 80,
               "hostPort": 80,
               "protocol": "tcp"
            }
         ],
         "essential": true,
         "environment": [
            {
               "name": "DISABLE_VOLUME_CHECK",
               "value": "yes"
            },
            {
               "name": "BASEROW_PUBLIC_URL",
               "value": "https://--.com" // Hided - but valid FQDN
            },
            {
               "name": "DOWNLOAD_FILE_VIA_XHR",
               "value": "1"
            },
            {
               "name": "BASEROW_IMAGE_TYPE",
               "value": "all-in-one"
            },
            {
               "name": "REDIS_URL",
               "value": "rediss://nde-baserow-poc-ech.0us1dt.0001.euc1.cache.amazonaws.com:6379/0"
            },
            {
               "name": "AWS_STORAGE_BUCKET_NAME",
               "value": "nde-baserow-poc-s3b"
            }
         ],
         "mountPoints": [],
         "volumesFrom": [],
         "secrets": [
            {
               "name": "DATABASE_HOST",
               "valueFrom": "arn:aws:secretsmanager:eu-central-1:AWSACCOUNTNUMBERHIDDEN:secret:CDKBASEROWPOCNDEBASEROWPOCR-x7gNxVSMqygW-WSTQ55:host::"
            },
            {
               "name": "DATABASE_NAME",
               "valueFrom": "arn:aws:secretsmanager:eu-central-1:AWSACCOUNTNUMBERHIDDEN:secret:CDKBASEROWPOCNDEBASEROWPOCR-x7gNxVSMqygW-WSTQ55:dbname::"
            },
            {
               "name": "DATABASE_PASSWORD",
               "valueFrom": "arn:aws:secretsmanager:eu-central-1:AWSACCOUNTNUMBERHIDDEN:secret:CDKBASEROWPOCNDEBASEROWPOCR-x7gNxVSMqygW-WSTQ55:password::"
            },
            {
               "name": "DATABASE_PORT",
               "valueFrom": "arn:aws:secretsmanager:eu-central-1:AWSACCOUNTNUMBERHIDDEN:secret:CDKBASEROWPOCNDEBASEROWPOCR-x7gNxVSMqygW-WSTQ55:port::"
            },
            {
               "name": "DATABASE_USER",
               "valueFrom": "arn:aws:secretsmanager:eu-central-1:AWSACCOUNTNUMBERHIDDEN:secret:CDKBASEROWPOCNDEBASEROWPOCR-x7gNxVSMqygW-WSTQ55:username::"
            }
         ],
         "dockerLabels": {},
         "logConfiguration": {
            "logDriver": "awslogs",
            "options": {
               "awslogs-group": "baserow",
               "awslogs-region": "eu-central-1",
               "awslogs-stream-prefix": "logs"
            }
         },
         "systemControls": []
      }
   ],
   "family": "MANUAL",
   "taskRoleArn": "arn:aws:iam::AWSACCOUNTNUMBERHIDDEN:role/nde-baserow-poc-iam",
   "executionRoleArn": "arn:aws:iam::AWSACCOUNTNUMBERHIDDEN:role/CDK-BASEROW-POC-NDEBASEROWPOCECSTSKExecutionRole8EE-UtdybVkoMi8p",
   "networkMode": "awsvpc",
   "revision": 1,
   "volumes": [],
   "status": "ACTIVE",
   "requiresAttributes": [
      {
         "name": "com.amazonaws.ecs.capability.logging-driver.awslogs"
      },
      {
         "name": "ecs.capability.execution-role-awslogs"
      },
      {
         "name": "com.amazonaws.ecs.capability.docker-remote-api.1.19"
      },
      {
         "name": "ecs.capability.secrets.asm.environment-variables"
      },
      {
         "name": "com.amazonaws.ecs.capability.task-iam-role"
      },
      {
         "name": "com.amazonaws.ecs.capability.docker-remote-api.1.18"
      },
      {
         "name": "ecs.capability.task-eni"
      }
   ],
   "placementConstraints": [],
   "compatibilities": [
      "EC2",
      "FARGATE"
   ],
   "requiresCompatibilities": [
      "FARGATE"
   ],
   "cpu": "4096",
   "memory": "8192",
   "registeredAt": "2024-04-03T12:43:36.659Z",
   "registeredBy": "arn:aws:sts::AWSACCOUNTNUMBERHIDDEN:assumed-role/AWSReservedSSO_NDE-AWSPractice-Admin_318fbfc360223a3b/P3250064",
   "tags": []
}

Here is the logs from CloudWatch, but I don’t really understand what is the root cause, only thing I can see it’s Caddy and port 3000

2024-04-03T16:18:53.836+03:00
2024-04-03 13:18:53,836 INFO reaped unknown pid 206 (exit status 0)


2024-04-03 13:18:53,836 INFO reaped unknown pid 206 (exit status 0)
2024-04-03T16:18:53.836+03:00
2024-04-03 13:18:53,836 INFO reaped unknown pid 206 (exit status 0)


2024-04-03 13:18:53,836 INFO reaped unknown pid 206 (exit status 0)
2024-04-03T16:18:53.836+03:00
e[36m [CELERY_WORKER][2024-04-03 13:18:53]  e(Be[m 


e[36m [CELERY_WORKER][2024-04-03 13:18:53] e(Be[m
2024-04-03T16:18:53.836+03:00
e[36m [CELERY_WORKER][2024-04-03 13:18:53]  e(Be[m 


e[36m [CELERY_WORKER][2024-04-03 13:18:53] e(Be[m
2024-04-03T16:18:55.830+03:00
e[36m [CELERY_WORKER][2024-04-03 13:18:55] worker: Warm shutdown (MainProcess) e(Be[m 


e[36m [CELERY_WORKER][2024-04-03 13:18:55] worker: Warm shutdown (MainProcess) e(Be[m
2024-04-03T16:18:55.831+03:00
2024-04-03 13:18:55,831 INFO stopped: celeryworker (exit status 0)


2024-04-03 13:18:55,831 INFO stopped: celeryworker (exit status 0)
2024-04-03T16:18:55.831+03:00
2024-04-03 13:18:55,831 INFO stopped: celeryworker (exit status 0)


2024-04-03 13:18:55,831 INFO stopped: celeryworker (exit status 0)
2024-04-03T16:18:55.831+03:00
2024-04-03 13:18:55,831 INFO reaped unknown pid 216 (exit status 0)


2024-04-03 13:18:55,831 INFO reaped unknown pid 216 (exit status 0)
2024-04-03T16:18:55.831+03:00
2024-04-03 13:18:55,831 INFO reaped unknown pid 216 (exit status 0)


2024-04-03 13:18:55,831 INFO reaped unknown pid 216 (exit status 0)
2024-04-03T16:18:56.832+03:00
2024-04-03 13:18:56,832 INFO stopped: backend (terminated by SIGTERM)


2024-04-03 13:18:56,832 INFO stopped: backend (terminated by SIGTERM)
2024-04-03T16:18:56.832+03:00
2024-04-03 13:18:56,832 INFO stopped: backend (terminated by SIGTERM)


2024-04-03 13:18:56,832 INFO stopped: backend (terminated by SIGTERM)
2024-04-03T16:18:56.833+03:00
e[35m [CADDY][2024-04-03 13:18:56] 
{
    "level": "info",
    "ts": 1712150251.0400605,
    "logger": "tls",
    "msg": "finished cleaning storage units"
}
 e(Be[m 


e[35m [CADDY][2024-04-03 13:18:56] {"level":"info","ts":1712150251.0400605,"logger":"tls","msg":"finished cleaning storage units"} e(Be[m
2024-04-03T16:18:56.833+03:00
e[35m [CADDY][2024-04-03 13:18:56] 
{
    "level": "info",
    "ts": 1712150336.8329391,
    "msg": "shutting down apps, then terminating",
    "signal": "SIGTERM"
}
 e(Be[m 


e[35m [CADDY][2024-04-03 13:18:56] {"level":"info","ts":1712150336.8329391,"msg":"shutting down apps, then terminating","signal":"SIGTERM"} e(Be[m
2024-04-03T16:18:56.833+03:00
e[35m [CADDY][2024-04-03 13:18:56] 
{
    "level": "warn",
    "ts": 1712150336.8329701,
    "msg": "exiting; byeee!! 👋",
    "signal": "SIGTERM"
}
 e(Be[m 


e[35m [CADDY][2024-04-03 13:18:56] {"level":"warn","ts":1712150336.8329701,"msg":"exiting; byeee!! 👋","signal":"SIGTERM"} e(Be[m
2024-04-03T16:18:56.833+03:00
e[35m [CADDY][2024-04-03 13:18:56] 
{
    "level": "info",
    "ts": 1712150336.8330188,
    "logger": "http",
    "msg": "servers shutting down with eternal grace period"
}
 e(Be[m 


e[35m [CADDY][2024-04-03 13:18:56] {"level":"info","ts":1712150336.8330188,"logger":"http","msg":"servers shutting down with eternal grace period"} e(Be[m
2024-04-03T16:18:56.833+03:00
e[35m [CADDY][2024-04-03 13:18:56] 
{
    "level": "info",
    "ts": 1712150336.8332982,
    "logger": "admin",
    "msg": "stopped previous server",
    "address": "localhost:2019"
}
 e(Be[m 


e[35m [CADDY][2024-04-03 13:18:56] {"level":"info","ts":1712150336.8332982,"logger":"admin","msg":"stopped previous server","address":"localhost:2019"} e(Be[m
2024-04-03T16:18:56.834+03:00
e[35m [CADDY][2024-04-03 13:18:56] 
{
    "level": "info",
    "ts": 1712150336.8333192,
    "msg": "shutdown complete",
    "signal": "SIGTERM",
    "exit_code": 0
}
 e(Be[m 


e[35m [CADDY][2024-04-03 13:18:56] {"level":"info","ts":1712150336.8333192,"msg":"shutdown complete","signal":"SIGTERM","exit_code":0} e(Be[m
2024-04-03T16:18:56.834+03:00
2024-04-03 13:18:56,834 INFO stopped: caddy (exit status 0)


2024-04-03 13:18:56,834 INFO stopped: caddy (exit status 0)
2024-04-03T16:18:56.834+03:00
2024-04-03 13:18:56,834 INFO stopped: caddy (exit status 0)


2024-04-03 13:18:56,834 INFO stopped: caddy (exit status 0)
2024-04-03T16:18:57.836+03:00
2024-04-03 13:18:57,836 INFO stopped: baserow-watcher (terminated by SIGTERM)


2024-04-03 13:18:57,836 INFO stopped: baserow-watcher (terminated by SIGTERM)
2024-04-03T16:18:57.836+03:00
2024-04-03 13:18:57,836 INFO stopped: baserow-watcher (terminated by SIGTERM)


2024-04-03 13:18:57,836 INFO stopped: baserow-watcher (terminated by SIGTERM)
2024-04-03T16:18:57.836+03:00
2024-04-03 13:18:57,836 INFO reaped unknown pid 202 (exit status 0)


2024-04-03 13:18:57,836 INFO reaped unknown pid 202 (exit status 0)
2024-04-03T16:18:57.836+03:00
2024-04-03 13:18:57,836 INFO reaped unknown pid 202 (exit status 0)


2024-04-03 13:18:57,836 INFO reaped unknown pid 202 (exit status 0)
2024-04-03T16:18:57.836+03:00
2024-04-03 13:18:57,836 INFO waiting for processes to die


2024-04-03 13:18:57,836 INFO waiting for processes to die
2024-04-03T16:18:57.836+03:00
2024-04-03 13:18:57,836 INFO waiting for processes to die


2024-04-03 13:18:57,836 INFO waiting for processes to die
2024-04-03T16:18:57.837+03:00
2024-04-03 13:18:57,837 INFO stopped: processes (terminated by SIGTERM)


2024-04-03 13:18:57,837 INFO stopped: processes (terminated by SIGTERM)
2024-04-03T16:18:57.837+03:00
2024-04-03 13:18:57,837 INFO stopped: processes (terminated by SIGTERM)

Hi @Unity, would you be able to share more logs? In the logs that I can see, I already see exit codes, but it would be useful to see the output before. That might help in identifying the problem.

1 Like

Hi, @bram thank you for your response,

I’ve managed to create a “Baserow” in ECS, however it failing on ALB side, once it become “unhealthy”.

Here are the logs, by filer “404”

2024-04-05T11:30:00.020+03:00
e[36m [EXPORT_WORKER][2024-04-05 08:30:00] [2024-04-05 08:30:00,018: INFO/MainProcess] Task baserow.contrib.database.fields.tasks.run_periodic_fields_updates[1175c53f-86f2-404c-a9b2-0bacbeb2bc72] received e(Be[m 
e[36m [EXPORT_WORKER][2024-04-05 08:30:00] [2024-04-05 08:30:00,018: INFO/MainProcess] Task baserow.contrib.database.fields.tasks.run_periodic_fields_updates[1175c53f-86f2-404c-a9b2-0bacbeb2bc72] received e(Be[m	
logs/BASEROW/ed3fc8ee35c54e0092d9f2dead7565a2


2024-04-05T11:30:00.067+03:00
e[36m [EXPORT_WORKER][2024-04-05 08:30:00] 
[2024-04-05 08:30:00,066: ERROR/ForkPoolWorker-1] 
Task baserow.contrib.database.fields.tasks.run_periodic_fields_updates[1175c53f-86f2-404c-a9b2-0bacbeb2bc72] raised unexpected: 
ProgrammingError('relation "core_workspace" does not exist\nLINE 1: ...ts_taken_updated_at", "core_workspace"."now" FROM "core_work...\n                                                             


e[36m [EXPORT_WORKER][2024-04-05 08:30:00] 
[2024-04-05 08:30:00,066: ERROR/ForkPoolWorker-1] Task baserow.contrib.database.fields.tasks.run_periodic_fields_updates[1175c53f-86f2-404c-a9b2-0bacbeb2bc72] raised unexpected: 
ProgrammingError('relation "core_workspace" does not exist\nLINE 1: ...ts_taken_updated_at", "core_workspace"."now" FROM "core_work...\n ^\n') e(Be[m	logs/BASEROW/ed3fc8ee35c54e0092d9f2dead7565a2


2024-04-05T11:32:36.710+03:00
e[36m [EXPORT_WORKER][2024-04-05 08:32:32] [2024-04-05 08:32:32,748: WARNING/ForkPoolWorker-1]  e(Be[m 
e[34m [BACKEND][2024-04-05 08:32:36] 127.0.0.1:40602 - "GET /api/builder/domains/published/by_name/10.0.148.39/ HTTP/1.1" 404 e(Be[m 


e[36m [EXPORT_WORKER][2024-04-05 08:32:32] [2024-04-05 08:32:32,748: WARNING/ForkPoolWorker-1] e(Be[m 
e[34m [BACKEND][2024-04-05 08:32:36] 127.0.0.1:40602 - "GET /api/builder/domains/published/by_name/10.0.148.39/ HTTP/1.1" 404 e(Be[m	
logs/BASEROW/ed3fc8ee35c54e0092d9f2dead7565a2


2024-04-05T11:32:41.205+03:00
e[36m [EXPORT_WORKER][2024-04-05 08:32:39] [2024-04-05 08:32:39,552: WARNING/ForkPoolWorker-1]  e(Be[m 
e[34m [BACKEND][2024-04-05 08:32:41] 127.0.0.1:50476 - "GET /api/builder/domains/published/by_name/10.0.148.39/ HTTP/1.1" 404 e(Be[m 


e[36m [EXPORT_WORKER][2024-04-05 08:32:39] [2024-04-05 08:32:39,552: WARNING/ForkPoolWorker-1] e(Be[m 
e[34m [BACKEND][2024-04-05 08:32:41] 127.0.0.1:50476 - "GET /api/builder/domains/published/by_name/10.0.148.39/ HTTP/1.1" 404 e(Be[m	
logs/BASEROW/ed3fc8ee35c54e0092d9f2dead7565a2


2024-04-05T11:32:44.931+03:00
e[36m [EXPORT_WORKER][2024-04-05 08:32:42] [2024-04-05 08:32:42,901: WARNING/ForkPoolWorker-1]  e(Be[m 
e[34m [BACKEND][2024-04-05 08:32:44] 127.0.0.1:50490 - "GET /api/builder/domains/published/by_name/10.0.148.39/ HTTP/1.1" 404 e(Be[m 

Also, here are the logs by filer “error”

e[35m [CADDY][2024-04-05 08:29:11] {
    "level": "warn",
    "ts": 1712305751.3467083,
    "logger": "tls",
    "msg": "unable to get instance ID; storage clean stamps will be incomplete",
    "error": "open /baserow/data/caddy/data/caddy/instance.uuid: no such file or directory"
} e(Be[m 


e[36m [BEAT_WORKER][2024-04-05 08:29:26]     Sleeping for 15 before starting beat to prevent  startup errors. e(Be[m 
e[34m [BACKEND][2024-04-05 08:29:51]         Applying auth.0007_alter_validators_add_error_messages... OK e(Be[m 
e[36m [EXPORT_WORKER][2024-04-05 08:30:00]   psycopg2.errors.UndefinedTable: relation "core_workspace" does not exist e(Be[m 
e[36m [EXPORT_WORKER][2024-04-05 08:30:00]   psycopg2.errors.UndefinedTable: relation "core_notificationrecipient" does not exist e(Be[m 

Observations:

When I go through my public dns name, it redirecting me to home page, where I can register my self and able to open dashboard, there I can see that all health status are ok, however ALB receives unhealthy target

Grace period extended to 1800 seconds (30mins)
Port open for (80, 443, 3000, 8000) - No security group issue (Tested with other images)

I’ve used Cloudron to install Baserow on my AWS EC2 instance and it made the whole process significantly simpler. Maybe it helps?

Cloudron definitely makes things much simpler!

@Unity, if I understand you correctly, the Baserow instance is working as expected, but it’s becoming unhealthy and the load balancer doesn’t like that. I don’t have that much experience with AWS, but how is the health check being performed there? Is that something that the load balancer does?

It’s also a bit strange that you’re getting errors relation "core_workspace" does not exist. This indicates that either the database migrations haven’t run, which is definitely going to cause problems, or that a bigger problem is at play here.

1 Like

Hi @bram

The issue has been resolved-ish by deceiving the ALB Target Group, I just changed the status code from 200 to range of 200-499, which is not correct approach for production. Can you please clarify, on which port the Health Check is responding? I’ve tried 80, 443, 3000, and 8000 all gives a 404 (Not found), also I’ve observed this error:


e[35m [CADDY][2024-04-05 17:15:40] 
{
    "level": "warn",
    "ts": 1712337340.9474046,
    "logger": "tls",
    "msg": "unable to get instance ID; storage clean stamps will be incomplete",
    "error": "open /baserow/data/caddy/data/caddy/instance.uuid: no such file or directory"
}
 e(Be[m 

And

[34m [BACKEND][2024-04-05 17:15:59] 127.0.0.1:58276 - "GET /api/builder/domains/published/by_name/10.0.131.4/ HTTP/1.1" 404 e(Be[m 

I have no clue why /api/builder/domains/published/by_name/ tries to fetch from outside IP instead of internal one, (127.0.0.1 or localhost or 0.0.0.0)

And this type of 404 error is all around


All other components (except SMTP) have been configured (Redis, PostgreSQL, S3)

The problem seems to be related to your ALB trying to reach your container/server under a different hostname. The backend of Baserow accepts requests with a hostname equal to the hostname of BASEROW_PUBLIC_URL, but if your load balancer uses the internal IP address of your server, then the request will be rejected because that’s not an allowed address.

Would it be possible to configure a different hostname for the health check with the ALB?

The request to /api/builder/domains/published/by_name/ is related to Caddy checking if it should automatically obtain an SSL certificate for that domain. This is because the application builder built into Baserow can publish applications to domains. It’s expected that this is happening, and failing with a 404 if the load balancer makes a request.