Importing large databases from AirTable

Hello, all. I’m using Baserow 1.22.2 on a DO droplet w/ 128GB RAM, 1TB disk. I’m trying to move some large airtable bases to baserow. I get the following exception in the logs:

[2024-01-18 15:23:23,043: ERROR/ForkPoolWorker-14] Task baserow.core.jobs.tasks.run_async_job[498618e7-1701-48d6-9fd0-32b1c5cd363a] raised unexpected: LargeZipFile('Central directory offset would require ZIP64 extensions')
Traceback (most recent call last):
  File "/app/code/backend/src/baserow/contrib/database/airtable/handler.py", line 330, in download_files_as_zip
    files_zip.writestr(file_name, response.content)
  File "/usr/lib/python3.10/zipfile.py", line 1816, in writestr
    with self.open(zinfo, mode='w') as dest:
  File "/usr/lib/python3.10/zipfile.py", line 1519, in open
    return self._open_to_write(zinfo, force_zip64=force_zip64)
  File "/usr/lib/python3.10/zipfile.py", line 1611, in _open_to_write
    self._writecheck(zinfo)
  File "/usr/lib/python3.10/zipfile.py", line 1726, in _writecheck
    raise LargeZipFile(requires_zip64 +
zipfile.LargeZipFile: Zipfile size would require ZIP64 extensions

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/code/env/lib/python3.10/site-packages/celery/app/trace.py", line 451, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/app/code/env/lib/python3.10/site-packages/celery/app/trace.py", line 734, in __protected_call__
    return self.run(*args, **kwargs)
  File "/app/code/backend/src/baserow/core/jobs/tasks.py", line 34, in run_async_job
    JobHandler().run(job)
  File "/app/code/backend/src/baserow/core/jobs/handler.py", line 59, in run
    return job_type.run(job, progress)
  File "/app/code/backend/src/baserow/contrib/database/airtable/job_types.py", line 114, in run
    database = action_type_registry.get(
  File "/app/code/backend/src/baserow/contrib/database/airtable/actions.py", line 56, in do
    database = AirtableHandler.import_from_airtable_to_workspace(
  File "/app/code/backend/src/baserow/contrib/database/airtable/handler.py", line 609, in import_from_airtable_to_workspace
    baserow_database_export, files_buffer = cls.to_baserow_database_export(
  File "/app/code/backend/src/baserow/contrib/database/airtable/handler.py", line 537, in to_baserow_database_export
    user_files_zip = cls.download_files_as_zip(
  File "/app/code/backend/src/baserow/contrib/database/airtable/handler.py", line 327, in download_files_as_zip
    with ZipFile(files_buffer, "a", ZIP_DEFLATED, False) as files_zip:
  File "/usr/lib/python3.10/zipfile.py", line 1312, in __exit__
    self.close()
  File "/usr/lib/python3.10/zipfile.py", line 1839, in close
    self._write_end_record()
  File "/usr/lib/python3.10/zipfile.py", line 1914, in _write_end_record
    raise LargeZipFile(requires_zip64 +
zipfile.LargeZipFile: Central directory offset would require ZIP64 extensions

I see the following code in ./backend/src/baserow/contrib/database/airtable/handler.py

6ccf1ef7f2 (Bram Wiepjes     2022-02-16 14:56:21 +0000 327)         with ZipFile(files_buffer, "a", ZIP_DEFLATED, False) as files_zip:

That last ‘False’ seems to explicitly disable the use of zip64 extensions which are necessary when working with zip files > 2GB. I changed that to a ‘True’ and my imports now appear to succeed, but I’m not sure if that’s there’s a better approach or if this is just a bug in Baserow. If it is a bug, is there a way to get the fix upstreamed so I don’t have to run a custom copy?

Thank you!

Hi @HappyFriday!

I personally don’t see any reason why the bigger Zip files shouldn’t be allowed, certainly so for self hosters (our saas offering can always have some limits).

Do you want to contribute this change and open a MR?