SQLite error on publish

jowhite · May 1, 2025, 6:43am

We’re trying to do our initial publish on a stratum0 of about 1.1TB over 9824776 inodes. We’re getting past the initial scan and upload as well as the automatic nested catalog creation but we hit an SQLite error immediately after that. I’m not exactly sure which sqlite db this is indicating issues with.

Automatic creation of nested catalog in '/rocky-8.x86_64/manual/modules/tools/matlab/r2024b/sys'
Failed to create common properties table
SQLite said: 'unable to open database file'
cvmfs_swissknife: /home/sftnight/jenkins/workspace/CvmfsFullBuildDocker/CVMFS_BUILD_ARCH/docker-x86_64/CVMFS_BUILD_PLATFORM/cc8/build/BUILD/cvmfs-2.12.7/cvmfs/catalog_mgr_rw.cc:776: void catalog::WritableCatalogManager::CreateNestedCatalog(const string&): Assertion `NULL != new_catalog_db' failed.
/bin/cvmfs_server: line 4128: 21490 Killed                  $user_shell "$sync_command"
Synchronization failed

Executed Command:
cvmfs_swissknife sync -u /cvmfs/[redacted] -s /var/spool/cvmfs/[redacted]/scratch/current -c /var/spool/cvmfs/[redacted]/rdonly -t /var/spool/cvmfs/[redacted]/tmp -b b3cb98405af072157f638159772afd259eef7ac9-shake128 -r S3,/var/spool/cvmfs/[redacted]/tmp,[redacted]@/etc/cvmfs/s3.conf -w https://[redacted]/[redacted]/[redacted] -o /var/spool/cvmfs/[redacted]/tmp/manifest -e shake128 -Z none -N [redacted] -K /etc/cvmfs/keys/[redacted].pub -L -D generic-2025-05-01T01:07:12Z -f overlayfs -p -l 4194304 -a 8388608 -h 16777216 -A -i -U 10240

dwd · May 1, 2025, 4:39pm

I expect that’s a bigger single publish than anybody has attempted so far. Could you break it up into smaller publish operations?

Dave

jowhite · May 1, 2025, 4:55pm

So that’s an ongoing question I’ve had, I’ve been playing with configurations (VM sizing of stratum0, s3 vs posix backend, s3 implementations, etc). I hit a scaling limit in different places with different errors each time. I have, indeed, tried piecemeal ingest for my initial but that scares me about the stability/maintainability of the tool as a whole so I’ve been trying all sorts of things to get the initial to just work.

Is there a generalized sizing guide anywhere? I’ve seen the docs on “if you’re hosting datasets, do x with garbage collection, y with grafting” and so on but none that really detail “you’re going to want x amount of memory per object” or “injest generally can handle y inodes”

dwd · May 1, 2025, 6:14pm

Not that I’m aware of, sorry. We certainly have stratum0s that manage repositories of that size, but they have been built up over time. I think it’s probably not a good idea to publish more than a few hundred gigabytes at a time.

In my experience the best performance for stratum0s is with a physical machine with locally attached disk. I know CERN does it other ways but it seems to me they end up needing much more resources in the end.

Our one 64-core 96G ram stratum 0 server at Fermilab hosts 37 repositories with currently 7.7T compressed & deduplicated cvmfs files and 1.1G inodes. Publishes use temporary space before compression & deduplication in /var/spool/cvmfs and that is sized at 720G. We’ve never had anybody complain about poor performance.

Dave