How to run GC when a filesystem has already filled up

rptaylor · February 28, 2024, 10:40pm

Hello,

We have another system where the /srv/cvmfs filesystem filled up, due to CVMFS_AUTO_TAG_TIMESPAN ignored by gateway · Issue #3017 · cvmfs/cvmfs · GitHub (We really need a fix for this, especially considering that CVMFS may not safely handle a filesystem filling up: data corruption due to full filesystem, and snapshot fails with "unexpected HTTP error code 200" · Issue #3268 · cvmfs/cvmfs · GitHub).

In this situation the system can not run GC:

Starting soft-dev.computecanada.ca at Wed Feb 28 11:37:38 PST 2024
Running Garbage Collection 
failed to load repository manifest (6 - manifest signature is invalid)
Fail (6)!
ERROR from cvmfs_server gc!
Finished soft-dev.computecanada.ca at Wed Feb 28 11:37:38 PST 2024

The signature expired because the system was failing to snapshot for some time, because the filesystem got full. So in effect we need to reduce the storage usage but can not because the storage is fully used. There is a lot of stale data that would be removed by GC if we could run it though.

Aside from getting more storage is there any workaround that would enable running GC in this state, or any other way to recover (short of deleting and replicating the repository again) ?

Thanks!

dwd · February 29, 2024, 2:56pm

It’s going to have to have at least a little free space from somewhere. You’ll have to find something to delete, or maybe symlink to another filesystem temporarily. Then hopefully you can copy corrupted file(s) from a stratum 1, or do transaction/publish or possibly resign/resign -p.

rptaylor · February 29, 2024, 8:51pm

Somehow the situation got mostly resolved. The day before I had expanded the volume to use all the remaining storage. I ran GC then but it still didn’t work. However it was enough for a snapshot to begin, I thought it failed and consumed again all remaining disk space but it must have eventually succeeded with just barely enough space, allowing the subsequent automated GC to run.

However after the GC finished the storage usage was unexpectedly high, I found there was 113 GB in the data/txn directory, mostly leftover cvmfs.***** files from years ago, must be junk.
It would be nice if GC would also clean that up. Perhaps https://its.cern.ch/jira/browse/CVM-1462 could still take care of that. Are JIRA issues still looked at?

Thanks.

vavolkl · March 1, 2024, 8:18am

Regarding JIRA issues, we do still look at them, but if there’s something we should prioritise, feel free to open a github issue pointing to it. I opened Automatically clean out data/txn files after failures · Issue #3523 · cvmfs/cvmfs · GitHub just now and will look into it.