Reclaiming space

Hello,

since I installed cernvm-fs at our institution to handle how we distribute software to our workstations and laptops, things have gone so smoothly that now I’m very rusty with its administration… :slight_smile: But I think it is about time that I start to worry about the disk space taken by the repositories both in the Stratum-0 and Stratum-1s.

Frist, I would like to see how much space is used at present. Is there any cvmfs utility to report some stats on this, or is it down to using the system tools to check the space in the relevant directories?

Second, when I distribute a new version of the repository I always do it like: cvmfs_server publish -a “$tag” -m “$message” sie_u22.iac.es, but now we have 131 named snapshots and I would like to do some cleaning up. I know I can do cvmfs_server tag -r , but this would be a bit painful to do one by one. Is there a way to just say “keep the latest 50 tags”? And once I do that in the Stratum-0 machine, to reclaim space in Stratum-1s, do I have to do anything manually or deletion would happen automatically?

In case it is relevant, our configuration file reads:

# Created by cvmfs_server.

CVMFS_AUTO_TAG_TIMESPAN=“2 weeks ago”
CVMFS_HASH_ALGORITHM=shake128
CVMFS_AUTOCATALOGS=true
CVMFS_ENFORCE_LIMITS=true
CVMFS_FORCE_REMOUNT_WARNING=false
CVMFS_FILE_MBYTE_LIMIT=2000

CVMFS_CREATOR_VERSION=143
CVMFS_REPOSITORY_NAME=sie_u22.iac.es
CVMFS_REPOSITORY_TYPE=stratum0
CVMFS_USER=root
CVMFS_UNION_DIR=/cvmfs/sie_u22.iac.es
CVMFS_SPOOL_DIR=/var/spool/cvmfs/sie_u22.iac.es
CVMFS_STRATUM0=http://localhost/cvmfs/sie_u22.iac.es
CVMFS_UPSTREAM_STORAGE=local,/srv/cvmfs/sie_u22.iac.es/data/txn,/srv/cvmfs/sie_u22.iac.es
CVMFS_USE_FILE_CHUNKING=true
CVMFS_MIN_CHUNK_SIZE=4194304
CVMFS_AVG_CHUNK_SIZE=8388608
CVMFS_MAX_CHUNK_SIZE=16777216
CVMFS_UNION_FS_TYPE=overlayfs
CVMFS_COMPRESSION_ALGORITHM=default
CVMFS_EXTERNAL_DATA=false
CVMFS_AUTO_TAG=true
CVMFS_GARBAGE_COLLECTION=false
CVMFS_AUTO_REPAIR_MOUNTPOINT=true
CVMFS_ASYNC_SCRATCH_CLEANUP=true
CVMFS_PRINT_STATISTICS=false
CVMFS_UPLOAD_STATS_DB=false
CVMFS_UPLOAD_STATS_PLOTS=false
CVMFS_IGNORE_XDIR_HARDLINKS=true

Hi Angel,

Good to hear that operations have been smooth!

First, I would like to see how much space is used at present. Is there any cvmfs utility to report some stats on this, or is it down to using the system tools to check the space in the relevant directories?

No, there’s no cvmfs utility to check the diskspace - all data that makes up the repository resides in a single location ( defined by CVMFS_UPSTREAM_STORAGE, usually /srv/cvmfs/ if it’s on a local disk, or an s3 bucket ), so it should be easy to check with system tools. Either du -h -d 1 /srv/cvmfs/sie_u22.iac.es/ or s3cmd du <bucket>/

Is there a way to just say “keep the latest 50 tags”?

Not really. That exists only for auto-generated tags - if you set CVMFS_AUTO_TAG=true it will generate tags like generic-2025-08-27T12:44:46Z for every publication. You can then set CVMFS_AUTO_TAG_TIMESPAN="1 month ago" to clean up autotags older than a month (when doing new publications). For named snapshots, it should be possible to parse the output of cvmfs_server list and do similar deletions in a script. The autotag cleanup essentiall does the same thing. Let me open an issue to add an option to the server tools eventually.

And once I do that in the Stratum-0 machine, to reclaim space in Stratum-1s, do I have to do anything manually or deletion would happen automatically?

Nothing is deleted from cvmfs repository backend stores by default, even when you delete files in a publication. You have to run the garbage collection (cvmfs_server gc), both on the stratum-0 and stratum-1 (it’s usually done in a cron job). I see you have gc disabled

CVMFS_GARBAGE_COLLECTION=false

so first you need to change that to true, and then do one (empty) publication for the setting to take effect.

Cheers,
Valentin

It’s not difficult to make your own script to delete tags. You can see all existing tags with

cvmfs_server tag -xl sie_u22.iac.es | awk '{print $1}'

Dave

I was actually doing this a few minutes ago, and my one liner to remove the oldest 50 named snapshots ended up being a bit ugly but it did the job :slight_smile:

for tag in `sudo cvmfs_server tag -lx sie_u22.iac.es | tac | head -n 50 | awk '{print $1}' | tr '\n' ' '` ; do sudo cvmfs_server tag -f -r $tag sie_u22.iac.es ; done

Thanks a lot Valentin. That was really useful. I cleaned up the name snapshots list, and for the moment I realized that I probably can live without garbage collection as there are very few deletions in the repository, and it is mostly adding new packages, so I don’t think I would gain much from garbage collection, so the lazy in me tells me to postpone this for the time being…