Cvmfs_shrinkwrap caching

Hello,

I have a workflow in which I am downloading a large volume of files from a CVMFS repository using the cvmfs_shrinkwrap utility, then storing those files in a squashfs image.

On my machine with limited disk space, I find I am running out of space on the device although I should have plenty of space to support the files that I intend to download. It appears that all of the files are being duplicated in various caches that I don’t believe I need.

I tried to set the cache quota limit with CVMFS_QUOTA_LIMIT, but I get the following error:

LibCvmfs version 2.8, revision 30
Initialization failed: Failure: libcvmfs supports only unmanaged exclusive cache or alien cache. (2)

Is there no way to use the quota limit with cvmfs_shrinkwrap?

If I don’t need a proper cache, how would you recommend I handle this situation to keep disk space utilization to a minimum?

Any help is greatly appreciated.

Here are more details of my configuration.

  • Repository configuration file:
CVMFS_REPOSITORIES=[Repository Name]
CVMFS_REPOSITORY_NAME=[Repository Name]
CVMFS_CLAIM_OWNERSHIP=yes
CVMFS_CACHE_BASE=/shrinkwrap_cvmfs/src-cache
CVMFS_SERVER_URL="http://cvmfs-s1goc.opensciencegrid.org:8001/cvmfs/@fqrn@"
CVMFS_HTTP_PROXY=DIRECT
CVMFS_KEYS_DIR=/etc/cvmfs/keys/opensciencegrid.org
CVMFS_CATALOG_WATERMARK=64
CVMFS_MAX_RETRIES=3
CVMFS_TIMEOUT=60
CVMFS_TIMEOUT_DIRECT=60
  • cvmfs and cvmfs-shrinkwrap were installed with
sudo yum install -y https://ecsft.cern.ch/dist/cvmfs/cvmfs-release/cvmfs-release-latest.noarch.rpm \ 
    && yum install -y cvmfs.x86_64 cvmfs-shrinkwrap.x86_64 

The cvmfs_shrinkwrap utility does fill an unbounded client cache, and currently there is unfortunately no quick way to change this behavior. It should not fill more than that though, so your total space consumption should be limited by a factor of 2 wrt. the needed content. In the destination directory, the shrinkwrap utility used hard links to deduplicate files.

You can try to create the final directory tree piecewise from several sub directories, and to clean the client cache in-between the runs. But at that point, it may be simpler to find more disk space.

Cheers,
Jakob

Thanks Jakob.

It may be possible for me to get more disk space to work around the issue.

Do you think it would be possible and relatively easy to manually clear the cache of a single process? I’ve tried a naive approach of running a daemon to blindly clear the cache every minute, but it looks like I might be deleting files before CVMFS is done with them. Is there a clever way to check which files are ready to be removed or somehow force it to not use the cache?

Randomly deleting from the cache should work, in fact. Cvmfs keeps the file descriptors that it needs open. So the worst that can happen is that you unlink an open file and the space is not released until cvmfs closes the file.