I have a repo (http://cvmfs0-psu0.galaxyproject.org/cvmfs/data.galaxyproject.org) that I’ve just uploaded a few fresh TB of data to. I then noticed that some of the files were missing global read permissions, but an in-transaction chmod is taking days because some of the affected files are > 1 TB and it appears that the entire file has to be pulled into the publisher cache in order to perform the chmod (pretty surprising and how does this work when the file size is >> cache quota?).¹
I started investigating catalog hacking to fix this. I located the correct catalog and am able to adjust the mode, but this changes the catalog hash, so I’d need to hack the parent catalogs up to the root, modify .cvmfspublished, and sign it, if I understand everything correctly.
I am sure this is a bad idea and definitely the wrong way to go about it, but I don’t see a cvmfs_swissknife command to do this more properly. Ultimately I’ll probably just let the chmod run to completion, but it’d be nice to know if there’s an alternative.
¹ As to how we’ll actually be able to use TB-order files on clients when it’s this slow to read, I haven’t been able to verify yet but I think it won’t be so problematic since these giant files are indexed and the tools that read them perform byte range requests, which CVMFS performs quite well at.
This will only do a chown, correct? I need to do a chmod.
EDIT: saw your edit. I did look at it before to see if I could adapt it to perform a chmod, but under the hood it’s just calling cvmfs_swissknife migrate with its corresponding uid and gid map options, so I’d have to tear apart cvmfs_swissknife migrate itself in order to potentially implement this.
Probably a lot will depend on the access patterns. If a lot of jobs are reading the same blocks of the files and they don’t read too many GB each, and especially if the reading is spread out over the life of the jobs, it could be OK. If jobs are reading different data or doing only partial sharing that could be a big problem. If there’s partial sharing, the CVMFS external server feature might be the way to go, using caches with high speed file servers the way the OSG Data Federation does it.