Perform a catalog chmod?

I have a repo (http://cvmfs0-psu0.galaxyproject.org/cvmfs/data.galaxyproject.org) that I’ve just uploaded a few fresh TB of data to. I then noticed that some of the files were missing global read permissions, but an in-transaction chmod is taking days because some of the affected files are > 1 TB and it appears that the entire file has to be pulled into the publisher cache in order to perform the chmod (pretty surprising and how does this work when the file size is >> cache quota?).¹

I started investigating catalog hacking to fix this. I located the correct catalog and am able to adjust the mode, but this changes the catalog hash, so I’d need to hack the parent catalogs up to the root, modify .cvmfspublished, and sign it, if I understand everything correctly.

I am sure this is a bad idea and definitely the wrong way to go about it, but I don’t see a cvmfs_swissknife command to do this more properly. Ultimately I’ll probably just let the chmod run to completion, but it’d be nice to know if there’s an alternative.

¹ As to how we’ll actually be able to use TB-order files on clients when it’s this slow to read, I haven’t been able to verify yet but I think it won’t be so problematic since these giant files are indexed and the tools that read them perform byte range requests, which CVMFS performs quite well at.

Check out the cvmfs_server catalog-chown command.

Oh that’s of course a little different than what you’re asking for but it should give some clues about how to go about doing a catalog chmod.

This will only do a chown, correct? I need to do a chmod.

EDIT: saw your edit. I did look at it before to see if I could adapt it to perform a chmod, but under the hood it’s just calling cvmfs_swissknife migrate with its corresponding uid and gid map options, so I’d have to tear apart cvmfs_swissknife migrate itself in order to potentially implement this.

Probably a lot will depend on the access patterns. If a lot of jobs are reading the same blocks of the files and they don’t read too many GB each, and especially if the reading is spread out over the life of the jobs, it could be OK. If jobs are reading different data or doing only partial sharing that could be a big problem. If there’s partial sharing, the CVMFS external server feature might be the way to go, using caches with high speed file servers the way the OSG Data Federation does it.

Thanks for the pointers about read performance, this will be helpful once we’re testing.

I see. Maybe that wouldn’t be so bad, however. Maybe adding more options to the cvmfs_swissknife migrate command to do chown-like things would be the easiest way to go.

I’ll have a look, it’ll probably take me longer to implement at that level than the chmod, though. But it’s good to know I’m not missing some way to do this with the existing tools.

I’ll take a look if we can add this functionality, seems like this should be possible with CVMFS tooling. Just wanted to point out that as of 2.11, there is also a workaround on the client side by setting CVMFS_WORLD_READABLE=yes (Add CVMFS_WORLD_READABLE (#3012) - cleanup of test by HereThereBeDragons · Pull Request #3115 · cvmfs/cvmfs · GitHub)

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.