I have a rather large repository. It is divided up into directories based on micro-architecture and compiler. However, one of the directories is larger than others. I decided to use the “Automatic Management of Nested Catalogs” feature. However, that leads to messages like the following:
Couldn't create a new nested catalog in any subdirectory of '/2022.1/apps/linux-centos7-x86_64/gcc-9.4.0' even though currently it is overflowed
Looking at the output of
├─ 295093 /2022.1/apps/linux-centos7-x86_64/gcc-9.4.0
How big of a problem is the above? Is there anyway that I can resolve it without having to resort to splitting that directory up further?
I think that an occasional nearly 300K-entry catalog, as long as it isn’t the root catalog, is probably not going to be very harmful and you can ignore it.
Would it help if I bump
CVMFS_AUTOCATALOGS_MAX_WEIGHT to 200000. As far as I can tell, it defaults to 100000. I am thinking that if the deeper level catalogs are larger than the upper level catalog would be smaller. If I change that variable setting how do I force a regeneration of the catalogs?
I don’t know much about the autocatalogs. I don’t trust them and so avoid them.
You probably don’t have any catalogs under the gcc-9.4.0 directory, do you?
The way to regenerate the catalogs is to do a transaction/publish cycle.
Okay; good to know they are not trusted. I do have catalogs under that directory. Here is a snippet:
├─ 91902 /2022.1/apps/linux-centos7-x86_64
│ ├─ 295093 /2022.1/apps/linux-centos7-x86_64/gcc-9.4.0
│ │ ├─ 1170 /2022.1/apps/linux-centos7-x86_64/gcc-9.4.0/wxwidgets-3.0.2-jb5g72t
│ │ ├─ 1780 /2022.1/apps/linux-centos7-x86_64/gcc-9.4.0/tk-8.6.11-eivwzz4
│ │ ├─ 15313 /2022.1/apps/linux-centos7-x86_64/gcc-9.4.0/texlive-live-kzx3ktv
│ │ ├─ 91587 /2022.1/apps/linux-centos7-x86_64/gcc-9.4.0/texlive-20210325-tutllta
│ │ │ ├─ 54243 /2022.1/apps/linux-centos7-x86_64/gcc-9.4.0/texlive-20210325-tutllta/texmf-dist/fonts
│ │ │ │ ├─ 72461 /2022.1/apps/linux-centos7-x86_64/gcc-9.4.0/texlive-20210325-tutllta/texmf-dist/fonts/tfm
│ │ ├─ 3740 /2022.1/apps/linux-centos7-x86_64/gcc-9.4.0/tcl-8.6.11-wsizrjd
It seemed the only way to regenerate them was to delete the
.cvmfsautocatalog files, followed by a transaction/publish cycle. I thought there might be an easier way to do that.
The main reason I don’t trust it is that I don’t see how it can make good choices about where to put the catalogs without having any understanding about which pieces might be used together. If every case is going to load all the subcatalogs, there’s not much efficiency gained by splitting it up.
There is another tool that is helpful for determining which .cvmfsdirtab patterns to include. I often use the “catdirusage” tool that comes in GitHub - cvmfs-contrib/python-cvmfsutils which tells you how many files are in the current catalog in each subdirectory under a supplied path, sorted in increasing order. Example usage for the root directory on grid.cern.ch:
$ catdirusage http://cvmfs-stratum-one.cern.ch/cvmfs/grid.cern.ch /