Failed to upload /var/spool/cvmfs/my.repo.org/tmp/snapshotting

Hi,

I am investigating how to move all Stratum-1 data to a Network File System (CephFS).
When all data is on CephFS, everything works fine:

[root@cvmfs-stratum1-01 ~]# ls -ltr /srv/cvmfs
lrwxrwxrwx 1 root root 31 Nov 29 15:35 /srv/cvmfs -> /mnt/cvmfs/stratum-1/srv/cvmfs/
[root@cvmfs-stratum1-01 ~]# ls -ltr /var/spool/cvmfs
lrwxrwxrwx 1 root root 36 Nov 29 15:35 /var/spool/cvmfs -> /mnt/cvmfs/stratum-1/var/spool/cvmfs

However, there is a problem with this architecture.
The transactions happening in .../data/txn/ are imposing a heavy load on the FileSystem.
Because of that, I am trying to move that directory to local host:

[root@cvmfs-stratum1-01 ~]# ls -ltr /mnt/cvmfs/stratum-1/srv/cvmfs/atlas.cern.ch/data/txn
lrwxrwxrwx 1 root root 33 Apr 10 11:49 /mnt/cvmfs/stratum-1/srv/cvmfs/atlas.cern.ch/data/txn -> /srv/cvmfs.txn/atlas.cern.ch/txn/

where the directory /srv/cvmfs.txn/ is on local disk. It has full permissions, all the way down:

[root@cvmfs-stratum1-01 ~]# ls -ltr /srv/cvmfs.txn
drwxrwxrwx 3 root root 17 Apr 10 11:06 atlas.cern.ch

[root@cvmfs-stratum1-01 ~]# ls -ltr /srv/cvmfs.txn/atlas.cern.ch
drwxrwxrwx 2 root root 150 Apr 10 13:18 txn

However, snapshots fail:

[root@cvmfs-stratum1-01 ~]# cvmfs_server snapshot atlas.cern.ch
failed to upload /var/spool/cvmfs/atlas.cern.ch/tmp/snapshotting

Which is odd, since the command was capable of creating that file (it didn’t exist before):

[root@cvmfs-stratum1-01 ~]# ls -ltr /srv/cvmfs.txn/atlas.cern.ch/txn/
-rw-r--r-- 1 root root 29 Apr 10 13:29 snapshotting

What is going on? Any tip or advice?

Cheers,
Jose

Hmm. Removing the links seems to work…

Removing which symlink makes it work? The one under /var/spool? I would replace that symlink with one pointing to /srv rather then making it go out to CephFS first.

By default, the tmp link under /var/spool/cvmfs/<repo> goes to /srv/, which I have in CephFS

[root@cvmfs-stratum1-02 ~]# ls -ltr /var/spool/cvmfs/atlas.cern.ch/
lrwxrwxrwx 1 root root 33 May 26  2023 tmp -> /srv/cvmfs/atlas.cern.ch/data/txn

[root@cvmfs-stratum1-02 ~]# ls -ltr /srv/
lrwxrwxrwx 1 root root 31 Nov  1 15:49 cvmfs -> /mnt/cvmfs/stratum-1/srv/cvmfs/

So my last test was to remove that /tmp/ link, and create the directory:

[root@cvmfs-stratum1-01 ~]# ls -ltr /var/spool/cvmfs/atlas.cern.ch/
drwxrwxrwx 2 root root 4096 Apr 11 00:14 tmp

CVMFS works, but the load on CephFS is still very high.

Right, that also prevents putting those temporary files on CephFS, but puts them in /var instead of /srv/cvmfs.txn/. If those are different filesystems, putting a symlink there instead of a directory should also work.

I would expect a stratum 1 to put a high load on any filesystem that holds its data, due to its huge number of small files, frequent accesses, and operations that scan large numbers of files such as gc. I’m not sure that there’s much to be done about that.

The CephFS admin claims that the highest I/O load is on the directories /txn/
That’s why I am looking for ways to put that directory on local host, outside CephFS. However, as reported, if I make it a link, it doesn’t matter the exact path, I always get the same error:

[root@cvmfs-stratum1-02 ~]# ls -l /mnt/cvmfs/stratum-1/srv/cvmfs/hermes.desy.de/data/ | grep txn
lrwxrwxrwx 1 root root 36 Apr 16 10:03 txn -> /var/spool/cvmfs/hermes.desy.de/txn/

[root@cvmfs-stratum1-02 ~]# ls -ltr /var/spool/cvmfs/hermes.desy.de/
-rw-r--r-- 1 root root   40 Apr 16 09:59 reflog.chksum
-rw-r--r-- 1 root root 7168 Apr 16 09:59 stats.db
lrwxrwxrwx 1 root root   35 Apr 16 10:08 tmp -> /mnt/cvmfs/stratum-1/srv/cvmfs/hermes.desy.de/data/txn/
drwxr-xr-x 2 root root   26 Apr 16 10:09 txn

[root@cvmfs-stratum1-02 ~]# cvmfs_server snapshot hermes.desy.de
failed to upload /var/spool/cvmfs/hermes.desy.de/tmp/snapshotting

You have a circular link there. Those two links point to each other. Do

# rm -f /var/spool/cvmfs/hermes.desy.de/tmp
# ln -s /srv/cvmfs.txn/hermes.desy.de/txn /var/spool/cvmfs/hermes.desy.de/tmp

I don’t think that /mnt/cvmfs/stratum-1/srv/cvmfs/hermes.desy.de/data/txn will get used if no symlink is pointing to it so that shouldn’t matter.

no, they are not circular. Note I created a dedicated local /txn/ directory.
They look like this:

/var/spool/…/tmp → /mnt/…/srv/…/data/txn/
/mnt/…/srv/…/data/txn → /var/spool/…/txn/

so, at the end, /txn/ is on local host. And then is when CVMFS fails.

Oh. If you want it to be in /var/spool anyway you might as well just delete the tmp link, bypassing Ceph. If you want it to be in /srv/cvmfs.txn then make a link going directly there at tmp.