Cvmfs_server mkfs from systemd unit

nmori · September 6, 2022, 2:24pm

I’m setting up a containerized publisher for my repository. The container starts systemd as PID 1, and in the Dockerfile I enable a unit calling a shell script which creates the repository for the publisher:

cvmfs_server mkfs -w https://my.cvmfs.s3.endpoint/cvmfs -s /s3_backend_config -u gw,/srv/cvmfs/my.repo/data/txn,https://my.cvmfs.gateway/api/v1 -o root my.repo

The script is then launched by systemd at container startup, and it executes with no error. But then I get:

# cvmfs_server list
my.repo (stratum0 / gw - unhealthy)  
# ls /cvmfs/my.repo
ls: reading directory /cvmfs/my.repo: Transport endpoint is not connected

If I disable the unit and launch the script manually from the container shell after boot then everything works. So I think it’s an issue of cvmfs_server mkfs being run by a systemd unit, probably related to FUSE since the Transport endpoint is not connected seems to be a FUSE error message. But I can’t figure out what might be wrong.

Here is how I launch the container:

docker run -it --rm --name tester --tmpfs /tmp --tmpfs /run --device /dev/fuse  --cap-add SYS_ADMIN --security-opt apparmor:unconfined -v /sys/fs/cgroup:/sys/fs/cgroup:rw -v cvmfs_temp:/var/spool/cvmfs
publisher:latest

and the systemd unit that launches the script:

[Unit]
Description=Execute entrypoint.sh

[Service]
Type=oneshot
User=root
ExecStart=/bin/bash -c /entrypoint.sh

[Install]
WantedBy=multi-user.target

Thanks in advance for any help.

nmori · September 6, 2022, 3:32pm

I forgot to mention one detail: there are some messages that appears in the unit status but not when launching the script manually:

Sep 06 15:29:17 152c2096ad05 cvmfs2[164]: (my.repo) looks like cvmfs has been crashed previously
Sep 06 15:29:17 152c2096ad05 cvmfs2[164]: (my.repo) re-building cache database
Sep 06 15:29:17 152c2096ad05 cvmfs2[164]: (my.repo) CernVM-FS: linking /var/spool/cvmfs/my.repo/rdonly to repository my.repo

Nevertheless, the execution of the script within the unit goes on and cvmfs_server mkfs exits with success.

jakob · September 9, 2022, 12:02pm

The “Transport endpoint not connected” error is a fuse crash. We would need to modify the /usr/bin/cvmfs_server script to get more information. You can try to edit that file and find the create_spool_area_for_new_repository() function, where you would end before the last chown command

echo CVMFS_DEBUGLOG=/tmp/cvmfs.log > $spool_dir/client.local

We should then find more information about why the fuse module crashes in /tmp/cvmfs.log.

nmori · September 12, 2022, 8:02am

This is how I modified the code:

  if [ x"$CVMFS_UNION_FS_TYPE" = x"overlayfs" ]; then
    mkdir -p $ofs_workdir || return 2
  fi
  echo CVMFS_DEBUGLOG=/tmp/cvmfs.log > $spool_dir/client.local
  chown -R $CVMFS_USER /cvmfs/$name/ $spool_dir/

and then started the unit, but I got no log file in /tmp, even if in /var/spool/cvmfs/my.repo/client.local I actually see the CVMFS_DEBUGLOG line.

nmori · September 12, 2022, 4:41pm

I found something interesting. If I modify the script adding cvmfs_server list and ls /cvmfs/my.repo/ after cvmfs_server mkfs then correct messages are printed in the unit status, showing a healthy repo and the correct content in the mounted folder. So it seems that FUSE works well during the unit execution but when the unit finishes its execution it stops working.

Edit: adding a sleep infinity at the end of the script to maintain the unit running then everything works. I guess this can be a reasonable workaround at least for my use case.

nmori · September 13, 2022, 7:03am

I think I solved the problem. The error is due to the fact that my unit is a oneshot service: this means that at the end of the execution of the script the unit terminates, killing all the children processes spawned by the script including cvmfs2. Changing the service type to forking keeps the unit running and cvmfs2 alive, so that the mounted repo remains accessible.