Cvmfs_config setup hangs

Hi all,

default.local:

CVMFS_USE_GEOAPI=yes
CVMFS_HTTP_PROXY="http://ca-proxy.cern.ch:3128"

# It is advised to change these configs in the cache section of the helm values
# and leave them unchanged here, so they auto-generate.
CVMFS_QUOTA_LIMIT=1000
CVMFS_CACHE_BASE=/cvmfs-localcache

I’ve ran cvmfs_config setup, which then calls cvmfs2 -o parse '' / where the script seems got stuck. I’m trying to run cvmfs2 -o parse '' / -d manually and see why it hangs, but even with -d flag it doesn’t give any helpful output at the moment:

# cvmfs2 -o parse '' / -d
Debug: using library /usr/lib64/libcvmfs_fuse_stub.so
(no further progress beyond this line)

It seems to use quite a lot of CPU though:

# ps aux | grep cvmfs2
root        6673  0.0  0.1  51384  5196 ?        S    15:02   0:00 cvmfs2 -o parse  /
root       36497 93.8  0.0  51384   664 ?        R    15:18   4:29 cvmfs2 -o parse  /

Is there any way to get more debug output to troubleshoot this? Or any ideas what’s going on?

  • Kernel: 5.16.13-200.fc35.x86_64
  • Host OS: CentOS Linux 7 (Core)
  • CVMFS version: 2.9.4 (installed from centos packages)
  • I’m running this inside CERN network, http://ca-proxy.cern.ch is reachable

Thank you!

Cheers,
Robert

Hi Robert,

Do you have some other config at /etc/cvmfs/...?
Is this a container or bare metal // VM? In case of a container, how is the configuration mounted to /etc/cvmfs/...? ConfigMaps?

Hi Enrico,

this is running inside a container (see Dockerfile here, and Kubernetes DaemonSet manifest here) and yes, the configuration is mounted as a ConfigMap. I also tried to remove the ConfigMap mounts and run vanilla CVMFS client with defaults from the centos package, but the result is the same as in my original post.

What is interesting though, is that the command eventually finishes; it only takes around 45min to do so (in both cases with and without ConfigMaps mounted inside /etc/cvmfs):

I0113 15:02:31.692753    6639 driver.go:227] Exec-ID 2: Running command env=[] prog=/usr/bin/cvmfs_config args=[cvmfs_config setup]
I0113 15:48:51.470040    6639 driver.go:227] Exec-ID 2: Process exited: exit status 0

All this while consuming around 95% CPU by the cvmfs2 -o parse process.

Running strace on cvmfs2’s forked process reveals something is being looped over indefinitely.

strace -f cvmfs2 -o parse '' / -d

...
statfs("/sys/fs/selinux", {f_type=SELINUX_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0, 0]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_NOSUID|ST_NOEXEC|ST_RELATIME}) = 0
statfs("/sys/fs/selinux", {f_type=SELINUX_MAGIC, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={val=[0, 0]}, f_namelen=255, f_frsize=4096, f_flags=ST_VALID|ST_NOSUID|ST_NOEXEC|ST_RELATIME}) = 0
stat("/sys/fs/selinux", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
access("/etc/selinux/config", F_OK)     = 0
open("/etc/pki/tls/legacy-settings", O_RDONLY) = -1 ENOENT (No such file or directory)
access("/etc/system-fips", F_OK)        = -1 ENOENT (No such file or directory)
munmap(0x7f26f18b4000, 13599)           = 0
fstat(1, {st_mode=S_IFREG|0644, st_size=15598, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f26f18b7000
write(1, "Debug: using library /usr/lib64/"..., 54Debug: using library /usr/lib64/libcvmfs_fuse_stub.so
) = 54
umask(007)                              = 022
rt_sigprocmask(SIG_BLOCK, [USR1], NULL, 8) = 0
open("/etc/cvmfs/default.conf", O_RDONLY) = 3
pipe([4, 5])                            = 0
pipe([6, 7])                            = 0
pipe([8, 9])                            = 0
pipe([10, 11])                          = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f26f18afa10) = 3484648
wait4(3484648, strace: Process 3484648 attached
 <unfinished ...>
[pid 3484648] set_robust_list(0x7f26f18afa20, 24) = 0
[pid 3484648] dup2(4, 0)                = 0
[pid 3484648] dup2(7, 1)                = 1
[pid 3484648] dup2(9, 2)                = 2
[pid 3484648] getrlimit(RLIMIT_NOFILE, {rlim_cur=1073741816, rlim_max=1073741816}) = 0
[pid 3484648] close(3)                  = 0
[pid 3484648] close(4)                  = 0
[pid 3484648] close(5)                  = 0
[pid 3484648] close(6)                  = 0
[pid 3484648] close(7)                  = 0
[pid 3484648] close(8)                  = 0
[pid 3484648] close(9)                  = 0
[pid 3484648] close(10)                 = 0
[pid 3484648] close(12)                 = -1 EBADF (Bad file descriptor)
[pid 3484648] close(13)                 = -1 EBADF (Bad file descriptor)
[pid 3484648] close(14)                 = -1 EBADF (Bad file descriptor)
[pid 3484648] close(15)                 = -1 EBADF (Bad file descriptor)
[pid 3484648] close(16)                 = -1 EBADF (Bad file descriptor)
[pid 3484648] close(17)                 = -1 EBADF (Bad file descriptor)
[pid 3484648] close(18)                 = -1 EBADF (Bad file descriptor)
[pid 3484648] close(19)                 = -1 EBADF (Bad file descriptor)
[pid 3484648] close(20)                 = -1 EBADF (Bad file descriptor)
... close(fd) = -1 EBADF (Bad file descriptor) is spammed indefinitely ...

This is possibly an environment issue (also). In an older env (see below), it runs just fine.

I’ve tried to debug this a bit, and was able to track it down to this:

In the newer env, max_fd evaluates to 1073741816, in the older it is 1048576. Making that many syscalls seems to slow things down a lot.

  • Old env:
    • Host OS: Fedora CoreOS 35.20220227.2.1 (5.16.13-200.fc35.x86_64)
    • containerd: v1.4.13 9cc61520f4cd876b86e77edfeb88fbcd536d1f9d
    • podman: 3.4.4
    • ulimit -n: 1048576
  • New env:
    • Host OS: Fedora CoreOS 36.20221007.dev.3 (5.19.9-200.fc36.x86_64)
    • containerd: v1.6.9 1c90a442489720eec95342e1789ee8a5e1b9536f
    • podman: 4.2.0
    • ulimit -n: 1073741816

I can open an issue on github if you prefer?

Cheers,
Robert

I am unable to fully reproduce the issue, but I think you are very right about the correlation with the limit of open files:

ulimit -n 1024
time cvmfs2 -o parse '' / -d
	real	0m0.026s
	user	0m0.009s
	sys	0m0.005s

ulimit -n 1048576
time cvmfs2 -o parse '' / -d
	real	0m3.385s
	user	0m1.912s
	sys	0m1.458s

Let me raise this with the developers.
The limit of 1073741816 in your new environment seems very high, though.

Many thanks for the investigation! Now tracked here: High limit of open file descriptors slows down `exec` wrapper · Issue #3158 · cvmfs/cvmfs · GitHub