Querying mount hangs session for sbn.osgstorage.org

Hi all,

We currently have an intermittent issue where the sbn.osgstorage.org repository hangs the session of the workernode we query. If we try to run a df -h the session hangs. Running a strace against the command produces the following:

stat("/run/docker/netns/d826fdf2cc20", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
stat("/pool/docker/overlay2/87217b60e0e5498555a601417fd55e8f5d7ddac923fc11f90cbb90339c317024/merged", {st_mode=S_IFDIR|0755, st_size=51, ...}) = 0
stat("/run/docker/netns/f56859ed682d", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
stat("/pool/docker/overlay2/454e26d1cb51d64a86744f7d3c52811b4daa2f76ab1f09cfe8be482a6a19b222/merged", {st_mode=S_IFDIR|0755, st_size=51, ...}) = 0
stat("/run/docker/netns/8a052834add1", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
stat("/cvmfs/sbn.osgstorage.org",

In the cvmfs logs I find the following:

Jul 12 17:34:23 lcg2426 cvmfs2[2224726]: (sbn.osgstorage.org) switching proxy from http://130.246.183.248:3128 to http://130.246.183.211:3128 (geosort)
Jul 12 17:34:23 lcg2426 cvmfs2[2224726]: (sbn.osgstorage.org) switching proxy from (none) to http://130.246.183.211:3128 (cloned)
Jul 12 17:34:23 lcg2426 cvmfs2[2224726]: (sbn.osgstorage.org) switching proxy from http://130.246.183.211:3128 to http://130.246.183.247:3128 (failed proxy)
Jul 12 17:34:23 lcg2426 cvmfs2[2224726]: (sbn.osgstorage.org) switching proxy from http://130.246.183.247:3128 to http://130.246.183.244:3128 (failed proxy)
Jul 12 17:34:23 lcg2426 cvmfs2[2224726]: (sbn.osgstorage.org) switching proxy from http://130.246.183.244:3128 to http://130.246.183.248:3128 (failed proxy)
Jul 12 17:34:23 lcg2426 cvmfs2[2224726]: (sbn.osgstorage.org) switching proxy from http://130.246.183.248:3128 to http://130.246.183.246:3128 (failed proxy)
Jul 12 17:34:23 lcg2426 cvmfs2[2224726]: (sbn.osgstorage.org) switching proxy from http://130.246.183.246:3128 to http://128.142.248.156:3126 (failed proxy)
Jul 12 17:34:23 lcg2426 cvmfs2[2224726]: (sbn.osgstorage.org) geographic order of servers retrieved from cvmfs-s1.hpc.swin.edu.au
Jul 12 17:34:23 lcg2426 cvmfs2[2224726]: (sbn.osgstorage.org) switching proxy from http://130.246.183.211:3128 to DIRECT (set proxies)
Jul 12 17:34:23 lcg2426 cvmfs2[2224726]: (sbn.osgstorage.org) CernVM-FS: linking /cvmfs/sbn.osgstorage.org to repository sbn.osgstorage.org
Jul 12 17:42:24 lcg2426 cvmfs2[2225111]: (sbn.osgstorage.org) switching proxy from http://128.142.248.156:3126 to http://130.246.183.247:3128 (reset proxy group)
Jul 12 17:46:56 lcg2426 cvmfs2[2225111]: (sbn.osgstorage.org) switched to catalog revision 149856
Jul 12 17:50:51 lcg2426 cvmfs2[2225111]: (sbn.osgstorage.org) switching host from http://cf-ac-uk-cache.nationalresearchplatform.org:8000/ to http://fiona-r-uva.vlan7.uvalight.net:8000/ (host returned HTTP error)
Jul 12 18:07:15 lcg2426 cvmfs2[2225111]: (sbn.osgstorage.org) reloading Fuse module

We are currently running cvmfs-2.10.1-1.el8.x86_64 with kernel-lt-5.4.244-1.el8.elrepo.x86_64 any help in debugging this issue is greatly appreciated.

Many thanks,

Tom

I recommend working this through a github issue instead of the forum. Attach a cvmfs_config bugreport tarball, and describe anything else about the environment such as the host operating system and any containers you’re running inside. If the bugreport command get stuck, using ps find out where it is stuck and kill the stuck command (or its parent process if kill -9 doesn’t work) to try to allow the bugreport command to finish and collect as much information as possible.

Guessing from the strace and the kernel version, this is likely to be a very difficult environment to reproduce. The errors at 17:34:23 with the proxy are concerning but since they happened nearly a half hour before the last log entry they’re unlikely to be the cause of the hanging.

Thank you! I’ll put a github issue together with the bugreport and present my findings there. It is a difficult issue to debug and I’ve found it to be sporadic in when the bug occurs. Needless to say I will supply the requested information, many thanks. Tom

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.