Problems with sbn.osgstorage.org again

Hi,

the problem reported here Querying mount hangs session for sbn.osgstorage.org is happening again.
On that thread, it was suggested to send the output of cvmfs_config bugreport. That is not possible, since when it tries to collect df data, it hangs forever:

[root@host ~]# cvmfs_config bugreport
Gathering /etc/cvmfs
Gathering files in quarantaine
Gathering stack traces
Gathering uname -a
Gathering cat /etc/issue
Gathering hostname -f
Gathering ifconfig -a
Gathering cvmfs2 --version
Gathering ls -lR /var/run/cvmfs
Gathering grep cvmfs /var/log/messages
Gathering grep cvmfs /var/log/syslog
Gathering find /usr/lib /usr/lib64 /lib /lib64 -name libfuse*
Gathering journalctl -alm /usr/bin/cvmfs2
Gathering journalctl -alm /usr/libexec/cvmfs/cache/cvmfs_cache_ram
Gathering eval find /pool/cache/cvmfs2/ -maxdepth 1 -exec ls -lah \{\} \;
Gathering cvmfs_config probe
Gathering mount
Gathering df -h

Is the fact that this repo seems to be a stash-cache one relevant? Could the problem be in any of the stash-cache server?

Extra info:

[root@host ~]# uname -a
Linux lcg2400.gridpp.rl.ac.uk 5.4.251-1.el8.elrepo.x86_64 #1 SMP Thu Jul 27 18:47:58 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux

[root@host ~]# rpm -qa | grep cvmfs
cvmfs-config-egi-2.6-1.7.obs.el8.noarch
tier1-cvmfs-fixer-1.0-1.noarch
cvmfs-x509-helper-2.2-2.29.obs.el8.x86_64
cvmfs-2.10.1-1.el8.x86_64

[root@host ~]# cvmfs_config showconfig sbn.osgstorage.org
CVMFS_REPOSITORY_NAME=sbn.osgstorage.org
CERNVM_GRID_UI_VERSION=
CVMFS_ALIEN_CACHE=
CVMFS_ALT_ROOT_PATH=
CVMFS_AUTHZ_HELPER=
CVMFS_AUTHZ_SEARCH_PATH=
CVMFS_AUTO_UPDATE=
CVMFS_BACKOFF_INIT=2    # from /etc/cvmfs/default.conf
CVMFS_BACKOFF_MAX=10    # from /etc/cvmfs/default.conf
CVMFS_BASE_ENV=1    # from /etc/cvmfs/default.conf
CVMFS_CACHE_BASE=/pool/cache/cvmfs2//osgstorage    # from /cvmfs/config-egi.egi.eu/etc/cvmfs/domain.d/osgstorage.org.conf
CVMFS_CACHE_DIR=/pool/cache/cvmfs2//osgstorage/shared
CVMFS_CACHE_PRIMARY=
CVMFS_CHECK_PERMISSIONS=yes    # from /etc/cvmfs/default.conf
CVMFS_CLAIM_OWNERSHIP=yes    # from /etc/cvmfs/default.conf
CVMFS_CLIENT_PROFILE=    # from /etc/cvmfs/default.conf
CVMFS_CONFIG_REPO_REQUIRED=yes    # from /etc/cvmfs/default.d/60-egi.conf
CVMFS_CONFIG_REPOSITORY=config-egi.egi.eu    # from /etc/cvmfs/default.d/60-egi.conf
CVMFS_DEBUGLOG=
CVMFS_DEFAULT_DOMAIN=
CVMFS_DNS_RETRIES=
CVMFS_DNS_TIMEOUT=
CVMFS_EXTERNAL_FALLBACK_PROXY=
CVMFS_EXTERNAL_HTTP_PROXY=
CVMFS_EXTERNAL_MAX_SERVERS=4    # from /cvmfs/config-egi.egi.eu/etc/cvmfs/domain.d/osgstorage.org.conf
CVMFS_EXTERNAL_SERVER_URL=
CVMFS_EXTERNAL_TIMEOUT=
CVMFS_EXTERNAL_TIMEOUT_DIRECT=
CVMFS_EXTERNAL_URL='http://stashcache.t2.ucsd.edu:8000/;http://osg-kansas-city-stashcache.nrp.internet2.edu:8000/;http://osg-new-york-stashcache.nrp.internet2.edu:8000/;http://daejeon-kreonet-net.nationalresearchplatform.org:8000/;http://cf-ac-uk-cache.nationalresearchplatform.org:8000/;http://osg-stash-sfu-computecanada-ca.nationalresearchplatform.org:8000/;http://fiona-r-uva.vlan7.uvalight.net:8000/;http://xcachevirgo.pic.es:8000/;http://stashcache.edi.scotgrid.ac.uk:8000/'    # from /cvmfs/config-egi.egi.eu/etc/cvmfs/domain.d/osgstorage.org.conf
CVMFS_FALLBACK_PROXY='http://cvmfsbproxy.cern.ch:3126;http://cvmfsbproxy.fnal.gov:3126'    # from /cvmfs/config-egi.egi.eu/etc/cvmfs/domain.d/osgstorage.org.conf
CVMFS_FOLLOW_REDIRECTS=yes    # from /cvmfs/config-egi.egi.eu/etc/cvmfs/domain.d/osgstorage.org.conf
CVMFS_HIDE_MAGIC_XATTRS=
CVMFS_HOST_RESET_AFTER=1800    # from /etc/cvmfs/default.conf
CVMFS_HTTP_PROXY=http://cvmfs-squid.gridpp.rl.ac.uk:3128    # from /cvmfs/config-egi.egi.eu/etc/cvmfs/domain.d/osgstorage.org.conf
CVMFS_IGNORE_SIGNATURE=
CVMFS_INITIAL_GENERATION=
CVMFS_IPFAMILY_PREFER=
CVMFS_KCACHE_TIMEOUT=
CVMFS_KEYS_DIR=/cvmfs/config-egi.egi.eu/etc/cvmfs/keys/osgstorage.org    # from /cvmfs/config-egi.egi.eu/etc/cvmfs/domain.d/osgstorage.org.conf
CVMFS_LOW_SPEED_LIMIT=512    # from /cvmfs/config-egi.egi.eu/etc/cvmfs/domain.d/osgstorage.org.conf
CVMFS_MAGIC_XATTRS_VISIBILITY=rootonly    # from /cvmfs/config-egi.egi.eu/etc/cvmfs/default.conf
CVMFS_MAX_IPADDR_PER_PROXY=
CVMFS_MAX_RETRIES=1    # from /etc/cvmfs/default.conf
CVMFS_MAX_TTL=
CVMFS_MEMCACHE_SIZE=64    # from /etc/cvmfs/default.local
CVMFS_MOUNT_DIR=/cvmfs    # from /etc/cvmfs/default.conf
CVMFS_MOUNT_RW=
CVMFS_NFILES=262144    # from /etc/cvmfs/default.local
CVMFS_NFS_SHARED=
CVMFS_NFS_SOURCE=
CVMFS_OOM_SCORE_ADJ=
CVMFS_PAC_URLS='http://grid-wpad/wpad.dat;http://wpad/wpad.dat;http://cernvm-wpad.cern.ch/wpad.dat;http://cernvm-wpad.fnal.gov/wpad.dat'    # from /cvmfs/config-egi.egi.eu/etc/cvmfs/default.conf
CVMFS_PROXY_RESET_AFTER=300    # from /etc/cvmfs/default.conf
CVMFS_PROXY_TEMPLATE=
CVMFS_PUBLIC_KEY=
CVMFS_QUOTA_LIMIT=1000    # from /cvmfs/config-egi.egi.eu/etc/cvmfs/domain.d/osgstorage.org.conf
CVMFS_RELOAD_SOCKETS=/var/run/cvmfs    # from /etc/cvmfs/default.conf
CVMFS_REPOSITORIES=
CVMFS_REPOSITORY_DATE=
CVMFS_REPOSITORY_TAG=
CVMFS_ROOT_HASH=
CVMFS_SEND_INFO_HEADER=yes    # from /etc/cvmfs/default.d/60-egi.conf
CVMFS_SERVER_CACHE_MODE=
CVMFS_SERVER_URL='http://cvmfs-egi.gridpp.rl.ac.uk:8000/cvmfs/sbn.osgstorage.org;http://klei.nikhef.nl:8000/cvmfs/sbn.osgstorage.org;http://cvmfs-s1fnal.opensciencegrid.org:8000/cvmfs/sbn.osgstorage.org;http://cvmfs-s1goc.opensciencegrid.org:8000/cvmfs/sbn.osgstorage.org;http://cvmfs-s1bnl.opensciencegrid.org:8000/cvmfs/sbn.osgstorage.org;http://cvmfsrep.grid.sinica.edu.tw:8000/cvmfs/sbn.osgstorage.org;http://cvmfs-stratum-one.ihep.ac.cn:8000/cvmfs/sbn.osgstorage.org;http://cvmfs-s1.hpc.swin.edu.au:8000/cvmfs/sbn.osgstorage.org'    # from /cvmfs/config-egi.egi.eu/etc/cvmfs/domain.d/osgstorage.org.conf
CVMFS_SHARED_CACHE=yes    # from /etc/cvmfs/default.conf
CVMFS_STRICT_MOUNT=no    # from /etc/cvmfs/default.conf
CVMFS_SYSLOG_FACILITY=
CVMFS_SYSLOG_LEVEL=
CVMFS_SYSTEMD_NOKILL=
CVMFS_TIMEOUT=5    # from /etc/cvmfs/default.conf
CVMFS_TIMEOUT_DIRECT=10    # from /etc/cvmfs/default.conf
CVMFS_TRACEFILE=
CVMFS_TRUSTED_CERTS=
CVMFS_USE_CDN=    # from /cvmfs/config-egi.egi.eu/etc/cvmfs/domain.d/osgstorage.org.conf
CVMFS_USE_GEOAPI=yes    # from /etc/cvmfs/default.d/60-egi.conf
CVMFS_USER=cvmfs    # from /etc/cvmfs/default.conf
CVMFS_WORKSPACE=

If a bugreport operation hangs I usually figure out which process is hanging and either kill -9 that process if it works or kill its parent process, so the bugreport can proceed and collect as much information as possible.

Dave

When the client node gets into this state, not even the kill command works. The only solution seems to be rebooting the host. That’s why we would like to know why this is happening and how to fix or at least prevent it.

Here is a more detailed explanation from the Batch Farm sys admin:

[…] this caused the CVMFS client on the workers to hang. Once that repo goes into a hung state, this adversely affects other containers, from being unable to run du (as some SAM tests do) to locking the entire I/O of the host. Symptoms can vary but the short version is the workernodes disk become inoperable.

[…] this issue can be identify because xfs_info times out […]

  • The sync between Condor and Docker is broken (Condor is unable to read the Docker socket)
  • The jobs inside the pilot can’t be cleared down due to the scratch directory hanging on I/O.
    […]

Most reasons come back to the fact the disk is unable to read and write.

Do we need any extra RPM on the clients? Some plugins or similar?
This is what we currently have:

[root@host ~]# rpm -qa | grep cvmfs
cvmfs-config-egi-2.6-1.7.obs.el8.noarch
tier1-cvmfs-fixer-1.0-1.noarch
cvmfs-x509-helper-2.2-2.29.obs.el8.x86_64
cvmfs-2.10.1-1.el8.x86_64

When the client node gets into this state, not even the kill command works. The only solution seems to be rebooting the host. That’s why we would like to know why this is happening and how to fix or at least prevent it.

I understand. And in order to try to fix the problem, a bugreport with as much info as possible before the reboot is very helpful. The system admin should be able to kill the parent process so the bugreport command can finish, as I suggested.

Do we need any extra RPM on the clients? Some plugins or similar?

No, you have all that’s needed.

Hi, I have a bug report from a host that exhibited the issue, however it has since been rebooted. Would it be helpful to send this bugreport even though it’s now functional? I’m hoping this may give some insight into the config of the host if nothing else.

It might possibly be helpful. The messages from /var/log/messages might say something. More likely the info won’t be conclusive, but it’s worth a look.

Hi, thank you for the reply. To try and help centralise the issue, I have created a GitHub issue with the relevant information, please find: CVMFS client hangs on repo and `df -u` command · Issue #3383 · cvmfs/cvmfs · GitHub

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.