Cvmfs unusable from cold cache

Hello,

I’m evaluating the Compute Canada software stack via CVMFS, and my first proof of concept test has resulted in constant failures with Input/Output errors when trying to load and run many different modules, from R to Python to applications in the root like curl.

I am currently using 1 single machine and no forward Squid proxy, so I expected performance to be bad but I did not expect applications to fail to work. If I’m able to successfully get applications to work, I plan to deploy a squid proxy before adding more than a single client.

I am on Ubuntu 24.04 and following Accessing CVMFS - Alliance Doc . If there’s a more appropriate avenue for me to ask questions about the Compute Canada stack, I’d appreciation direction towards it.
My current setup is:

/etc/cvmfs/default.local

CVMFS_REPOSITORIES=soft.computecanada.ca
CVMFS_CLIENT_PROFILE=single
CVMFS_HTTP_PROXY=DIRECT
CVMFS_USE_GEOAPI=yes
CVMFS_DEBUGLOG=/tmp/cvmfs.log

cvmfs_config chksetup

Warning: debug mode is on for soft.computecanada.ca
Warning: failed to use Geo-API with cvmfs-s1.computecanada.net

The first time I try to run all sorts of commands, from curl to htop to python3 -m venv env I see failures like:

curl: error while loading shared libraries: libzstd.so.1: cannot open shared object file: Input/output error

for many different libraries.

cvmfs_config probe

Probing /cvmfs/soft.computecanada.ca... OK

The debug log holds many entires like this. I can provide the entire log if helpful:

(cvmfs) CernVM-FS: using public key(s) /cvmfs/cvmfs-config.cern.ch/etc/cvmfs/keys/computecanada.ca/soft-shared.computecanada.ca.pub    [04-14-2025 19:28:16 U
TC]
(download) (manager 'standard') first fallback proxy group 1    [04-14-2025 19:28:16 UTC]
(download) (manager 'standard') full proxy list DIRECT    [04-14-2025 19:28:16 UTC]
(download) (manager 'standard') resolving 1 proxy addresses    [04-14-2025 19:28:16 UTC]
(dns) empty hostname    [04-14-2025 19:28:16 UTC]
(download) (manager 'standard') installed 1 proxies in 1 load-balance groups    [04-14-2025 19:28:16 UTC]
(download) (manager 'standard') switching proxy from (none) to DIRECT. Reason: set random start proxy from the first proxy group [Current host: http://NULL] 
   [04-14-2025 19:28:16 UTC]
(download) (manager 'external') switching proxy from (none) to DIRECT. Reason: cloned [Current host: http://NULL]    [04-14-2025 19:28:16 UTC]
(download) (manager 'external') first fallback proxy group 1    [04-14-2025 19:28:16 UTC]
(download) (manager 'external') full proxy list DIRECT    [04-14-2025 19:28:16 UTC]
(download) (manager 'external') resolving 1 proxy addresses    [04-14-2025 19:28:16 UTC]
(dns) empty hostname    [04-14-2025 19:28:16 UTC]
(download) (manager 'external') installed 1 proxies in 1 load-balance groups    [04-14-2025 19:28:16 UTC]
(cvmfs) DNS roaming is disabled for this repository.    [04-14-2025 19:28:16 UTC]
(catalog) constructing client catalog manager    [04-14-2025 19:28:16 UTC]
(catalog) Initialize catalog    [04-14-2025 19:28:16 UTC]
(cache) Unable to read local checksum 0000000000000000000000000000000000000000T0R18446744073709551615    [04-14-2025 19:28:16 UTC]
(download) (manager standard - id 0) reading from host 0    [04-14-2025 19:28:16 UTC]
(download) (id 0) escaped http://NULL/.cvmfspublished to http://NULL/.cvmfspublished    [04-14-2025 19:28:16 UTC]
(curl) (id 0) {info} Could not resolve host: NULL    [04-14-2025 19:28:16 UTC]
(download) (manager 'standard' - id 0) Verify downloaded url /.cvmfspublished, proxy DIRECT (curl error 6)    [04-14-2025 19:28:16 UTC]
(download) (manager 'standard' - id 0) download failed (error 4 - failed to resolve host address)    [04-14-2025 19:28:16 UTC]
(cvmfs) failed to download repository manifest (4 - failed to resolve host address)    [04-14-2025 19:28:16 UTC]
(cache) Failed fetch manifest from server: manifest too old or server unreachable (1 - failed to download)    [04-14-2025 19:28:16 UTC]
(cache) No valid root catalog found!    [04-14-2025 19:28:16 UTC]
(catalog) failed to retrieve valid root catalog ''    [04-14-2025 19:28:16 UTC]
(catalog) failed to initialize root catalog    [04-14-2025 19:28:16 UTC]

Hi!

Your cvmfs configuration is incomplete, in particular you are missing the server url that provides soft.computecanada.ca:

(download) (id 0) escaped http://NULL/.cvmfspublished

Can you post the output of cvmfs_config showconfig?

Thanks for the assistance! Even with my broken configuration, it was working after retrying commands and trying cvmfs_config reload, just slowly and inconsistently.

Removing CVMFS_HTTP_PROXY=DIRECT and falling back on the default value for CVMFS_HTTP_PROXY seems to have helped things, but I’m still seeing Input/Output errors as I test more applications.

Here is my whole cvmfs_config showconfig:

Running /usr/bin/cvmfs_config soft.computecanada.ca:
CVMFS_REPOSITORY_NAME=soft.computecanada.ca
CERNVM_GRID_UI_VERSION=
CVMFS_ALIEN_CACHE=
CVMFS_ALT_ROOT_PATH=
CVMFS_AUTHZ_HELPER=
CVMFS_AUTHZ_SEARCH_PATH=
CVMFS_AUTO_UPDATE=
CVMFS_BACKOFF_INIT=2    # from /etc/cvmfs/default.conf
CVMFS_BACKOFF_MAX=10    # from /etc/cvmfs/default.conf
CVMFS_BASE_ENV=1    # from /etc/cvmfs/default.conf
CVMFS_CACHE_BASE=/var/lib/cvmfs    # from /etc/cvmfs/default.conf
CVMFS_CACHE_DIR=/var/lib/cvmfs/shared
CVMFS_CACHE_PRIMARY=
CVMFS_CACHE_REFCOUNT=true    # from /cvmfs/cvmfs-config.cern.ch/etc/cvmfs/default.conf
CVMFS_CHECK_PERMISSIONS=yes    # from /etc/cvmfs/default.conf
CVMFS_CLAIM_OWNERSHIP=yes    # from /etc/cvmfs/default.conf
CVMFS_CLIENT_PROFILE=single    # from /etc/cvmfs/default.local
CVMFS_CONFIG_REPOSITORY=cvmfs-config.cern.ch    # from /etc/cvmfs/default.d/50-cern-debian.conf
CVMFS_CONFIG_REPO_DEFAULT_ENV=1    # from /cvmfs/cvmfs-config.cern.ch/etc/cvmfs/default.conf
CVMFS_CONFIG_REPO_REQUIRED=
CVMFS_DEBUGLOG=/tmp/cvmfs.log    # from /etc/cvmfs/default.local
CVMFS_DEFAULT_DOMAIN=cern.ch    # from /etc/cvmfs/default.d/50-cern-debian.conf
CVMFS_DNS_RETRIES=
CVMFS_DNS_TIMEOUT=
CVMFS_EXTERNAL_FALLBACK_PROXY=
CVMFS_EXTERNAL_HTTP_PROXY=
CVMFS_EXTERNAL_SERVER_URL=
CVMFS_EXTERNAL_TIMEOUT=
CVMFS_EXTERNAL_TIMEOUT_DIRECT=
CVMFS_FALLBACK_PROXY=    # from /cvmfs/cvmfs-config.cern.ch/etc/cvmfs/domain.d/computecanada.ca.conf
CVMFS_FOLLOW_REDIRECTS=
CVMFS_HIDE_MAGIC_XATTRS=yes    # from /cvmfs/cvmfs-config.cern.ch/etc/cvmfs/default.conf
CVMFS_HOST_RESET_AFTER=1800    # from /etc/cvmfs/default.conf
CVMFS_HTTP_PROXY='auto;DIRECT'    # from /cvmfs/cvmfs-config.cern.ch/etc/cvmfs/domain.d/computecanada.ca.conf
CVMFS_IGNORE_SIGNATURE=
CVMFS_INITIAL_GENERATION=
CVMFS_IPFAMILY_PREFER=
CVMFS_KCACHE_TIMEOUT=
CVMFS_KEYS_DIR=/cvmfs/cvmfs-config.cern.ch/etc/cvmfs/keys/computecanada.ca    # from /cvmfs/cvmfs-config.cern.ch/etc/cvmfs/domain.d/computecanada.ca.conf
CVMFS_LOW_SPEED_LIMIT=1024    # from /etc/cvmfs/default.conf
CVMFS_MAGIC_XATTRS_VISIBILITY=rootonly    # from /cvmfs/cvmfs-config.cern.ch/etc/cvmfs/default.conf
CVMFS_MAX_IPADDR_PER_PROXY=
CVMFS_MAX_RETRIES=1    # from /etc/cvmfs/default.conf
CVMFS_MAX_TTL=
CVMFS_MEMCACHE_SIZE=
CVMFS_MOUNT_DIR=/cvmfs    # from /etc/cvmfs/default.conf
CVMFS_MOUNT_RW=
CVMFS_NFILES=131072    # from /etc/cvmfs/default.conf
CVMFS_NFS_SHARED=
CVMFS_NFS_SOURCE=
CVMFS_OOM_SCORE_ADJ=
CVMFS_PAC_URLS='http://grid-wpad/wpad.dat;http://wpad/wpad.dat;http://cernvm-wpad.cern.ch/wpad.dat;http://cernvm-wpad.fnal.gov/wpad.dat'    # from /cvmfs/cvmfs-config.cern.ch/etc/cvmfs/default.conf
CVMFS_PROXY_RESET_AFTER=300    # from /etc/cvmfs/default.conf
CVMFS_PROXY_TEMPLATE=
CVMFS_PUBLIC_KEY=
CVMFS_QUOTA_LIMIT=4000    # from /etc/cvmfs/default.conf
CVMFS_RELOAD_SOCKETS=/var/run/cvmfs    # from /etc/cvmfs/default.conf
CVMFS_REPOSITORIES=soft.computecanada.ca    # from /etc/cvmfs/default.local
CVMFS_REPOSITORY_DATE=
CVMFS_REPOSITORY_TAG=
CVMFS_ROOT_HASH=
CVMFS_SEND_INFO_HEADER=yes    # from /cvmfs/cvmfs-config.cern.ch/etc/cvmfs/domain.d/computecanada.ca.conf
CVMFS_SERVER_CACHE_MODE=
CVMFS_SERVER_URL=http://cvmfs-s1.computecanada.net/cvmfs/soft.computecanada.ca    # from /cvmfs/cvmfs-config.cern.ch/etc/cvmfs/domain.d/computecanada.ca.conf
CVMFS_SHARED_CACHE=yes    # from /etc/cvmfs/default.conf
CVMFS_STRICT_MOUNT=no    # from /etc/cvmfs/default.conf
CVMFS_SYSLOG_FACILITY=
CVMFS_SYSLOG_LEVEL=
CVMFS_SYSTEMD_NOKILL=
CVMFS_TIMEOUT=5    # from /etc/cvmfs/default.conf
CVMFS_TIMEOUT_DIRECT=10    # from /etc/cvmfs/default.conf
CVMFS_TRACEFILE=
CVMFS_TRUSTED_CERTS=
CVMFS_USER=cvmfs    # from /etc/cvmfs/default.conf
CVMFS_USE_CDN=
CVMFS_USE_GEOAPI=yes    # from /cvmfs/cvmfs-config.cern.ch/etc/cvmfs/domain.d/computecanada.ca.conf
CVMFS_WORKSPACE=

Hello

Could you please provide more debug logs? In particular each one of them:

  • when it is working
  • when it is working but took a long time
  • when it is failing

I think you have 2 issues.

  1. Your config of the server and proxies is suboptimal - but this should not be the failing reason.
    You normally either set CVMFS_CLIENT_PROFILE or CVMFS_HTTP_PROXY but only in rare cases both. If you use DIRECT you cannot use GEOAPI because there is only one server for compute canada if you set DIRECT.

  2. I think your main issue is with the cache. Could you please let us know which kind of errors you find? Is it EIO (01)? If yes you need to increase CVMFS_QUOTA_LIMIT= to something larger than the default CVMFS_QUOTA_LIMIT=4000 (= 4GB)

My suggestion for your minimal config in /etc/cvmfs/default.local:

CVMFS_CLIENT_PROFILE=single
CVMFS_QUOTA_LIMIT=10000
CVMFS_DEBUGLOG=/tmp/cvmfs-@fqrn@.log  # this makes a separate log for each repo with reponame @fqrn@

Let me know if that helps

Cheers
Laura

PS. CVMFS_REPOSITORIES is not needed. This is only useful if: you want to auto-run the cvmfs config commands on the list repositories OR if you want to enforce that users only mount those repositories with CVMFS_STRICT_MOUNT=on

Also make sure there’s enough disk space to hold CVMFS_QUOTA_LIMIT worth of cache.

It doesn’t really make sense that you’ll have better success leaving CVMFS_HTTP_PROXY unset than to set it to DIRECT, because the default setting is auto;DIRECT which means it will try to first read http://cernvm-wpad.cern.ch/wpad.dat and that probably does not return anything helpful. It doesn’t know that computecanada.net is Cloudflare so it returns NONE. In fact that’s probably where that http://NULL is coming from.

Dave

Hi,

Our configuration is intended to work out of the box and make things simple and easy for end users.
There is a snippet of config here: config-repo/etc/cvmfs/domain.d/computecanada.ca.conf at master · cvmfs-contrib/config-repo · GitHub
which should result in using cvmfs-s1.computecanada.net in situations like this.
That Cloudflare LB is backed by 4 origin servers, with dynamic steering to detect the closest one, and has zero-downtime failover between them if one fails in the short time period between a HTTP request being made and the automated health checks detecting a failure. We haven’t had any issues known/reported with that configuration.

But again, end users should not have to worry about these details.

@dwd it sounds like the issue is caused by WPAD? I didn’t know that was activated by default. Why is it returning NONE and potentially breaking the connection by setting http://NULL ?

Thanks.

Currently if you do curl http://cernvm-wpad.cern.ch/wpad.dat from an IP address not associated with a proxy you will get this result:

// no squid found matching the remote ip address
function FindProxyForURL(url, host) {
    if (shExpMatch(url, "*.openhtc.io*")) {
        return "DIRECT";
    }
    return "NONE";
}

So it’s only recognizing openhtc.io as a destination to use for DIRECT. The idea is that we don’t want clients directly connecting to stratum 1s, but Cloudflare is OK in limited numbers. The WLCG WPAD service hasn’t yet been configured to recognize computecanada.net as an acceptable DIRECT destination. I will make that configuration change today. Note that cernvm-wpad.{cern.ch|fnal.gov} are also configured to only accept a limited number of requests in a period of time from a single GeoIP “organization” before they start redirecting to failover proxies. That is to prevent abuse of Cloudflare from an organization that should be supplying its own squid.

In addition, I’m pretty sure you and I must have discussed this before, but for the record and in case others come across this post: using multiple stratum 1s behind a single alias has the potential of confusing the CVMFS client and is not recommended. That’s because stratum 1 updates are not synchronized, and if the client reads a catalog from one stratum 1 it assumes that all files associated with that update are present on that stratum 1. By switching stratum 1 servers without the knowledge of the CVMFS client, the client could get an error when attempting to read a file that is in the catalog it sees but not yet present on the other stratum 1. I’m not sure if that returns an immediate fatal error to the user, but at minimum the client will consider that stratum 1 to be broken and not use it anymore.

Instead, I recommend that Compute Canada and anyone else that wants to set up their own Cloudflare alias do what we do with openhtc.io, and assign a separate alias to each stratum 1. In addition, the stratum 1 geo api recognizes when it is given a request from Cloudflare and tries to look up each stratum 1 name with an ip. prefix to find out the real IP address of the stratum 1s in order to do the geo sorting. This is easily done in Cloudflare by creating another DNS entry that is not proxied.

Dave

Thanks for fixing the WPAD config. So it injects http://NULL for unknown servers if DIRECT is used? People are using lots of servers in lots of domains so it doesn’t seem feasible to keep track of all of them in WPAD config. Is there some other conditional that enables the use of WPAD, like CVMFS_CLIENT_PROFILE = single or CVMFS_USE_CDN= yes ?

Yes, it’s important to be careful about how the Cloudflare mechanisms interact with the CVMFS ones. We’re using additional features of Cloudflare; the dynamic steering ensures that the closest origin is always used so there is generally no switching between servers. Having all our origins behind one LB address ensures the Cloudflare caching layer is more efficient. This is the typical practice with commercial CDNs, although it looks a bit different from the usual CVMFS way. GeoAPI doesn’t work well on the Canadian research network anyway. Anyway I agree the setup you described makes sense when using only the free features of Cloudflare.

@pettyalex Is it working now?

In the future you can contact Compute Canada (DRAC) support directly via https://docs.alliancecan.ca/wiki/Technical_support , although in this case it was beneficial to discuss it here as I was not aware of the external WPAD mechanism that seems to have caused the issue (although it was not until 4 months later that I noticed this thread and the issue was fixed).

Thanks!

So it injects http://NULL for unknown servers if DIRECT is used?

No, it injects that if auto is used, there’s no squid found, and the servers are not using Cloudflare. If DIRECT is used, WPAD will not be contacted.

Is there some other conditional that enables the use of WPAD, like CVMFS_CLIENT_PROFILE = single or CVMFS_USE_CDN= yes ?

WPAD is used by default if CVMFS_CLIENT_PROFILE=single and CVMFS_HTTP_PROXY is unset, or any time somebody includes auto in CVMFS_HTTP_PROXY. CVMFS_USE_CDN=yes does not affect the setting of CVMFS_HTTP_PROXY.

Hmm, I see that in the default config-repo, unlike the egi and osg config-repos, CVMFS_USE_CDNis not automatically set to yes when CVMFS_HTTP_PROXY=DIRECT or auto;DIRECT or even when CVMFS_CLIENT_PROFILE=yes. In effect it is set that way for the computecanada.ca domain (because that uses the CDN in those cases anyway), but not for everyone else. I think that is a problem especially with auto;DIRECT because the WPAD would then inject http://NULL. That should probably be fixed.

Ah I see, thanks.

Hmm, I have noted that in CVMFS_USE_CDN is not set in default common.conf · Issue #320 · cvmfs-contrib/config-repo · GitHub

I’ll check tomorrow in a fresh VM and see if things are working, thanks so much for investigating this!

I evaluated CVMFS a few months ago when I originally made this post and decided not to use it at the time, but I was just about to re-evaluate it and contact the Compute Canada team, so this is great timing to find out that something actually was wrong.

1 Like