Stratum 1 snapshot fails due to timeout fetching .cvmfs_master_replica

One of my Stratum 1s seems to be unable to perform a snapshot since mid-September because it fails to fetch .cvmfs_master_replica:

[singularity@cvmfs1-iu0 ~]$ CVMFS_SERVER_DEBUG=3 cvmfs_server snapshot singularity.galaxyproject.org
WARNING: cannot read /etc/cvmfs/server.local
Directory '/var/lib/cvmfs-server/geo' doesn't exist or is not writable by singularity
CernVM-FS: replicating from http://cvmfs-stratum0.galaxyproject.eu/cvmfs/singularity.galaxyproject.org
CernVM-FS: using public key(s) /etc/cvmfs/keys/galaxyproject.org/singularity.galaxyproject.org.pub
(download) download I/O thread started    [10-28-2022 21:19:20 UTC]
(download) escaped http://cvmfs-stratum0.galaxyproject.eu/cvmfs/singularity.galaxyproject.org/.cvmfs_master_replica to http://cvmfs-stratum0.galaxyproject.eu/cvmfs/singularity.galaxyproject.org/.cvmfs_master_replica    [10-28-2022 21:19:20 UTC]
(download) Verify downloaded url http://cvmfs-stratum0.galaxyproject.eu/cvmfs/singularity.galaxyproject.org/.cvmfs_master_replica, proxy DIRECT (curl error 28)    [10-28-2022 21:19:30 UTC]
(download) Trying again on same curl handle, same url: 1, error code 15    [10-28-2022 21:19:30 UTC]
(download) backing off for 354 ms    [10-28-2022 21:19:30 UTC]
(download) Verify downloaded url http://cvmfs-stratum0.galaxyproject.eu/cvmfs/singularity.galaxyproject.org/.cvmfs_master_replica, proxy DIRECT (curl error 28)    [10-28-2022 21:19:40 UTC]
(download) Trying again on same curl handle, same url: 1, error code 15    [10-28-2022 21:19:40 UTC]
(download) backing off for 708 ms    [10-28-2022 21:19:40 UTC]
(download) Verify downloaded url http://cvmfs-stratum0.galaxyproject.eu/cvmfs/singularity.galaxyproject.org/.cvmfs_master_replica, proxy DIRECT (curl error 28)    [10-28-2022 21:19:51 UTC]
(download) Trying again on same curl handle, same url: 1, error code 15    [10-28-2022 21:19:51 UTC]
(download) backing off for 1416 ms    [10-28-2022 21:19:51 UTC]
(download) Verify downloaded url http://cvmfs-stratum0.galaxyproject.eu/cvmfs/singularity.galaxyproject.org/.cvmfs_master_replica, proxy DIRECT (curl error 28)    [10-28-2022 21:20:03 UTC]
(download) download failed (error 15 - host serving data too slowly)    [10-28-2022 21:20:03 UTC]
Failed to contact stratum 0 server (15 - host serving data too slowly)
(download) download I/O thread terminated    [10-28-2022 21:20:03 UTC]

There are no requests for that file in the Apache log on the stratum 0, although there are two others:

149.165.159.221 - - [28/Oct/2022:23:19:20 +0200] "GET /cvmfs/singularity.galaxyproject.org/.cvmfspublished HTTP/1.1" 200 618 "-" "cvmfs Fuse 2.9.4"
149.165.159.221 - - [28/Oct/2022:23:19:20 +0200] "HEAD /cvmfs/singularity.galaxyproject.org/data/e2/ab48b0984729d99951cb62c4312f501b3ddc6bX HTTP/1.1" 200 - "-" "cvmfs Fuse 2.9.4"

The file fetches fine with curl, however:

[singularity@cvmfs1-iu0 ~]$ curl -v http://cvmfs-stratum0.galaxyproject.eu/cvmfs/singularity.galaxyproject.org/.cvmfs_master_replica
*   Trying 132.230.223.20...
* TCP_NODELAY set
* Connected to cvmfs-stratum0.galaxyproject.eu (132.230.223.20) port 80 (#0)
> GET /cvmfs/singularity.galaxyproject.org/.cvmfs_master_replica HTTP/1.1
> Host: cvmfs-stratum0.galaxyproject.eu
> User-Agent: curl/7.61.1
> Accept: */*
> 
< HTTP/1.1 200 OK
< Date: Fri, 28 Oct 2022 21:20:11 GMT
< Server: Apache/2.4.37 (rocky)
< Accept-Ranges: bytes
< Content-Length: 0
< Cache-Control: max-age=61
< Expires: Fri, 28 Oct 2022 21:21:12 GMT
< Content-Type: application/x-cvmfs
< 
* Connection #0 to host cvmfs-stratum0.galaxyproject.eu left intact

Which is logged on the stratum 0 side:

149.165.159.221 - - [28/Oct/2022:23:20:11 +0200] "GET /cvmfs/singularity.galaxyproject.org/.cvmfs_master_replica HTTP/1.1" 200 - "-" "curl/7.61.1"

And other stratum 1s have no problem:

[singularity@cvmfs1-psu0 ~]$ CVMFS_SERVER_DEBUG=3 cvmfs_server snapshot singularity.galaxyproject.org                                                                                                                                                                                                                          
WARNING: cannot read /etc/cvmfs/server.local
Directory '/var/lib/cvmfs-server/geo' doesn't exist or is not writable by singularity
CernVM-FS: replicating from http://cvmfs-stratum0.galaxyproject.eu/cvmfs/singularity.galaxyproject.org
CernVM-FS: using public key(s) /etc/cvmfs/keys/singularity.galaxyproject.org.pub
(download) download I/O thread started    [10-28-2022 21:14:22 UTC]
(download) escaped http://cvmfs-stratum0.galaxyproject.eu/cvmfs/singularity.galaxyproject.org/.cvmfs_master_replica to http://cvmfs-stratum0.galaxyproject.eu/cvmfs/singularity.galaxyproject.org/.cvmfs_master_replica    [10-28-2022 21:14:22 UTC]
(download) Verify downloaded url http://cvmfs-stratum0.galaxyproject.eu/cvmfs/singularity.galaxyproject.org/.cvmfs_master_replica, proxy DIRECT (curl error 0)    [10-28-2022 21:14:23 UTC]

The stratum 1 in question is running a fully updated Rocky 8, CVMFS package version is cvmfs-server-2.9.4-1.el8.x86_64.

Curl error code 15 says “An internal failure to lookup the host used for the new connection.” On the other hand when I try to read it I get Permission denied:

$ curl -L http://cvmfs-stratum0.galaxyproject.eu/cvmfs/singularity.galaxyproject.org
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>403 Forbidden</title>
</head><body>
<h1>Forbidden</h1>
<p>You don't have permission to access this resource.</p>
</body></html>

So maybe there’s an issue with access control, perhaps a .htaccess file, on the stratum 0. The fact that it works from the same machine with curl though is hard to explain. Maybe an IPv6 issue, or maybe there’s a difference in how curl and cvmfs look up host names.

Dave

Is that 15 maybe the cvmfs error code? The curl error is also reported as 28 (timeout).

Posting a second reply with the rest since apparently I can only put 2 links in a comment as new user.

Your 403 appears to be for http://cvmfs-stratum0.galaxyproject.eu/cvmfs/singularity.galaxyproject.org/, which is correct afaict since there’s no index or autoindex for cvmfs. Are you able to fetch the .cvmfs_master_replica at http://cvmfs-stratum0.galaxyproject.eu/cvmfs/singularity.galaxyproject.org/.cvmfs_master_replica ?

Thanks,
–nate

It looks like the curl timeout is hardcoded to 10 seconds here, and I can verify that each block of:

(download) Verify downloaded url http://cvmfs-stratum0.galaxyproject.eu/cvmfs/singularity.galaxyproject.org/.cvmfs_master_replica, proxy DIRECT (curl error 28)    [10-28-2022 21:19:30 UTC]
(download) Trying again on same curl handle, same url: 1, error code 15    [10-28-2022 21:19:30 UTC]
(download) backing off for 354 ms    [10-28-2022 21:19:30 UTC]

is output after a 10 second pause. That said, it doesn’t take 10 seconds to make the request via the command line curl, so I don’t know why it’s reaching that timeout.

Is that 15 maybe the cvmfs error code?

Oh you’re probably right.

Your 403 appears to be for http://cvmfs-stratum0.galaxyproject.eu/cvmfs/singularity.galaxyproject.org/, which is correct afaict since there’s no index or autoindex for cvmfs. Are you able to fetch the .cvmfs_master_replica at http://cvmfs-stratum0.galaxyproject.eu/cvmfs/singularity.galaxyproject.org/.cvmfs_master_replica ?

Right again! So, my attempted help wasn’t so helpful, I’m sorry about that.

I don’t have any other ideas short of trying tcpdump and/or strace.

Dave

Ah, thanks, your suggestions led me the right place! It wasn’t even generating network traffic for the .cvmfs_master_replica requests.

The first resolver in the stratum 1’s /etc/resolv.conf is not responding for some reason. The second resolver is, but it would appear that additional resolvers aren’t tried, perhaps due to the short timeout (before the first resolver fails) on the request?

Curious that it makes two successful requests beforehand (for .cvmfspublished and something in data (a catalog?)) to the same server it then fails to resolve in the next step.

The item from the data directory is most likely the certificate that was used to sign the repository manifest. In this case, the URL should end with an X.