Tags and GC policies when using a remote publisher

Dear all

If I create a repo on a stratum 0, and I configure this repo with some policies with respect to tags and garbage collector (see also the thread “Tags, rollbacks, and garbage collection”), I would expect that these policies are enforced also when publishing through the gateway/publisher stuff.

As far I can see this not happening, at least in my setup.

Below [*] all the details.
Any help in understanding what I am doing wrong will be really appreciated

Thanks, Massimo

[*]

I created a repo on a remote publisher:

sudo cvmfs_server mkfs -w https://rgw.cloud.infn.it:443/cvmfs/repo01.infn.it \
-u gw,/srv/cvmfs/repo01.infn.it/data/txn,http://cvmfs.wp6.cloud.infn.it:4929/api/v1 -k /tmp/repo01_keys -o `whoami` repo01.i\
nfn.it

For this repo there are old tags, that are supposed to be cleaned at next transaction:

[almalinux@sgaravat-publisher ~]$ sudo cvmfs_server tag -l repo01.infn.it 
Name                             │ Revision │ Timestamp            │ Branch │ Description
─────────────────────────────────┼──────────┼──────────────────────┼────────┼─────────────
generic-2024-09-02T15:12:17.759Z │     2383 │ 2 Sep 2024 17:12:17  │        │
generic-2024-09-02T15:12:30.506Z │     2384 │ 2 Sep 2024 17:12:30  │        │
generic-2024-09-02T15:12:39.913Z │     2385 │ 2 Sep 2024 17:12:40  │        │
generic-2024-09-02T15:12:50.649Z │     2386 │ 2 Sep 2024 17:12:50  │        │
generic-2024-09-02T15:13:04.084Z │     2387 │ 2 Sep 2024 17:13:04  │        │
generic-2024-09-03T08:01:25.723Z │     2388 │ 3 Sep 2024 10:01:25  │        │
generic-2024-09-03T08:01:49.858Z │     2389 │ 3 Sep 2024 10:01:50  │        │
generic-2024-09-03T08:02:15.283Z │     2390 │ 3 Sep 2024 10:02:15  │        │
generic-2024-09-03T08:05:59.488Z │     2391 │ 3 Sep 2024 10:05:59  │        │
generic-2024-09-03T08:06:12.850Z │     2392 │ 3 Sep 2024 10:06:13  │        │
generic-2024-09-11T12:50:20.408Z │     2393 │ 11 Sep 2024 14:50:20 │        │
generic-2024-09-11T12:57:41.505Z │     2394 │ 11 Sep 2024 14:57:41 │        │
etc etc

I do a transaction/publish on the remote publisher but the cleaning of old stuff doesn’t happen

[almalinux@sgaravat-publisher ~]$ sudo cvmfs_server transaction repo01.infn.it
Gateway reply: ok
[almalinux@sgaravat-publisher ~]$ sudo cvmfs_server publish repo01.infn.it
Using auto tag 'generic-2024-10-10T11:26:00Z'
Processing changes...
Waiting for upload of files before committing...
Committing file catalogs...
Wait for all uploads to finish
Exporting repository manifest
Statistics stored at: /var/spool/cvmfs/repo01.infn.it/stats.db
Changes submitted to repository gateway
[almalinux@sgaravat-publisher ~]$

If I do the transaction on the stratum-0, the cleaning works as expected:

root@cvmfs-s0-s3cloudveneto almalinux]# systemctl stop cvmfs-gateway.service


[root@cvmfs-s0-s3cloudveneto almalinux]#  cvmfs_server publish repo01.infn.it
Using auto tag 'generic-2024-10-10T11:28:39Z'
Processing changes...
Waiting for upload of files before committing...
Committing file catalogs...
Wait for all uploads to finish
Exporting repository manifest
Statistics stored at: /var/spool/cvmfs/repo01.infn.it/stats.db
Removing outdated automatically generated tags for repo01.infn.it...
deleting 'generic-2024-09-03T08:06:12.850Z' (110a9e5770a3079d0f1b2ea3037cc1b227d80dae)
deleting 'generic-2024-09-03T08:05:59.488Z' (9bde102e514f7c7caefaf2b2fc2872ece6078bbf)
deleting 'generic-2024-09-03T08:02:15.283Z' (b38eb40b6bba5038401e373d8cdd85cc0e86598c)
deleting 'generic-2024-09-03T08:01:49.858Z' (e5d304cde6550cc939bd66da52ce7c89ca089600)
deleting 'generic-2024-09-03T08:01:25.723Z' (8dbed00b4c5a3e46470d1bb568de761b447d0b30)
deleting 'generic-2024-09-02T15:13:04.084Z' (9d23308bb86ea02da1bc92e7f4703652af32620b)
deleting 'generic-2024-09-02T15:12:50.649Z' (a8e8a559bb58584269ec5c2ffcccb2fcfe21ae56)
deleting 'generic-2024-09-02T15:12:39.913Z' (a4aab05e8b3252099acbd59a17221b172f3763b7)
deleting 'generic-2024-09-02T15:12:30.506Z' (cac3bff09978b70e2e5799c373fc524c3a575bb2)
deleting 'generic-2024-09-02T15:12:17.759Z' (16839d9f50165f3acb7014868780b70857a4cc55)
Tagging repo01.infn.it
Flushing file system buffers
Signing new manifest
Running automatic garbage collection
  --> marking unreferenced objects [Thu, 10 Oct 2024 11:28:48 GMT]
  --> sweeping unreferenced objects [Thu, 10 Oct 2024 11:28:49 GMT]
      - 10%    475 / 4748 unreferenced revisions removed [Thu, 10 Oct 2024 11:29:05 GMT]
      - 20%    950 / 4748 unreferenced revisions removed [Thu, 10 Oct 2024 11:29:20 GMT]
      - 30%    1425 / 4748 unreferenced revisions removed [Thu, 10 Oct 2024 11:29:37 GMT]
      - 40%    1900 / 4748 unreferenced revisions removed [Thu, 10 Oct 2024 11:29:54 GMT]
      - 50%    2375 / 4748 unreferenced revisions removed [Thu, 10 Oct 2024 11:30:13 GMT]
      - 60%    2850 / 4748 unreferenced revisions removed [Thu, 10 Oct 2024 11:30:33 GMT]
      - 70%    3325 / 4748 unreferenced revisions removed [Thu, 10 Oct 2024 11:30:51 GMT]
      - 80%    3800 / 4748 unreferenced revisions removed [Thu, 10 Oct 2024 11:31:10 GMT]
      - 90%    4275 / 4748 unreferenced revisions removed [Thu, 10 Oct 2024 11:31:26 GMT]
  --> done garbage collecting [Thu, 10 Oct 2024 11:31:48 GMT]
Statistics stored at: /var/spool/cvmfs/repo01.infn.it/stats.db
Remounting newly created repository revision
[root@cvmfs-s0-s3cloudveneto almalinux]#

That is a long-standing issue: CVMFS_AUTO_TAG_TIMESPAN ignored by gateway · Issue #3017 · cvmfs/cvmfs · GitHub
It would be good to have a fix especially since this adds to the risk of the filesystem filling up if one doesn’t remember to do the manual workaround occasionally (thus apparently also increasing the risk of potential data corruption data corruption due to full filesystem, and snapshot fails with "unexpected HTTP error code 200" · Issue #3268 · cvmfs/cvmfs · GitHub).

Ok, understood
I’d also like a fix
For the time being we will live with the workaround (a transaction/publish on the the stratum0 from time to time)

Thanks, Massimo

I started to write the post about exactly the same issue, and when I found that it’s already signaled.

I hope that some fix will be available as well.

I suggest also to mention it in the documentation.
Actually I see in the documentation :

On every publish, automatically generated tags older than the defined threshold are removed.

I think if we mention that such transaction should be done directly on stratum and not on publisher it could help.
Because, we also discovered this behaviour by coincidence. Before we were persuaded that everything is normal.