Cvmfs check during transaction

paul.zakharov · October 4, 2023, 2:32pm

We have 2 publisher nodes connected to stratum0 server (in gateway mode).

I’d like to run
cvmfs_server check <repo>
for some of ours repos.

I’d like to know if cvmfs_server check could be launched during transaction ? Or if some transaction will be open during cvmfs_server check.
If it should be avoided, is it enough to stop cvmfs-gateway.service to be safe ?

And small additional question, is it safe to run directly cvmfs_server check -r to repair the problem automatically ?

dwd · October 5, 2023, 9:34pm

I’d like to know if cvmfs_server check could be launched during transaction ? Or if some transaction will be open during cvmfs_server check.

My experience with cvmfs_server check is mostly on stratum 1s, but I believe it can be run simultaneously to other operations without a problem.

is it safe to run directly cvmfs_server check -r to repair the problem automatically ?

Note that that only repairs reflog problems. When there’s a problem with a reflog it generally stops other operations, and it is safe to use the -r option to repair the reflog.

paul.zakharov · October 6, 2023, 8:30am

Thank you for reply.

My experience with cvmfs_server check is mostly on stratum 1s, but I believe it can be run simultaneously to other operations without a problem.

I have a repo on stratum 0 that indicated as unhealthy. I thought that “cvmfs_server check” is a right way to fix this “unhealthy” status.
Are there some other method ?

dwd · October 9, 2023, 9:58pm

That should at least tell you what’s wrong so that you can hopefully fix it yourself.

jakob · October 10, 2023, 9:50pm

Ususally, cvmfs_server list should already have a reason for the unhealthy state. Can you share the full output / line for this repository? It may be an issue where the repository check does not help, e.g. Apache not running, repository not mounted, or an expired whitelist.

paul.zakharov · October 11, 2023, 9:41am

Thank you very much.

cvmfs_server list doesn’t give me any useful information (at least from my point of view).

Here is the output :

~# cvmfs_server list
euclid-dev.in2p3.fr (stratum0 / local - unhealthy)  
euclid-test.in2p3.fr (stratum0 / local - unhealthy)  
euclid.in2p3.fr (stratum0 / local - unhealthy)

If I do cvmfs_server info on one of repos it doesn’t give much more information. Whitelist and everything else seems to be ok.

In general there is no transactions directly on this stratum0 server.
We have two publishers connected to gateway running on this server.

It should be note, that this unhealthy status doesn’t prevent normal operation. At least from global point of view. We are still able to make transaction and publish from publishers.
Another point, this unhealthy status come back regularly.
It was repaired some time ago (probably using “chvms_server check”, but I’m not sure for 100% : hence my original question).
But this unhealthy status came back again without any obvious reason.

For the moment I have two purposes :

learn how to repair unhealthy status (cvmfs_server check? or something else ?)
understand how to prevent or repair it automatically (using crontab ? but how it will do if we have an transaction/publish during check ?)

jakob · October 11, 2023, 9:37pm

So if I understand correctly, the unhealthy state appears only on the gateway, not on the publishers?

This is most likely a presentation error. The health check done on the gateway does not take into account that remote publishers change the repository. So then it is surprised that the gateway has an outdated repository version mounted and declares it as “unhealthy”. This does not have any further consequences though.

We’ll fix the cvmfs_server list command.

paul.zakharov · October 12, 2023, 3:14pm

In fact the both show unhealthy status : the gateway (stratum0) and the publishers.
Here is the output from one of publisher :

~]# cvmfs_server list 
euclid-dev.in2p3.fr (stratum0 / gw - unhealthy)  
euclid.in2p3.fr (stratum0 / gw - unhealthy)

But, I think you analysis is probably true.

paul.zakharov · October 20, 2023, 2:17pm

A small update on this issue.

I did cvmfs_server check euclid.in2p3.fr. it worked well and euclid.in2p3.fr became healthy.
But if I do transaction/publish right after check euclid.in2p3.fr became again unhealthy.

dwd · October 20, 2023, 3:51pm

Please post any relevant messages that the check command printed for the repair it did, and the complete output of the following publish command.

paul.zakharov · October 23, 2023, 2:08pm

Thank you, dwd.
Here the outputs.
I have euclid.in2p3.fr repo marked as unhealthy on my Stratum0

So on Stratum0 I do check :

stratum0 ~$ cvmfs_server check euclid.in2p3.fr
euclid.in2p3.fr is not based on the newest published revision
Note: Trying to umount /cvmfs/euclid.in2p3.fr... success
Note: Trying to umount /var/spool/cvmfs/euclid.in2p3.fr/rdonly... success
Note: Trying to mount /var/spool/cvmfs/euclid.in2p3.fr/rdonly... success
Note: Trying to mount /cvmfs/euclid.in2p3.fr... success
Verifying integrity of euclid.in2p3.fr...
Inspecting log of references
Inspecting tag database
[inspecting catalog] 5f12500053af5c896b61458775d976bc27d579f2 at /
[inspecting catalog] 0c5b6f93d1143d1f8eff0e5459b8aaf26a591a95 at /CentOS7/PipelineRunner-1.0.4
[skip a lot of inspecting catalog]
no problems found

I skip a lot “inspecting catalog” as there was nothing interesting.
euclid.in2p3.fr become healthy.

Then on pubsliher1 I open transaction an publish it :

pubsliher1 ~]$ cvmfs_server transaction euclid.in2p3.fr
Gateway reply: ok
pubsliher1 ~]$ cvmfs_server publish euclid.in2p3.fr
Using auto tag 'generic-2023-10-23T13:55:57Z'
Processing changes...
Waiting for upload of files before committing...
Committing file catalogs...
Wait for all uploads to finish
Exporting repository manifest
Statistics stored at: /var/spool/cvmfs/euclid.in2p3.fr/stats.db
Changes submitted to repository gateway

Right after publishing, I do cvmfs_server list on Stratum0 and euclid.in2p3.fr become unhealthy again.

I also do transaction and publish with CVMFS_SERVER_DEBUG=3. I could share the results, but they are huge and I don’t see anything relevant inside.

dwd · October 24, 2023, 6:07pm

I don’t know much about the gateway, but since there were no errors on the publisher I would assume the problem is on the gateway and that there should be a log to look in there.

rptaylor · November 1, 2023, 6:20pm

That is what Jakob explained. “Unhealthy” in that situation just means out of date. An update was published from a different node that the current node was not aware of yet. Most commands (most importantly, starting a new transaction) automatically remount the repository before proceeding, thereby updating it to the latest revision. You can also just do cvmfs_server mount -a to bring all the repos up to date and make them healthy if one was out of date.

rptaylor · November 1, 2023, 6:43pm

@jakob I have a similar question. We have a gateway (s0) and remote publisher. The gateway has automated server checks via cron (with -i so they take awhile).

As I understand, running a check on the s0 does not start a transaction, so it should be safe to have that concurrent with remote publisher activity. However when a publisher started a transaction while the check was running, aside from being very slow due to the check IO, the publisher node showed that the repo is unhealthy during the transaction:

soft-dev.computecanada.ca (stratum0 / gw - in transaction - unhealthy)

Is it safe to start a remote transaction during a check ? Why is it reported unhealthy and will it eventually succeed?
Thanks!

system · November 16, 2023, 6:44pm

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.