After some days of inactivity new transaction sometimes remove files from previous transactions

About 1 year ago we have an accident on our CVMFS publishing system. We didn’t understood it well, so we do some fix on our CI-system and didn’t asked the question here.
But this year (about two weeks ago) this accident repeated. So may be we will try to ask.

I simplify a little bit, but here is the issue.
In the both cases, after some period of inactivity (about 2-3 days) we publish new transaction, and this new transaction removed files from previous transactions. But normally this new transaction should not touch any files in the previous transactions (they are two different parts of project and they have not any files in common).

Here is the example from 2023. This is the list of transactions with my comments:

(transaction0) S_Pipeline_develop_2023-10-19T01:32:08.938Z : untouched

(transaction1) N_I_Pipelines_2.6.0_2023-10-19T07:39:13.965Z : files were removed by transaction6
(transaction2) S_I_Pipelines_1.1.1_2023-10-20T07:36:27.356Z : files were removed by transaction6
(transaction3) S_I_Pipelines_1.1.2_2023-10-20T09:33:47.396Z : files were removed by transaction6
(transaction4) V_T_13.0.21_2023-10-20T10:57:31.291Z : files were removed by transaction6
(transaction5) V_I_Pipelines_13.0.21_2023-10-20T11:44:35.706Z : files were removed by transaction6

(transaction6) P_N_1.2.3_2023-10-23T06:06:54.518Z : remove 5 previous transactions

So, when we publish transaction6 at 23.10.2023 it removed all the files from 5 previous transactions.
If we look at the diff of transaction6 we could see that it removes files (but it definitely should not).

We think that it could be linked to Publishers auto umount.
Our suspicion is that

  • for some reason after some days of inactivity our /cvmfs/euclid-dev.in2p3.fr directory was empty ;
  • when we open the new transaction for some unknown reason it was remounted not in the latest state, but for some previous one (probably the state of transaction0).
  • so the files from transactions 1-5 were not in mounted version of repository, and so when transaction6 was published it removed all these files.

Do you think our idea is correct ? Did you see something similar before ? Do you have any suggestions that we could check/fix ?

I understand that this issue is strange (and fortunately very rare). I’m not sure if my explanations were good. If it’s not very understandable, I could try to re-write/re-explain the problem.

Thanks for the detailed report. Indeed we’ve seen something like this before, and you should definitely update to 2.12.7. The problem and fix is described here: Fix for `cvmfs_server transaction -t`: update repository state after waiting for lease by vvolkl · Pull Request #3771 · cvmfs/cvmfs · GitHub - We’ve only seen this for transactions that wait for a lease using the timeout -t option, but I think the bug could be similar. It’ll still take some debugging though

Do you happen to know which transaction ran on which publisher?

I think updating is the first step here, you mentioned you had some questions around the upgrade, do let me know if I can help.

Thank you very much for your help.

We don’t use “-t” option to open transaction, but you think that we could still be affected by this bug ?
And so upgrading to 2.12.7 could help ?

About upgrading to 2.12.7, we had some discussion internally and we identified two points.
Finally the first one seems to be not a problem. We still have some CVMFS publishers installed with CentOS and we still need to keep them for some legacy reasons.
But I checked and cvmfs package 2.12.7 is available on these servers (via cernvm repository) and so normally it’s not a problem.

The second point may be more complicated.
We have a lot of client and we are not sure that it could be easy to upgrade all of them to 2.12.7.
Does cvmfs 2.12.7 has some compatibility for the clients ?
If our Stratum1 is migrated to 2.12.7, could the clients with 2.11.0 version of client still use it ?

Also we have some “third-party” Stratum1 servers (it’s not really third-party, but they are servers installed by some other teams), could these servers stay on 2.11.0 if we migrate our Stratum0 to 2.12.7 ?

We don’t use “-t” option to open transaction, but you think that we could still be affected by this bug ? And so upgrading to 2.12.7 could help ?

That’s a very good question. The problem sounds very similar to the bug that is fixed in 2.12.7, but I’ll have to double-check the code paths when the transaction does not do a timeout. Can I still ask you to send me a cvmfs_config bugreport tarball from one of the publishers that seemed empty?

The update does in general help a lot with debugging, as there are improvements in logging and certain bugs are automatically eliminated.

If our Stratum1 is migrated to 2.12.7, could the clients with 2.11.0 version of client still use it ?
?

Absolutely! It’s usually very safe to update to all production versions (although we recommend to do a staged rollout). CVMFS is pretty good in terms of compatibility, even very old clients can read from the repositories written with the latest version, and vice versa. There are a few exceptions due to bugs ( between 2.9 and 2.10 the gateway and publishers needed to be updated in lockstep) but none apply in this case.

Also we have some “third-party” Stratum1 servers (it’s not really third-party, but they are servers installed by some other teams), could these servers stay on 2.11.0 if we migrate our Stratum0 to 2.12.7 ?

Also no problem.

Thank you very much.

For cvmfs_config bugreport, I’ll absolutely try to do it, but for the moment I could not reproduce this behavior.
May be I’ll try to provoke some inactivity putting PreProd CI system on maintenance (because even on PreProd it does some automatic builds and so may be it’s explain why we could not reproduce this problem). I’ll keep you in touch about it.

For 2.12.7 version, the compatibility with 2.11.0 is very good news, so I will immediately create a ticket in our internal system to plan this upgrade.