Publisher stuck in unabortable transaction

I need help recovering a publisher.

I was attempting to publish some changes (cvmfs_server publish), and received a warning about open files. I control-c-ed out of the publish (perhaps mistakenly) to find and close open files, and after closing all open files I found I got into a predicament I cannot seem to get out of.

cvmfs_server publish is failing with invalid lease error
(dev-2:~# cvmfs_server publish
Using auto tag ‘generic-2023-07-28T22:01:29Z’
Processing changes…

Waiting for upload of files before committing…
Committing file catalogs…
Wait for all uploads to finish
SessionContext::DoUpload - error reply: {“reason”:“invalid lease”,“status”:“error”}
terminate called after throwing an instance of ‘ECvmfsException’
what(): PANIC: /home/sftnight/jenkins/workspace/CvmfsFullBuildDocker/CVMFS_BUILD_ARCH/docker-x86_64/CVMFS_BUILD_PLATFORM/cc8/build/BUILD/cvmfs-2.10.1/cvmfs/session_context.cc : 416
SessionContext: could not submit payload. Aborting.
/bin/cvmfs_server: line 4105: 856489 Killed $user_shell “$sync_command”
Synchronization failed

Executed Command:
cvmfs_swissknife sync -u /cvmfs/hpcsw.umd.edu -s /var/spool/cvmfs/hpcsw.umd.edu/scratch/current -c /var/spool/cvmfs/hpcsw.umd.edu/rdonly -t /var/spool/cvmfs/hpcsw.umd.edu/tmp -b e13106755c57a71a0d01f502c23e6de4b3dc00dc -r gw,/srv/cvmfs/hpcsw.umd.edu/data/txn,http://fs-1.zaratan.umd.edu:4929/api/v1 -w http://fs-1.zaratan.umd.edu/cvmfs/hpcsw.umd.edu -o /var/spool/cvmfs/hpcsw.umd.edu/tmp/manifest -e sha1 -Z default -C /etc/cvmfs/repositories.d/hpcsw.umd.edu/trusted_certs -N hpcsw.umd.edu -K /etc/cvmfs/keys/hpcsw.umd.edu.pub -L -D generic-2023-07-28T22:01:29Z -H /etc/cvmfs/keys/hpcsw.umd.edu.gw -P /var/spool/cvmfs/hpcsw.umd.edu/session_token -f overlayfs -p -l 4194304 -a 8388608 -h 16777216 -i
)

cvmfs_server abort again failes due to invalid lease.
(dev-2:~# cvmfs_server abort
You are about to DISCARD ALL CHANGES OF THE CURRENT TRANSACTION for hpcsw.umd.edu! Are you sure (y/N)? y
Error from gateway: ‘invalid lease’
gateway doesn’t recognize the lease or cannot drop it), even with -f flag
(dev-2:~# cvmfs_server abort -f
Error from gateway: ‘invalid lease’
force abort, continue despite error while trying to drop lease, removing session token. Error: gateway doesn’t recognize the lease or cannot drop it
hpcsw.umd.edu is not based on the newest published revision
umount: /cvmfs/hpcsw.umd.edu: target is busy.
Trying to unmount /cvmfs/hpcsw.umd.edu… fail
Trying to unmount /cvmfs/hpcsw.umd.edu… fail
)

cvmfs_server check won’t remount the fs because it is in a transaction, and just advises to abort (dev-2:~# cvmfs_server check
hpcsw.umd.edu is not based on the newest published revision
Repository hpcsw.umd.edu is in a transaction and cannot be repaired.
→ Run cvmfs_server abort hpcsw.umd.edu to revert and repair.)

Any suggestions on how to recover from this?
Thanks in advance.

OK, I guess the resolution suggested by Not able to abort a transaction after a cvms_receiver crash worked in this case too.

Waiting a while for the lease to expire and then retrying the abort appears to have worked.

I was not sure if it would expire because the URL http://SERVER:PORT/api/v1/leases was not showing any active leases.

Hopefully this will prove useful to someone else.

control-c-ed out of the publish

Yeah I think that could lead to problems. We had quite a headache one time when an analyst canceled an abort operation with Ctrl-C, so the transaction failed to abort properly. I think if you terminate an operation you can be left in an unclean state that may require manual cleanup.

Just as a side note: in the upcoming release abort -f will become more resilient.
(though sadly it will still not be possible to delete an “invalid” lease on the gateway by the publisher)

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.