.cvmfswhitelist expiry monitoring with S3

Hello,

We need automated monitoring (e.g. a cron job) that will create an alert (triggered by a syslog message that may be written by e.g. /usr/bin/logger) on a stratum 0 if any .cvmfswhitelist files have not been recently re-signed.

For conventional storage we have a simple solution:

00 10 * * * cvmfs /usr/bin/find /srv/cvmfs/*/.cvmfswhitelist -mtime +12 -exec /usr/bin/logger -t cvmfs-whitelist-check "CVMFS whitelist {} is about to expire" \;

This cron job checks all repositories for any .cvmfswhitelist files that are older than 12 days and avoids needing to do any date comparisons.

On S3 it is not so simple, but I think I have an option worked out:

aws --endpoint-url=https://s3.storage s3api list-objects-v2 --bucket cvmfs-s3-test.dev.computecanada.ca --prefix "s3-test.dev.computecanada.ca/.cvmfswhitelist" --query "Contents[?LastModified < '`date --date '-2 weeks' +%F`']"  | jq -e '.[0]' &> /dev/null ||  logger -t cvmfs-whitelist-check "CVMFS whitelist for s3-test.dev.computecanada.ca is about to expire"

The s3api query returns either an empty list or a list with one item depending whether the file modification date is within the last 2 weeks. Then jq -e tries to access the 1st item in the list and returns 0 or 1 depending whether it exists. Unfortunately I think it is only possible to check each repository individually so I would need to put it in a for loop, and the cron job would need to be provided with a list of all repositories, so it would have to be updated when new repos are added which is a disadvantage.

The only other option I could think of is to curl the .cvmfswhitelist files in the repos and | head -n 1 (which I infer gives the signing date). It is nice not to require s3 credentials but doing the date comparison in bash doesn’t seem as clean. Not sure which way would be most robust.

Looking further, rclone ls with the --max-age option might be a bit simpler than s3api.
(update, doesn’t work: “Can’t limit to single files when using filters”)

Does anyone have suggestions, improvements or other solutions for this?

Thanks.

Maybe servermon with an external URL (?) , but

My suggestion would be to look at the whitelist itself because the age of the file is only an indication of the actual signing date. Perhaps @ebocchi has more.

Hi @rptaylor,
I agree with Jakob: Checking the age of the file is not necessarily reliable (you may need to re-push old whitelists, and this would deceive your check).
What we do is to GET the whitelist from S3, parse it and use the line starting with E that encodes the expiry date of the whitelist. Then, yes, we do a date comparison and trigger a collectd-based alarm if the whitelist is going to expire in less-then-a-given-amount-of-days.

Maybe servermon with an external URL (?) , but

But what? That would be a good place to put it to benefit lots of sites instead of a single site. A pull request would be welcomed.

Do you have a script (python, bash?) for that Enrico?

Yes, but the repo is private. Let me mail it to you.

cvmfs-servermon-1.19 which is now in the cvmfs-contrib-testing repo now flags a WARNING when a .cvmfswhitelist is less than 48 hours (by default) from expiring, and marks it CRITICAL if it is expired. I plan to wait until after the holidays to promote it to cvmfs-contrib, but feel free to test it before then.

The new version also has an additional WARNING when geodb databases are more than 30 days old and another that gives a WARNING on repositories that failed under cvmfs_server check -a.

1 Like

Great, that will be very useful, thanks Dave!
We already have automated monitoring based on servermon so this will fit in nicely.

I tried it on a EL8 dev server:
http://cvmfs-s1dev-arbutus.computecanada.ca:8000/cvmfsmon/api/v1.0/all&format=details

but there were errors

{"CRITICAL": {"whitelist": [{"msg": "startswith first arg must be bytes or a "
                                    "tuple of bytes, not str",
                             "repo": "containers.dev.computecanada.ca"},
                            {"msg": "startswith first arg must be bytes or a "
                                    "tuple of bytes, not str",
                             "repo": "cvmfs-config.dev.computecanada.ca"},
                            {"msg": "startswith first arg must be bytes or a "
                                    "tuple of bytes, not str",
                             "repo": "data.dev.computecanada.ca"},
                            {"msg": "startswith first arg must be bytes or a "
                                    "tuple of bytes, not str",
                             "repo": "restricted.dev.computecanada.ca"},
                            {"msg": "startswith first arg must be bytes or a "
                                    "tuple of bytes, not str",
                             "repo": "soft-dev.computecanada.ca"},
                            {"msg": "startswith first arg must be bytes or a "
                                    "tuple of bytes, not str",
                             "repo": "test.dev.computecanada.ca"}]},

Also it looks like only one repository (which happens to be the most recently created one) reports the geodb status, I guess because there is only one geodb?

Thanks for the testing, Ryan. I believe I have fixed the startswith problem in version 1.20 now also in cvmfs-contrib-testing, although I don’t currently have an el8 test web service so I decided to rely on you to test it there. Please try it and let me know if it fixes the problem.

Yes the geo test is only done on one repo because there’s only one geodb.

Seems to work with the new version:
http://cvmfs-s1dev-arbutus.computecanada.ca:8000/cvmfsmon/api/v1.0/all&format=details

1 Like

I tried the servermon external URL feature, but it seems to rely on the existence of /cvmfs/info/v1/repositories.json . So it seems this would not work with S3 storage because in that case only the repository directory exists ; there is no /cvmfs/info .

Yes it does depend on repositories.json. I guess you could create it by hand or have some other mechanism for generating it.

Or, you could create a little stratum 1 for monitoring purposes. There is a feature for marking a stratum 1 repository as “pass-through” to an S3 for monitoring purposes. See CVM-1845 and let us know if that’s helpful for you.

I ended up making a python script:

#!/usr/bin/python3

import sys
import requests
import datetime
import syslog
from urllib.parse import urlparse

# Explicitly set the program name (avoids a leading slash showing up)
syslog.openlog(ident="cvmfs-whitelist-check")

try:
  url = sys.argv[1]
except Exception as e:
  syslog.syslog(syslog.LOG_ERR, f"CVMFS whitelist check failed with undefined URL, {type(e).__name__}: {e}")
  raise e

try:
  parsed_url = urlparse(url)
  assert all([parsed_url.scheme, parsed_url.netloc]), f"URL {url} is not valid"
  r = requests.get(url)
  assert r.status_code == 200, "HTTP request did not succeed"

  i = r.iter_lines()
  signature_line = next(i)
  expiry_line = next(i)
  expiry_line = expiry_line.decode('utf-8')
  # Strip off leading 'E'
  if expiry_line[0] == 'E':
    expiry_date = expiry_line[1:] 
  else:
    raise Exception("Expiry date is not in expected format.")

  expiry_date = datetime.datetime.strptime(expiry_date, '%Y%m%d%H%M%S')
  syslog.syslog(syslog.LOG_INFO, f'Signature in { url } expires on {expiry_date}')
  remaining = expiry_date - datetime.datetime.utcnow()
  if remaining.days < 12:
    syslog.syslog(syslog.LOG_ERR, f"CVMFS whitelist { url } is about to expire in { remaining.days } days")

except Exception as e:
  syslog.syslog(syslog.LOG_ERR, f"CVMFS whitelist check failed for { url }, {type(e).__name__}: {e}")
  raise e

Doesn’t that miss out on all the other things that cvmfs-servermon is monitoring for?

Yes, it would be nice to have both. However everything is defined in Ansible and automatically managed (including creation of a new repository) and we try to minimize the need for manual operations as much as possible. Also I don’t want to deal with Ansible vault (which would add a lot of complexity in the context of how we run Ansible) which would be needed to encrypt the s3 credentials in order to upload a file to s3 as part of the Ansible play.
I suppose we could make something like repositories.json on the local server, describing the remote repository, if there were a way to make that work. Or it would be nice if we could define a server alias, and all the repos hosted on it, in api.conf.

My intention with the python script was to generalize/replace the find cron job I mentioned before, which we use as another avenue of monitoring/alerting in addition to cvmfs-servermon.