Error checking
Sat Apr 7 16:12:40 UTC 2012
Error checking
Suddenly at a previous workplace, a HP-UX machine running Sendmail suddenly started beeping regularily, in intervals of a few minutes. Further examination showed: it was the X server no longer starting up properly.
Yes, I know, a X server is not supposed to run on a mail server. But this is not the problem I am talking about here.
Further examination led to the X server failing to start up
because of wrong permissions in /etc
–anything there
suddenly was owned by the group mail
, and having
664
permissions. How can this have happened?
The culprit was easily found: a cronjob along the lines of
13 * * * * cd /var/spool/mail; find . -type f -exec chgrp mail {} \; -exec chmod 664 {} \;
At a certain time, the remote mount of
/var/spool/mail
was down, making the cd
command fail. However, the cronjob just continued to run, and
performed that nasty find
command all over the root
file system.
We were lucky that the server did not succeed in accessing the user home directories due to the same issue...
The Solution
First of all: don't do such weird hacks to
solve a setup problem of e.g. Sendmail. These hacks never help. In
this case, it only was necessary due to a bug in the procmail
setup. Typically, procmail gains root privileges by setuid bit,
then changes its user context to the target user's in order to run
commands from the user's .procmailrc
file. For some
reason, this mechanism was not working on the system, and the
previous admins thought it to be a good idea to install such a cron
job to work around the problem. As a side effect, this meant that
any user had full access to any user's mail by
simply putting the right commands in their
.procmailrc
.
This specific problem however stemmed from being careless when
writing shell scripts. One should never perform such a critical
command on anything it is not meant for. In this case, this means:
either should one have used an explicit path specification as
argument to find
:
13 * * * * find /var/spool/mail -type f -exec chgrp mail {} \; -exec chmod 664 {} \;
Or, one should check the status of the cd
command
and abort on failure:
13 * * * * cd /var/spool/mail && find -type f -exec chgrp mail {} \; -exec chmod 664 {} \;
Or, one could get used to implicit error abort in POSIX shells
by using set -e
:
13 * * * * set -e; cd /var/spool/mail; find -type f -exec chgrp mail {} \; -exec chmod 664 {} \;
Any of these measures would have prevented the catastrophic failure that in the end led to the system being reinstalled in order to restore permissions back to normal.
Lesson learned: check for error conditions, bail out to prevent worse things from happening