setrlimit not raising a signal with SIGXCPU

I ran into a unusual linux bug of late using RLIMIT_CPU to kill zombie processes that ran for longer than 15 seconds. The basic code I was using was:

struct rlimit rl;
memset(&rl, 0, sizeof(rl));
/* Set a CPU limit of 15 second. */
rl.rlim_cur = 15;
setrlimit(RLIMIT_CPU, &rl);
/* CPU Time exceeded */
signal(SIGXCPU, catchSignal);

This worked fine when I original wrote it a few years ago, but I noticed of late that I was spawning a large number of zombie processes that were not being trapped by SIGXCPU. After tearing my hair out all day trying to work out why, I noticed that this was only occurring on my CentOS 6.5 development box (Linux version 2.6.32). It turns out that there was a bug in the implementation of setrlimit before 2.6.17 that led a RLIMIT_CPU limit of 0 to be wrongly treated as “no limit” (like RLIM_INFINITY). Since 2.6.17 this is now treated as a limit of 1 second. Since I was setting rlim_max to 0 via memset meant that I was now effectively setting rlim_max to less than rlim_cur.

The solution is really simple – just ensure that you set rlim_max (hard limit) as well as rlim_cur (soft limit).

struct rlimit rl;
/* Set a CPU soft limit of 15 second and hard limit of 20 seconds */
rl.rlim_cur = 15;
rl.rlim_max = 20;
setrlimit(RLIMIT_CPU, &rl);
/* CPU Time exceeded */
signal(SIGXCPU, catchSignal);

Leave a Reply