I ran into a unusual linux bug of late using RLIMIT_CPU to kill zombie processes that ran for longer than 15 seconds. The basic code I was using was:
struct rlimit rl; memset(&rl, 0, sizeof(rl)); /* Set a CPU limit of 15 second. */ rl.rlim_cur = 15; setrlimit(RLIMIT_CPU, &rl); /* CPU Time exceeded */ signal(SIGXCPU, catchSignal);
This worked fine when I original wrote it a few years ago, but I noticed of late that I was spawning a large number of zombie processes that were not being trapped by SIGXCPU. After tearing my hair out all day trying to work out why, I noticed that this was only occurring on my CentOS 6.5 development box (Linux version 2.6.32). It turns out that there was a bug in the implementation of setrlimit before 2.6.17 that led a RLIMIT_CPU limit of 0 to be wrongly treated as “no limit” (like RLIM_INFINITY). Since 2.6.17 this is now treated as a limit of 1 second. Since I was setting rlim_max to 0 via memset meant that I was now effectively setting rlim_max to less than rlim_cur.
The solution is really simple – just ensure that you set rlim_max (hard limit) as well as rlim_cur (soft limit).
struct rlimit rl; /* Set a CPU soft limit of 15 second and hard limit of 20 seconds */ rl.rlim_cur = 15; rl.rlim_max = 20; setrlimit(RLIMIT_CPU, &rl); /* CPU Time exceeded */ signal(SIGXCPU, catchSignal);