These days I helped to troubleshoot a RedHat Cluster issue.
When we run command 'service cman start' we would get a 'FAILED' result and from the /var/log/messages I found below similar error message:
kernel:oddjobd : segfault at 8 ip 00000034d1e0b393 sp 00007fff83540f80 error 4 in libc-2.12.so[7fcfc7726000+18a000]
Note: I could not access the customer system easily, so the above message was not a real one but a similar one.
At the beginning I found the HA-LVM was not configured correctly so I thought maybe when the cman tried to start the service and failed. So we got a down time and I fixed the previous LVM issue, but we still got the same error and could not start the cluster.
As the error message mentioned libc, I guessed it should be related with library files. I tired strace to get more information about 'service cman start' but failed. When I searched the key words 'error 4 in libc-2.12.so' on RedHat website, I found below similar issue:Why yum segfaults in Red Hat Enterprise Linux 6?
I run 'ls -lrt /etc/ld.so.conf*' and found one new zlib library config file was added in April and the library cache was generated on Oct 8 and the cluster failed from that day. I also found oddjobd was a execute file so I run 'ldd /usr/sbin/oddjobd' to check its library file. The oddjobs was linked with the new zlib file like '/usr/local/lib/libz.so.1' not '/lib64/libz.so.1'.
I moved the zlib library config file out of /etc/ld.so.conf.d and run ldconfig to regenerate the library cache then run 'ldd /usr/sbin/oddjobd' to confirm the default zlib library file was linked.
Finally I run 'service cman start' and the error message disappeared.
So the wrong zlib library config file was introduced in April and the customer failed to start the cluster after the server was rebooted ( library cache would be regenerated automatically during reboot) on Oct 8.