Run BreakIn

It can be difficult to tell if a memory error is related to hardware or software. To help determine this we suggest running the ACT breakin utility to remove any possibility of software related errors.

Run memtest86+

memtest86+ is a free utility that will test writing and reading to the systems RAM. If your system does not already have memtest86+ as a boot option you can add it in CentOS by doing the following:

$ yum install memtest86+
$ memtest-setup
CODE

This will both install memtest86+ and run the initial setup to add it to the boot options in grub. When you are ready to run the test, reboot the machine and look for the Memtest86+ option on the grub boot option list.

Check system logs

Memory related errors can appear in many different ways. The following files are a good place to scan through for any errors related to memory.

$ cat /var/log/messages | less
$ cat /var/log/mcelog
$ dmesg
CODE

If your DIMMs have ECC capability the edac-util program can read information from EDAC (Error Detection and Correction) drivers in the kernel, using files exported by these drivers to record corrected and non-corrected errors. This can also be useful for narrowing down which DIMM errors are coming from.

$ edac-util -v
CODE

If you are unsure about any of the output from the utilities above you can send the output to support@advancedclustering.com and we will gladly look over the output for you.