Skip to main content
Skip table of contents

Checking and Clearing InfiniBand Errors

An easy way to check for errors on your entire cluster IB network is to run the command ‘ibcheckerrors.’ If you need to install it: dnf install infiniband-diags-compat Also, infiniband-diags has the command 'ibqueryerrors' which replaces the depreciated ‘ibcheckerrors’ command.

This will print any errors that can range from a port being down (even just unplugged temporarily) to transmission errors. After troubleshooting any errors you find, you can clear out the error counters with the command ‘ibclearerrors’ (Depreciated see Red Hat : Please use ibqueryerrors -k instead of ibclearerrors command. Please use ibqueryerrors -K instead of ibclearcounters command.).
(Note: most IB errors can be resolved by reseating the cables on both ends for any ports that showed errors.)
 
The output of ‘ibcheckerrors’ can be confusing when you’re trying to determine which physical ports the errors are happening on. A good way to see which ‘lid’ an error happens on is by using the tool ‘ibnetdiscover’ to print out your entire, active IB network layout.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.