Skip to main content
Skip table of contents

Repairing a corrupted SGE database

Note: Understanding the cause of sgemaster failing to start is important.  Before running these steps, there should be some indication of a database corruption issue in the logs.  These logs are located in /act/sge/default/spool/qmaster/messages.  A typical corruption error message may look like this:

CODE
03/07/2015 17:34:07| main|head|E|couldn't open berkeley database "sge": (22) Invalid argument
03/07/2015 17:34:07| main|head|E|startup of rule "default rule" in context "berkeleydb spooling" failed
03/07/2015 17:34:07| main|head|C|setup failed

or

CODE
03/12/2015 13:07:08| main|head|E|couldn't open database environment for server "local spooling", directory "/act/sge/default/spool/spooldb": (-30974) DB_RUNRECOVERY: Fatal error, run database recovery
03/12/2015 13:07:08| main|head|E|startup of rule "default rule" in context "berkeleydb spooling" failed
03/12/2015 13:07:08| main|head|C|setup failed

If your filesystem ever fills up or the system crashes as the wrong time, your SGE database may get corrupted. In your errors, take note of the database mentioned. In our example errors "sge" is the corrupted database. Here are steps that can usually repair the "sge" database so SGE will run properly again. The same steps below will work with the "sge_job" database as well.

CODE
cd $SGE_ROOT/default/spool
cp -a spooldb spooldb.bak
cd spooldb
db_verify sge
db_recover
db_dump -f sge.out sge
mv sge sge.old
db_load -f sge.out sge
db_verify sge
chown -R sgeadmin. $SGE_ROOT/default/spool
If the above does not work, this alternative method may work instead.
CODE
cd /act/sge
./sge_inst -bak

This starts an interactive backup script. Choose the default answers. Optionally, selecting not to use tar/gzip will make the backups easier to inspect.  The settings are saved to /act/sge/backup.  To fix the database corruption, simply restore this backup with the following.

CODE
./sge_inst -rst

This starts another interactive script, but to restore from backup. Answer all the questions, which should have correct default answers.  You can then start sgemaster without any issues.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.