VMware Log Insight - Upgrade from 8.1.0 to 8.1.1 and corrupted RPM db
The initial idea of this post was to do a quick walkthrough of the upgrade of VMware Log Insight from 8.1.0 to 8.1.1, however the upgrade gone sideways and I ended up troubleshooting and fixing an issue with the RPM db of the appliance.
After some digging, seems that the issue could happen in any of the CentOS, RHEL, or SUSE based appliances, since it is related to the RPM package management db being corrupted.
Starting the Upgrade
The VMware Log Insight upgrade process is pretty straight forward
- Login to the UI with a user with Admin privileges
- Go to Administration -> Cluster
- Click Upgrade Cluster
- Select the desired .pak file and wait…
OOOPPPPSSSS…. Something went wrong
Once the upgrade progress bar filled up completely and when I expected to be ready to start playing around with the new VMware Log Insight version, I was awarded with this error.
The error is pretty descriptive and potentially a bit overwhelming to some extent. However, when we start looking into it there are some hints giving some direction on the troubleshoot.
Failed to upgrade: Failed to read installed version: error: rpmdb: BDB0113 Thread/process 4858/139939578443968 failed: BDB1507 Thread died in Berkeley DB library error: db5 error(-30973) from dbenv->failchk: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery error: cannot open Packages index using db5 - (-30973) error: cannot open Packages database in /var/lib/rpm error: rpmdb: BDB0113 Thread/process 4858/139939578443968 failed: BDB1507 Thread died in Berkeley DB library error: db5 error(-30973) from dbenv->failchk: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery error: cannot open Packages index using db5 - (-30973) error: cannot open Packages database in /var/lib/rpm
The error information seems to be pointing out to an issue/corruption with the RPM database that it is stopping the upgrade to finish successfully.
I did not dig into the real reason why this got to this state, however I cannot say that is the most Production Ready environment. But lets fix it since the plan is to upgrade the VMware Log Insight 8.1.0 from 8.1.1.
Solution
So we need to recover the RPM database using the following steps.
- Taking a snapshot of the VM just to have a quick rollback if needed
-
First lets login to the VMware Log Insight console using the root user
-
Making a backup of /var/lib/rpm files, before we start
mkdir /var/lib/rpm/backup cp -a /var/lib/rpm/__db.* /var/lib/rpm/backup/
-
Remove the existing database files to avoid stale locks
rm -f /var/lib/rpm/__db.* rpm --quiet -qa
-
Rebuild the RPM database
rpm --rebuilddb yum clean all
And we are ready to try to upgrade our VMware Log Insight again.
Upgrade - TAKE 2
-
We go back to VMware Log Insight UI.
-
And we wait…
-
Wait… this looks better now…
-
We click Accept after going through the EULA, and we kick off the upgrade process
-
And after waiting for a while the upgrade is successfully done
To tidy up, we can get rid of the VM Snapshot that was done before we started and the backup folder that we made.
Conclusion
In this case, we are just upgrading a single node VMware Log Insight, however I suspect that would be a similar process for a clustered deployment, with the same steps in each of the nodes, since the upgrade process of a cluster will upgrade all the nodes.
While searching for a solution I found a similar issue documented for VMware AppDefense appliances and some of steps, or probably all, were taken from it, so the issue seems to affect potentially any of the appliances with a CentOS, RHEL, or SUSE platform.