VMware Log Insight - Upgrade from 8.1.0 to 8.1.1 and corrupted RPM db

3 minute read

The initial idea of this post was to do a quick walkthrough of the upgrade of VMware Log Insight from 8.1.0 to 8.1.1, however the upgrade gone sideways and I ended up troubleshooting and fixing an issue with the RPM db of the appliance.

After some digging, seems that the issue could happen in any of the CentOS, RHEL, or SUSE based appliances, since it is related to the RPM package management db being corrupted.

Starting the Upgrade

The VMware Log Insight upgrade process is pretty straight forward

  • Login to the UI with a user with Admin privileges Step 1
  • Go to Administration -> Cluster Step 2
  • Click Upgrade Cluster Step 3
  • Select the desired .pak file and wait… Step 4

OOOPPPPSSSS…. Something went wrong

OOpppps

Once the upgrade progress bar filled up completely and when I expected to be ready to start playing around with the new VMware Log Insight version, I was awarded with this error.

The error is pretty descriptive and potentially a bit overwhelming to some extent. However, when we start looking into it there are some hints giving some direction on the troubleshoot.

Failed to upgrade: Failed to read installed version: error: rpmdb: BDB0113 Thread/process 4858/139939578443968 failed: BDB1507 Thread died in Berkeley DB library error: db5 error(-30973) from dbenv->failchk: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery error: cannot open Packages index using db5 - (-30973) error: cannot open Packages database in /var/lib/rpm error: rpmdb: BDB0113 Thread/process 4858/139939578443968 failed: BDB1507 Thread died in Berkeley DB library error: db5 error(-30973) from dbenv->failchk: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery error: cannot open Packages index using db5 - (-30973) error: cannot open Packages database in /var/lib/rpm

The error information seems to be pointing out to an issue/corruption with the RPM database that it is stopping the upgrade to finish successfully.

I did not dig into the real reason why this got to this state, however I cannot say that is the most Production Ready environment. But lets fix it since the plan is to upgrade the VMware Log Insight 8.1.0 from 8.1.1.

Solution

So we need to recover the RPM database using the following steps.

  1. Taking a snapshot of the VM just to have a quick rollback if needed
  2. First lets login to the VMware Log Insight console using the root user

    Login

  3. Making a backup of /var/lib/rpm files, before we start

    mkdir /var/lib/rpm/backup
    cp -a /var/lib/rpm/__db.* /var/lib/rpm/backup/
    

    Backup RPM DB

  4. Remove the existing database files to avoid stale locks

    rm -f /var/lib/rpm/__db.*
    rpm --quiet -qa
    

    Remove old RPM DB

    Rebuild RPM DB

  5. Rebuild the RPM database

    rpm --rebuilddb
    yum clean all
    

    Rebuild RPM DB

    yum clean all

And we are ready to try to upgrade our VMware Log Insight again.

Upgrade - TAKE 2

  • We go back to VMware Log Insight UI.

    Upgrade TAKE 2 - Step 1

  • And we wait…

    Upgrade TAKE 2 - Step 2

  • Wait… this looks better now…

    Upgrade TAKE 2 - Step 3

  • We click Accept after going through the EULA, and we kick off the upgrade process

    Upgrade TAKE 2 - Step 4

  • And after waiting for a while the upgrade is successfully done

    Upgrade TAKE 2 - Success

    To tidy up, we can get rid of the VM Snapshot that was done before we started and the backup folder that we made.

Conclusion

In this case, we are just upgrading a single node VMware Log Insight, however I suspect that would be a similar process for a clustered deployment, with the same steps in each of the nodes, since the upgrade process of a cluster will upgrade all the nodes.

While searching for a solution I found a similar issue documented for VMware AppDefense appliances and some of steps, or probably all, were taken from it, so the issue seems to affect potentially any of the appliances with a CentOS, RHEL, or SUSE platform.