I detected autovacuum operations that were excessively using CPU in PostgreSQL. Since I had never seen this in databases like Oracle, MySQL, and MSSQL that I've used in the field, I looked into it and found it interesting, so I organized it.
Before learning about Dead Tuple and Vacuum/autovacuum, let me first cover MVCC for easier delivery.
MVCC (Multi-Version Concurrency Control)
MVCC translates to Multi-Version Concurrency Control.
For example, think of Transactions. Transactions can group work into units and Undo/Redo changes before Commit. They implement MVCC to ensure integrity for read/write operations.
An interesting point is that each database implements features slightly differently.
Summary of Main Differences
DBMS | MVCC Basic Operation | Garbage Collection | Performance |
---|---|---|---|
PostgreSQL | Creates new versions leaving Dead Tuples | Autovacuum | Excellent |
MySQL (InnoDB) | Manages previous versions in Undo Log | Purge Thread | Excellent |
Oracle | Manages versions in Undo Segment | SMON | Very Excellent |
SQL Server | Provides similar method in Snapshot Isolation | Automatic version management | Flexible |
As such, MVCC is implemented similarly across different DBs using different methods and terminology.
Dead Tuples
PostgreSQL's Dead Tuples are data that accumulates during updates/deletions.
Unlike databases like MySQL that use MVCC + Lock functionality, it reduces conflicts and improves performance when transactions occur simultaneously. Since modification work is possible even while reading data, it's a structure optimized for read performance.
So when is this accumulated data removed?
Autovacuum
Autovacuum exists to clean up this accumulated data. When Dead Tuples accumulate to a threshold, it executes Garbage Collection for optimization, and this is called Autovacuum.
The Autovacuum daemon runs and automatically executes when thresholds are reached. There's also manual vacuum execution available.
Conclusion
While learning about autovacuum, I found the MVCC implementation and differences across different DBs interesting, so I briefly organized it.