Skip to content
Kendrick's Website Kendrick's GitHub Kendrick's Youtube Kendrick's Travel blog

PostgreSQL's Dead Tuple and Vacuum

2 min read
Cover

I detected autovacuum operations that were excessively using CPU in PostgreSQL. Since I had never seen this in databases like Oracle, MySQL, and MSSQL that I've used in the field, I looked into it and found it interesting, so I organized it.

Before learning about Dead Tuple and Vacuum/autovacuum, let me first cover MVCC for easier delivery.

MVCC (Multi-Version Concurrency Control)

MVCC translates to Multi-Version Concurrency Control.

For example, think of Transactions. Transactions can group work into units and Undo/Redo changes before Commit. They implement MVCC to ensure integrity for read/write operations.

An interesting point is that each database implements features slightly differently.

Summary of Main Differences

DBMSMVCC Basic OperationGarbage CollectionPerformance
PostgreSQLCreates new versions leaving Dead TuplesAutovacuumExcellent
MySQL (InnoDB)Manages previous versions in Undo LogPurge ThreadExcellent
OracleManages versions in Undo SegmentSMONVery Excellent
SQL ServerProvides similar method in Snapshot IsolationAutomatic version managementFlexible

As such, MVCC is implemented similarly across different DBs using different methods and terminology.

Dead Tuples

Dead tuples imagePostgreSQL's Dead Tuples are data that accumulates during updates/deletions.

Unlike databases like MySQL that use MVCC + Lock functionality, it reduces conflicts and improves performance when transactions occur simultaneously. Since modification work is possible even while reading data, it's a structure optimized for read performance.

So when is this accumulated data removed?

Autovacuum

Autovacuum imageAutovacuum exists to clean up this accumulated data. When Dead Tuples accumulate to a threshold, it executes Garbage Collection for optimization, and this is called Autovacuum.

The Autovacuum daemon runs and automatically executes when thresholds are reached. There's also manual vacuum execution available.

Conclusion

While learning about autovacuum, I found the MVCC implementation and differences across different DBs interesting, so I briefly organized it.

References