Misconfigured PostgreSQL Vacuum

The Common Problem: Misconfigured PostgreSQL Vacuum
A Vacuum Riddle:
-What’s socks but doesn’t suck?
-A vacuum!
(Except for the Postgres vacuum, which does—occasionally—suck.)
This playful riddle highlights the irony of PostgreSQL’s VACUUM process: it’s designed to clean up dead rows (much like a vacuum cleaner), but if misconfigured, it can drain performance from your system
It’s a topic that’s been discussed time and time again, yet many PostgreSQL databases continue to suffer from poorly configured vacuum processes. If you’ve ever had to troubleshoot a database performance issue, you’ve likely encountered this frustration.
- Symptom: Slow queries, high I/O usage, or ballooning database size.
- Cause: Inefficient or completely disabled VACUUM routines leaving behind dead tuples.
- Effect: Increased table bloat, slower performance, and more headaches for DBAs.
As a service to the public (and to help you get home earlier to your family—or your PlayStation), here are some basic rules for configuring your vacuum process properly.
Rule #1: Never Disable the Vacuum Process
The VACUUM process does exactly what its name implies—it vacuums up old data that’s no longer needed. Let’s break it down further:
Deletes Leave “Dead Rows”DELETE FROM mytable WHERE id=1;
When you delete a row from a table, the row is not immediately removed from the table’s data block. Instead, it’s marked as deleted, leaving a “dead row” that will be cleaned up later by the vacuum process.
Updates Also Leave “Dead Rows”UPDATE mytable SET salary=salary-100 WHERE name='Robert';
In this case, the old salary for Robert remains on the table, marked as dead, but still occupying space until VACUUM cleans it up. If Robert’s salary gets updated 1,000 times and VACUUM is disabled, there will be 1,000 dead rows for each update, but only one live row.
All these dead rows lead to table bloat, meaning your tables and indexes will consume more space. Performance suffers because PostgreSQL must sift through unnecessary dead data to find the live rows.
Moral of the story? Never, ever disable the vacuum process. It’s critical for maintaining your database’s health and performance.
Rule #2: Monitor the VACUUM Statistics
To ensure your vacuum process is keeping up with deletes and updates, it’s crucial to monitor the number of dead rows. One of the best ways to do this is by querying the pg_stat_user_tables view.
SELECT relname, n_live_tup, n_dead_tup
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;
- Live Tuples (n_live_tup): Represents rows currently considered “live.”
- Dead Tuples (n_dead_tup): Rows that have been updated or deleted but not yet cleaned up by vacuum.
A quick rule of thumb to detect problems:
- If n_dead_tup > 0.2 * n_live_tup, consider investigating.
- If n_dead_tup > n_live_tup, you definitely have a problem.
High numbers of dead tuples often indicate that VACUUM isn’t keeping up. This leads to table bloat, slows down queries, and increases storage needs. By regularly checking these stats, you can spot issues early and adjust your vacuum configuration before performance suffers.
Rule #3: Enable log_vacuum_min_duration
Long-running vacuum operations can clog your system if they remain unnoticed. Enabling the log_vacuum_min_duration parameter helps you spot vacuums that exceed a certain duration. For example, to log vacuums that take longer than 100 milliseconds:
ALTER SYSTEM SET log_vacuum_min_duration = 100;
After reloading your PostgreSQL config or restarting the server, any vacuum exceeding 100 ms will be written to the logs. You can then use a command like:
grep vacuum postgresql-Sun.log
This helps you quickly locate tables that trigger excessive vacuum activity or extended vacuum times. From there, you might need to tweak autovacuum settings—like autovacuum_vacuum_cost_delay or autovacuum_vacuum_threshold — to match your workload more effectively.
Rule #4: Avoid Overloading the Vacuum Process
Large, bulk deletes can overload the vacuum process. Rather than issuing a massive delete, such as:
DELETE FROM mytable WHERE created_at < '2023-01-01';
Consider partitioning large tables by date or other criteria. Then, you can simply drop entire partitions instead of deleting millions of rows. For instance:
ALTER TABLE mytable ATTACH PARTITION
... -- or
DROP TABLE mytable_2022
; -- if it’s a partition you no longer need
Dropping a partition removes the underlying data file without requiring a full vacuum of those millions of rows, significantly reducing vacuum overhead and improving overall performance.
Conclusion
For now, you’ve learned how to keep a close eye on your vacuum process and diagnose potential issues by monitoring dead rows and enabling vacuum logging. By preventing long-running vacuum operations and avoiding massive deletes through partitioning, you can significantly improve performance and reduce downtime.
Similar posts

Can't ignore the elephant in the room
Don’t ignore the elephant in the room—especially when it’s PostgreSQL.

Mongo vs. Postgres – Real-Life Comparison –Part 1
Are you asking which database is better—or which is better for you?

Mongo vs. Postgres – Real-Life Comparison –Part 2
From indexing tricks to handling terabytes of data—discover which database meets your needs best.