The most surprising thing about pg_cron is that it lets you forget you’re running a scheduler at all, because it lives inside your PostgreSQL database.
Imagine you have a routine cleanup task, like purging old user data or updating aggregated statistics. Traditionally, you’d reach for cron on your server, or a dedicated job scheduler like Airflow or Rundeck. You’d write a shell script, configure the cron entry, and then have to manage that script and its environment separately from your database.
pg_cron flips this. It’s a PostgreSQL extension. You install it, and then you create jobs directly within your database using SQL INSERT statements.
Here’s a quick look at it in action. First, you need to install the extension. This usually involves downloading the source, compiling, and then running CREATE EXTENSION pg_cron; in your target database. Once installed, you can add a job like this:
INSERT INTO cron.job (schedule, command) VALUES
('0 0 * * *', 'TRUNCATE TABLE logs WHERE log_date < NOW() - INTERVAL ''7 days'';');
This INSERT statement tells pg_cron to execute the TRUNCATE command every day at midnight. The schedule uses standard cron syntax. The command is just a plain SQL statement that will be executed by the PostgreSQL server itself.
Let’s break down what’s happening here. pg_cron runs as a background worker process within your PostgreSQL instance. It periodically scans the cron.job table. When it finds a job whose schedule matches the current time, it executes the associated command. This execution happens within the same PostgreSQL transaction context as the pg_cron worker itself.
You have fine-grained control over your jobs:
schedule: The cron-style string defining when the job runs.command: The SQL statement to execute. This can be a simpleUPDATE, a complexSELECTthat triggers other actions, or even a call to a stored procedure.nodename: You can specify which PostgreSQL node a job should run on in a cluster. This is crucial for distributed setups.nodein: A comma-separated list of nodes to run on.nodeout: A comma-separated list of nodes not to run on.database: The specific database within your PostgreSQL instance where the command should run.active: A boolean to enable or disable a job without deleting it.jobname: A descriptive name for the job.
Consider a scenario where you need to run a VACUUM FULL operation on a specific table. Instead of scheduling it externally, you can do this:
INSERT INTO cron.job (schedule, command, database, jobname) VALUES
('0 3 * * 1', 'VACUUM FULL my_large_table;', 'production_db', 'vacuum_large_table');
This job runs every Monday at 3 AM in the production_db database, performing a potentially resource-intensive VACUUM FULL. Because it runs inside PostgreSQL, it benefits from the database’s internal resource management.
The one thing that trips people up is how pg_cron handles errors and output. By default, pg_cron doesn’t log the output of your commands to a file accessible from the OS. Instead, it writes them to the PostgreSQL server logs. If you’re used to seeing cron job output in /var/log/syslog or a dedicated job log file, you’ll need to adjust your monitoring to check your PostgreSQL logs (postgresql.log or similar, depending on your log_directory configuration). You can query the cron.job_run_details table to see the history of job executions, including success or failure and any output or error messages.
Understanding the cron.job_run_details table is key to debugging. It stores the jobid, run_at timestamp, jobname, and importantly, status ('OK' or 'failed') and output. You can query it like this to find recent failures:
SELECT * FROM cron.job_run_details WHERE status = 'failed' ORDER BY run_at DESC LIMIT 10;
This gives you immediate insight into what went wrong with a specific job execution.
The next logical step after mastering simple SQL jobs is to explore how pg_cron integrates with more complex PL/pgSQL functions for sophisticated task orchestration.