Question:
There is a table like this:
create table email_stats (
id bigserial primary key,
mailing_list_id int not null,
sended_date timestamp not null,
emails_send bigint not null default 0,
emails_clicks bigint not null default 0
);
Now we need to add a new field to it. So the task is simple,
alter table email_stats
add column emails_paid_clicks bigint not null default 0;
Here's just a problem: the tablet is already several tens of gigabytes in size, and this alter table
blocks all writing to the table for a long time.
How can a field be added without system downtime?
PS: strange, but did not find such a common question here
Answer:
If you are a happy user of PostgreSQL 11 or newer (but you don’t need to install beta versions in working projects), then all tricky tricks are now hidden inside and it’s enough to directly do alter table
:
set statement_timeout to '1s';
alter table email_stats
add column emails_paid_clicks bigint not null default 0;
The patch adopted in postgresql 11 allows you to no longer copy the entire table again when adding a new field with a default value. And the only problem with long table locks is when the alter table
is forced to wait for an exclusive lock to be obtained because some other transactions hold this lock. Therefore, it makes sense to set a small statement_timeout
and repeat attempts to execute alter table
.
If you have postgresql version below 11, then everything is not so simple. In the given request
alter table email_stats
add column emails_paid_clicks bigint not null default 0;
default 0
causes a long table lock. PostgreSQL takes a lock on the table and starts iterating through all the records in the table at the data level, and only after that releases the lock. But if you insert NULL
by default, then PostgreSQL will update only the metadata of the table in the service directory, which is very fast, but this is not what is required.
Therefore, it is possible to make the desired change, but it will not look like this at all, but in many actions.
Let's start making changes
First we add a new field as null by default – it's fast, just change the table's metadata. Then we set the desired default value – this value will already be applied for new inserted rows.
begin;
set local statement_timeout = '1s';
alter table email_stats
add column emails_paid_clicks bigint;
alter table email_stats
alter column emails_paid_clicks set default 0;
commit;
A separate explanation deserves why I changed the statement_timeout
in the transaction. This setting limits the maximum query execution time. This alter table
still requires a lock on the table, albeit for a short time, and there is a hidden rake here: what if the alter table
cannot take the lock due to some other ongoing transaction? For example, a simple insert
in another transaction will not allow you to take a lock to change the structure. But at the same time, a running alter table
will already block all subsequent writing queries to this table. A short statement_timeout
will quickly kill the alter table
and the service will continue to run. And an attempt to add a field can be painlessly repeated a little later until it eventually succeeds.
There is also an interesting rake that an explicit indication of default null
in an alter table
in some cases can be considered a NULL
value of a non-equivalent data type and the database will start rewriting the table again. In order not to catch this rare rake, it is better not to specify default null
explicitly, but to leave the one that is implied by add column
itself. By default, NULL
is assumed there.
The field has been added, now default is in the data
Long process, need inquiries
update email_stats
set emails_paid_clicks = default
where emails_paid_clicks is null
and /* следующий кусочек данных */
put down the desired value in the available table data. You need to update the data in pieces (for which you left an open condition in the request), making pauses between updates, monitor the replication lag (if any) and the autovacuum processes. There are several approaches for the update itself, the simpler one would be to update by primary or any unique key. We take any programming or scripting language of our choice and do something like this:
$current_id = (select min(id) from email_stats)
$maxid = (select max(id) from email_stats)
while ($current_id < $maxid) {
$batch_to = $current_id + 10000 // максимальный размер пачки для одного обновления
update email_stats
set emails_paid_clicks = default
where emails_paid_clicks is null
and id >= $current_id and id < $batch_to
$current_id = $batch_to
sleep(5) -- задержка между обновлениями чтобы меньше мешать сервису
}
During the execution of such a script, you can occasionally do vacuum email_stats
so that the plate does not increase much in size. Especially if autovacuum
is not configured aggressively enough and does not have time to clean up after the script.
The size of one update and the amount of pause between updates should be selected according to the load profile of a particular service. Small updates and long pauses will not bother anyone, but the script will simply run for a very long time.
The request example is not the most efficient, its execution time will greatly float from id gaps in the data and because the data will most likely be in different memory pages, but it is simple and you can easily adjust the maximum size for one update.
Of the pitfalls at this stage: it is much easier to catch a deadlock
by fighting with the application for updating rows if the application wants to update several rows from the same batch, but in a different order. You can set the connection in which the script of this update is running to set deadlock_timeout = 100ms
, then when deadlocking, as a rule, our script will be killed, and not a useful application transaction.
Set not null
Now we should not have null values in the added field in the table, we can put down not null.
alter table email_stats alter column emails_paid_clicks set not null;
This request will unfortunately put a write lock. But the execution time is much less than updating the entire table with overwriting the default value.
With a minimal write lock, you will have to abandon the native not null
property, but you can add a check
constraint with a similar property. First, add a constraint indicating not valid
(similar to statement timeout
for a transaction)
begin;
set local statement_timeout = '1s';
alter table email_stats
add constraint emails_paid_clicks_not_null
check (emails_paid_clicks is not null) not valid;
commit;
Then in another transaction without statement_timeout
alter table email_stats validate constraint emails_paid_clicks_not_null;
Checking the constraint will not block the write.
For postgresql 11 and newer, this entire instruction has become much shorter, but for beauty I will add a little more, it was not in vain that I tried, the patch wrote . Starting with postgresql 12 alter table set not null
may skip validating the data in the table if there are enough constraints to believe that the field is not NULL
. Yes, I'm just talking about the check constraint
that I just described how to create without a long lock. After the constraint has passed the validate constraint
, you can call set not null
and remove the no longer needed check
. This will still require a table lock, but now a short one.
begin;
set local statement_timeout = '1s';
alter table email_stats alter column emails_paid_clicks set not null;
alter table email_stats
drop constraint emails_paid_clicks_not_null;
commit;
Ready
That's it, the field has been added.
By the way, it makes sense to think about whether to use NULL by default. Such a field is not only much easier to add, but also stored much more compactly. The NULL value is one bit in the row header bitmask, the bigint value 0 already occupies 8 bytes.