Yes, fillfactor again. I spend many hours reading and I can't decide what's best for each case. I don't understand when and how fragmentation happens. I'm migrating a database from MS SQL Server to PostgreSQL 9.2.
Case 1
10-50 inserts / minute in a sequential (serial) PK, 20-50 reads / hour.
CREATE TABLE dev_transactions (
transaction_id serial NOT NULL,
transaction_type smallint NOT NULL,
moment timestamp without time zone NOT NULL,
gateway integer NOT NULL,
device integer NOT NULL,
controler smallint NOT NULL,
token integer,
et_mode character(1),
status smallint NOT NULL,
CONSTRAINT pk_dev_transactions PRIMARY KEY (transaction_id)
);
Case 2
Similar structure, index for serial PK, writes in blocks (one shot) of ~ 50.000 registers every 2 months, readings 10-50 / minute.
Does a 50% fillfactor mean that each insert generates a new page and moves 50% of existing rows to a newly generated page?
Does a 50% fillfactor mean frees space is allocated between physical rows in new data pages?
A new page is generated only if there is no free space left in existing pages?
As you can see I'm very confused; I would appreciate some help — maybe a good link to read about PostgreSQL and index fillfactor.
FILLFACTORWith only
INSERTandSELECTyou should use aFILLFACTORof100for tables (which is the default anyway). There is no point in leaving wiggle room per data page if you are not going to "wiggle" withUPDATEs.The mechanism behind
FILLFACTORis simple.INSERTs only fill data pages (usually 8 kB blocks) up to the percentage declared by theFILLFACTORsetting. Also, whenever you runVACUUM FULLorCLUSTERon the table, the same wiggle room per block is re-established. Ideally, this allowsUPDATEto store new row versions in the same data page, which can provide a substantial performance boost when dealing with lots ofUPDATEs. Also beneficial in combination with H.O.T. updates. See:Indexes need more wiggle room by design. They have to store new entries at the right position in leaf pages. Once a page is full, a relatively costly "page split" is needed. So indexes tend to bloat more than tables. The default
FILLFACTORfor a (default) B-Tree index is90(varies per index type). And wiggle room makes sense for just INSERTs, too. The best strategy heavily depends on write patterns.Example: If new inserts have steadily growing values (typical case for a
serialortimestampcolumn), then there are basically no page-splits, and you might go withFILLFACTOR = 100(or a bit lower to allows for some noise).For a random distribution of new values, you might go below the default 90 ...
Basic source of information: the manual for
CREATE TABLEandCREATE INDEX.Other optimization
But you can do something else - since you seem to be a sucker for optimization ... :)
This optimizes your table with regard to data alignment and avoids padding for a typical 64 bit server and saves a few bytes, probably just 8 byte on average - you typically can't squeeze out much with "column tetris":
Keep
NOT NULLcolumns at the start of the table for a very small performance bonus.Your table has 9 columns. The initial ("cost-free") 1-byte NULL bitmap covers 8 columns. The 9th column triggers an additional 8 bytes for the extended NULL bitmap - if there are any NULL values in the row.
If you make
et_modeandtokenNOT NULL, all columns areNOT NULLand there is no NULL bitmap, freeing up 8 bytes per row.This even works per row if some columns can be NULL. If all fields of the same row have values, there is no NULL bitmap for the row. In your special case, this leads to the paradox that filling in values for
et_modeandtokencan make your storage size smaller or at least stay the same:Basic source of information: the manual on Database Physical Storage.
Compare the size of rows (filled with values) with your original table to get definitive proof:
(Plus maybe padding between rows, as the next row starts at a multiple of 8 bytes.)