PostgreSQL Database Log Transaction

3.6k views Asked by At

I'm a last year college student and I'm doing my thesis right now. My title is "Index Suggestion based on Log Analysis". This project will analyze the PostgreSQL transaction log to give index recommendation to the database that will be tested.

This research will develop an index recommender tool by analyzing the attribute that is frequently accessed (using SELECT statement).

But, I found it's hard to find the PostgreSQL log file. My question is, where can I find PostgreSQL log transaction dataset? Or maybe other database log transaction dataset?

3

There are 3 answers

3
Laurenz Albe On

You are mixing up the transaction log (WAL) and the regular text log file.

The latter does contain statements (if the configuration is set like that), while the transaction log doesn't contain statements at all, just binary information about what has changed in which block.

You won't be able to recommend an index just from looking at the query, I can't do that either.

I have a suggestion for you: if you want to write a tool that suggests indexes, it should take the output of EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) SELECT /* your query */ as input.

Moreover, the tool will have to be connected to the database to query table and index metadata (and perhaps statistics). That makes you dependent on the database version, because metadata can change (and do – see partitioned tables), but that won't concern you so much in a thesis paper.

The task is still not simple (query optimization is AI), but then you have at least a chance.

0
Super Kai - Kazuya Ito On

You need to run the query below then restart PostgreSQL to enable logging persistently. *The parameter with ALTER SYSTEM SET is set to postgresql.auto.conf rather than postgresql.conf:

ALTER SYSTEM SET log_statement = 'all';

And, you need to run either of the queries below then restart PostgreSQL to disable logging persistently:

ALTER SYSTEM RESET log_statement;

Or:

ALTER SYSTEM SET log_statement = 'none';

You can also run the query below then need to restart PostgreSQL to enable logging persistently:

ALTER SYSTEM SET log_min_duration_statement = 0;

And, you can also run either of the queries below then need to restart PostgreSQL to disable logging persistently:

ALTER SYSTEM RESET log_min_duration_statement;

Or:

ALTER SYSTEM SET log_min_duration_statement = -1;

You can see my answer explaining more about how to enable and disable query logs on PostgreSQL.

0
saolof On

A bit late to the party here, but the thing you would probably want in practice is pg_stat_statements. Use it to list the queries with the highest total_exec_time, and look at their query plans. Then you would consider adding indexes that would speed up joins or scans in those queries.

This should be possible to automate to some extent. Similarly, recommending indexes to drop should be possible to do using index usage statistics. Personally, I'd love to have a tool that does this kind of suggestions automatically, and it would be a great example of profile guided optimization.