in short: we are facing a problem that full table scan is performed on remote Oracle DB instead of using the index.
Setup:
Postgres 12.3 in an enriched docker container with oracle base client, connecting to a remote Oracle DB in version 19c. The accessed table has 2M entries. The installed oracle_fdw is in version 2.30.
Question:
It seems that a select on a foreign table is not using the index of the foreign table. We want to select data in the foreign table depending on data in local tables. We tried different methods like joining or subselects, but the index on the foreign table is not used. We tried to generate immutable data with a function and this actually worked. For this single id the statement returns in 12ms and the explain plan shows that the index is used.
CREATE FUNCTION f_single()
RETURNS text LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
'SELECT id FROM local_table';
SELECT r.* FROM remote_table r WHERE r.id IN (SELECT f_ single ());
"Insert on another_local_table (cost=10000.00..10010.00 rows=1 width=5981) (actual time=11.855..11.855 rows=0 loops=1)"
" -> Foreign Scan on remote_table r (cost=10000.00..10010.00 rows=1 width=5981) (actual time=11.095..11.793 rows=1 loops=1)"
" Output: r.id"
" Oracle query: SELECT /*fcb71071ce9258eac9244f42c3067c30*/ r3."ID"FROM " REMOTE_TABLE " r3 WHERE (r3."ID" = '2351923')"
" Oracle plan: SELECT STATEMENT"
" Oracle plan: TABLE ACCESS BY INDEX ROWID REMOTE_TABLE "
" Oracle plan: INDEX UNIQUE SCAN PK_REMOTE_TABLE (condition "R3"."ID"='2351923')"
"Planning Time: 5.128 ms"
"Execution Time: 11.998 ms"
But it acutally is not working, if we return multiple rows with the function as seen here:
CREATE FUNCTION f_multi()
RETURNS setof text LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
'SELECT id FROM local_table';
SELECT r.* FROM remote_table r WHERE r.id IN (SELECT f_ multi ());
"Insert on another_local_table (cost=10022.26..20451397.84 rows=1000 width=5981) (actual time=264112.346..264112.346 rows=0 loops=1)"
" -> Hash Join (cost=10022.26..20451397.84 rows=1000 width=5981) (actual time=17482.841..264112.267 rows=1 loops=1)"
" Output: r.id "
" Inner Unique: true"
" Hash Cond: ((r.id)::text = (f_multi()))"
" -> Foreign Scan on remote_table r (cost=10000.00..20446000.00 rows=2043600 width=5981) (actual time=319.042..263161.299 rows=1981851 loops=1)"
" Output: r.id"
" Oracle query: SELECT /*ceeb047d793530c693667f5f6fada4d8*/ r3."ID FROM " REMOTE_TABLE" r3"
" Oracle plan: SELECT STATEMENT"
" Oracle plan: TABLE ACCESS FULL REMOTE_TABLE "
" -> Hash (cost=19.77..19.77 rows=200 width=32) (actual time=419.881..419.881 rows=1 loops=1)"
" Output: (f_multi())"
" Buckets: 1024 Batches: 1 Memory Usage: 9kB"
" -> HashAggregate (cost=17.77..19.77 rows=200 width=32) (actual time=419.878..419.878 rows=1 loops=1)"
" Output: (f_multi())"
" Group Key: f_multi()"
" -> ProjectSet (cost=0.00..5.27 rows=1000 width=32) (actual time=419.867..419.870 rows=1 loops=1)"
" Output: f_multi()"
" -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=419.744..419.745 rows=1 loops=1)"
"Planning Time: 4.804 ms"
"JIT:"
" Functions: 11"
" Options: Inlining true, Optimization true, Expressions true, Deforming true"
" Timing: Generation 1.896 ms, Inlining 3.663 ms, Optimization 82.373 ms, Emission 333.437 ms, Total 421.368 ms"
"Execution Time: 264114.529 ms"
In this case, the statements takes about 4 min to return even for one row. The explain plan shows that a Full Table Scan is performed.
Why is the index not used? What can we do to force the usage of the index?
If more information about the setup or the tables is needed, we will update the question.
We tracked down the needs to basically this (the WHERE statement seems not to be forced down to Oracle):
SELECT r.* FROM remote_table r
INNER JOIN local_table l
ON l.id = r.id;
Any help is appreciated. Thank you!
That is because the PostgreSQL optimizer transforms the query into a join, and oracle_fdw does not support parameterized paths, which could make the foreign scan on the inner side of a nested loop join efficient.
In the first case, the optimizer knows that the
IN
list must contain a single value and transforms the query to a simple equality condition.You might have more success with two queries: one that queries the result of
f_multi()
, and a second one with a constantIN
list constructed from the results of the first query.