Is there any way to find the source of table in vertica?

46 views Asked by At

We've got vertica server, many data sources (hadoop hive, postgres, some inner airflow dag pipelines), from which data come. There are several tables, which data sources are unknown. Responsible people have disappeared, no info in colfuence, jira, etc. Tables have fresh data, regularaly refreshing. Is there any way to trace process, some data to find linked server?

Actually dont know where to start. I am using pycharm to work with vertica.

1

There are 1 answers

0
marcothesane On

Try a query against both the query_requests and load_streams system tables:

If your table is poc.tgt and you suspect it's filled with an INSERT, go:

SELECT 
  session_id
, user
, REGEXP_SUBSTR(request, 'INSERT\s+INTO\s+poc.tgt', 1, 1, 'i') AS reqstart
FROM query_requests
WHERE regexp_ilike(request, 'INSERT\s+INTO\spoc.tgt')
LIMIT 64; -- the first 64 found should give you enough info

if your table's name is with_array and you suspect it's populated by a COPY, go:

SELECT 
  session_id
, user
, REGEXP_SUBSTR(request, 'COPY\swith_array') AS reqstart
FROM query_requests
WHERE regexp_ilike(request, 'COPY\swith_array')
LIMIT 64

Look at the other columns in query_requests and load_streams to see if you want to add other columns to your report - to find the populating process.

Also - join the user_sessions table with query_requests using session_id - and see if columns from user_sessions shed some more light on the matter.