AWS RDS PostgreSQL - copying from/to csv files on EC2 instance

682 views Asked by At

I've run into problem that I can't fix for a few days.

The thing is - I have following architecture:

  • Two EC2 instances which are nodes running Trifacta application (some kind of application for data scientists),
  • AWS RDS PostgreSQL instance.

Since the newest version this Trifacta application is using new schema in database which performs some database migrations at the start of application. During the startup some tables are copied into *.csv files and then copied back into tables from this *csv files. It's all okay when it's run on local database because superuser role in postgresql allows for such actions. When it comes to performing it on AWS RDS PostgreSQL instance it falls in following errors:

Error running query COPY (select "id" from workspaces) TO '/tmp/workspaces.csv' DELIMITER ',' CSV HEADER; error: must be superuser to COPY to or from a file
    at Connection.parseE (/opt/trifacta/migration-framework/node_modules/pg/lib/connection.js:614:13)
    at Connection.parseMessage (/opt/trifacta/migration-framework/node_modules/pg/lib/connection.js:413:19)
    at Socket.<anonymous> (/opt/trifacta/migration-framework/node_modules/pg/lib/connection.js:129:22)
    at Socket.emit (events.js:315:20)
    at addChunk (_stream_readable.js:295:12)
    at readableAddChunk (_stream_readable.js:271:9)
    at Socket.Readable.push (_stream_readable.js:212:10)
    at TCP.onStreamRead (internal/stream_base_commons.js:186:23) {
  length: 178,
  severity: 'ERROR',
  code: '42501',
  detail: undefined,
  hint: "Anyone can COPY to stdout or from stdin. psql's \\copy command also works for anyone.",
  position: undefined,
  internalPosition: undefined,
  internalQuery: undefined,
  where: undefined,
  schema: undefined,
  table: undefined,
  column: undefined,
  dataType: undefined,
  constraint: undefined,
  file: 'copy.c',
  line: '905',
  routine: 'DoCopy'
} 

It's just first one, there are a lot of them. I made a research on that and figured why it's happening. AWS is using rds_superuser role instead of superuser and privilleges of this role aren't sufficient for copying from/to local filesystem. From psql console it can be done with using \copy instead of copy but in my case it isn't any helpful because the way Trifacta does it is executing SQL queries from their *.js files and as far as I know it isn't possible to run \copy query from anywhere else than psql CLI.

With a suggestion of @IMSoP I am adding the code of Trifacta *.js file where the actions are performed:

ConnectUtils.copyQuery = function(query, connection, options = {}) {
  ensure.notNull(connection.base.DriverName, 'connection driver name');
  ensure.notNull(options.tableName, 'table name');
  const table = options.tableName;
  const filePath = ConnectUtils.getOutputFilePath(table, options);
  if (connection.base.DriverName === DATABASE_JS_TYPE[MYSQL]) {
    return `${query} INTO OUTFILE \'${filePath}\' FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n'`;
  } else if (connection.base.DriverName === DATABASE_JS_TYPE[POSTGRES]) {
    return `COPY (${query}) TO '${filePath}' DELIMITER ',' CSV HEADER;`;
  } else if (connection.base.DriverName === DATABASE_JS_TYPE[SQLITE]) {
    return query;
  }
  return;
};

ConnectUtils.loadQuery = function(connection, options = {}) {
  ensure.notNull(connection.base.DriverName, 'connection driver name');
  ensure.notNull(connection.base.Database, 'connection database');
  ensure.notNull(options.tableName, 'table name');
  const table = options.tableName;
  const filePath = ConnectUtils.getOutputFilePath(table, options);
  if (connection.base.DriverName === DATABASE_JS_TYPE[MYSQL]) {
    return `LOAD DATA INFILE \'${filePath}\' INTO TABLE ${
      connection.base.Database
    }.${table} FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES TERMINATED BY '\n' IGNORE 1 ROWS;`;
  } else if (connection.base.DriverName === DATABASE_JS_TYPE[POSTGRES]) {
    return `COPY ${table} FROM '${filePath}' DELIMITER ',' CSV HEADER;`;
  }
  return;
};

${filePath} is path on EC2 instance and ${table} are the tables on AWS RDS EC2 instance. From your answers before editing my question I assume there is no way to workaround this as this script is trying to reach ${filePath} as a path on AWS RDS instance. Right?

Thanks for reading.

0

There are 0 answers