Stripping out newline characters from input JSON string when using COPY Command in Postgres

109 views Asked by At

I have a set of code that is importing a JSON file provided into a temp table as follows:

    DROP TABLE IF EXISTS tmp;
    CREATE TEMP  table tmp (c JSONB );
    
    sqltxt:='COPY tmp from '''||filepath||''' with (FORMAT TEXT, DELIMITER ''~'')';

Now in order to process this file sucessfully , I have to manually strip out the newline characters in the incoming file.

I would like to do this in my postgres code by using a function know as regexp_replace

I am struggling to do this

This is my attempt at this :

    DROP TABLE IF EXISTS tmp;
    CREATE TEMP  table tmp (c JSONB );

    -- Populate temp table with incoming JSON
    sqltxt:='COPY translate(tmp, E''\n,'', '''') from '''||filepath||''' with (FORMAT TEXT,     DELIMITER ''~'')';
    EXECUTE sqltxt;

this produceds the following code which when run produces the error:


COPY translate(tmp, E'\n,', '') from 'C:\ChrisDev\Readings\14.json' with (FORMAT TEXT, DELIMITER '~')

[42601] ERROR: syntax error at or near "E'\n,'"

How can I run this code to strip out all newline characters fr

4

There are 4 answers

0
ChrisAsi71 On BEST ANSWER

I have found a solution for this through running a docker container

So I had to do the following:

  1. Run up a Powershell command as Administrator.

  2. Change folder to where the JSON files are stored eg : C:\ChrisDev\Readings

  3. Run the following command to mount the folder into the Docker Container: docker run -it -v .:/files/ ubuntu

  4. Now run the command to strip out Newline characters on the file

    sed -i -z -e 's/\r\n/ /g' '/files/926371_20230120.json'

Now here what im doing is /\r\n/ is replacing all occurances of Carriage Return (\r) and Line Feed (\n) with an empty space.

Once this is run, then i can run my JSON file through my program with not issues.

3
Omar Faruque On

-- Create a temporary table DROP TABLE IF EXISTS tmp; CREATE TEMP TABLE tmp (c JSONB);

-- Populate the temporary table with incoming JSON, replacing newline characters sqltxt := 'COPY translate(tmp, c) FROM ''' || filepath || ''' WITH (FORMAT TEXT)'; EXECUTE sqltxt;

-- Replace newline characters in the JSONB column using regexp_replace UPDATE tmp SET c = regexp_replace(c::TEXT, E'\n', '', 'g')::JSONB;

-- Now, your temporary table (tmp) contains JSON data with newline characters removed

2
Laurenz Albe On

You cannot do that in PostgreSQL. The only way to do something like that would be to call a program that modifies the file before you feed it to COPY:

COPY tmp FROM PROGRAM $$sed -z -e 's/\n/ /g' '/path/to/file'$$;

That example is for Linux. On other operating systems, you may need to call different programs.

0
Omar Faruque On

The error message indicates that the sed command failed to execute properly. The exit code 1 suggests that there might be an issue with the command syntax or the input file itself.

Here are some potential causes for this error:

  1. Invalid Path:

    Ensure that the path to the file C:\ChrisDev\Readings\14.json is correct and accessible by the sed program.

  2. Syntax Error:

    Double-check the syntax of the sed command for any typos or missing characters. The syntax you provided seems correct, but it's always good to double-check.

  3. Invalid Input File:

    Verify that the file 14.json is a valid JSON file and doesn't contain any unexpected characters or formatting that might cause issues with the sed command.