read csv file where value contain comma in AWS athena

Question

read csv file where value contain comma in AWS athena

6.3k views Asked by Sumon At 06 September 2017 at 15:33

Hi Currently I have created a table schema in AWS Athena as follow

CREATE EXTERNAL TABLE IF NOT EXISTS axlargetable.AEGIntJnlActivityLogStaging (
  `clientcomputername` string,
  `intjnltblrecid` bigint,
  `processingstate` string,
  `sessionid` int,
  `sessionlogindatetime` string,
  `sessionlogindatetimetzid` bigint,
  `recidoriginal` bigint,
  `modifieddatetime` string,
  `modifiedby` string,
  `createddatetime` string,
  `createdby` string,
  `dataareaid` string,
  `recversion` int,
  `partition` bigint,
  `recid` bigint 
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
   'separatorChar' = ',',
   'quoteChar' = '\"',
   'escapeChar' = '\\'
   ) 
LOCATION 's3://ax-large-table/AEGIntJnlActivityLogStaging/'
TBLPROPERTIES ('has_encrypted_data'='false');

But one of the filed (processingstate) value contain comma as "Europe, Middle East, & Africa" which displace columns order.

So what would be the best way to read this file. Thanks

Original Q&A

There are 3 answers

**Cherry** · Answer 1 · 2018-01-16T11:37:20+00:00

As workaround - look at aws glue project.

Instead of creating table via CREATE EXTERNAL TABLE:

invoke get-table for your table
Then make json for create-table
Merge the following StorageDescriptor part:

{ "StorageDescriptor": { "SerdeInfo": { "SerializationLibrary": "org.apache.hadoop.hive.serde2.OpenCSVSerde" ... } ... }
perform create via aws cli. You will get this table in aws glue and athena be able to select correct columns.

Notes

If your table already defined OpenCSVSerde - they may be fixed this issue and you can simple recreate this table.
I do not have much knoledge about athena, but in aws glue you can delete or create table without any data loss
Before adding this table via create-table you have to check first how glue or/and athena hadles table duplicates

**Lior Alon** · Answer 2 · 2018-11-07T09:50:10+00:00

Lior Alon On 07 November 2018 at 09:50

When I removed this part

WITH SERDEPROPERTIES (
'separatorChar' = ',',
'quoteChar' = '\"',
'escapeChar' = '\\'
)

I was able to read quoted text with commas in it

**sydadder** · Answer 3 · 2021-04-28T23:00:14+00:00

This is a common messy CSV file situation where certain values contain commas. The solution in Athena for this is to use SERDEPROPERTIES as described in the AWS doc https://docs.aws.amazon.com/athena/latest/ug/csv-serde.html [the url may change so just search for 'OpenCSVSerDe for Processing']

Following is a basic create table example provided. Based on your data you would have to ensure that the data type is specified correctly (eg string)

CREATE EXTERNAL TABLE test1 ( f1 string, s2 string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ("separatorChar" = ",", "escapeChar" = "\") LOCATION 's3://user-test-region/dataset/test1/'

TechQA.

read csv file where value contain comma in AWS athena

There are 3 answers

Related Questions in AMAZON-WEB-SERVICES

Related Questions in AMAZON-S3

Related Questions in AMAZON-ATHENA

Popular Questions

Trending Questions