regex for access log in hive serde with newline

Question

regex for access log in hive serde with newline

519 views Asked by Kuem Hyang Park At 04 February 2020 at 01:33

With aws athena services, I try to import csv file including new line data

Importing data uses hive serde format.

If data is like this, (each data is enclosed in double quotes. "")

"DataA"|"DataB"|"DataC"
"Data1"|"Data2
with new line"|"Data3"
"Data가"|"Data2나"|"Data나"

then how to write regular expressions to below table DDL?

CREATE EXTERNAL TABLE ssdm_schema.ABCTable_regex (  
  Data_A VARCHAR(100)  
, Data_B VARCHAR(100)  
, Data_C VARCHAR(100)  
) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'  
WITH SERDEPROPERTIES (  
"input.regex" = '?????????'  
)

I'm asking to this question referring to the following answer.

How to handle embed line breaks in AWS Athena

Thank you

Original Q&A

There are 1 answers

**Freiheit** · Answer 1 · 2020-02-04T02:09:45+00:00

Solved it. https://regex101.com/r/bYF1Zm/3

"([\w\s\n\r\X]+?)"\|"([\X\w\s\n\r]+?)"\|"([\X\w\s\n\r]+?)" with the global and unicode flags set.

There were three things making this tricky:

The line break in the middle of the data on line 2
The white space in the middle of the data on line 2
The unicode

This regex can probably be more succinct because the matching pattern repeats.

TechQA.

regex for access log in hive serde with newline

There are 1 answers

Related Questions in REGEX

Related Questions in HIVE

Related Questions in NEWLINE

Related Questions in HIVE-SERDE

Related Questions in REGEXSERDE

Popular Questions

Popular Tags

Trending Questions