I need to validate the file with respect to the data types. I have a file with below data,
data.csv
Col1 | Col2 | Col3 | Col4
100 | XYZ | 200 | 2020-07-11
200 | XYZ | 500 | 2020-07-10
300 | XYZ | 700 | 2020-07-09
I have another file having the configurations,
Config_file.txt
Columns = Col1|Col2|Col3|Col4
Data_type = numeric|string|numeric|date
Delimiter = |
I have to compare the configuration file and data file and return a result.
For example:
In configuration file data_type of Col1 is numeric. In case if i get any string value in Col1 in data file, the script should return Datatype Mismatch Found in Col1
. I have tried with awk
, if its one line item its easy to get it done by defining the position of the columns. But am not sure how to loop entire file column by column ad check the data.
I have also tried providing the patterns and achieve this. But am unable to validate complete file. Any suggestion would be helpful.
awk -F "|" '$1 ~ "^[+-]?[0-9]+([.][0-9]+)?$" && $4 ~ "^[+-]?[0-9]+([.][0-9]+)?$" && length($5) == 10 {print}' data.csv
The goal is to compare the data file (data.csv) and Data_Type in config file(Config_file.txt) for each column and check if any column is having datatype mismatch.
For example, consider below data
Col1 | Col2 | Col3 | Col4
100 | XYZ | 200 | 2020-07-11
ABC | XYZ | 500 | 2020-07-10 -- This is incorrect data because Col1 is having string value `ABC`, in config file, the data type is numeric
300 | XYZ | 700 | 2020-07-09
300 | XYZ | 700 | 2020-07-09
300 | XYZ | XYZ | 2020-07-09 -- Incorrect Data
300 | 300 | 700 | 2020-07-09
300 | XYZ | 700 | XYX -- Incorrect Data
The data type provided in config table is as below,
Columns = Col1|Col2|Col3|Col4
Data_type = numeric|string|numeric|date
The script should echo the result as Data Type Mismatch Found in Col1
Here is a skeleton solution in GNU awk. In lack of sample output I improvised:
Sample output: