I am trying to import a COVID-19 dataset into MySQL Workbench manually on MySQL 8.0 with CREATE TABLE and then LOAD DATA INFILE.
For columns such as hospital patients per million for example, the data varies in number of digits (up to 8) and number of decimal places (up to 3).
Also, data in other columns such as new cases has negative figures (when number of new cases starts falling compared to previous records).
When I run the below CREATE TABLE query, the table would populate with all columns. Then when I import the CSV file with LOAD DATA INFILE, column data type "decimal" would keep giving me errors such as Error 1366 incorrect decimal value at row 1.
So I changed it to "float" and it gave me Error 1265 data truncated at row 1. Now I've changed it to "double", as another Stack Overflow user had suggested, and still get Error 1265 Data truncated at row 1.
Here is the CREATE TABLE query:
CREATE TABLE covid_deaths (
iso_code text,
continent text,
location text,
date date,
population bigint,
total_cases int,
new_cases int,
new_cases_smoothed double,
total_deaths int,
new_deaths int,
new_deaths_smoothed double,
total_cases_per_million double,
new_cases_per_million double,
new_cases_smoothed_per_million double,
total_deaths_per_million double,
new_deaths_per_million double,
new_deaths_smoothed_per_million double,
reproduction_rate double,
icu_patients int,
icu_patients_per_million double,
hosp_patients int,
hosp_patients_per_million double,
weekly_icu_admissions double,
weekly_icu_admissions_per_million double,
weekly_hosp_admissions double,
weekly_hosp_admissions_per_million double
);
Here is the LOAD DATA INFILE query:
LOAD DATA INFILE 'CovidDeaths.csv'
INTO TABLE covid_deaths
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;
First 3 columns and first 3 rows of CSV (26 columns and 85,171 rows):
iso_code, continent, location, date, population, total_cases
AFG, Asia, Afghanistan, 2020-02-24, 38928341, 1
AFG, Asia, Afghanistan, 2020-02-25, 38928341, 1
Additional info:
I tried importing all "double" columns above with the data type "varchar" instead, and it worked. However, when I use ORDER BY DESC it would give me e.g. 99.4 followed by 887.2, as it was ordering descending by characters (9 before 8) rather than the numeric values (887.2 is bigger than 99.4).
So I need a numeric data type rather than varchar, in order to analyze the data. But I'm just mentioning that when I used "varchar" to import, all 85,171 rows were imported perfectly, and I was able to query the data as well.
My CSV file is already in the working directory.
Using the Table Data Import Wizard finishes with 0 rows affected (i.e. no data imported except the column titles). This is why I am using the CREATE TABLE then LOAD DATA INFILE route instead.
My data had a header row (column titles), so I thought this could also be a problem. I included IGNORE 1 ROWS but it didn't help. I also removed the header row from the CSV file, leaving only the data. This didn't help either.
secure-file-priv is turned off and I have privileges to edit my.ini if required.
I am new on Stack Overflow, I have done some research, but I apologize if my question is still not up to standard. Please let me know if any additional information is required. Thank you and have a pleasant day.
My first observation is that the published table does not contain all the columns in the file (at least from todays version)--it needs to. Second the file providers don't tell you the datatypes to use , which is a bit naughty. I would download the xslx file and examine that to determine appropriate datatypes.
If you don't want all the columns in the data you can discard by pushing them to user defined variables. This also a useful way of finding and/or transforming read columns see the section Input Preprocessing https://dev.mysql.com/doc/refman/8.0/en/load-data.html.
The following loaded the columns I asked for
EDIT example with empty or null values