I got this "incomplete final line found by readTableHeader" error message when using read.delim() to read in a tab-delimited text file. There are Traditional Chinese characters in the header and content, so I am already using alternative encoding, like this:
kg = read.delim("KG_EDB_20150505.csv",fileEncoding="UTF-16LE")
Warning message:
In read.table(file = file, header = header, sep = sep, quote = quote, :
incomplete final line found by readTableHeader on 'KG_EDB_20150505.csv'
I have read other posts with similar issues, e.g.:
'Incomplete final line' warning when trying to read a .csv file into R In read.table(): incomplete final line found by readTableHeader
But unfortunately the suggested solutions in these posts cannot solve the problem.
A summary of what were tried etc:
- Pressing ENTER at the last line of the text file: same error
- Trimming the text file into header + 1 single of data, then make sure there is a new line (ENTER) between the line for header and the content: same error
- Trimming the text file until only the header is left, then copy&paste the header onto the next line and use it to pretend as a line of data. Add a new line (ENTER) after the fake line of data: WORKS! Chinese is all garbage, but then I do not need those anyway.
- Remove the trailing new line (ENTER) in #3: same error, but can read 1 line of fake data into the data.frame.
- Open in Excel directly: works, but not the workflow I want.
So what gives?
Is there a way I can read in such file?
or
Is there a way to massage the file (preferably in R) and then read it in?
The file is here:
https://dl.dropboxusercontent.com/u/5860015/KG_EDB_20150505.csv
It was from a government webpage here:
http://www1.map.gov.hk/gih3/view/index.jsp
(Map Tools > Data Download > Kindergarten-cum-child Care Centres)
Many thanks in advance!
Update:
By a stroke of luck, I isolated an offending character inside the text file, namely this Chinese character "稚". It may not be the only one, but if I add it to the file in #3, same error again. I do not know what is special about this character and I do no need any info in the text file in Chinese anyway.
So now there are more questions:
- Is there a way to skip reading this offending character?
or
- Is there a way in R to replace this offending character in the file, before reading in the text file?
It's full of Chinese characters (every other field in fact).
First line:
"ENGLISH CATEGORY" "中文類別" "ENGLISH NAME" "中文名稱" "ENGLISH ADDRESS" "中文地址" "LONGITUDE" "經度" "LATITUDE" "緯度" "EASTING" "坐標東" "NORTHING" "坐標北" "STUDENTS GENDER" "就讀學生性別" "SESSION" "學校授課時間" "DISTRICT" "分區" "FINANCE TYPE" "資助種類" "SCHOOL LEVEL" "學校類型" "OPENING HOURS" "開放時間" "TELEPHONE" "聯絡電話" "FAX NUMBER" "傳真號碼" "EMAIL ADDRESS" "電郵地址" "WEBSITE" "網頁" "RELIGION" "宗教"
And my editor thinks it is UTF-16 and that it is "Little Endian".
Unless you are set up with the right fonts and understand the ins and outs of encodings, it is much easier to use an external editor, especially since you say you do not want the info that is in the Chinese fields. I succeeded with the TextWrangler editor from Bare Bones Software. It's the free version of their more full featured editor, but it has the capacity to remove non-ASCII characters and save as UTF-8 encoded file.
The fields that had Chinese in the header are all now blank. It's NOT a csv file.... no commas. If I were doing it again for myself I'd use
stringsAsFactors =FALSE
It's also possible to input the file with the correct encoding. This works on the original file with no editing at all: