How to import Unicode csv file?

946 views Asked by At

I have unicode csv file:

LabelName,Label1,Label2,SpeciesLabel,Group,Subgroup,Species
التسمية 1,Group 1,Subgroup 1,Species 1,1,1,1
التسمية 2,Group 1,Subgroup 1,Species 1,1,1,1
التسمية 3,Group 1,Subgroup 1,Species 1,1,1,1

I want to read it into R, and I used this command:

Data = read.csv("Data.csv", encoding="UTF-8", fileEncoding = "UTF-8")

But I got this error:

Error in read.table(file = file, header = header, sep = sep, quote = quote,  : 
  empty beginning of file
In addition: Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  invalid input found on input connection 'Data.csv'
2: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  line 1 appears to contain embedded nulls
3: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  incomplete final line found by readTableHeader on 'Data.csv'

How can I read unicode (with arabic letters) csv file in R.

Thanks!

1

There are 1 answers

0
Artem On

You can read the file using readLines with the argument warn = FALSE, then execute read.csv with text argument as below:

arabic <- readLines("arabic.csv", warn = FALSE, encoding = "UTF-8")
Data = read.csv(text = arabic)
str(Data)

Output:

'data.frame':   3 obs. of  7 variables:
 $ X.U.FEFF.LabelName: Factor w/ 3 levels "التسمية 1","التسمية 2",..: 1 2 3
 $ Label1            : Factor w/ 1 level "Group 1": 1 1 1
 $ Label2            : Factor w/ 1 level "Subgroup 1": 1 1 1
 $ SpeciesLabel      : Factor w/ 1 level "Species 1": 1 1 1
 $ Group             : int  1 1 1
 $ Subgroup          : int  1 1 1
 $ Species           : int  1 1 1