Goal:
Upload / post CSV file w/ UTF-8 characters to an MVC action, read the the data and stick it in a database table.
Problem:
Only the plain text characters make it through. UTF-8 "special" characters like á are not coming through correctly, in code and in the database they render as this character => �.
More:
I'm convinced that this isn't a problem with my C# code although I've included the important parts below.
I thought the problem was that the uploaded file is encoded a plain text or "plain/text" MIME type, but I was able to change that by changing the file extension to .html
Summary:
How do you get a form with an enctype attribute set to "multipart/form-data" to correctly interpret UTF-8 characters in a posted file?
Research:
From my research this appears to be a common problem without a common and clear solution.
I've found more solutions for java and PHP than .Net as well.
csvFile variable is of type HttpPostedFileBase
this is the MVC action signature
[HttpPost]
public ActionResult LoadFromCsv(HttpPostedFileBase csvFile)
Things I've tried:
1)
using (Stream inputStream = csvFile.InputStream)
{
byte[] bytes = ReadFully(inputStream);
string bytesConverted = new UTF8Encoding().GetString(bytes);
}
2)
using (Stream inputStream = csvFile.InputStream)
{
using (StreamReader readStream = new StreamReader(inputStream, Encoding.UTF8, true))
{
while (!readStream.EndOfStream)
{
string csvLine = readStream.ReadLine();
// string csvLine = new UTF8Encoding().GetString(new UTF8Encoding().GetBytes(readStream.ReadLine())); // stupid... this can not be the way!
}
}
}
3)
<form method="post" enctype="multipart/form-data" accept-charset="UTF-8">
4)
<input type="file" id="csvFile" name="csvFile" accept="UTF-8" />
<input type="file" id="csvFile" name="csvFile" accept="text/html" />
5)
When the file has a .txt extension, the ContentType property of the HttpPostedFileBase is "text/plain"
When I change the file extension from .txt to .csv the ContentType property of the HttpPostedFileBase is "application/vnd.ms-excel"
When I change the file extension to .html, the ContentType property of the HttpPostedFileBase is "text/html" - I thought this was going to be a winner, but it wasn't.
In my soul I have to believe there is an easy solution to this problem. It surprises me that I haven't been able to figure this one out on my own, uploading UTF-8 characters in a file is a common task! Why am I failing here?!?!
Perhaps I have to adjust mime types in IIS for the website?
Perhaps I need different DOCTYPE / html tag / meta tags?
@Gabe -
Here is what my post looks like in fiddler. This is really interesting because the � is plain as day, right there in the post value.
http://localhost/AwesomeGeography/GeoBytesCities/LoadFromCsv?adsf HTTP/1.1
Host: localhost
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Referer: http://localhost/AwesomeGeography/GeoBytesCities/LoadFromCsv?adsf
Content-Type: multipart/form-data; boundary=---------------------------199122566726299
Content-Length: 354
-----------------------------199122566726299
Content-Disposition: form-data; name="csvFile"; filename="cities_test.html"
Content-Type: text/html
"CityId","CountryID","RegionID","City","Latitude","Longitude","TimeZone","DmaId","Code"
3344,10,1063,"Luj�n de Cuyo","-33.05","-68.867","-03:00",0,"LDCU"
-----------------------------199122566726299--
Based on the information given, I would guess that the problem is with the file encoding itself - not with your code.
I ran a simple test to demonstrate this:
I exported a simple csv file from Excel containing special characters.
Then, I uploaded it through the following form and action method.
Form
Action method
I had the same problem as you in this case - the special characters were replaced with �.
I opened the file in Notepad and the special characters were displayed correctly there, so it seemed that it couldn't be a file problem, but when I opened the "Save As" dialog, the selected encoding was "ANSI". I switched it to UTF-8 and saved it, ran it through the uploader, and it all worked fine.