I used iTextSharp for reading pdf file. i can read the english text, but for chinese i am getting question marks, how can i read chinese characters using iTextSharp.
coverNoteFilePath = @"D:\Temp\cc8a12e6-399a-4146-81ac-e49eb67e7e1b\CoverNote.pdf";
try
{
PdfReader reader = new PdfReader(coverNoteFilePath);
for (int page = 1; page <= reader.NumberOfPages; page++)
{
ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy();
String s = PdfTextExtractor.GetTextFromPage(reader, page, its);
s = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(s)));
coverNoteContent = coverNoteContent + s;
}
reader.Close();
Response.Write(coverNoteContent);
}
Try replacing
ASCIIEncoding
with one of the other encoding classes (UTF8Encoding
for example). I imagine PDF documents know which encoding they use so you might be able to find the correct one in thePdfReader
object. Worth checking.From the MSDN: