How to read a Spanish encoded file and store it character by character?

680 views Asked by At

I have trouble reading a file and storing it on memory since it is written in Spanish, I think it could be an encoding problem. I would like to know a way to print or store each of the characters separately. I have already tried many things but the most accurate approach I have found is using the method wstring readFile(const char* filename),as showed in the code:

#include <sstream>
#include <fstream>
#include <iostream>
#include <fstream>
#include <algorithm>

std::wstring readFile(const char* filename)//Read using a file using wifstream
{
    std::wifstream wif(filename);

    std::wstringstream wss;

    wss << wif.rdbuf();
    return wss.str();
}

int main()
{
    std::wstring fileContent = readFile("read.txt"); //Read file to wstring.

    std::wcout << fileContent ; //Print the wstring. This works fine.
    std::cout << " " << std::endl;//Give spacing.

    wchar_t a; //create variable wchar_t.
    int fs = fileContent.size();
    std::cout << "Number of chars: " << fs; //Check content size.

    for (int i = 0; i < fs; i++){ //I want to print each letter.

        a = fileContent.at(i);  //Assign to "a" content of specified index.

        std::wcout << " " << a ; //Print character stored in variable a.
    }
}

It seems there is a problem when storing or printing the value of fileContent.at(i) or fileContent[i] in the variable wchar_t a. Do you know what could be improved in the code or give me a guideline to solve this problem?

I am using Macintosh and Linux, if it helps to know. Thanks!

1

There are 1 answers

0
Remy Lebeau On BEST ANSWER

You are using std::wifstream, which returns Unicode characters using wchar_t (UTF-16 or UTF-32, depending on platform), but you are not telling std::wifstream what the encoding of the source file is so that it can decode the file data from Spanish to Unicode. You need to imbue() an appropriate Spanish locale into the std::wifstream before you can start reading the file data.