Qt's QDir: File Names Dropping Non-Ascii Characters

1.8k views Asked by At

I am having issues with QDir losing Non-Ascii characters from my file names.

I have files with names like testingöäüß.txt or exampleΦ.shp and when trying to use Qt utilities like QDir and QFile they simply show up as testing.txt and example.shp. Seems as though I cannot tell those classes what kind of encoding to use. I'm trying QDirIterator and the QDir function entryInfoList:

   QDir someDir("/home/blah");  //contains testingöäüß.txt

   QDirIterator dirIter(someDir.absolutePath(), QDir::NoDotAndDotDot | QDir::Dirs | QDir::Files);
   while(dirIter.hasNext())
   {
      QString fileName1 = QFile::decodeName(dirIter.next().toUtf8());
      std::cout << "QDirIterator Name " << fileName1.toStdString().c_str() << std::endl;
   }

   QFileInfoList fileInfoList = someDir.entryInfoList(QDir::NoDotAndDotDot | QDir::Dirs | QDir::Files);
   foreach(QFileInfo fileInfo, fileInfoList)
   {
      QString fileName1 = QFile::decodeName(fileInfo.fileName().toUtf8());
      std::cout << "entryInfoList Name " << fileName1.toStdString().c_str() << std::endl;

      QString fileName2 = QFile::decodeName(fileInfo.absoluteFilePath().toUtf8());
      std::cout << "entryInfoList Name2 " << fileName2.toStdString().c_str() << std::endl;

      QString fileName3 =  QString::fromUtf8(dirIter.fileInfo().absoluteFilePath().toStdString().c_str());
      std::cout << "entryInfoList Name3 " << fileName3.toStdString().c_str() << std::endl;
   }

Every one of those prints will lack the non-ascii characters. Seems like as soon as you try to grab the file names to loop over they will be ascii only. Anyone have any ideas on this? Or can Qt simply not handle this? Thanks!

2

There are 2 answers

2
Felix On

Qt can handle filenames with special characters. You just make them disappear somewhere in that string conversion stuff. (Which is completly unnecessary) Try it this way:

#include <QDebug>
//...
QFileInfoList fileInfoList = someDir.entryInfoList(QDir::NoDotAndDotDot | QDir::Dirs | QDir::Files);
foreach(QFileInfo fileInfo, fileInfoList)
{
    qDebug() << fileInfo.fileName();//uses qdebug
    std::cout << fileInfo.fileName().toStdWString() << std::endl;//uses a 16Bit string on normal cout
}

If you still don't see them, it's because your console settings do not allow to display them. Try to write them to a file or display them in a gui - or simply try to open a file with that name, it will work.

3
ScottG On

I know this is an old question, but I just ran into the same problem. The same exact Qt code would work fine on my development VM, but when I transferred it to an embedded Linux system (running on x86 so literally the same executable) my directory names just silently got their non-ASCII characters dropped.

Turned out the QTextCodec::codecForLocale on my dev VM was set to UTF-8, and on the embedded box it was System. If I manually changed the locale to UTF-8 before doing any filesystem operations (by calling QTextCodec::setCodecForLocale(QTextCodec::codecForName("UTF-8"))), everything started working fine.

So why was this happening in the first place? My suspicion is that in the process of slimming down the embedded system's root filesystem I might have accidentally deleted some locale-related files that Qt was using to try to auto-detect the locale. When it couldn't determine it was on UTF-8, it fell back to System, which for whatever reason is broken (maybe for the same reason it couldn't detect UTF-8 in the first place).

I need to eventually fix whatever is causing it to not auto-detect, but in the short-term just manually setting a UTF-8 locale should work if you are experiencing this same issue.

Note that this has nothing to do with whether the console can display UTF-8, or anything to do with manual conversion of UTF-16 to UTF-8! So Felix's answer to this question is not correct, at least for this particular issue. To completely remove the capability of the console from the equation, I was also simply printing the number of UTF-16 characters in the string, and every non-ASCII character actually made the returned path and filename strings from QDir::entryInfoList have one less UTF-16 character. Additionally, the dead giveaway is that the characters were simply stripped out, not just replaced with garbage or question marks or whatever.