Python gcld3 installation on Windows 11

403 views Asked by At

In midst of writing a script in python I came across a problem I still cannot solve. The problem being accurate language detection of latin, specifically medical-latin terms. I understand google created a python module for language detection called gcld3, however I cannot get it to install correctly...

OS --> Microsoft Windows 11 Home, version 10.0.22621 N/A Build 22621

Python --> 3.12.0

To fix my main problem of latin language detection I tried using the following libraries:

However, none of them were precise enough to correctly detect the medical latin terms. So I needed a more robust solution and using the google's module seemed like a fair attempt to solve the problem.

It is day 2 now and I still don't understand how and why the gcld3 module fails to install...

First of all, I needed to install something called a "protobuf compiler", which I am pretty sure is the protoc.exe which can be found in the protoc.zip folder here.

After I've successfully downloaded the protoc.exe and added its path to env variables I tried running pip install gcld3 again, however this error appeared:

fatal error C1083: Cannot open include file: 'google/protobuf/port_def.inc': No such file or directory

I searched around online and it seems to be a bug? Despite that, I tried to fix it...

First of all I installed the protocol buffers using CMAKE. Just a quick disclaimer, I have no idea what protocol buffers are and how they work or how they are structured, I just know it is somehow connected with gcld3 I followed these instructions, I have no idea if I was supposed to do that, but I still installed it. After the installation I found myself which a pretty large protobuf folder with a lot of different subfolders for various programming languages.

I tried running pip install gcld3 again and nothing, the same error as before...

After that I literally looked for the port_def.inc file and found it in the src subdirectory of my protobuf installation (the one I installed with CMAKE) directory (more specifically it was in src/google/protobuf/). So I decided to move all the contents of the src folder to the Python/include folder.

So it seemed like that fixed the issue, however NOW it said that the absl folder was missing... So I yet again looked for it in my CMAKE installation of protobuf and moved it to Python/include. After that, yet another error popped up saying that it cannot open protobuf.lib :

LINK : fatal error LNK1181: cannot open input file 'protobuf.lib'

That is where I gave up...

Can anyone please explain to me step by step how am I supposed to install this module? And more importantly if it is a good solution for the medical-latin detection. I will be extremely grateful for every answer provided!

2

There are 2 answers

1
Ivan Campelo On

I understand that you are still encountering the "Cannot open include file: 'google/protobuf/port_def.inc'" error when trying to install gcld3, even after installing protobuf. This error typically occurs when the installation process is unable to locate the necessary protobuf headers.

To address this issue, you can try the following steps:

  1. Verify Protobuf Installation: Ensure that Protobuf (protobuf compiler) is correctly installed and available in your PATH. You mentioned that you downloaded protoc.exe, but please make sure it's in a directory that is included in your system's PATH environment variable. You can check this by running the following command in your command terminal:

    protoc --version

If Protobuf is installed and in your PATH, it should display the version number. If it doesn't, you might need to add the directory containing protoc.exe to your PATH or reinstall Protobuf.

  1. Clear Cache and Reinstall Dependencies: Sometimes, issues like this can be caused by cached dependencies. You can try clearing the pip cache and then reinstalling the dependencies and gcld3:

    pip cache purge pip install cython pip install git+https://github.com/abseil/abseil-py pip install protobuf pip install gcld3

  2. Verify Python Version:

    python --version

If it's not Python 3.12.0, make sure to activate or use that specific Python version.

  1. Check Compiler and IDE: Ensure that you are using a compatible C/C++ compiler and IDE (if applicable) that matches your Python version and architecture (e.g., 32-bit or 64-bit). Make sure that your IDE and build environment are configured correctly.

  2. Verify System Environment: Ensure that your system environment variables (such as PATH) are correctly configured and that there are no conflicting versions of Protobuf or other libraries that may interfere with the installation.

  3. Consider Virtual Environments: If you're still facing issues, consider using a virtual environment for your project. Virtual environments can help isolate dependencies and prevent conflicts with other Python packages.

If you continue to experience issues after trying these steps, it's possible that there may be specific compatibility problems with the versions of the libraries you are using. In such cases, you may need to wait for updates or seek help from the library's maintainers or community for troubleshooting.

Additionally, please note that using gcld3 for accurate detection of Latin medical terms may not be the most suitable approach, as it is a general language detection library and may not be optimized for specialized tasks like medical terminology. You may want to explore other natural language processing (NLP) techniques or models tailored to medical text analysis if precise detection is critical.

If it doesn't work maybe you should try using another technology. Some libs may be obsolete and perhaps this gcld3 is one of them. There are others that may solve your problem, such as:

  1. LangDetect: A Python library for language detection that is similar to Gcld3. It allows you to detect the predominant language in a text.

  2. TextBlob: A Python library that provides natural language processing (NLP) functionality, including language detection.

  3. NLTK (Natural Language Toolkit): A widely used Python library for natural language processing that includes features for language detection.

  4. Spacy: Another natural language processing Python library that includes support for language detection in addition to many other NLP features.

  5. fastText: A Facebook-developed library for natural language processing that includes pre-trained models for language detection.

  6. Polyglot: A Python library for NLP that supports language detection, among other features.

0
babay On
  1. If compiler can't locate protoc.exe, find one and add it to PATH. If you don't have protoc.exe, you can install it with vcpkg. For example, mine is located at I:\dev\vcpkg\installed\x64-windows\tools\protobuf.

  2. If compiler can't locate google/protobuf/port_def.inc, set environment variable LIBPATH to a folder, where google/protobuf/port_def.inc For example,

    set LIBPATH=I:\dev\vcpkg\installed\x64-windows\lib

  3. If compiler can't locate protobuf.lib... it's weird. There is almost no info on protobuf.lib on the internet, but there IS info on libprotobuf.lib, and it looks like it's the file we need. Find it, copy to protobuf.lib and set environment variable LIB to a folder, where it's located. For example,

    set LIB=I:\dev\vcpkg\installed\x64-windows\lib

  4. CLD3 requires protobuf>=3.20.0

  5. Of course, all the files should be of the same protobuf installation.

  6. it looks like a bug in gcld3 windows build script

and... it still not works ( (