Langchain UnstructuredURLLoader shows Libmagic Unavailble

378 views Asked by At

Attempting to use UnstructuredURLLoader but getting a 'libmagic is unavailable'.

I have:

  • Install langchain
  • Install unstructured libmagic python-magic python-magic-bin
  • Install python-magic-bin==0.4.13
  • python_magic-0.4.13-py2.py3-none-any.whl (I even tried other versions). I am on an AMD64 windows machine.
  • Uninstalled and reinstalled.
  • Google, ChatGTP, similar issues on stackoverflow for answers.

Code:

from langchain.document_loaders import UnstructuredURLLoader
loader = UnstructuredURLLoader(
    urls = [
        "https://www.moneycontrol.com/news/business/banks/hdfc-bank-re-appoints-sanmoy-chakrabarti-as-chief-risk-officer-11259771.html",
        "https://www.moneycontrol.com/news/business/markets/market-corrects-post-rbi-ups-inflation-forecast-icrr-bet-on-these-top-10-rate-sensitive-stocks-ideas-11142611.html"
    ]
)
data = loader.load()
len(data)

Error:

libmagic is unavailable but assists in filetype detection on file-like objects. Please consider installing libmagic for better results.
Error fetching or processing https://www.moneycontrol.com/news/business/banks/hdfc-bank-re-appoints-sanmoy-chakrabarti-as-chief-risk-officer-11259771.html, exception: Invalid file. The FileType.UNK file type is not supported in partition.
libmagic is unavailable but assists in filetype detection on file-like objects. Please consider installing libmagic for better results.
Error fetching or processing https://www.moneycontrol.com/news/business/markets/market-corrects-post-rbi-ups-inflation-forecast-icrr-bet-on-these-top-10-rate-sensitive-stocks-ideas-11142611.html, exception: Invalid file. The FileType.UNK file type is not supported in partition.
3

There are 3 answers

0
Gregory Morris On BEST ANSWER

Resolution: The path to the libmagic.dll folder in the venv has to be added to system variables.

In my instance: D:\ds_projects\code-basic-LLM-finance-domain.venv\Lib\site-packages\magic\libmagic

For others, it will likely be: your_path\ .venv\Lib\site-packages\magic\libmagic

0
Vijayaraghavan Sundararaman On

If you using a Mac you can use Homebrew to install it

brew install libmagic
0
Sujit Roy On

There is library version support related issues for UnstructuredURLLoader in libmagic. You can use SeleniumURLLoader() instead of UnstructuredURLLoader(). For you above code you can modify your code like below:

from langchain.document_loaders import SeleniumURLLoader
loader = SeleniumURLLoader(
urls = [
    "https://www.moneycontrol.com/news/business/banks/hdfc-bank-re-appoints-sanmoy-chakrabarti-as-chief-risk-officer-11259771.html",
    "https://www.moneycontrol.com/news/business/markets/market-corrects-post-rbi-ups-inflation-forecast-icrr-bet-on-these-top-10-rate-sensitive-stocks-ideas-11142611.html"
])

data = loader.load()
len(data)