Attempting to use UnstructuredURLLoader
but getting a 'libmagic is unavailable'.
I have:
- Install langchain
- Install unstructured libmagic python-magic python-magic-bin
- Install python-magic-bin==0.4.13
- python_magic-0.4.13-py2.py3-none-any.whl (I even tried other versions). I am on an AMD64 windows machine.
- Uninstalled and reinstalled.
- Google, ChatGTP, similar issues on stackoverflow for answers.
Code:
from langchain.document_loaders import UnstructuredURLLoader
loader = UnstructuredURLLoader(
urls = [
"https://www.moneycontrol.com/news/business/banks/hdfc-bank-re-appoints-sanmoy-chakrabarti-as-chief-risk-officer-11259771.html",
"https://www.moneycontrol.com/news/business/markets/market-corrects-post-rbi-ups-inflation-forecast-icrr-bet-on-these-top-10-rate-sensitive-stocks-ideas-11142611.html"
]
)
data = loader.load()
len(data)
Error:
libmagic is unavailable but assists in filetype detection on file-like objects. Please consider installing libmagic for better results.
Error fetching or processing https://www.moneycontrol.com/news/business/banks/hdfc-bank-re-appoints-sanmoy-chakrabarti-as-chief-risk-officer-11259771.html, exception: Invalid file. The FileType.UNK file type is not supported in partition.
libmagic is unavailable but assists in filetype detection on file-like objects. Please consider installing libmagic for better results.
Error fetching or processing https://www.moneycontrol.com/news/business/markets/market-corrects-post-rbi-ups-inflation-forecast-icrr-bet-on-these-top-10-rate-sensitive-stocks-ideas-11142611.html, exception: Invalid file. The FileType.UNK file type is not supported in partition.
Resolution: The path to the libmagic.dll folder in the venv has to be added to system variables.
In my instance: D:\ds_projects\code-basic-LLM-finance-domain.venv\Lib\site-packages\magic\libmagic
For others, it will likely be: your_path\ .venv\Lib\site-packages\magic\libmagic