I'm new to NLP. I am looking for recommendations for an Annotation tool to create a labeled NER dataset from raw texts.
In details:
I'm trying to create a labeled data set for specific types of Entities in order to develop my own NER project (rule based at first). I assumed there will be some friendly frameworks that allows create tagging projects, tag text data, create a labeled dataset, and even share projects so several people could work on the same project, but I'm struggling to find one (I admit "friendly" or "intuitive" are subjective, yet this is my experience).
So far I've tried several Frameworks:
- I tried LightTag. It makes the tagging itself fast and easy (i.e. marking the words and giving them labels) but the entire process of creating a useful dataset is not as intuitive as I expected (i.e. uploading the text files, split to different tagging objects, save the tags, etc.)
- I've installed and tried LabelStudio and found it less mature then LightTag (don't mean to judge here :))
- I've also read about spaCy's Prodigy, which offers a paid annotation tool. I would consider purchasing it, but their website only offers a live demo of the the tagging phase and I can't access if their product is superior to the other two products above.
Even in StackOverflow the latest question I found on that matter is over 5 years ago.
Do you have any recommendation for a tool to create a labeled NER dataset from raw text?
⚠️ Disclaimer
I am the one of the founders of Labellerr and LabelGPT. So take my answer with a pinch of a salt. Because we have a reputation in the space so i would ensure that i tell my perspective of reality.
Finding the right tool for labeling data in AI projects, especially for naming things in text (NER), can be tricky. I started building Labellerr because, while leading AI teams, I faced the same problems it aims to solve.
The thing is, AI projects today deal with lots of different challenges. When it comes to choosing a labeling tool, speed, accuracy, and cost are big concerns for AI teams. But one tool doesn't fit all industries. For example, labeling a conversation between a doctor and a patient is different from sorting out insurance claims in healthcare. And it gets even more specific based on where and what kind of medical field you're in.
These tools that help label data have come a long way in the last few years, especially as AI technology has grown. They try to make labeling faster by automating some parts, but how much they can automate really depends on past experiences with similar data.
The tools need to be flexible because industries and data types are so different. Labellerr, the tool I helped create, was made to be customized for different needs. It's all about making it easier for AI teams to label data the way they need to, considering the unique challenges of their specific projects. Hardly the case happens when we get new users onboard and continue to scale and still no need to customize some bit of it whether it is some look and feel or some button here and there which might save time.
The search for the right tool is about finding something that can adapt to different needs, fits the budget, and can keep up with the ever-changing world of AI.
Hope you must have got some tool by now. But only solution is to try out all of them and see where its the easiest fit for your use case.
Hope this makes sense!