Please suggest me a downloadable English corpus that contains informal, playful words such as 'gonna', 'LOL' and 'wanna'
Is there a downloadable corpus (dictionary/ lexicon) for informal, playful words such as 'gonna', 'LOL', 'wanna' in English?
344 views Asked by AudioBubble At
2
There are 2 answers
0
clemtoy
On
I don't know such lexicon but you can try to do this, alternatively:
- Get the vocabulary V1 of Twitter or other web and chat corpus.
- Get the vocabulary V2 of literary corpus.
The lexicon you want might be V1 \ V2 i.e. all the words of V1 which are not in V2.
Using Python, NLTK provides corpora (see nltk.corpus.webtext). Moreover, as @mbatchkarov said in the comments: Twitter is full of informal language.
Related Questions in NLP
- Seeking Python Libraries for Removing Extraneous Characters and Spaces in Text
- Clarification on T5 Model Pre-training Objective and Denoising Process
- The training accuracy and the validation accuracy curves are almost parallel to each other. Is the model overfitting?
- Give Bert an input and ask him to predict. In this input, can Bert apply the first word prediction result to all subsequent predictions?
- Output of Cosine Similarity is not as expected
- Getting an error while using the open ai api to summarize news atricles
- SpanRuler on Retokenized tokens links back to original token text, not the token text with a split (space) introduced
- Should I use beam search on validation phase?
- Dialogflow failing to dectect the correct intent
- How to detect if two sentences are simmilar, not in meaning, but in syllables/words?
- Is BertForSequenceClassification using the CLS vector?
- Issue with memory when using spacy_universal_sentence_encoder for similarity detection
- Why does the Cloud Natural Language Model API return so many NULLs?
- Is there any OCR or technique that can recognize/identify radio buttons printed out in the form of pdf document?
- Model, lexicon to do fine grained emotions analysis on text in r
Related Questions in CORPUS
- Why are SST-2 and CoLA commonly used datasets for debiasing?
- Can log2 be substituted with ln in logDice association measure in R?
- Error In tokenizer.train(): Exception: No such file or directory (os error 2)
- What is the Regex in sketch engine's concordance for space inside CQL
- Changing legend title in ggpattern R
- Binding the rows of two quanteda corpus with same docvars
- Finding word frequency of wordlist with multiple word-chunks
- Unable to edit metadata in corpus
- Searching for specific words in Corpus with R (tm package)
- Recommended way to extract "the representative" (not necessarily most frequent) 4-grams in a corpus? TF-IDF or
- Docvarsfrom = filenames error message in Quanteda in R: "Filename elements are not equal in length"
- URLError: [WinError 10060] | When trying to install wordnet through Anaconda Jupyter (python)
- Why is the text in the files I am concatenating in Powershell coming out altered?
- Does it make sense to have less than 30 documents and more than 10000 words in Latent Dirichlet Allocation?
- I am trying to create a corpus using pdf documents
Related Questions in LINGUISTICS
- Likert scale study - ordinal regression model
- Automatic Word Boundary Detection for German
- Can log2 be substituted with ln in logDice association measure in R?
- Using numeric column in dataframe within formula in R
- Query Wikidata via SPARQL to get specific word etymology from Wiktionary
- What does "assign A to B" mean?
- Problems with reproducing the training of the spaCy pipeline
- In NLTK, how to generate a sample of sentences from PCFG, respecting the probabilities
- LLM Content Generation in Non-English Languages
- In R, is it possible to create a random list of words for a speech stream (exposure) where I give it the syllable triplets I need (psycholinguistics)?
- Weighted Distance Matrix for QWERTZ Keyboard for Levenshtein Distance Algorithm
- How to develop a corpus(corpus analysis)
- How do I study linguistic features of NLP libraries like spacy/NLTK in-depth?
- Tool for detecting differences between text passages from two different groups
- R - readtext and list of .xml files
Related Questions in LEXICON
- Extract lexicons from trained model in sentiment analysis
- Rank in given order
- Function error when counting positive words row
- After creating a build from expo, the Apk needs metro bundle to run. What am I doing wrong?
- How to use a custom NRC-style lexicon on Syuzhet for R?
- R: Counting frequency of words in a character column
- Logistic Regression and Sentiment Analysis
- "'utf-8' codec can't decode byte 0xf3" while performing the sentiment lexicon
- R: find words from tweets in Lexicon, count them and save number in dataframe with tweets
- Find the number of positive and negative words in a text using a Lexicon,
- Does sentimentr package account for number of words in sentence and number of sentence in paragraph?
- Issues with using lexicon on Azure Cognitive services (text-to-speach) from python
- Question about how the sentimentr lexicon dictionary was built
- R sentiment analysis; 'lexicon' not found;
- Change value of words in bing lexicon
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Use 'NetLingo'. They have a rich content :)