I have used gensim.utils.simple_preprocess(str(sentence) to create a dictionary of words that I want to use for topic modelling. However, this is also filtering important numbers (house resolutions, bill no, etc) that I really need. How did I overcome this? Possibly by replacing digits with their word form. How do i go about it, though?
How do i retain numbers while preprocessing data using gensim in python?
659 views Asked by piñatabreaker At
1
There are 1 answers
Related Questions in NLP
- CSS Class is not applying to element (border width,color,and style attributes)
- How do I find the fonts that are not loading in a CORS situation ( MoovWeb )?
- Positioning child at bottom of parent with scroll
- Play multiple audio files in a slider
- How to set text over image?
- Website zoomed out on Android default browser
- Writing/Overwriting to specific XML file from ASP.NET code behind
- My navbar is not expanding after collapse
- when a checkbox is checked how to display a different hidden element using javascript
- Gaps Vertically Using Dividers
Related Questions in GENSIM
- CSS Class is not applying to element (border width,color,and style attributes)
- How do I find the fonts that are not loading in a CORS situation ( MoovWeb )?
- Positioning child at bottom of parent with scroll
- Play multiple audio files in a slider
- How to set text over image?
- Website zoomed out on Android default browser
- Writing/Overwriting to specific XML file from ASP.NET code behind
- My navbar is not expanding after collapse
- when a checkbox is checked how to display a different hidden element using javascript
- Gaps Vertically Using Dividers
Related Questions in PREPROCESSOR
- CSS Class is not applying to element (border width,color,and style attributes)
- How do I find the fonts that are not loading in a CORS situation ( MoovWeb )?
- Positioning child at bottom of parent with scroll
- Play multiple audio files in a slider
- How to set text over image?
- Website zoomed out on Android default browser
- Writing/Overwriting to specific XML file from ASP.NET code behind
- My navbar is not expanding after collapse
- when a checkbox is checked how to display a different hidden element using javascript
- Gaps Vertically Using Dividers
Related Questions in LDA
- CSS Class is not applying to element (border width,color,and style attributes)
- How do I find the fonts that are not loading in a CORS situation ( MoovWeb )?
- Positioning child at bottom of parent with scroll
- Play multiple audio files in a slider
- How to set text over image?
- Website zoomed out on Android default browser
- Writing/Overwriting to specific XML file from ASP.NET code behind
- My navbar is not expanding after collapse
- when a checkbox is checked how to display a different hidden element using javascript
- Gaps Vertically Using Dividers
Related Questions in LATENT-SEMANTIC-ANALYSIS
- CSS Class is not applying to element (border width,color,and style attributes)
- How do I find the fonts that are not loading in a CORS situation ( MoovWeb )?
- Positioning child at bottom of parent with scroll
- Play multiple audio files in a slider
- How to set text over image?
- Website zoomed out on Android default browser
- Writing/Overwriting to specific XML file from ASP.NET code behind
- My navbar is not expanding after collapse
- when a checkbox is checked how to display a different hidden element using javascript
- Gaps Vertically Using Dividers
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Popular Tags
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
You don't have to use
simple_preprocess()
- it's not doing much, it's not that configurable or sophisticated, and typically the other Gensim algorithms just need lists-of-tokens.So, choose your own tokenization - which in some cases, depnding on your source data, could be as simple as a
.split()
on whitespace.If you want to look at what
simple_preprocess()
does, as a model, you can view its Python source at:https://github.com/RaRe-Technologies/gensim/blob/351456b4f7d597e5a4522e71acedf785b2128ca1/gensim/utils.py#L288