is there a command in scala to ignore all kind of numbers, such as " IgnoreNumbers() ~> "?
I'm a scala newbie and, in fact, I only have to use one script in this language.
Thanks a lot for any help!
It's for a tokenizer from here http://nlp.stanford.edu/software/tmt/tmt-0.4/examples/example-1-dataset.scala:
val tokenizer = {
SimpleEnglishTokenizer() ~> // Remove punctuation
CaseFolder() ~> // Lowercase everything
WordsAndNumbersOnlyFilter() ~> // Ignore non-words and non-numbers
MinimumLengthFilter(3) // Take terms with >=3 characters
}
I've never used ScalaNLP, but it looks like it is trivial to modify (or better, create a new type) based on WordsAndNumbersOnlyFilter by simply removing the
Number
usage, e.g.Then: