I am working on a program that I want to filter out some words, with nltk style of removing the stopwords as follows:
def phrasefilter(phrase):
phrase = phrase.replace('hi', 'hello')
phrase = phrase.replace('hey', 'hello')
phrase = re.sub('[^A-Za-z0-9\s]+', '', phrase.lower())
noise_words_set = ['of', 'the', 'at', 'for', 'in', 'and', 'is', 'from', 'are', 'our', 'it', 'its', 'was', 'when', 'how', 'what', 'like', 'whats', 'now', 'panic', 'very']
return ' '.join(w for w in phrase.split() if w.lower() not in noise_words_set)
Is there a way of doing this on web2py DAL.
db.define_table( words,
Field(words1, REQUIRES IS_NOT_NULL(), REQUIRES....
I want to put it in the REQUIRES IS_NOT_IN_NOISE_WORDS_SET() constraints for example. Is this possible? Am working on a user input( with strings saved to the db) where it automatically deletes the stopwords I have chosen instead of the using the snippet shown above.
You have several options. First, you can create a custom validator that simply acts as a filter. A validator takes a value and returns a tuple including the (possibly transformed) value and either
None
or an error message (in this case, we want to returnNone
as the second element of the tuple given that we are only transforming the value but not checking for errors).Note, the
IS_NOT_EMPTY
validator comes after the filtering to ensure the post-filtered input is not empty.Another option would be to do the filtering via the filter_in attribute of the field:
The advantage of using
filter_in
is that it applies to all inserts and updates (made via the DAL), whereas a validator would only be applied when using SQLFORM (or when explicitly calling the special.validate_and_insert
and.validate_and_update
methods). The disadvantage offilter_in
is that the filter is applied after any validators, soIS_NOT_EMPTY
would run on the pre-filtered input.Finally, rather than filtering the input before storing it, you might consider storing the original input and then either storing the filtered input in a separate computed field or using a virtual field.