Alright, so I am making a Discord bot, and I'm trying to set parameters for words that aren't allowed to be said in the server (You know the kind, slurs and the like). So, I put a slur (I'll just use the word "dog" as an example), but it was only lowercase. So basically, it's just "dog", and if someone says "dog" in chat, then their message will be deleted, and the bot sends them a message through DMs. But, if someone were to say "Dog", then they wouldn't get their message deleted. What should I add to the code to make sure that all variations of the slur get picked up?

I'm incredibly new to any form of coding, and I have gotten a ton of help from my friends to make this bot, so I really have no clue about what I'm doing.

(if you want to see the code, here it is. I replaced all the slurs with words, but I think you get the gist):

   "bannedWords":[
       "apple",
      "dog",
      "bird",
      "cat"
   ],
   "code":""
}

I expect the words "dog", "Dog", "DOg", "DOG", "dOG", "doG", "DoG", "dOg", etc. to be identified instead of just "dog".

1 Answers

0
Nick LeBlanc On

This is actually a very complex question.
The most simple approach would be creating an array of banned words on lower-case and comparing it to the string after manipulating it completely to lower cause using something like .toLower() or .toLowerCase() in JavaScript.
But that wouldn't stop from users trying to bypass your list by using similar characters and even ASCII characters as a replacement, like dög, d0g or even døg. Solving that actually is the complex part of the question.
One option is using Regex wildcards like the . operator.

\d.g\
Will match dog, dög....

But that would be very impractical and time consuming, not to mention the computational cost if your banned list is long. There are several ways on how this could be accomplished, the easiest ones, like described above would suit just fine some friends-only-server as you just described, but in situations where the detection of those words are critical, using Stemming Algorithms, Fuzzy Matching, Regular Expressions and Machine Learning like described in this article, are all valid options and optimal if combined with each other efficiently.