Remove all characters except alphabets and numbers from a C# "Unicode" string

752 views Asked by At

Removing non-alphanumeric characters from a string is simple work. For example:

StringBuilder sb = new StringBuilder();
foreach(var c in s)
{
    if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || (c >= '0' && c <= '9'))
        sb.Append(c);
}
return sb.ToString();

This method is suitable for ASCII characters.

Is there a solution for removing all non-alphanumeric characters in "UNICODE" texts?

3

There are 3 answers

0
Tanveer Badar On

You can use char.IsLetterOrDigit() for that.

0
fubo On
string result = string.Concat(s.Where(char.IsLetterOrDigit));
0
Dmitry Bychenko On

Regular expression is an alternative; we replace all not unwanted letters (here we use \W+ patttern - one or more non alphanumeric characters) with empty string:

string result = Regex.Replace(s, @"\W+", "");