How to detect if a string contains hindi (devnagri) in it with character and word count

Question

How to detect if a string contains hindi (devnagri) in it with character and word count

15.6k views Asked by A_N At 30 December 2024 at 13:19

Below is a example string -

$string = "abcde वायरस abcde"

I need to check weather this string contains any Hindi (Devanagari) content and if so the count of characters and words. I guess regex with unicode character class can work http://www.regular-expressions.info/unicode.html. But I am not able to figure out the correct regex statement.

Original Q&A

There are 2 answers

Sandeep Dixit On 28 June 2022 at 05:58

It should be a range. The list of all characters is not required. The following will detect a Devanagari word

[\u0900-\u097F]+

**ssc-hrep3** · Accepted Answer · 2017-01-06T21:58:54+00:00

To find out, if a string contains a Hindi (Devanagari) character, you need to have a full list of all Hindi characters. According to this website, the Hindi characters are the hexadecimal characters between 0x0900 and 0x097F (decimal 2304 to 2431).

The regular expression pattern needs to match, if any of those characters are in the set. Therefore, you can use a pattern (actually a set of characters) to match the string, which looks like this:

[\u0900\u0901\u0902 ... \u097D\u097E\u097F]

Because it is rather cumbersome to manually write this list of characters down, you can generate this string by iterating over the decimal characters from 2304 to 2431 or over the hexadecimal characters.

To count all words containing at least one Hindi character, you can use the following pattern. It contains white-space (\s) around the word or the beginning (^) or the end ($) around the word, and a global flag, to match every occurence (/g):

/(?:^|\s)[\u0900\u0901\u0902 ... \u097D\u097E\u097F]+?(?:\s|$)/g

Here is a live implementation in JavaScript:

var numberOfHindiCharacters = 128;
var unicodeShift = 0x0900;
var hindiAlphabet = [];
for(var i = 0; i < numberOfHindiCharacters; i++) {
  hindiAlphabet.push("\\u0" + (unicodeShift + i).toString(16));
}

var regex = new RegExp("(?:^|\\s)["+hindiAlphabet.join("")+"]+?(?:\\s|$)", "g");
var string1 = "abcde वायरस abcde";
var string2 = "abcde abcde";

[ string1.match(regex), string2.match(regex) ].forEach(function(match) {
  if(match) {
    console.log("String contains " + match.length + " words with Hindi characters only.");
  } else {
    console.log("String does NOT contain any words with Hindi characters only.");
  }
});

TechQA.

How to detect if a string contains hindi (devnagri) in it with character and word count

There are 2 answers

Related Questions in REGEX

Related Questions in POWERSHELL

Related Questions in UNICODE

Related Questions in HINDI

Popular Questions

Popular Tags

Trending Questions