With this code, when I fill 漢字
into an input element with type text
and name text
and press submit button, its shows mb_strlen : 16
and strlen : 16
<?php
include("connect.php");
if(isset($_POST["submit"]))
{
$string = mysqli_real_escape_string($db_mysqli,$_POST['text']);
//$string = "漢字";
echo $string."<BR>";
echo "mb_strlen : ".mb_strlen($string, 'utf-8')."<BR>";
echo "strlen : ".strlen($string)."<BR>";
if(strlen($string) != mb_strlen($string, 'utf-8'))
{
echo "Please enter English words only:(";
}
else
{
echo "OK, English Detected!";
}
}
?>
<form method="post" ENCTYPE = "multipart/form-data">
<input type="text" name="text">
<input type="submit" name="submit" value="OK" id="button-blue" style=" float: none; ">
</form>
But when use this code, it's will show mb_strlen : 2
and strlen : 6
I want to know , why the value from above code is incorrect and how to apply?
<?php
$string = "漢字";
echo $string."<BR>";
echo "mb_strlen : ".mb_strlen($string, 'utf-8')."<BR>";
echo "strlen : ".strlen($string)."<BR>";
if(strlen($string) != mb_strlen($string, 'utf-8'))
{
echo "Please enter English words only:(";
}
else
{
echo "OK, English Detected!";
}
?>
There are likely some gotchas with this answer—which will require later revision—but instead of using
strlen
we can use Regex to check if the input string has non-Latin characters.Code:
Results:
If I tested with
This is a Latin string jasDLFKL@##$&()@!!!
I get an empty array back. I don't believe this is a foolproof solution, but more of a good first step.Please note that the definition of the Latin character range for Regex is U+0000–U+007F. This Regex Tutorial Page goes into detail about Unicode. Also note that my pattern has a
u
flag, for Unicode. That will be necessary to include.