I have a string like this Hello? My name is Ben! @ My age is 32.
I want to change it into an array with all words, spaces and punctuation as separate entities in the array. For example if I did var_dump($sentence)
the array should look like this:
array(12) {
[0]=>
string(5) "Hello"
[1]=>
string(1) "?"
[2]=>
string(1) " "
[3]=>
string(2) "My"
[4]=>
string(1) " "
[5]=>
string(4) "name"
[6]=>
string(1) " "
[7]=>
string(2) "is"
[8]=>
string(1) " "
[9]=>
string(3) "Ben"
[10]=>
string(1) "!"
[11]=>
string(1) " "
[12]=>
string(1) "@"
etc...
The only code I've found which comes close to this is:
$sentence = preg_split("/(?<=\w)\b\s*/", 'Hello? My name is Ben! @ My age is 32.');
echo '<pre>';
var_dump($sentence);
echo '</pre>';
which outputs:
array(10) {
[0]=>
string(5) "Hello"
[1]=>
string(4) ". My"
[2]=>
string(4) "name"
[3]=>
string(2) "is"
[4]=>
string(3) "Ben"
[5]=>
string(6) "! @ My"
[6]=>
string(3) "age"
[7]=>
string(2) "is"
[8]=>
string(2) "32"
[9]=>
string(1) "."
}
How do i change this so the spaces and punctuation are separated in the array?
No need for lookahead: just make preg_split capture delimiters as well (with
PREG_SPLIT_DELIM_CAPTURE
option):Demo. With this setup, each
\W
(non-word) symbol is captured separately (as a delimiter), but all\w
symbols are gathered into sequences (as parts of the string separated by\W
).