Extracting e-mail address from a html structure using PHP

113 views Asked by At

I am trying to modify a php file (It is of Joomla extension Community Builder 1.9.1, and the file is \components\com_comprofiler\plugin\templates\default\default.php), in order to extract the e-mail address from a variable.

For description’s sake, let’s say this variable is $html. To make sure this variable is the right one containing the e-mail address that I'm targeting, I insert:

<pre><?php print_r($html) ?></pre>

Into the file, and its output is the email address with a mailto link, and the corresponding HTML is something like

<span id="cbMa47822" class="cbMailRepl"><a href="mailto:[email protected]">[email protected]</a></span>

So I guess I can use:

<?php $html_array = explode("\"",$html);echo $html_array[5]; ?>

Io get 'mailto:[email protected]'; But actually it only returns a notice of:

undefined offset:5

So I print_r($html_array), and it return something like

Array
(
    [0] =>  cbMa14768
    [2] =>  class=
    [3] => cbMailRepl
    [4] => >... 
)

It looks like the <a> tag part of the html output is replaced by "...", like what you see in Chrome’s developer tool html inspector, where before you expand it, the HTML looks like:

<span id="cbMa47822" class="cbMailRepl">...</span>

I looked deeper into the php code, trying to find out how this $html is contructed, but it is totally beyond my understanding.

For learning purpose, my questions are:

  1. why there is no [1] in the result of print_r($html_array)

  2. How do I test a variable’s value more exactly, by more exactly I mean totally without html input, like if the value is "<a href="htt://foo.com">foo</a>", if should display the HTML as is, but not a link (when I use print_r, it returns a link)?

  3. And most importantly, based on the information given above, can you give my any hint regarding how I can extract the e-mail address from a variable like this?

Finally, for those who are willing to take a deeper look into this, the variable I am talking about is $this->tableContent[$userIdx][1][6]->value in \components\com_comprofiler\plugin\templates\default\default.php, originally it wasn't in the code but I did some test and confirm it contains the email address. I inserted the following code between line 450 & 451

<?php $html_array = explode("\"",$this->tableContent[$userIdx][1][6]->value);echo $html_array[5]; ?>
2

There are 2 answers

2
Gaurav Singh On
  1. To avoid links you can use escape sequence.
  2. you can use regular expression to match if the given string matches the email address pattern and print it
  3. PHP has a vast support for functions which can perform wierdest tasks so search for them
2
Giacomo1968 On

To extract an e-mail address from an HTML strcuture as you describe, just use regex and preg_match:

$html = '<span id="cbMa47822" class="cbMailRepl"><a href="mailto:[email protected]">[email protected]</a></span>';

preg_match("/mailto:(.*)\">/is", $html, $matches);

echo '<pre>';
print_r($matches);
echo '</pre>';

The output would be:

Array
(
    [0] => mailto:[email protected]">
    [1] => [email protected]
)

So to access that e-mail address, just do this:

echo $matches[1];

The output would be:

[email protected]