Extracting e-mail address from a html structure using PHP

Extracting e-mail address from a html structure using PHP - php

I am trying to modify a php file (It is of Joomla extension Community Builder 1.9.1, and the file is \components\com_comprofiler\plugin\templates\default\default.php), in order to extract the e-mail address from a variable.
For description’s sake, let’s say this variable is $html. To make sure this variable is the right one containing the e-mail address that I'm targeting, I insert:
<pre><?php print_r($html) ?></pre>
Into the file, and its output is the email address with a mailto link, and the corresponding HTML is something like
<span id="cbMa47822" class="cbMailRepl">myemail#yahoo.com</span>
So I guess I can use:
<?php $html_array = explode("\"",$html);echo $html_array[5]; ?>
Io get 'mailto:myemail#yahoo.com'; But actually it only returns a notice of:
undefined offset:5
So I print_r($html_array), and it return something like
Array
(
[0] => cbMa14768
[2] => class=
[3] => cbMailRepl
[4] => >...
)
It looks like the <a> tag part of the html output is replaced by "...", like what you see in Chrome’s developer tool html inspector, where before you expand it, the HTML looks like:
<span id="cbMa47822" class="cbMailRepl">...</span>
I looked deeper into the php code, trying to find out how this $html is contructed, but it is totally beyond my understanding.
For learning purpose, my questions are:
why there is no [1] in the result of print_r($html_array)
How do I test a variable’s value more exactly, by more exactly I mean totally without html input, like if the value is "foo", if should display the HTML as is, but not a link (when I use print_r, it returns a link)?
And most importantly, based on the information given above, can you give my any hint regarding how I can extract the e-mail address from a variable like this?
Finally, for those who are willing to take a deeper look into this, the variable I am talking about is $this->tableContent[$userIdx][1][6]->value in \components\com_comprofiler\plugin\templates\default\default.php, originally it wasn't in the code but I did some test and confirm it contains the email address. I inserted the following code between line 450 & 451
<?php $html_array = explode("\"",$this->tableContent[$userIdx][1][6]->value);echo $html_array[5]; ?>

To extract an e-mail address from an HTML strcuture as you describe, just use regex and preg_match:
$html = '<span id="cbMa47822" class="cbMailRepl">myemail#yahoo.com</span>';
preg_match("/mailto:(.*)\">/is", $html, $matches);
echo '<pre>';
print_r($matches);
echo '</pre>';
The output would be:
Array
(
[0] => mailto:myemail#yahoo.com">
[1] => myemail#yahoo.com
)
So to access that e-mail address, just do this:
echo $matches[1];
The output would be:
myemail#yahoo.com

To avoid links you can use escape sequence.
you can use regular expression to match if the given string matches the email address pattern and print it
PHP has a vast support for functions which can perform wierdest tasks so search for them

Related

Does x-Cart strips tags from posted data?

I send data from form, where the textarea contains html tags. On PHP side I do not see them, using:
echo "<pre>";
print_r( $_POST );
echo "</pre>";
exit();
I get:
Where have paragraph tags gone?
In source code they are clearly gone:
<pre>Array
(
[mode] => save_product
[id] => 1
[title] => Banana Shake
[categoryid] => 1
[serving] => 34.50
[orderby] => 10
[intro] => Intro
[instructions] => Empty contents of packet into a shaker or blender, add 200-240ml of cold water and shake/mix until fully dissolved.
Consume within 10 minutes for full nutritional benefit.
...</pre>
EDIT
I am using x-Cart's engine to manipulate data, could be the x-Cart strips those tags.

The solution was to set trusted variables at beginning of the script this way:
define('USE_TRUSTED_POST_VARIABLES', 1);
$trusted_post_variables = array('intro', 'instructions');
That way x-cart won't strip any tags.
Thanks for the help and sorry for the confusion.

EDIT: this answer is written on the presumption that you are not using a framework or other method that strips HTML tags from your post.
Your paragraph tags are still there. Since you're printing them in the browser, the browser is interpreting them as real <p> tags. If you were to look at the page's source code, you'd see the tags. (Google "<your browser name> view page source" for instructions on how to do this.)
You could also use htmlentities($_POST['instructions]) or htmlspecialchars($_POST['instructions]) to change the HTML tags to entities, which will cause them to be printed to the browser.
htmlentities()
htmlspecialchars()

PHP do not parse array input name

Let's say that I have the following simple input text box:
<input type="text" name="Details[0]->Name" value="" />
Now the problem is, php translating the input name as array, and then ignore the rest name after the closing square bracket. So in print_r, it become:
Details => Array{
[0] => "Input"
}
What can I do to workaround this? Is there any unparsed $_REQUESTS?
N.B: If you noticed it, yes I am trying to use automatic input to class mapper as it has been done in Asp.net mvc.
Edit:
The additional solution requirement is that I can read raw array input from either GET, POST or multipart form requests.

For POST:
echo urldecode ( file_get_contents('php://input'));
For GET:
echo urldecode ($_SERVER['QUERY_STRING']);
Both the above give the output Details[0]->Name=testval
As for enctype='multipart/form-data', the unparsed data is not available in php. However, there is a solution of sorts given by this SO question: Get raw post data

Just replace Input with some another name like Inputcust and make new array
Details => Array{
[0] => "Inputcust"
}
and where ever you want to use again replace from Inputcust => Input

PHP Replace tags / placeholders / markers in text string with dynamic values

Basically, what I want to achieve is dynamically replace {SOME_TAG} with "Text".
My idea was to read all tags like {SOME_TAG}, put them into array.
Then convert array keys into variables like $some_tag, and put them into array.
So, this is how far I got:
//Some code goes here
$some_tag = "Is defined somewhere else.";
$different_tag = 1 + $something;
Some text {SOME_TAG} appears in different file, which contents has been read earlier.
//Some code goes here
preg_match_all('/{\w+}/', $strings, $search);
$search = str_replace(str_split('{}'),"",$search[0]);
$search = array_change_key_case( array_flip($search), CASE_LOWER);
...some code missing here, which I cant figure out.
Replace array should look something like this
$replace = array($some_tag, $different_tag);
//Then comes replacing code and output blah blah blah..
How to make array $replace contain variables dynamically depending on $search array?

Why not something along the lines of:
<?php
$replace = array(
'{TAG_1}' => 'hello',
'{TAG_2}' => 'world',
'{TAG_3}' => '!'
);
$myString = '{TAG_1} {TAG_2}{TAG_3}{TAG_3}';
echo str_replace(array_keys($replace), array_values($replace), $myString);

If I understand correctly:
You're working on trying to create a customizable document, using {TAGS} in order to represent replaceable areas that can be filled in with dynamic information. At some point in time while replacing the {TAGS} with the dynamic information, you want the dynamic information to be stored in automatically generated basic variable names, as $tags.
I'm not sure why you want to convert these tags to basic variables instead using them entirely as array keys. I would like to point out that this represents a security or functionality hole - what happens if someone puts {REPLACE} in as a tag in your document? Your replace array would get overwritten with dynamic data, and your whole program would fall apart. Either that, or the whole replace array would get dumped in for {REPLACE}, making for a very messy document with perhaps data you don't WANT them to have in it. Perhaps you have this dealt with - I don't have all the context here - but I thought I'd point out the risk factor.
As for a better solution, unless there's some specific need that you're addressing by going through $tags instead of using using the $replace array directly, I like #Emissary's answer.

How to use ipinfodb to find country code only

I want to find the country code of my site visitor using the ipinfodb API.
When I try the following, http://api.ipinfodb.com/v3/ip-country/?key=<My_API_Key>&ip=<Some_IP>, it gives the following output:
OK;;<Some_IP>;US;UNITED STATES
How do I filter this output so that it shows only the country code ?
Regards,
Timothy
EDIT:
In reply to Charles,
After searching on google I came to know that the API can be given a 'format attribute as XML' so the following works.
$xml = simplexml_load_file('http://api.ipinfodb.com/v3/ip-country/?key=<My_API>&format=xml&ip=<Some_IP>);
echo $xml->countryCode;
How can I get the same output without the XML argument ?
Thanks

OK;;<Some_IP>;US;UNITED STATES
How do I filter this output so that it shows only the country code ?
I find it curious that you'd be able to invoke SimpleXML, but didn't think of explode, which will turn a string in into an array, splitting on the given delimiter. In this case, we need to explode on ;:
$string_data = 'OK;;127.0.0.1;US;UNITED STATES';
$sploded_data = explode(';', $string_data);
print_r($sploded_data);
echo "\nCountry code: ", $sploded_data[3], "\n";
This should emit:
Array
(
[0] => OK
[1] =>
[2] => 127.0.0.1
[3] => US
[4] => UNITED STATES
)
Country code: US
You may wish to review the rest of the string manipulation functions to see if there's any other interesting things you may have missed.

The above answer isn't particularly helpful, because the server output is only static if you explicitly specify the same IP address every time. If you hardcode a location request for a particular IP address, it's going to the be the same every time. What's the point?
Do this instead
Install the PHP class for their API. PHP Class
Play around with the PHP sample found here; save it as its own webpage and observe the results. Let's call it "userlocation.php." Please note that the fields will be null if you load from localhost.
Okay, the trick to parse this output is array_values(). Took me forever to figure this, but eventually I stumbled upon this.
So...
$locations = array_values($locations);
echo $locations[n],"<br />\n";
echo $locations[n+1],"<br />\n";
etc.
Whatever element you need, you can get in this way -- and it's dynamic. The code will return whatever country pertains to the user's IP address.
One last note. Take care to paste your API key into the class file and not into userlocation.php. The class file variables are protected, which is a good thing.
Anyway, I'm no expert; just thought I'd share what I've learned.

Regex not finding all variables

I'm parsing some HTML, that I have generated in a form. This is a token system. I'm trying to get the information from the Regexp later on, but somehow, it's turning up only the first of the matches. I found a regexp on the Web, that did almost what I needed, except of being able to process multiple occurances.
I want to be able to replace the content found, with content that was generated from the found string.
So, here is my code:
$result = preg_replace_callback("/<\/?\w+((\s+(\w|\w[\w-]*\w)(\s*=\s*(?:\".*?\"|'.*?'|[^'\">\s]+))?)+\s*|\s*)\/?>\[\*.*\*\]\<\/[a]\>/i", array(get_class($this), 'embed_video'), $str);
public function embed_video($matches)
{
print_r($matches);
return $matches[1] . 'foo';
}
I really need only the attributes, since they containt all of the valuable information. The contents of the tag are used only to find the token. This is an example of what needs to happen:
<a type="TypeOfToken1" id="IdOfToken1">[*SomeTokenTitle1*]</a>
<a type="TypeOfToken2" id="IdOfToken2">[*SomeTokenTitle2*]</a>
After the preg_replace_callback() this should be returned:
type="TypeOfToken1" id="IdOfToken1" type="TypeOfToken2" id="IdOfToken2"
But, the callback function outputs the matches, but does not replace them with the return. So, the $result stays the same after the preg_replace_callback. What could be the problem?
An example with real data:
Input:
<p><a id="someToken1" rel="someToken1">[*someToken1*]</a> sdfsdf <a id="someToken2" rel="someToken2">[*someToken2*]</a></p>
returned $result:
id="someToken1" rel="someToken1"foo
Return from the print_r() if the callback function:
Array ( [0] => [*someToken1*] sdfsdf [*someToken2*] [1] => id="someToken1" rel="someToken1" [2] => rel="someToken1" [3] => rel [4] => ="someToken1" )
I think that it is not returning both of the strings it should.

For anyone else stumbling into a problem like this, try checking your regexp and it's modifiers.
Regarding the parsing of the document, I'm still doing it, just not HTML tags. I have instead gone with someting more textlike, that can be more easily parsed. In my case: [*TokeName::TokenDetails*].

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Extracting e-mail address from a html structure using PHP - php

To avoid links you can use escape sequence. you can use regular expression to match if the given string matches the email address pattern and print it PHP has a vast support for functions which can perform wierdest tasks so search for them

Related

Does x-Cart strips tags from posted data?

PHP do not parse array input name

PHP Replace tags / placeholders / markers in text string with dynamic values

How to use ipinfodb to find country code only

Regex not finding all variables

Categories

Resources