preg_match returns an empty string even there is a match - php

I am trying to extract all meta tags in web page, currently am using preg_match_all to get that, but unfortunately it returns an empty strings for the array indexes.
<?php
$meta_tag_pattern = '/<meta(?:"[^"]*"[\'"]*|\'[^\']*\'[\'"]*|[^\'">])+>/';
$meta_url = file_get_contents('test.html');
if(preg_match_all($meta_tag_pattern, $meta_url, $matches) == 1)
echo "there is a match <br>";
print_r($matches);
?>
Returned array:
Array ( [0] => Array ( [0] => [1] => [2] => [3] => ) ) Array ( [0] => Array ( [0] => [1] => [2] => [3] => ) )

An example with DOMDocument:
$url = 'test.html';
$dom = new DOMDocument();
#$dom->loadHTMLFile($url);
$metas = $dom->getElementsByTagName('meta');
foreach ($metas as $meta) {
echo htmlspecialchars($dom->saveHTML($meta));
}

UPDATED: Example grabbing meta tags from URL:
$meta_tag_pattern = '/<meta\s[^>]+>/';
$meta_url = file_get_contents('http://stackoverflow.com/questions/10551116/html-php-escape-and-symbols-while-echoing');
if(preg_match_all($meta_tag_pattern, $meta_url, $matches))
echo "there is a match <br>";
foreach ( $matches[0] as $value ) {
print htmlentities($value) . '<br>';
}
Outputs:
there is a match
<meta name="twitter:card" content="summary">
<meta name="twitter:domain" content="stackoverflow.com"/>
<meta name="og:type" content="website" />
...
Looks like part of the problem is the browser rendering the meta tags as meta tags and not displaying the text when you print_r the output, so they need to be escaped.

Related

Preg Match Regular expression

What will the expression be for preg_match_all to get the content part of this string:
< meta itemprop="url" content="whatever content is here" >
So far I have tried:
preg_match_all('| < meta itemprop=/"url/" content=/"(.*?)/" > |',$input,$outputArray);
Try this expression:
<?php
$input = '< meta itemprop="url" content="whatever content is here" >';
preg_match_all('/content="(.*?)"/',$input,$outputArray);
print_r($outputArray);
?>
Output
Array
(
[0] => Array
(
[0] => content="whatever content is here"
)
[1] => Array
(
[0] => whatever content is here
)
)
Working Demo
Edit
If you want to fetch content of only itemprop="url", modify regex to
preg_match_all('/itemprop="url".*content="(.*?)"/',$input,$outputArray);

PHP: preg_match_all() - divide an array

I have the following code:
$str = '{"ok1", "ok2"},
{"ok3", "ok4"},
{"ok5", "ok6"}';
preg_match_all('/"([^"]*)"/', $str, $matches);
print_r($matches[1]);
which outputs this:
Array ( [0] => ok1 [1] => ok2 [2] => ok3 [3] => ok4 [4] => ok5 [5] => ok6 )
It works perfect but I want to make it array1, array2 and array3. So it will divide the array depending on the tags inside {}
i.e.
`array1` will be `array("ok1", "ok2")`;
`array2` will be `array("ok3", "ok4")`;
`array3` will be `array("ok5", "ok6")`;
Kind of an overkill, but you could indeed achieve it with two regular expressions as well (if this is not some JSON code):
<?php
$string = '{"ok1", "ok2"}, {"ok3", "ok4"}, {"ok5", "ok6"}';
$regex = '~(?<=}),\s~';
$result = array();
$parts = preg_split($regex, $string);
foreach ($parts as $part) {
preg_match_all('~"(?<values>[^"]+)"~', $part, $elements);
$result[] = $elements["values"];
}
echo $result[0][1]; // ok2
?>
Jan's answer is very good and I am only posting mine here as a different way to approach the problem using regex - not to take away from his answer.
If you had a string like this:
$output_array = array();
$str = '{"ok1", "ok2", "ok9", "ok11"},
{"ok3", "ok4"},
{"ok5", "ok6", "ok99"}';
Then you could look for all sets of curly braces and store those into an array:
preg_match_all('~\{.*?\}~', $str, $matches);
Finally, just loop through each set of braces and match each set of data appearing in quotation marks. Then add those matches to your output array.
foreach ($matches[0] AS $set) {
preg_match_all('~".*?"~', $set, $set_matches);
$output_array[] = $set_matches[0];
}
print_r($output_array);
That will give you an array like this:
Array
(
[0] => Array
(
[0] => "ok1"
[1] => "ok2"
[2] => "ok9"
[3] => "ok11"
)
[1] => Array
(
[0] => "ok3"
[1] => "ok4"
)
[2] => Array
(
[0] => "ok5"
[1] => "ok6"
[2] => "ok99"
)
)

php Regular expressions basic

I want to regex thread_id in this html code by using php
<a id="change_view" class="f_right" href="pm.php?view=chat&thread_id=462075438105382912">
I wrote this code however it return empty array to me
$success = preg_match_all('/pm\.php\?view=chat&thread_id=([^"]+)/', $con, $match2);
is there any problem in my php code ?
Well, you said it is giving you an empty array. But it is not. Here is the value returned by print_r()
Array
(
[0] => Array
(
[0] => pm.php?view=chat&thread_id=462075438105382912
)
[1] => Array
(
[0] => 462075438105382912
)
)
But It is not returning what you want it to. The regular expression to get string that comes after thread_id= and before & or " is :
/(?<=thread_id=).*(?=\"|&)/
Working example :
<?php
$con = '<a id="change_view" class="f_right" href="pm.php?view=chat&thread_id=462075438105382912">link</a>';
$match2 = Array();
preg_match_all('/(?<=thread_id=).*(?=\"|&)/', $val, $arr);
echo "<pre>";
print_r($arr);
echo "</pre>";
?>
Output :
Array
(
[0] => Array
(
[0] => 462075438105382912
)
)
If you're only looking for the thread_id, this should do it.
$success = preg_match_all('/(thread_id=)([\d]+)/', $con, $match2);
if (preg_match('/thread_id=[0-9]*/', $line, $matches))
$thread_id = $matches[0];

How can I extract value from an xml

I'm new to PHP. I'm trying to get the data out of the below XML. Now, in my code $data->Address contains value of the below code i.e:
$data->Address = "<tolist></tolist>
<cclist>
<cc>
<contactpersonname>niraj</contactpersonname>
<name>niraj</name>
<email>stgh#gmail.com</email>
<number>+91.3212365212</number>
<prefix>Ms.</prefix>
<contactpersonprefix>Ms.</contactpersonprefix>
</cc>
<cc>
<contactpersonname>fdg</contactpersonname>
<name>admin</name>
<email>admin12#gmail.com</email>
<number>+91.4554343234</number>
<prefix>Mr.</prefix>
<contactpersonprefix>Mr.</contactpersonprefix>
</cc>
</cclist>";
Now I want to extract the <contactpersonname> tag and print it. How can I do this?
Since your XML is missing a tag that encompasses all others, you need to create on in order to get parsers to work properly:
<?php
$buffer = "<tolist></tolist>
<cclist>
<cc>
<contactpersonname>niraj</contactpersonname>
<name>niraj</name>
<email>stgh#gmail.com</email>
<number>+91.3212365212</number>
<prefix>Ms.</prefix>
<contactpersonprefix>Ms.</contactpersonprefix>
</cc>
<cc>
<contactpersonname>fdg</contactpersonname>
<name>admin</name>
<email>admin12#gmail.com</email>
<number>+91.4554343234</number>
<prefix>Mr.</prefix>
<contactpersonprefix>Mr.</contactpersonprefix>
</cc>
</cclist>";
// ***** wrap the whole thing in a <root> tag...
$xml = simplexml_load_string("<root>".$buffer."</root>");
$array = json_decode(json_encode((array) $xml), 1);
echo "<pre>";
print_r($array);
echo "</pre>";
?>
Result:
Array
(
[tolist] => Array
(
)
[cclist] => Array
(
[cc] => Array
(
[0] => Array
(
[contactpersonname] => niraj
[name] => niraj
[email] => stgh#gmail.com
[number] => +91.3212365212
[prefix] => Ms.
[contactpersonprefix] => Ms.
)
[1] => Array
(
[contactpersonname] => fdg
[name] => admin
[email] => admin12#gmail.com
[number] => +91.4554343234
[prefix] => Mr.
[contactpersonprefix] => Mr.
)
)
)
)
UPDATED
Now you can navigate down to where you want to go with
echo "<pre>";
$ccList = $array['cclist'];
$cc = $ccList['cc'];
$contacts = array();
foreach($cc as $i=>$val) {
$contacts[$i]=$val['contactpersonname'];
}
echo "first contact: " . $contacts[0] . "<br>";
echo "second contact: " . $contacts[1] ."<br>";
Result:
first contact: niraj
second contact: fdg
You can convert the XML to an array with the following code:
$xml = simplexml_load_string($buffer);
$array = json_decode(json_encode((array) $xml), 1);
Where $buffer is the xml string.
Then you can obtain the person name as follow:
$data->Address = $array['cclist']['cc']['contactpersonname'];
It's a quick and dirty method to convert the xml to an array, but it works.
Try this..
$xml = new SimpleXMLElement($string);
$results = $xml->xpath('cclist/cc/contactpersonname');
http://php.net/manual/en/simplexmlelement.xpath.php
$xml = simplexml_load_file("note.xml");
echo $xml->contactpersonname;
This requires you to load it form an xml file. If you already have the string in the code I'd recommend a regex. If you know the data won't ever be incorrect written!
$pattern = '#<contactpersonname>(.*?)</contactpersonname>#';
echo preg_match ($pattern, $data->Address);

Match String and Get Variable PHP

I would like to be able to use this string to pull off a certain piece of data from a database '{my-id-1}' so basically if this is found in the text '{my-id-*}' then get the id (eg. if {is-id-1} then ID is 1) and then I can run some code with that ID.
So I've got it so I can get the ID from the braces, but I'm not sure how to replace that within the text.
<?php
$text = "test 1 dfhjsdh sdjkfhksdhfkj skjh {is-id-1} sdfhskdfh sdfsdjfhksd fjksdfhksd {is-id-2}";
preg_match_all('/{is-id-+(.*?)}/',$text, $matches);
print_r ($matches);
$replacewiththis = "this has been replaced, it was id: " . $idhere;
$text = preg_replace('/{is-id-+(.*?)}/', $replacewiththis, $text);
echo $text;
?>
The Array for the matches outputs:
Array (
[0] => Array (
[0] => {is-id-1}
[1] => {is-id-2}
)
[1] => Array (
[0] => 1
[1] => 2
)
)
I'm stuck now and not sure how to can process each of the braces. Can anyone give me a hand?
Thanks.
I am not sure I understood well what you want, but I think this is it:
foreach($matches[1] as $match){
$replacewiththis = "this has been replaced, it was id: $match";
$text=str_replace('{is-id-'.$match.'}', $replacewiththis, $text);
}
echo $text;

Categories