How can I extract the number 12345 from the following string in PHP ?
<span id="jordan934" itemprop="distance"><span class='WebDistance'>#$#20B9; </span>12345</span></h3>
I was using the following until that '#$#20B9' string was not in it .
$results = $dom->query('#jordan934"]');
$distance = false;
if (count($results)) {
$distance = (int)trim($results->current()->textContent);
}
return $distance;
}
Try using regular expression
$str = 'jordan934';
preg_match_all('!\d+!', $str, $matches);
print_r($matches);
You could use a dom object
<?php
$html = '<span id="jordan934" itemprop="distance"><span class=\'WebDistance\'>#$#20B9; </span>12345</span></h3>';
$dom = new DomDocument();
$dom->loadHTML($filePath);
$finder = new DomXPath($dom);
$classname="my-class";
$nodes = $finder->query("//*[contains(#class, '$classname')]");
var_dump($nodes);
Other not widely known function is:
$str = 'jordan934';
$int = filter_var($str, FILTER_SANITIZE_NUMBER_INT);
http://php.net/manual/pl/function.filter-var.php
Related
How to use value from file_get_contents php as a number ?
I want to use $val in this case = 50.0001 plus with 20 it's will be result 70.0001 But when i test it's show 0 why ? how can i do ?
<?php
$html = file_get_contents('https://www.example.com');
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html);
$finder = new DomXPath($doc);
$node = $finder->query("//*[contains(#class, 'test')]");
$val = $doc->saveHTML($node->item(0));
$result = $val + 20;
echo $result;
?>
and https://www.example.com
<span class="test">50.0001</span>
just get the textcontent from the node and don't use saveHTML.
$val = $node->item(0)->textContent;
I'm retrieving a remote page with PHP, getting a few links from that page and accessing each link and parsing it.
It takes me about 12 seconds which are way too much, and I need to optimize the code somehow.
My code is something like that:
$result = get_web_page('THE_WEB_PAGE');
preg_match_all('/<a data\-a=".*" href="(.*)">/', $result['content'], $matches);
foreach ($matches[2] as $lnk) {
$result = get_web_page($lnk);
preg_match('/<span id="tests">(.*)<\/span>/', $result['content'], $match);
$re[$index]['test'] = $match[1];
preg_match('/<span id="tests2">(.*)<\/span>/', $result['content'], $match);
$re[$index]['test2'] = $match[1];
preg_match('/<span id="tests3">(.*)<\/span>/', $result['content'], $match);
$re[$index]['test3'] = $match[1];
++$index;
}
I have some more preg_match calls inside the loop.
How can I optimize my code?
Edit:
I've changed my code to use xpath instead of regex, and it became much more slower.
Edit2:
That's my full code:
<?php
$begin = microtime(TRUE);
$result = get_web_page('WEB_PAGE');
$dom = new DOMDocument();
$dom->loadHTML($result['content']);
$xpath = new DOMXPath($dom);
// Get the links
$matches = $xpath->evaluate('//li[#class = "lasts"]/a[#class = "lnk"]/#href | //li[#class=""]/a[ #class = "lnk"]/#href');
if ($matches === FALSE) {
echo 'error';
exit();
}
foreach ($matches as $match) {
$links[] = 'WEB_PAGE'.$match->value;
}
$index = 0;
// For each link
foreach ($links as $link) {
echo (string)($index).' loop '.(string)(microtime(TRUE)-$begin).'<br>';
$result = get_web_page($link);
$dom = new DOMDocument();
$dom->loadHTML($result['content']);
$xpath = new DOMXPath($dom);
$match = $xpath->evaluate('concat(//span[#id = "header"]/span[#id = "sub_header"]/text(), //span[#id = "header"]/span[#id = "sub_header"]/following-sibling::text()[1])');
if ($matches === FALSE) {
exit();
}
$data[$index]['name'] = $match;
$matches = $xpath->evaluate('//li[starts-with(#class, "active")]/a/text()');
if ($matches === FALSE) {
exit();
}
foreach ($matches as $match) {
$data[$index]['types'][] = $match->data;
}
$matches = $xpath->evaluate('//span[#title = "this is a title" and #class = "info"]/text()');
if ($matches === FALSE) {
exit();
}
foreach ($matches as $match) {
$data[$index]['info'][] = $match->data;
}
$matches = $xpath->evaluate('//span[#title = "this is another title" and #class = "name"]/text()');
if ($matches === FALSE) {
exit();
}
foreach ($matches as $match) {
$data[$index]['names'][] = $match->data;
}
++$index;
}
?>
As others mentioned, use a parser instead (ie DOMDocument) and combine it with xpath queries. Consider the following example:
<?php
# set up some dummy data
$data = <<<DATA
<div>
<a class='link'>Some link</a>
<a class='link' id='otherid'>Some link 2</a>
</div>
DATA;
$dom = new DOMDocument();
$dom->loadHTML($data);
$xpath = new DOMXPath($dom);
# all links
$links = $xpath->query("//a[#class = 'link']");
print_r($links);
# special id link
$special = $xpath->query("//a[#id = 'otherid']")
# and so on
$textlinks = $xpath->query("//a[startswith(text(), 'Some')]");
?>
Consider using a DOM framework for PHP. This should be way faster.
Use PHP's DOMDocument with xpath queries:
http://php.net/manual/en/class.domdocument.php
See Jan's answer for more explanation.
The following also works but is less preferable, according to the comments.
For example:
http://simplehtmldom.sourceforge.net/
an example to get all a tags on a page:
<?php
include_once('simple_html_dom.php');
$url = "http://your_url/";
$html = new simple_html_dom();
$html->load_file($url);
foreach($html->find("a") as $link)
{
// do something with the link
}
?>
I want to remove all links which matched this domain vnexpress.net in href attribute.
This is a link example:
whatever
This is my code:
$contents = preg_replace('/<a\s*href=\"*vnexpress*\"\s(.*)>(.*)<\/a>/', '', $data->content);
Please help me! Thank you so much!.
You've asked for a regular expression here, but it's not the right tool for parsing HTML.
$doc = new DOMDocument;
$doc->loadHTML($html); // load the html
$xpath = new DOMXPath($doc);
$links = $xpath->query("//a[contains(#href, 'vnexpress.net')]");
foreach ($links as $link) {
$link->parentNode->removeChild($link);
}
echo $doc->saveHTML();
Try this:
$re = "/<a[^>]+href=\"[^\"]*vnexpress.net[^>]+>(.*)<\\/a>/m";
$str = "<a id=\"\" href=\"http://vnexpress.net/whatever\">whatever <b>sss</b> </a>\n<a id=\"\" href=\"http://new.net/whatever\">whatever</a>\n";
$subst = "$1";
$result = preg_replace($re, $subst, $str);
Live demo
I found this function in snipplr which grabs ra div with certain attribute. I tried to use it, but it didn't work. Is there a something wrong in my way of using it?
http://snipplr.com/view.php?codeview&id=20987
function get_tag( $attr, $value, $xml, $tag=null ) {
if( is_null($tag) )
$tag = '\w+';
else
$tag = preg_quote($tag);
$attr = preg_quote($attr);
$value = preg_quote($value);
$tag_regex = "/<(".$tag.")[^>]*$attr\s*=\s*".
"(['\"])$value\\2[^>]*>(.*?)<\/\\1>/"
preg_match_all($tag_regex,
$xml,
$matches,
PREG_PATTERN_ORDER);
return $matches[3];
}
I made a change on it to use it for a url like this:
function get_tag( $attr, $value, $page, $tag=null ) {
if( is_null($tag) )
$tag = '\w+';
else
$tag = preg_quote($tag);
$attr = preg_quote($attr);
$value = preg_quote($value);
$tag_regex = "/<(".$tag.")[^>]*$attr\s*=\s*".
"(['\"])$value\\2[^>]*>(.*?)<\/\\1>/";
$page = file_get_contents($page);
preg_match_all($tag_regex,
$page,
$matches,
PREG_PATTERN_ORDER);
return $matches[3];
}
get_tag("class","weather","http://www.masrawy.com","div");
How can I use this correctly?
Dont use a regex for this. Use something that can parse and query the DOM like DOMDocument, Zend_Dom_Query or SimpleHTMLDOM.
DOMDocument example:
$dom = new DomDocument();
$html = file_get_contents('http://www.masrawy.com');
$dom->loadHTML($html);
$finder = new DomXPath($dom);
$classname="weather";
$nodes = $finder->query("//div[contains(concat(' ', normalize-space(#class), ' '), ' $classname ')]");
$extracted = array();
foreach($nodes as $element)
{
// convert to html string
$extracted[] = $element->ownerDocument->saveXML($element);
}
// now iterate over extracted and output...
An Zend_Dom_Query example:
$html = file_get_contents("http://www.masrawy.com");
$dom = new Zend_Dom_Query($html);
$results = $dom->query('div.theCssClassName');
$extracted = array();
foreach($results as $element)
{
// convert to html string
$extracted[] = $element->ownerDocument->saveXML($element);
}
// now iterate over extracted and output...
I currently have the following code :
$content = "
<name>Manufacturer</name><value>John Deere</value><name>Year</name><value>2001</value><name>Location</name><value>NSW</value><name>Hours</name><value>6320</value>";
I need to find a method to create and array as name=>value. E.g Manufacturer => John Deere.
Can anyone help me with a simple code snipped I tried some regex but doesn't even work to extract the names or values, e.g.:
$pattern = "/<name>Manufacturer<\/name><value>(.*)<\/value>/";
preg_match_all($pattern, $content, $matches);
$st_selval = $matches[1][0];
You don't want to use regex for this. Try out something like SimpleXML
EDIT
Well, why don't you start with this:
<?php
$content = "<root>" . $content . "</root>";
$xml = new SimpleXMLElement($c);
print_r($xml);
?>
EDIT 2
Despite the fact that some of the answers posted using regular expression MAY work, you should get in the habit of using the correct tool for the job and regular expressions are not the correct tool for parsing of XML.
I'm using your $content variable:
$preg1 = preg_match_all('#<name>([^<]+)#', $content, $name_arr);
$preg2 = preg_match_all('#<value>([^<]+)#', $content, $val_arr);
$array = array_combine($name_arr[1], $val_arr[1]);
This is rather simple, can be solved by regex. Should be:
$name = '<name>\s*([^<]+)</name>\s*';
$value = '<value>\s*([^<]+)</value>\s*';
$pattern = "|$name $value|";
preg_match_all($pattern, $content, $matches);
# create hash
$stuff = array_combine($matches[1], $matches[2]);
# display
var_dump($stuff);
Regards
rbo
First of all, never use regex to parse xml...
You could do this with an XPATH query...
First, wrap the content in a root tag to make the parser happy (if it doesn't already have it):
$content = '<root>' . $content . '</root>';
Then, load the document
$dom = new DomDocument();
$dom->loadXml($content);
Then, initialize the XPATH
$xpath = new DomXpath($dom);
Write your query:
$xpathQuery = '//name[text()="Manufacturer"]/follwing-sibling::value/text()';
Then, execute it:
$manufacturer = $xpath->evaluate($xpathQuery);
If I did the xpath right, it $manufacturer should be John Deere...
You can see the docs on DomXpath, a basic primer on XPath, and a bunch of XPath examples...
Edit: That won't work (PHP doesn't support that syntax (following-sibling). You could do this instead of the xpath query:
$xpathQuery = '//name[text()="Manufacturer"]';
$elements = $xpath->query($xpathQuery);
$manufacturer = $elements->item(0)->nextSibling->nodeValue;
I think this is what you're looking for:
<?php
$content = "<name>Manufacturer</name><value>John Deere</value><name>Year</name><value>2001</value><name>Location</name><value>NSW</value><name>Hours</name><value>6320</value>";
$pattern = "(\<name\>(\w*)\<\/name\>\<value\>(\w*)\<\/value\>)";
preg_match_all($pattern, $content, $matches);
$arr = array();
for ($i=0; $i<count($matches); $i++){
$arr[$matches[1][$i]] = $matches[2][$i];
}
/* This is an example on how to use it */
echo "Location: " . $arr["Location"] . "<br><br>";
/* This is the array */
print_r($arr);
?>
If your array has a lot of elements dont use the count() function in the for loop, calculate the value first and then use it as a constant.
I'll edit as my PHP is wrong, but here's some PHP (pseudo-)code to give some direction.
$pattern = '|<name>([^<]*)</name>\s*<value>([^<]*)</value>|'
preg_match_all($pattern, $content, $matches, PREG_SET_ORDER);
for($i = 0; $i < count($matches); $i++) {
$arr[$matches[$i][1]] = $matches[$i][2];
}
$arr is the array you want to store the name/value pairs.
Using XMLReader:
$content = '<name>Manufacturer</name><value>John Deere</value><name>Year</name><value>2001</value><name>Location</name><value>NSW</value><name>Hours</name><value>6320</value>';
$content = '<content>' . $content . '</content>';
$output = array();
$reader = new XMLReader();
$reader->XML($content);
$currentKey = null;
$currentValue = null;
while ($reader->read()) {
switch ($reader->name) {
case 'name':
$reader->read();
$currentKey = $reader->value;
$reader->read();
break;
case 'value':
$reader->read();
$currentValue = $reader->value;
$reader->read();
break;
}
if (isset($currentKey) && isset($currentValue)) {
$output[$currentKey] = $currentValue;
$currentKey = null;
$currentValue = null;
}
}
print_r($output);
The output is:
Array
(
[Manufacturer] => John Deere
[Year] => 2001
[Location] => NSW
[Hours] => 6320
)