How to fetch html string of XPath results? - php

Considering this code:
<div class="a">foo</div>
<div class="a"><div id="1">bar</div></div>
If I want to fetch all the values of divs with class a, I'll do the following query:
$q = $xpath->query('//div[#class="a"]');
However, I'll get this result:
foo
bar
But I want to get the actual value including the children tags. So it'll look like that:
foo
<div id="1">bar</div>
How can I accomplish that with XPath and DOMDocument only?
Solved by the function provided here.

PHP DOM has an undocumented '.nodeValue' attribute which acts exactly like .innerHTML in a browser. Once you've used XPath to get the node you want, just do $node->nodeValue to get the innerhtml.

You can try to use
$xml = '<?xml version=\'1.0\' encoding=\'UTF-8\' ?>
<root>
<div class="a">foo</div>
<div class="a"><div id="1">bar</div></div>
</root>';
$xml = simplexml_load_string($xml);
var_dump($xml->xpath('//div[#class="a"]'));
But in this case you will have to iterate objects.
Output:
array(2) {
[0]=>
object(SimpleXMLElement)#2 (2) {
["#attributes"]=>
array(1) {
["class"]=>
string(1) "a"
}
[0]=>
string(3) "foo"
}
[1]=>
object(SimpleXMLElement)#3 (2) {
["#attributes"]=>
array(1) {
["class"]=>
string(1) "a"
}
["div"]=>
string(3) "bar"
}
}

Try something like:
$doc = new DOMDocument;
$doc->loadHTML('<div>Your HTML here.</div>');
$xpath = new DOMXpath($doc);
$node = $xpath->query('//div[#class="a"]')->item(0);
$html = $node->ownerDocument->saveHTML($node); // Get HTML of DOMElement.

Related

Getting data from XML

I am struggling with reading XML file using PHP.
The XML I want to use is here:
http://www.gdacs.org/xml/rss.xml
Now, the data I am interested are the "item" nodes.
I created the following function, which gets the data:
$rawData = simplexml_load_string($response_xml_data);
foreach($rawData->channel->item as $value) {
$title = $value->title;
....
this works fine.
The nodes with the "gdcs:xxxx" were slightly more problematic, but I used the following code, which also works:
$subject = $value->children('dc', true)->subject;
Now the problem I have is with the "resources" node,
Basically the stripped down version of it would look like this:
<channel>
<item>
<gdacs:resources>
<gdacs:resource id="xx" version="0" source="xx" url="xx" type="xx">
<gdacs:title>xxx</gdacs:title>
</gdacs:resource>
<gdacs:resource id="xx" version="0" source="xx" url="xx" type="xx">
<gdacs:title>xxx</gdacs:title>
</gdacs:resource>
<gdacs:resource id="xx" version="0" source="xx" url="xx" type="xx">
<gdacs:title>xxx</gdacs:title>
</gdacs:resource>
</gdacs:resources>
</item>
</channel>
How in this case would I get the resources? I was able to get always just the first resource and only the title of it. What I would like to do is get all the resources items, which have "type" of a particular value and get their URL.
Running through XML the regular path, is , from my experience, slow and excruciating.
Have a look into XPath -> it's a way to extract data from XML through selectors ( similar to CSS selectors )
http://php.net/manual/en/simplexmlelement.xpath.php
You can select elements by their attributes similar to CSS
<?php
$xmlStr = file_get_contents('some_xml.xml');
$xml = new SimpleXMLElement($xmlStr);
$items = $xml->xpath("//channel/item");
$urls_by_item = array();
foreach($items as $x) {
$urls_by_item [] = $x->xpath("//gdacs:resources/gdacs:resource[#type='image']/#url");
}
Consider using the node occurrence of xpath with square brackets [] to align urls with corresponding titles. A more involved modification of #Daniel Batkilin's answer, you can incorporate both data pieces in an associative multidimensional array, requiring nested for loops.
$xml = simplexml_load_file('http://www.gdacs.org/xml/rss.xml');
$xml->registerXPathNamespace('gdacs', 'http://www.gdacs.org');
$items = $xml->xpath("//channel/item");
$i = 1;
$out = array();
foreach($items as $x) {
$titles = $xml->xpath("//channel/item[".$i."]/gdacs:resources/gdacs:resource[#type='image']/gdacs:title");
$urls = $xml->xpath("//channel/item[".$i."]/gdacs:resources/gdacs:resource[#type='image']/#url");
for($j=0; $j<count($urls); $j++) {
$out[$j.$i]['title'] = (string)$titles[$j];
$out[$j.$i]['url'] = (string)$urls[$j];
}
$i++;
}
$out = array_values($out);
var_dump($out);
ARRAY DUMP
array(40) {
[0]=>
array(2) {
["title"]=>
string(21) "Storm surge animation"
["url"]=>
string(92) "http://webcritech.jrc.ec.europa.eu/ModellingCyclone/cyclonesurgeVM/1000226/final/outres1.gif"
}
[1]=>
array(2) {
["title"]=>
string(26) "Storm surge maximum height"
["url"]=>
string(101) "http://webcritech.jrc.ec.europa.eu/ModellingCyclone/cyclonesurgeVM/1000226/final/P1_MAXHEIGHT_END.jpg"
}
[2]=>
array(2) {
["title"]=>
string(12) "Overview map"
["url"]=>
string(64) "http://dma.gdacs.org/saved/gdacs/tc/1000226/clouds_1000226_2.png"
}
[3]=>
array(2) {
["title"]=>
string(41) "Map of rainfall accummulation in past 24h"
["url"]=>
string(70) "http://dma.gdacs.org/saved/gdacs/tc/1000226/current_rain_1000226_2.png"
}
[4]=>
array(2) {
["title"]=>
string(23) "Map of extreme rainfall"
["url"]=>
string(62) "http://dma.gdacs.org/saved/gdacs/tc/1000226/rain_1000226_2.png"
}
[5]=>
array(2) {
["title"]=>
string(34) "Map of extreme rainfall (original)"
["url"]=>
string(97) "http://www.ssd.noaa.gov/PS/TROP/DATA/ETRAP/2015/NorthIndian/THREE/2015THREE.pmqpf.10100000.00.GIF"
}
...

PHP XML losing attributes

i have a problem while I load xml string into SimpleXMLElement ( i tried also with DOCDocument but result is the same). In XML i have this :
<definedNames>
<definedName name="name1" Id="1" hidden="1">NAME_TEST</definedName>
<definedName name="name2" Id="4" hidden="1">NAME_TEST_2</definedName>
</definedNames>
Now i need access to specific tag using 'name' attribute. But always when i tried to print_r, var_dump or smth else i always see all other attributes, but when comes to i see only array with
[0] = > NAME_TEST,
[1] => NAME_TEST_2
I tried also xpath, but everytime when i refer to attributes inside i get empty array.
So for now i tried : xpath, SimpleXMLDom, DOCDocument but result is always the same - empty array. Any clue ?
#edit
$xl->LoadTemplate('#xl/workbook.xml');
if (isset($workbook) && is_array($workbook) && count($workbook > 0)) {
$dom = new DOMDocument();
$dom->loadXML($xl->Source);
$xpath = new DOMXpath($dom);
foreach ($xpath->evaluate('//definedName') as $definedName) {
echo $definedName->getAttribute('name');
}
} else {
$TBS->Source = preg_replace('~\<definedNames\>.*\<\/definedNames\>~', '', $TBS->Source);
}
#edit2 - xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<workbook xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" mc:Ignorable="x15" xmlns:x15="http://schemas.microsoft.com/office/spreadsheetml/2010/11/main">
<definedNames>
<definedName name="name1" Id="1" hidden="1">NAME_TEST</definedName>
<definedName name="name2" Id="4" hidden="1">NAME_TEST_2</definedName>
</definedNames>
</workbook>
i know there is smth like , but i already tried :
$xpath->evaluate('//workbook/definedNames/definedName[#*]')
or
$xpath->evaluate('/workbook/definedNames/definedName[#name="name1"]')
still result is empty.
With DOMDocument, you use Xpath to fetch values or nodes.
Load the XML into a document and create an DOMXpath instance for it:
$dom = new DOMDocument();
$dom->loadXml($xml);
$xpath = new DOMXpath($dom);
Fetch a node value as string:
var_dump($xpath->evaluate('string(//definedName[#name="name1"])'));
Output:
string(9) "NAME_TEST"
Fetch all definedName element data into an array:
$byName = [];
foreach ($xpath->evaluate('//definedName') as $definedName) {
$byName[] = [
'name' => $definedName->getAttribute('name'),
'id' => $definedName->getAttribute('Id'),
'hidden' => $definedName->getAttribute('hidden'),
'text' => $definedName->nodeValue
];
}
var_dump($byName);
Output:
array(2) {
[0]=>
array(4) {
["name"]=>
string(5) "name1"
["id"]=>
string(1) "1"
["hidden"]=>
string(1) "1"
["text"]=>
string(9) "NAME_TEST"
}
[1]=>
array(4) {
["name"]=>
string(5) "name2"
["id"]=>
string(1) "4"
["hidden"]=>
string(1) "1"
["text"]=>
string(11) "NAME_TEST_2"
}
}
You can use the attributes() method on each element of the SimpleXMLElement Object. You can then access them:
$xml = '<definedNames>
<definedName name="name1" Id="1" hidden="1">NAME_TEST</definedName>
<definedName name="name2" Id="4" hidden="1">NAME_TEST_2</definedName>
</definedNames>';
$x = new SimpleXMLElement($xml);
foreach ($x as $element) {
var_dump($element->attributes());
}
The above returns:
object(SimpleXMLElement)#4 (1) {
["#attributes"]=>
array(3) {
["name"]=>
string(5) "name1"
["Id"]=>
string(1) "1"
["hidden"]=>
string(1) "1"
}
}
object(SimpleXMLElement)#3 (1) {
["#attributes"]=>
array(3) {
["name"]=>
string(5) "name2"
["Id"]=>
string(1) "4"
["hidden"]=>
string(1) "1"
}
}
You can then access individual attributes by using in the loop:
foreach ($x as $element) {
$element->attributes()->hidden;
$element->attributes()->Id;
$element->attributes()->name;
}
try simplexml_load_string()
$p = '<definedNames>
<definedName name="name1" Id="1" hidden="1">NAME_TEST</definedName>
<definedName name="name2" Id="4" hidden="1">NAME_TEST_2</definedName>
</definedNames>';
$xml = simplexml_load_string($p);
echo $xml->definedName[0]->attributes()->name;
echo $xml->definedName[1]->attributes()->name;
or use foreach to get all attaributes

Fetch data from site using php and put in an array

<div>A/C:front<span style="color:red;margin:8px">/
</span>Anti-Lock Brakes<span style="color:red;margin:8px">/
</span>Passenger Airbag<span style="color:red;margin:8px">/
</span>Power Mirrors<span style="color:red;margin:8px">/
</span>Power Steering<span style="color:red;margin:8px">/
</span>Power Windows<span style="color:red;margin:8px">/
</span>Driver Airbag<span style="color:red;margin:8px">/
</span>No Accidents<span style="color:red;margin:8px">/
</span>Power Door Locks<span style="color:red;margin:8px">/</span>
</div>
Appears like this on website :
A/C:front/Anti-Lock Brakes/Passenger Airbag/Power Mirrors/Power Steering/Power Windows/Driver Airbag/No Accidents/Power Door Locks/
I used $content = file_get_contents('url'); and now i need to shift through the data.
I need to fetch each one of the options above and put them in an array or something like :
$option = ("A/C:front","Anti-Lock Brakes","Passenger Airbag",....);
Any idea how to do this using php ?
With the source code everything is easier:
<?php
$dom = new DOMDocument;
#$dom->loadHTMLFile('http://www.sayuri.co.jp/used-cars/B37659-Nissan-Tiida%20Latio-japanese-used-cars');
$xpath = new DOMXPath($dom);
$nodes = iterator_to_array($xpath->query('//h4/following-sibling::div')->item(0)->childNodes);
$items = array_map(function ($node) {
return $node->nodeValue;
}, array_filter($nodes, function ($node) {
return $node->nodeValue != '/';
}));
var_dump($items);
This gave me the following:
array(9) {
[0]=>
string(9) "A/C:front"
[2]=>
string(16) "Anti-Lock Brakes"
[4]=>
string(16) "Passenger Airbag"
[6]=>
string(13) "Power Mirrors"
[8]=>
string(14) "Power Steering"
[10]=>
string(13) "Power Windows"
[12]=>
string(13) "Driver Airbag"
[14]=>
string(12) "No Accidents"
[16]=>
string(16) "Power Door Locks"
}
You might want to use array_values() on $items to reset the indexes. That's all!
Sounds like you need DOMDocument. Specifically, the getElementsByTagName function. So using your example, I suggest this. Please adjust to suit your needs:
// Get the contents of the URL.
$content = file_get_contents('url');
// Parse the HTML using `DOMDocument`
$dom = new DOMDocument();
#$dom->loadHTML($content);
// Search the parsed DOM structure for `span` elements.
$option = array();
foreach($dom->getElementsByTagName('span') as $span){
$option[] = $span->nodeValue;
}
// Dumps the values in `option` for review.
echo '<pre>';
print_r($option);
echo '</pre>';

xpath not return values

I am able to pull the necessary information using xpath, when I use var_dump using the following code. When I try to add a foreach loop to return all ["href"] values i get a blank page any ideas where I am messing up?
$dom = new DOMDocument();
#$dom->loadHTML($source);
$xml = simplexml_import_dom($dom);
$rss = $xml->xpath("/html/body//a[#class='highzoom1']");
$links = $rss->href;
foreach ($links as $link){
echo $link;
}
Here is the array of information.
array(96) {
[0]=>
object(SimpleXMLElement)#3 (2) {
["#attributes"]=>
array(2) {
["href"]=>
string(49) "/p/18351/test1.html"
["class"]=>
string(10) "highzoom1"
}
[0]=>
string(36) ""test1"
}
[1]=>
object(SimpleXMLElement)#4 (2) {
["#attributes"]=>
array(2) {
["href"]=>
string(43) "/p/18351/test2.html"
["class"]=>
string(10) "highzoom1"
}
[0]=>
string(30) ""test2"
}
[2]=>
object(SimpleXMLElement)#5 (2) {
["#attributes"]=>
array(2) {
["href"]=>
string(48) "/p/18351/test3.html"
["class"]=>
string(10) "highzoom1"
}
[0]=>
string(35) ""test3"
}
Instead of:
$rss = $xml->xpath("/html/body//a[#class='highzoom1']");
use:
$hrefs = $xml->xpath("/html/body//a[#class='highzoom1']/#href");
The original XPath expression (the first above) you are using selects any a element in the XML document the value of whose class atribute is 'highzoom1' and that (the a element) is a descendent of a body that is a child of the top element (named html) in the XML document.
However, you want to select the href attributes of these a elements -- not the a elements themselves.
The second XPath expression above select exactly the href attributes of these a elements.
$links = $rss->href;
will never work, as $rss is a DOMNodeList object, and won't have an href attribute. Instead, you'd want to do this:
$rss = $xml->xpath("/html/body//a[#class='highzoom1']");
foreach($rss as $link) {
echo $link->href;
}
Or you can address $rss as an array directly:
echo $rss[5]->href; // echo out the href of the 6th link found.

how to convert object(SimpleXMLElement) to string

I am using xpath to parse text from a webpage but it returns it as an object how can i return this as a string.
libxml_use_internal_errors(TRUE);
$dom = new DOMDocument();
$dom->loadHTML($source);
$xml = simplexml_import_dom($dom);
libxml_use_internal_errors(FALSE);
$username = $xml->xpath("//span[#class='user']");
var_dump of the $username array:
object(SimpleXMLElement)#3 (2) { ["#attributes"]=> array(1) { ["class"]=> string(4) "user" } [0]=> string(11) "bubblebubble1210" }
list(, $node) = $username;
var_dump($node);
// object(SimpleXMLElement)#3 (1) { [0]=> string(11) "bubblebubble1210" }
$node will still be an object above, but you can cast it explicitly with (string) or use echo which will cast it implicitly.
CodePad.
You can use $username->asXML(); to get the full string of that particular SimpleXMLElement object.

Categories