I am struggling with reading XML file using PHP.
The XML I want to use is here:
http://www.gdacs.org/xml/rss.xml
Now, the data I am interested are the "item" nodes.
I created the following function, which gets the data:
$rawData = simplexml_load_string($response_xml_data);
foreach($rawData->channel->item as $value) {
$title = $value->title;
....
this works fine.
The nodes with the "gdcs:xxxx" were slightly more problematic, but I used the following code, which also works:
$subject = $value->children('dc', true)->subject;
Now the problem I have is with the "resources" node,
Basically the stripped down version of it would look like this:
<channel>
<item>
<gdacs:resources>
<gdacs:resource id="xx" version="0" source="xx" url="xx" type="xx">
<gdacs:title>xxx</gdacs:title>
</gdacs:resource>
<gdacs:resource id="xx" version="0" source="xx" url="xx" type="xx">
<gdacs:title>xxx</gdacs:title>
</gdacs:resource>
<gdacs:resource id="xx" version="0" source="xx" url="xx" type="xx">
<gdacs:title>xxx</gdacs:title>
</gdacs:resource>
</gdacs:resources>
</item>
</channel>
How in this case would I get the resources? I was able to get always just the first resource and only the title of it. What I would like to do is get all the resources items, which have "type" of a particular value and get their URL.
Running through XML the regular path, is , from my experience, slow and excruciating.
Have a look into XPath -> it's a way to extract data from XML through selectors ( similar to CSS selectors )
http://php.net/manual/en/simplexmlelement.xpath.php
You can select elements by their attributes similar to CSS
<?php
$xmlStr = file_get_contents('some_xml.xml');
$xml = new SimpleXMLElement($xmlStr);
$items = $xml->xpath("//channel/item");
$urls_by_item = array();
foreach($items as $x) {
$urls_by_item [] = $x->xpath("//gdacs:resources/gdacs:resource[#type='image']/#url");
}
Consider using the node occurrence of xpath with square brackets [] to align urls with corresponding titles. A more involved modification of #Daniel Batkilin's answer, you can incorporate both data pieces in an associative multidimensional array, requiring nested for loops.
$xml = simplexml_load_file('http://www.gdacs.org/xml/rss.xml');
$xml->registerXPathNamespace('gdacs', 'http://www.gdacs.org');
$items = $xml->xpath("//channel/item");
$i = 1;
$out = array();
foreach($items as $x) {
$titles = $xml->xpath("//channel/item[".$i."]/gdacs:resources/gdacs:resource[#type='image']/gdacs:title");
$urls = $xml->xpath("//channel/item[".$i."]/gdacs:resources/gdacs:resource[#type='image']/#url");
for($j=0; $j<count($urls); $j++) {
$out[$j.$i]['title'] = (string)$titles[$j];
$out[$j.$i]['url'] = (string)$urls[$j];
}
$i++;
}
$out = array_values($out);
var_dump($out);
ARRAY DUMP
array(40) {
[0]=>
array(2) {
["title"]=>
string(21) "Storm surge animation"
["url"]=>
string(92) "http://webcritech.jrc.ec.europa.eu/ModellingCyclone/cyclonesurgeVM/1000226/final/outres1.gif"
}
[1]=>
array(2) {
["title"]=>
string(26) "Storm surge maximum height"
["url"]=>
string(101) "http://webcritech.jrc.ec.europa.eu/ModellingCyclone/cyclonesurgeVM/1000226/final/P1_MAXHEIGHT_END.jpg"
}
[2]=>
array(2) {
["title"]=>
string(12) "Overview map"
["url"]=>
string(64) "http://dma.gdacs.org/saved/gdacs/tc/1000226/clouds_1000226_2.png"
}
[3]=>
array(2) {
["title"]=>
string(41) "Map of rainfall accummulation in past 24h"
["url"]=>
string(70) "http://dma.gdacs.org/saved/gdacs/tc/1000226/current_rain_1000226_2.png"
}
[4]=>
array(2) {
["title"]=>
string(23) "Map of extreme rainfall"
["url"]=>
string(62) "http://dma.gdacs.org/saved/gdacs/tc/1000226/rain_1000226_2.png"
}
[5]=>
array(2) {
["title"]=>
string(34) "Map of extreme rainfall (original)"
["url"]=>
string(97) "http://www.ssd.noaa.gov/PS/TROP/DATA/ETRAP/2015/NorthIndian/THREE/2015THREE.pmqpf.10100000.00.GIF"
}
...
Related
I have the following html string snippett from a wikipedia page...
<table class="wikitable">
<tbody>
<tr>
<td>mod_access</td>
<td>Versions older than 2.1</td>
<td>Included by Default</td>
</tr>
<tr>
<td>mod_actions</td>
<td>Versions 1.1 and later</td>
<td>Included by Default</td>
</tr>
<tr>
<td>mod_alias</td>
<td>Versions 1.1 and later</td>
<td>Included by Default</td>
</tr>
</tr>
</tbody>
I have the following php code....
ini_set('display_errors','On');
$url="https://en.wikipedia.org/wiki/List_of_Apache_modules";
$dom=new DomDocument();
$dom->preserveWhiteSpace=false;
$dom->loadHtmlFile($url);
$xpath=new DomXpath($dom);
$elements=$xpath->query('//*[#id="mw-content-text"]/div/table/tbody/tr/td');
foreach($elements as $i=>$row){
$tds=$xpath->query('td',$row);
foreach($tds as $td){
echo "Td($i):", $td->nodeValue,"\n";
}
}
What i'd like in return is a numerical array with each index a table row filled with the td values.
Not quite sure what to do next.
If you remove both tbody and td from your first xpath query, it will find all of the tr elements:
$elements = $xpath->query('//*[#id="mw-content-text"]/div/table/tr');
Then you can loop through each row, use your existing code to find td elements, and add them to an array:
$data = array();
foreach ($elements as $y => $row) {
$tds = $xpath->query('td', $row);
foreach($tds as $x => $td) {
$data[$y][$x] = $td->nodeValue;
}
}
var_dump($data);
Tested with php 5.6, gives this output:
array(157) {
[1]=>
array(6) {
[0]=>
string(10) "mod_access"
[1]=>
string(23) "Versions older than 2.1"
[2]=>
string(19) "Included by Default"
[3]=>
string(26) "Apache Software Foundation"
[4]=>
string(27) "Apache License, Version 2.0"
[5]=>
string(71) "Provides access control based on the client and the client's request[2]"
}
[2]=>
array(6) {
[0]=>
string(11) "mod_actions"
[1]=>
string(22) "Versions 1.1 and later"
[2]=>
string(19) "Included by Default"
[3]=>
string(26) "Apache Software Foundation"
[4]=>
string(27) "Apache License, Version 2.0"
[5]=>
string(62) "Provides CGI ability based on request method and media type[3]"
}
// etc ...
I am using the PHP Simple HTML DOM Parser to scrape some results from a page.
At the moment I am having a problem with the function as it is not returning the array "$result".
Any help would be greatly appreciated :)
The result of the array:
array(1) { [0]=> array(6) { ["itemid"]=> string(6) "123456" ["title"]=> string(21) "XXX Prod1" ["unit"]=> string(6) "500ml " ["price"]=> string(4) "2.59" } [1]=> array(6) { ["itemid"]=> string(6) "123457" ["title"]=> string(27) "XXX Prod2" ["unit"]=> string(6) "500ml " ["price"]=> string(5) "10.49" }
Code in question:
function parseItems($html) {
foreach($html->find('div.product-stamp-inner') as $content) { //Finds each individual product on page and extracts its details and stores it into its own array
$detail['itemid'] = filter_var($content->find('a.product-title-link', 0)->href, FILTER_SANITIZE_NUMBER_FLOAT);
$detail['title'] = $content->find('span.title', 0)->plaintext;
$detail['unit'] = $content->find('span.unit-size', 0)->plaintext;
$detail['price'] = filter_var($content->find('span.price', 0)->plaintext, FILTER_SANITIZE_NUMBER_FLOAT, FILTER_FLAG_ALLOW_FRACTION | FILTER_FLAG_ALLOW_THOUSAND);
$result[] = $detail; //Puts all individual product arrays into one large array
}
//var_dump($result); --Testing purposes
return $result;
}
I guess what you have a piece of code like so
parseItems($html);
When it should be the following because it is returning a variable and needs a variable to hold its returning result
$retval = parseItems($html);
<div>A/C:front<span style="color:red;margin:8px">/
</span>Anti-Lock Brakes<span style="color:red;margin:8px">/
</span>Passenger Airbag<span style="color:red;margin:8px">/
</span>Power Mirrors<span style="color:red;margin:8px">/
</span>Power Steering<span style="color:red;margin:8px">/
</span>Power Windows<span style="color:red;margin:8px">/
</span>Driver Airbag<span style="color:red;margin:8px">/
</span>No Accidents<span style="color:red;margin:8px">/
</span>Power Door Locks<span style="color:red;margin:8px">/</span>
</div>
Appears like this on website :
A/C:front/Anti-Lock Brakes/Passenger Airbag/Power Mirrors/Power Steering/Power Windows/Driver Airbag/No Accidents/Power Door Locks/
I used $content = file_get_contents('url'); and now i need to shift through the data.
I need to fetch each one of the options above and put them in an array or something like :
$option = ("A/C:front","Anti-Lock Brakes","Passenger Airbag",....);
Any idea how to do this using php ?
With the source code everything is easier:
<?php
$dom = new DOMDocument;
#$dom->loadHTMLFile('http://www.sayuri.co.jp/used-cars/B37659-Nissan-Tiida%20Latio-japanese-used-cars');
$xpath = new DOMXPath($dom);
$nodes = iterator_to_array($xpath->query('//h4/following-sibling::div')->item(0)->childNodes);
$items = array_map(function ($node) {
return $node->nodeValue;
}, array_filter($nodes, function ($node) {
return $node->nodeValue != '/';
}));
var_dump($items);
This gave me the following:
array(9) {
[0]=>
string(9) "A/C:front"
[2]=>
string(16) "Anti-Lock Brakes"
[4]=>
string(16) "Passenger Airbag"
[6]=>
string(13) "Power Mirrors"
[8]=>
string(14) "Power Steering"
[10]=>
string(13) "Power Windows"
[12]=>
string(13) "Driver Airbag"
[14]=>
string(12) "No Accidents"
[16]=>
string(16) "Power Door Locks"
}
You might want to use array_values() on $items to reset the indexes. That's all!
Sounds like you need DOMDocument. Specifically, the getElementsByTagName function. So using your example, I suggest this. Please adjust to suit your needs:
// Get the contents of the URL.
$content = file_get_contents('url');
// Parse the HTML using `DOMDocument`
$dom = new DOMDocument();
#$dom->loadHTML($content);
// Search the parsed DOM structure for `span` elements.
$option = array();
foreach($dom->getElementsByTagName('span') as $span){
$option[] = $span->nodeValue;
}
// Dumps the values in `option` for review.
echo '<pre>';
print_r($option);
echo '</pre>';
This is how the array comes out
array(3) { [0]=> string(3) "174" [1]=> object(SimpleXMLElement)#5 (1) { [0]=> string(2) "41" } [2]=> object(SimpleXMLElement)#4 (1) { [0]=> string(2) "21" } }
I'm using this code here that generates the array.
while($row = mysql_fetch_assoc($results)){
$values[] = $row['id'];
$dom = simplexml_load_file('../data/'.$row['id'].'.xml');
foreach($dom->children() as $child)
{
$values[] = $child->views;
}
}
var_dump($values);
The xml file looks like this
<?xml version="1.0"?>
<website site_id="174" user_id="26">
<view day="23" month="10" year="11">
<views>31</views>
</view>
<view day="23" month="12" year="11">
<views>21</views>
</view>
</website>
I need to get the value of the Views into an array, but I keep getting these annoying
object(SimpleXMLElement)#5 things in the array. Also this string(3) . How do I get rid of those.
Thank you
Try to change
$values[] = $child->views;
with
$values[] = (string)$child->views;
How do I get rid of those
If you don't need to see the type of the variable - just don't use var_dump(), but print_r() instead
To explain (string): This is called 'typecasting'. Also works with other types such as (int), (bool), etc.
I am able to pull the necessary information using xpath, when I use var_dump using the following code. When I try to add a foreach loop to return all ["href"] values i get a blank page any ideas where I am messing up?
$dom = new DOMDocument();
#$dom->loadHTML($source);
$xml = simplexml_import_dom($dom);
$rss = $xml->xpath("/html/body//a[#class='highzoom1']");
$links = $rss->href;
foreach ($links as $link){
echo $link;
}
Here is the array of information.
array(96) {
[0]=>
object(SimpleXMLElement)#3 (2) {
["#attributes"]=>
array(2) {
["href"]=>
string(49) "/p/18351/test1.html"
["class"]=>
string(10) "highzoom1"
}
[0]=>
string(36) ""test1"
}
[1]=>
object(SimpleXMLElement)#4 (2) {
["#attributes"]=>
array(2) {
["href"]=>
string(43) "/p/18351/test2.html"
["class"]=>
string(10) "highzoom1"
}
[0]=>
string(30) ""test2"
}
[2]=>
object(SimpleXMLElement)#5 (2) {
["#attributes"]=>
array(2) {
["href"]=>
string(48) "/p/18351/test3.html"
["class"]=>
string(10) "highzoom1"
}
[0]=>
string(35) ""test3"
}
Instead of:
$rss = $xml->xpath("/html/body//a[#class='highzoom1']");
use:
$hrefs = $xml->xpath("/html/body//a[#class='highzoom1']/#href");
The original XPath expression (the first above) you are using selects any a element in the XML document the value of whose class atribute is 'highzoom1' and that (the a element) is a descendent of a body that is a child of the top element (named html) in the XML document.
However, you want to select the href attributes of these a elements -- not the a elements themselves.
The second XPath expression above select exactly the href attributes of these a elements.
$links = $rss->href;
will never work, as $rss is a DOMNodeList object, and won't have an href attribute. Instead, you'd want to do this:
$rss = $xml->xpath("/html/body//a[#class='highzoom1']");
foreach($rss as $link) {
echo $link->href;
}
Or you can address $rss as an array directly:
echo $rss[5]->href; // echo out the href of the 6th link found.