Using PHP, I would like to remove all the links in an unordered list and put them in an array. So the output would be: array[0]='Benefits', array[1]='Cost Savings', etc.
<ul>
<li>Benefits</li>
<li>Cost Savings</li>
<li>Member listing</li>
</ul>
Using; preg_match_all('/<a href=\"(.*?)\"[.*]?>(.*?)<\/a>/i', $content, $matches);
I get:
array(3) { [0]=> array(3) { [0]=> string(24) "Benefits" [1]=> string(28) "Cost Savings" [2]=> string(30) "Member listing" } [1]=> array(3) { [0]=> string(1) "#" [1]=> string(1) "#" [2]=> string(1) "#" } [2]=> array(3) { [0]=> string(8) "Benefits" [1]=> string(12) "Cost Savings" [2]=> string(14) "Member listing" } }
But i need to put it into one array.
To fetch the links you can leverage domdocument and domxpath
$html = '<html><body><ul>
<li>Benefits</li>
<li>Cost Savings</li>
<li>Member listing</li>
</ul></body></html>';
$dom = new DOMDocument();
$dom->loadHTML( $html ); // loads the html into the class
$xpath = new DOMXPath( $dom );
$items = $xpath->query('*/ul/li/a'); // matches any elements in this order
$array = array();
foreach( $items as $item )
{
$array[] = $dom->saveHTML( $item ); // using the parent document, get just a single elements html
}
// Array
// (
// [0] => Benefits
// [1] => Cost Savings
// [2] => Member listing
// )
Related
I am struggling with reading XML file using PHP.
The XML I want to use is here:
http://www.gdacs.org/xml/rss.xml
Now, the data I am interested are the "item" nodes.
I created the following function, which gets the data:
$rawData = simplexml_load_string($response_xml_data);
foreach($rawData->channel->item as $value) {
$title = $value->title;
....
this works fine.
The nodes with the "gdcs:xxxx" were slightly more problematic, but I used the following code, which also works:
$subject = $value->children('dc', true)->subject;
Now the problem I have is with the "resources" node,
Basically the stripped down version of it would look like this:
<channel>
<item>
<gdacs:resources>
<gdacs:resource id="xx" version="0" source="xx" url="xx" type="xx">
<gdacs:title>xxx</gdacs:title>
</gdacs:resource>
<gdacs:resource id="xx" version="0" source="xx" url="xx" type="xx">
<gdacs:title>xxx</gdacs:title>
</gdacs:resource>
<gdacs:resource id="xx" version="0" source="xx" url="xx" type="xx">
<gdacs:title>xxx</gdacs:title>
</gdacs:resource>
</gdacs:resources>
</item>
</channel>
How in this case would I get the resources? I was able to get always just the first resource and only the title of it. What I would like to do is get all the resources items, which have "type" of a particular value and get their URL.
Running through XML the regular path, is , from my experience, slow and excruciating.
Have a look into XPath -> it's a way to extract data from XML through selectors ( similar to CSS selectors )
http://php.net/manual/en/simplexmlelement.xpath.php
You can select elements by their attributes similar to CSS
<?php
$xmlStr = file_get_contents('some_xml.xml');
$xml = new SimpleXMLElement($xmlStr);
$items = $xml->xpath("//channel/item");
$urls_by_item = array();
foreach($items as $x) {
$urls_by_item [] = $x->xpath("//gdacs:resources/gdacs:resource[#type='image']/#url");
}
Consider using the node occurrence of xpath with square brackets [] to align urls with corresponding titles. A more involved modification of #Daniel Batkilin's answer, you can incorporate both data pieces in an associative multidimensional array, requiring nested for loops.
$xml = simplexml_load_file('http://www.gdacs.org/xml/rss.xml');
$xml->registerXPathNamespace('gdacs', 'http://www.gdacs.org');
$items = $xml->xpath("//channel/item");
$i = 1;
$out = array();
foreach($items as $x) {
$titles = $xml->xpath("//channel/item[".$i."]/gdacs:resources/gdacs:resource[#type='image']/gdacs:title");
$urls = $xml->xpath("//channel/item[".$i."]/gdacs:resources/gdacs:resource[#type='image']/#url");
for($j=0; $j<count($urls); $j++) {
$out[$j.$i]['title'] = (string)$titles[$j];
$out[$j.$i]['url'] = (string)$urls[$j];
}
$i++;
}
$out = array_values($out);
var_dump($out);
ARRAY DUMP
array(40) {
[0]=>
array(2) {
["title"]=>
string(21) "Storm surge animation"
["url"]=>
string(92) "http://webcritech.jrc.ec.europa.eu/ModellingCyclone/cyclonesurgeVM/1000226/final/outres1.gif"
}
[1]=>
array(2) {
["title"]=>
string(26) "Storm surge maximum height"
["url"]=>
string(101) "http://webcritech.jrc.ec.europa.eu/ModellingCyclone/cyclonesurgeVM/1000226/final/P1_MAXHEIGHT_END.jpg"
}
[2]=>
array(2) {
["title"]=>
string(12) "Overview map"
["url"]=>
string(64) "http://dma.gdacs.org/saved/gdacs/tc/1000226/clouds_1000226_2.png"
}
[3]=>
array(2) {
["title"]=>
string(41) "Map of rainfall accummulation in past 24h"
["url"]=>
string(70) "http://dma.gdacs.org/saved/gdacs/tc/1000226/current_rain_1000226_2.png"
}
[4]=>
array(2) {
["title"]=>
string(23) "Map of extreme rainfall"
["url"]=>
string(62) "http://dma.gdacs.org/saved/gdacs/tc/1000226/rain_1000226_2.png"
}
[5]=>
array(2) {
["title"]=>
string(34) "Map of extreme rainfall (original)"
["url"]=>
string(97) "http://www.ssd.noaa.gov/PS/TROP/DATA/ETRAP/2015/NorthIndian/THREE/2015THREE.pmqpf.10100000.00.GIF"
}
...
i have a problem while I load xml string into SimpleXMLElement ( i tried also with DOCDocument but result is the same). In XML i have this :
<definedNames>
<definedName name="name1" Id="1" hidden="1">NAME_TEST</definedName>
<definedName name="name2" Id="4" hidden="1">NAME_TEST_2</definedName>
</definedNames>
Now i need access to specific tag using 'name' attribute. But always when i tried to print_r, var_dump or smth else i always see all other attributes, but when comes to i see only array with
[0] = > NAME_TEST,
[1] => NAME_TEST_2
I tried also xpath, but everytime when i refer to attributes inside i get empty array.
So for now i tried : xpath, SimpleXMLDom, DOCDocument but result is always the same - empty array. Any clue ?
#edit
$xl->LoadTemplate('#xl/workbook.xml');
if (isset($workbook) && is_array($workbook) && count($workbook > 0)) {
$dom = new DOMDocument();
$dom->loadXML($xl->Source);
$xpath = new DOMXpath($dom);
foreach ($xpath->evaluate('//definedName') as $definedName) {
echo $definedName->getAttribute('name');
}
} else {
$TBS->Source = preg_replace('~\<definedNames\>.*\<\/definedNames\>~', '', $TBS->Source);
}
#edit2 - xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<workbook xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" mc:Ignorable="x15" xmlns:x15="http://schemas.microsoft.com/office/spreadsheetml/2010/11/main">
<definedNames>
<definedName name="name1" Id="1" hidden="1">NAME_TEST</definedName>
<definedName name="name2" Id="4" hidden="1">NAME_TEST_2</definedName>
</definedNames>
</workbook>
i know there is smth like , but i already tried :
$xpath->evaluate('//workbook/definedNames/definedName[#*]')
or
$xpath->evaluate('/workbook/definedNames/definedName[#name="name1"]')
still result is empty.
With DOMDocument, you use Xpath to fetch values or nodes.
Load the XML into a document and create an DOMXpath instance for it:
$dom = new DOMDocument();
$dom->loadXml($xml);
$xpath = new DOMXpath($dom);
Fetch a node value as string:
var_dump($xpath->evaluate('string(//definedName[#name="name1"])'));
Output:
string(9) "NAME_TEST"
Fetch all definedName element data into an array:
$byName = [];
foreach ($xpath->evaluate('//definedName') as $definedName) {
$byName[] = [
'name' => $definedName->getAttribute('name'),
'id' => $definedName->getAttribute('Id'),
'hidden' => $definedName->getAttribute('hidden'),
'text' => $definedName->nodeValue
];
}
var_dump($byName);
Output:
array(2) {
[0]=>
array(4) {
["name"]=>
string(5) "name1"
["id"]=>
string(1) "1"
["hidden"]=>
string(1) "1"
["text"]=>
string(9) "NAME_TEST"
}
[1]=>
array(4) {
["name"]=>
string(5) "name2"
["id"]=>
string(1) "4"
["hidden"]=>
string(1) "1"
["text"]=>
string(11) "NAME_TEST_2"
}
}
You can use the attributes() method on each element of the SimpleXMLElement Object. You can then access them:
$xml = '<definedNames>
<definedName name="name1" Id="1" hidden="1">NAME_TEST</definedName>
<definedName name="name2" Id="4" hidden="1">NAME_TEST_2</definedName>
</definedNames>';
$x = new SimpleXMLElement($xml);
foreach ($x as $element) {
var_dump($element->attributes());
}
The above returns:
object(SimpleXMLElement)#4 (1) {
["#attributes"]=>
array(3) {
["name"]=>
string(5) "name1"
["Id"]=>
string(1) "1"
["hidden"]=>
string(1) "1"
}
}
object(SimpleXMLElement)#3 (1) {
["#attributes"]=>
array(3) {
["name"]=>
string(5) "name2"
["Id"]=>
string(1) "4"
["hidden"]=>
string(1) "1"
}
}
You can then access individual attributes by using in the loop:
foreach ($x as $element) {
$element->attributes()->hidden;
$element->attributes()->Id;
$element->attributes()->name;
}
try simplexml_load_string()
$p = '<definedNames>
<definedName name="name1" Id="1" hidden="1">NAME_TEST</definedName>
<definedName name="name2" Id="4" hidden="1">NAME_TEST_2</definedName>
</definedNames>';
$xml = simplexml_load_string($p);
echo $xml->definedName[0]->attributes()->name;
echo $xml->definedName[1]->attributes()->name;
or use foreach to get all attaributes
I am using the PHP Simple HTML DOM Parser to scrape some results from a page.
At the moment I am having a problem with the function as it is not returning the array "$result".
Any help would be greatly appreciated :)
The result of the array:
array(1) { [0]=> array(6) { ["itemid"]=> string(6) "123456" ["title"]=> string(21) "XXX Prod1" ["unit"]=> string(6) "500ml " ["price"]=> string(4) "2.59" } [1]=> array(6) { ["itemid"]=> string(6) "123457" ["title"]=> string(27) "XXX Prod2" ["unit"]=> string(6) "500ml " ["price"]=> string(5) "10.49" }
Code in question:
function parseItems($html) {
foreach($html->find('div.product-stamp-inner') as $content) { //Finds each individual product on page and extracts its details and stores it into its own array
$detail['itemid'] = filter_var($content->find('a.product-title-link', 0)->href, FILTER_SANITIZE_NUMBER_FLOAT);
$detail['title'] = $content->find('span.title', 0)->plaintext;
$detail['unit'] = $content->find('span.unit-size', 0)->plaintext;
$detail['price'] = filter_var($content->find('span.price', 0)->plaintext, FILTER_SANITIZE_NUMBER_FLOAT, FILTER_FLAG_ALLOW_FRACTION | FILTER_FLAG_ALLOW_THOUSAND);
$result[] = $detail; //Puts all individual product arrays into one large array
}
//var_dump($result); --Testing purposes
return $result;
}
I guess what you have a piece of code like so
parseItems($html);
When it should be the following because it is returning a variable and needs a variable to hold its returning result
$retval = parseItems($html);
I am able to pull the necessary information using xpath, when I use var_dump using the following code. When I try to add a foreach loop to return all ["href"] values i get a blank page any ideas where I am messing up?
$dom = new DOMDocument();
#$dom->loadHTML($source);
$xml = simplexml_import_dom($dom);
$rss = $xml->xpath("/html/body//a[#class='highzoom1']");
$links = $rss->href;
foreach ($links as $link){
echo $link;
}
Here is the array of information.
array(96) {
[0]=>
object(SimpleXMLElement)#3 (2) {
["#attributes"]=>
array(2) {
["href"]=>
string(49) "/p/18351/test1.html"
["class"]=>
string(10) "highzoom1"
}
[0]=>
string(36) ""test1"
}
[1]=>
object(SimpleXMLElement)#4 (2) {
["#attributes"]=>
array(2) {
["href"]=>
string(43) "/p/18351/test2.html"
["class"]=>
string(10) "highzoom1"
}
[0]=>
string(30) ""test2"
}
[2]=>
object(SimpleXMLElement)#5 (2) {
["#attributes"]=>
array(2) {
["href"]=>
string(48) "/p/18351/test3.html"
["class"]=>
string(10) "highzoom1"
}
[0]=>
string(35) ""test3"
}
Instead of:
$rss = $xml->xpath("/html/body//a[#class='highzoom1']");
use:
$hrefs = $xml->xpath("/html/body//a[#class='highzoom1']/#href");
The original XPath expression (the first above) you are using selects any a element in the XML document the value of whose class atribute is 'highzoom1' and that (the a element) is a descendent of a body that is a child of the top element (named html) in the XML document.
However, you want to select the href attributes of these a elements -- not the a elements themselves.
The second XPath expression above select exactly the href attributes of these a elements.
$links = $rss->href;
will never work, as $rss is a DOMNodeList object, and won't have an href attribute. Instead, you'd want to do this:
$rss = $xml->xpath("/html/body//a[#class='highzoom1']");
foreach($rss as $link) {
echo $link->href;
}
Or you can address $rss as an array directly:
echo $rss[5]->href; // echo out the href of the 6th link found.
I have this SimpleXML object:
object(SimpleXMLElement)#176 (1) {
["record"]=>
array(2) {
[0]=>
object(SimpleXMLElement)#39 (2) {
["f"]=>
array(2) {
[0]=>
string(13) "stuff"
[1]=>
string(1) "1"
}
}
[1]=>
object(SimpleXMLElement)#37 (2) {
["f"]=>
array(2) {
[0]=>
string(13) "more stuff"
[1]=>
string(3) "90"
}
}
}
Why does is_array($object->record) return false? It clearly says it's an array. Why can't I detect it using is_array?
Also, I am unable to cast it as an array using (array) $object->record. I get this error:
Warning: It is not yet possible to
assign complex types to properties
SimpleXML nodes are objects that can contain other SimpleXML nodes. Use iterator_to_array().
It's not an array. The var_dump output is misleading. Consider:
<?php
$string = <<<XML
<?xml version='1.0'?>
<foo>
<bar>a</bar>
<bar>b</bar>
</foo>
XML;
$xml = simplexml_load_string($string);
var_dump($xml);
var_dump($xml->bar);
?>
Output:
object(SimpleXMLElement)#1 (1) {
["bar"]=>
array(2) {
[0]=>
string(1) "a"
[1]=>
string(1) "b"
}
}
object(SimpleXMLElement)#2 (1) {
[0]=>
string(1) "a"
}
As you can see by the second var_dump, it is actually a SimpleXMLElement.
I solved the problem using count() function:
if( count( $xml ) > 1 ) {
// $xml is an array...
}