Trouble extracting td values from php xpath parsed table html string - php

I have the following html string snippett from a wikipedia page...
<table class="wikitable">
<tbody>
<tr>
<td>mod_access</td>
<td>Versions older than 2.1</td>
<td>Included by Default</td>
</tr>
<tr>
<td>mod_actions</td>
<td>Versions 1.1 and later</td>
<td>Included by Default</td>
</tr>
<tr>
<td>mod_alias</td>
<td>Versions 1.1 and later</td>
<td>Included by Default</td>
</tr>
</tr>
</tbody>
I have the following php code....
ini_set('display_errors','On');
$url="https://en.wikipedia.org/wiki/List_of_Apache_modules";
$dom=new DomDocument();
$dom->preserveWhiteSpace=false;
$dom->loadHtmlFile($url);
$xpath=new DomXpath($dom);
$elements=$xpath->query('//*[#id="mw-content-text"]/div/table/tbody/tr/td');
foreach($elements as $i=>$row){
$tds=$xpath->query('td',$row);
foreach($tds as $td){
echo "Td($i):", $td->nodeValue,"\n";
}
}
What i'd like in return is a numerical array with each index a table row filled with the td values.
Not quite sure what to do next.

If you remove both tbody and td from your first xpath query, it will find all of the tr elements:
$elements = $xpath->query('//*[#id="mw-content-text"]/div/table/tr');
Then you can loop through each row, use your existing code to find td elements, and add them to an array:
$data = array();
foreach ($elements as $y => $row) {
$tds = $xpath->query('td', $row);
foreach($tds as $x => $td) {
$data[$y][$x] = $td->nodeValue;
}
}
var_dump($data);
Tested with php 5.6, gives this output:
array(157) {
[1]=>
array(6) {
[0]=>
string(10) "mod_access"
[1]=>
string(23) "Versions older than 2.1"
[2]=>
string(19) "Included by Default"
[3]=>
string(26) "Apache Software Foundation"
[4]=>
string(27) "Apache License, Version 2.0"
[5]=>
string(71) "Provides access control based on the client and the client's request[2]"
}
[2]=>
array(6) {
[0]=>
string(11) "mod_actions"
[1]=>
string(22) "Versions 1.1 and later"
[2]=>
string(19) "Included by Default"
[3]=>
string(26) "Apache Software Foundation"
[4]=>
string(27) "Apache License, Version 2.0"
[5]=>
string(62) "Provides CGI ability based on request method and media type[3]"
}
// etc ...

Related

PHP Multudimensional Array foreach

Using PHP and MySQL, I have generated an array called $response.
A var_dump of $response can be seen here.
array(2) {
["OperationRequest"]=>
array(4) {
["HTTPHeaders"]=>
array(1) {
[0]=>
array(1) {
["#attributes"]=>
array(2) {
["Name"]=>
string(9) "UserAgent"
["Value"]=>
string(14) "ApaiIO [2.1.0]"
}
}
}
["RequestId"]=>
string(36) "f53f381e-efb3-4fef-8e39-4f732b4b463e"
["Arguments"]=>
array(1) {
["Argument"]=>
array(11) {
[0]=>
array(1) {
["#attributes"]=>
array(2) {
["Name"]=>
string(14) "AWSAccessKeyId"
["Value"]=>
string(20) "KEY"
}
}
[1]=>
array(1) {
["#attributes"]=>
array(2) {
["Name"]=>
string(12) "AssociateTag"
["Value"]=>
string(11) "TAG"
}
}
[2]=>
array(1) {
["#attributes"]=>
array(2) {
["Name"]=>
string(6) "IdType"
["Value"]=>
string(4) "ISBN"
}
}
[3]=>
array(1) {
["#attributes"]=>
array(2) {
["Name"]=>
string(6) "ItemId"
["Value"]=>
string(38) "0751538310,9780141382067,9781305341141"
}
}
[4]=>
array(1) {
["#attributes"]=>
array(2) {
["Name"]=>
string(9) "Operation"
["Value"]=>
string(10) "ItemLookup"
}
}.......so on
A json_encode of the array can be seen here (as requested in a comment).
I'd like to select the Title from these two items. From what I can see this is located at;
Items > Item > ItemAttributes > Author
So, using a foreach loop I have tried the following;
foreach ($response as $item) {
echo $item['Items']['Item']['ItemAttributes']['Title']; // line 2
}
However this returns the following error;
Message: Undefined index: Items. Line Number: 2
Where am I going wrong and what must I change in my code in order to achieve the desired result?
Also, any advice on how to 'read' multidimensional arrays would be greatly appreciated.
Thanks
Try this one, it will help you out. You were are iterating on the wrong key that's why you were not getting desired output.
Try this code snippet herefrom json provide by OP in question
foreach($array["Items"]["Item"] as $key => $value)
{
print_r($value["ItemAttributes"]["Title"]);
echo PHP_EOL;
}
Output:
Panic
Panic
Captain Flinn and the Pirate Dinosaurs: Missing Treasure! (Captain Flinn)
For getting unique titles:
foreach(json_decode($json,true)["Items"]["Item"] as $key => $value)
{
$result[]=$value["ItemAttributes"]["Title"];
echo PHP_EOL;
}
print_r(array_unique($result));
#Also, any advice on how to 'read' multidimensional arrays would be greatly appreciated.
Post your encoded json string to
http://json.parser.online.fr
"+" and "-" button at the right panel should help you read it easily.
//Check Items is not empty
if( !isset($response["Items"]["Item"]) || empty($response["Items"]["Item"]) )
throw New Exception("Empty Item");
foreach($response["Items"]["Item"] as $item){
$title = $item['ItemAttributes']['Title']
}
You should debug as:
foreach ($response as $key => $item) {
if(isset($item['Items'])){ //Check Items index defined
echo $item['Items']['Item']['ItemAttributes']['Title'];
}else{
var_dump($key);
}
}

PHP | json_encode removes html tags

I have a big multidimensional array to which there are multiple string values that contains strings with html tags and attributes.
but when i use json_encode the output i get has some of the tags being removed.
Here is the example i have tried on a separate file to be sure if this actually is the problem with json encode and turns out i was right.
<?php
$var = array(
"type" => "<p style=\"text-align: center;\">sfds</p>"
);
$encoded = json_encode($var);
echo '<pre>';
print_r($encoded);
How to handle this kind of situation??
this is the result i got from example.
{"type":"
sfds<\/p>"}
and this is the result i am getting from my multidimensional array.
{"data":[{"type":"columns","data":{"columns":[{"width":6,"blocks":[]},{"width":6,"blocks":[]}],"preset":"columns-6-6"}},{"type":"columns","data":{"columns":[{"width":6,"blocks":[{"type":"heading","data":{"text":"
I am the Heading<\/p>","mce_0":"
I am the Heading<\/p>"}},{"type":"heading","data":{"text":"
sfds<\/p>","mce_1":"
sfds<\/p>"}}]},{"width":6,"blocks":[{"type":"text","data":{"text":"
\n
Im Text<\/div>\n<\/div>","mce_2":"
\n
Im Text<\/div>\n<\/div>"}}]}],"preset":"columns-6-6"}},{"type":"text","data":{"text":"
\n
Im just a text<\/div>\n<\/div>","mce_3":"
\n
Im just a text<\/div>\n<\/div>"}}]}
=-=-=-=-=-=-
Update: Generated HTML source Code of the var_dump of array variable that i am trying to encode.
<pre> after set2_decode:<br>array(1) {
["data"]=>
array(3) {
[0]=>
array(2) {
["type"]=>
string(7) "columns"
["data"]=>
array(2) {
["columns"]=>
array(2) {
[0]=>
array(2) {
["width"]=>
int(6)
["blocks"]=>
array(0) {
}
}
[1]=>
array(2) {
["width"]=>
int(6)
["blocks"]=>
array(0) {
}
}
}
["preset"]=>
string(11) "columns-6-6"
}
}
[1]=>
array(2) {
["type"]=>
string(7) "columns"
["data"]=>
array(2) {
["columns"]=>
array(2) {
[0]=>
array(2) {
["width"]=>
int(6)
["blocks"]=>
array(2) {
[0]=>
array(2) {
["type"]=>
string(7) "heading"
["data"]=>
array(2) {
["text"]=>
string(23) "<p>I am the Heading</p>"
["mce_0"]=>
string(23) "<p>I am the Heading</p>"
}
}
[1]=>
array(2) {
["type"]=>
string(7) "heading"
["data"]=>
array(2) {
["text"]=>
string(39) "<p style="text-align: center;">sfds</p>"
["mce_1"]=>
string(39) "<p style="text-align: center;">sfds</p>"
}
}
}
}
[1]=>
array(2) {
["width"]=>
int(6)
["blocks"]=>
array(1) {
[0]=>
array(2) {
["type"]=>
string(4) "text"
["data"]=>
array(2) {
["text"]=>
string(59) "<div>
<div style="text-align: center;">Im Text</div>
</div>"
["mce_2"]=>
string(59) "<div>
<div style="text-align: center;">Im Text</div>
</div>"
}
}
}
}
}
["preset"]=>
string(11) "columns-6-6"
}
}
[2]=>
array(2) {
["type"]=>
string(4) "text"
["data"]=>
array(2) {
["text"]=>
string(65) "<div>
<div style="text-align: right;">Im just a text</div>
</div>"
["mce_3"]=>
string(65) "<div>
<div style="text-align: right;">Im just a text</div>
</div>"
}
}
}
}
</pre>
Your browser hides all tags.
use htmlentities() to see all tags
print_r(htmlentities($encoded));
I think htmlentities will help. Please use print_r and then escape your html.
echo htmlentities (print_r (json_encode($var), true));
I hope this helps!
it is not best practice to put HTML in JSON response, the best practice is to accually build the html in frontend and just call the data in the html already built. Howeever note that you cannot have pure html in json response. it is usually escaped. So you have to make sure your html is properly built.
This is a good article to help you with that.
https://www.thorntech.com/2012/07/4-things-you-must-do-when-putting-html-in-json/

Getting data from XML

I am struggling with reading XML file using PHP.
The XML I want to use is here:
http://www.gdacs.org/xml/rss.xml
Now, the data I am interested are the "item" nodes.
I created the following function, which gets the data:
$rawData = simplexml_load_string($response_xml_data);
foreach($rawData->channel->item as $value) {
$title = $value->title;
....
this works fine.
The nodes with the "gdcs:xxxx" were slightly more problematic, but I used the following code, which also works:
$subject = $value->children('dc', true)->subject;
Now the problem I have is with the "resources" node,
Basically the stripped down version of it would look like this:
<channel>
<item>
<gdacs:resources>
<gdacs:resource id="xx" version="0" source="xx" url="xx" type="xx">
<gdacs:title>xxx</gdacs:title>
</gdacs:resource>
<gdacs:resource id="xx" version="0" source="xx" url="xx" type="xx">
<gdacs:title>xxx</gdacs:title>
</gdacs:resource>
<gdacs:resource id="xx" version="0" source="xx" url="xx" type="xx">
<gdacs:title>xxx</gdacs:title>
</gdacs:resource>
</gdacs:resources>
</item>
</channel>
How in this case would I get the resources? I was able to get always just the first resource and only the title of it. What I would like to do is get all the resources items, which have "type" of a particular value and get their URL.
Running through XML the regular path, is , from my experience, slow and excruciating.
Have a look into XPath -> it's a way to extract data from XML through selectors ( similar to CSS selectors )
http://php.net/manual/en/simplexmlelement.xpath.php
You can select elements by their attributes similar to CSS
<?php
$xmlStr = file_get_contents('some_xml.xml');
$xml = new SimpleXMLElement($xmlStr);
$items = $xml->xpath("//channel/item");
$urls_by_item = array();
foreach($items as $x) {
$urls_by_item [] = $x->xpath("//gdacs:resources/gdacs:resource[#type='image']/#url");
}
Consider using the node occurrence of xpath with square brackets [] to align urls with corresponding titles. A more involved modification of #Daniel Batkilin's answer, you can incorporate both data pieces in an associative multidimensional array, requiring nested for loops.
$xml = simplexml_load_file('http://www.gdacs.org/xml/rss.xml');
$xml->registerXPathNamespace('gdacs', 'http://www.gdacs.org');
$items = $xml->xpath("//channel/item");
$i = 1;
$out = array();
foreach($items as $x) {
$titles = $xml->xpath("//channel/item[".$i."]/gdacs:resources/gdacs:resource[#type='image']/gdacs:title");
$urls = $xml->xpath("//channel/item[".$i."]/gdacs:resources/gdacs:resource[#type='image']/#url");
for($j=0; $j<count($urls); $j++) {
$out[$j.$i]['title'] = (string)$titles[$j];
$out[$j.$i]['url'] = (string)$urls[$j];
}
$i++;
}
$out = array_values($out);
var_dump($out);
ARRAY DUMP
array(40) {
[0]=>
array(2) {
["title"]=>
string(21) "Storm surge animation"
["url"]=>
string(92) "http://webcritech.jrc.ec.europa.eu/ModellingCyclone/cyclonesurgeVM/1000226/final/outres1.gif"
}
[1]=>
array(2) {
["title"]=>
string(26) "Storm surge maximum height"
["url"]=>
string(101) "http://webcritech.jrc.ec.europa.eu/ModellingCyclone/cyclonesurgeVM/1000226/final/P1_MAXHEIGHT_END.jpg"
}
[2]=>
array(2) {
["title"]=>
string(12) "Overview map"
["url"]=>
string(64) "http://dma.gdacs.org/saved/gdacs/tc/1000226/clouds_1000226_2.png"
}
[3]=>
array(2) {
["title"]=>
string(41) "Map of rainfall accummulation in past 24h"
["url"]=>
string(70) "http://dma.gdacs.org/saved/gdacs/tc/1000226/current_rain_1000226_2.png"
}
[4]=>
array(2) {
["title"]=>
string(23) "Map of extreme rainfall"
["url"]=>
string(62) "http://dma.gdacs.org/saved/gdacs/tc/1000226/rain_1000226_2.png"
}
[5]=>
array(2) {
["title"]=>
string(34) "Map of extreme rainfall (original)"
["url"]=>
string(97) "http://www.ssd.noaa.gov/PS/TROP/DATA/ETRAP/2015/NorthIndian/THREE/2015THREE.pmqpf.10100000.00.GIF"
}
...

create ul and li using a multidimensional array in php

I have the following array:
$tree_array
When I do a var_dump, I get:
array(6) {
[0]=> string(23) "$100,000 Cash Flow 2013"
[1]=> array(6) {
[0]=> string(1) "2" ["Goal_ID"]=> string(1) "2"
[1]=> string(13) "Sell Iron Oak" ["Opportunity"]=> string(13) "Sell Iron Oak"
[2]=> string(2) "10" ["OID"]=> string(2) "10"
}
[2]=> array(2) {
[0]=> string(32) "ask her if she would like to buy" ["Activity"]=> string(32) "ask her if she would like to buy"
}
[3]=> array(6) {
[0]=> string(1) "2" ["Goal_ID"]=> string(1) "2"
[1]=> string(8) "Sell Car" ["Opportunity"]=> string(8) "Sell Car"
[2]=> string(2) "11" ["OID"]=> string(2) "11"
}
[4]=> array(2) {
[0]=> string(52) "Call Roy back to see if he would like to purchase it" ["Activity"]=> string(52) "Call Roy back to see if he would like to purchase it"
}
[5]=> array(1) {
["tot_opp"]=> NULL
}
}
My end goal is to create unordered lists and lists (ul, li) with this data. There will be more data added to the array as the database gets updated, so it will keep growing. My goal is to loop through the array and have it create the following code and be able to keep creating lists as the data grows. I am new to php and not sure how to accomplish this.
<ul>
<li>$100,000 Cash Flow 2013</li>
<ul>
<li>Sell Iron Oak</li>
<ul>
<li>ask her if she would like to buy</li>
</ul>
<ul>
<li>Sell Car</li>
</ul>etc...
Any help will be greatly appreciated! Thank you in advance!
You need a recursive function for that, not a loop. This way it will handle any depth of your source array.
function make_list($arr)
{
$return = '<ul>';
foreach ($arr as $item)
{
$return .= '<li>' . (is_array($item) ? make_list($item) : $item) . '</li>';
}
$return .= '</ul>';
return $return;
}
echo make_list($source_array);
Seems like a simple enough recursion to me:
function arrayToList($in) {
echo "<ul>";
foreach($in as $v) {
if( is_array($v)) arrayToList($v);
else echo '<li>' . $v . '</li>';
}
echo "</ul>";
}
It looks like you have some duplicate values up there. Are you using mysql_fetch_array? You should be using mysql_fetch_assoc or mysql_fetch_row depending on whether you need an associative or indexed array.

xpath not return values

I am able to pull the necessary information using xpath, when I use var_dump using the following code. When I try to add a foreach loop to return all ["href"] values i get a blank page any ideas where I am messing up?
$dom = new DOMDocument();
#$dom->loadHTML($source);
$xml = simplexml_import_dom($dom);
$rss = $xml->xpath("/html/body//a[#class='highzoom1']");
$links = $rss->href;
foreach ($links as $link){
echo $link;
}
Here is the array of information.
array(96) {
[0]=>
object(SimpleXMLElement)#3 (2) {
["#attributes"]=>
array(2) {
["href"]=>
string(49) "/p/18351/test1.html"
["class"]=>
string(10) "highzoom1"
}
[0]=>
string(36) ""test1"
}
[1]=>
object(SimpleXMLElement)#4 (2) {
["#attributes"]=>
array(2) {
["href"]=>
string(43) "/p/18351/test2.html"
["class"]=>
string(10) "highzoom1"
}
[0]=>
string(30) ""test2"
}
[2]=>
object(SimpleXMLElement)#5 (2) {
["#attributes"]=>
array(2) {
["href"]=>
string(48) "/p/18351/test3.html"
["class"]=>
string(10) "highzoom1"
}
[0]=>
string(35) ""test3"
}
Instead of:
$rss = $xml->xpath("/html/body//a[#class='highzoom1']");
use:
$hrefs = $xml->xpath("/html/body//a[#class='highzoom1']/#href");
The original XPath expression (the first above) you are using selects any a element in the XML document the value of whose class atribute is 'highzoom1' and that (the a element) is a descendent of a body that is a child of the top element (named html) in the XML document.
However, you want to select the href attributes of these a elements -- not the a elements themselves.
The second XPath expression above select exactly the href attributes of these a elements.
$links = $rss->href;
will never work, as $rss is a DOMNodeList object, and won't have an href attribute. Instead, you'd want to do this:
$rss = $xml->xpath("/html/body//a[#class='highzoom1']");
foreach($rss as $link) {
echo $link->href;
}
Or you can address $rss as an array directly:
echo $rss[5]->href; // echo out the href of the 6th link found.

Categories