php and simpleXml - how to change node contents - php

I'm trying to change the contents of a node in an XML file using simpleXML. I know that the variable for the new node-contents contains the right stuff, but for some reason the file isn't changed when it is saved. I'm probably missing something basic, because I'm new to simpleXML. Here is the whole php script:
<?php
$doc=$_REQUEST["book"];
$div1=$_REQUEST["div1"];
$div2=$_REQUEST["div2"];
if ($div1=="") $div1=$_REQUEST["chapter"];
if ($div2=="") $div2=$_REQUEST["verse"];
$div3=$_REQUEST["div3"];
$textresponse=$_REQUEST["xmltext"];
$strippedresponse = "<?xml version='1.0'?>" . stripslashes($textresponse);
echo("Saved changes to " . $doc . " " . $div1 . "." . $div2 ."<br />");
$fileName="/home/ocp/public_html/sites/default/docs/drafts/".$doc.".xml";
$xmlDoc = simplexml_load_file($fileName);
$backupFileName="/home/ocp/public_html/sites/default/docs/backups/".$doc." ".date("Y-m-d H.i.s").".xml";
file_put_contents($backupFileName, $xmlDoc->asXML());
$backupSize = filesize($backupFileName);
echo("Backup {$backupFileName} created:".$backupSize." bytes<br />");
if ($doc) {
if ($div1) {
if ($div2) {
$newVerse = simplexml_load_string($strippedresponse);
$oldVerse = $xmlDoc->xpath("//div[#number='".$div1."']/div[#number='".$div2."']");
$oldVerse = $newVerse;
$newDoc = $xmlDoc->asXml();
file_put_contents($fileName, $newDoc);
$newSize = filesize($fileName);
echo("New file is ".$newSize." bytes <br />");
}
}
}
?>

I'll venture to say that this code certainly doesn't do what you want it to:
$newVerse = simplexml_load_string($strippedresponse);
$oldVerse = $xmlDoc->xpath("//div[#number='".$div1."']/div[#number='".$div2."']");
$oldVerse = $newVerse;
Changing the value of a PHP variable has no side-effects. In other word, nothing happens when you do $a = $b; except in some specific cases, and it's not one of them.
I don't know what you really want to achieve with this code. If you want to replace the (X)HTML inside a specific <div/> you will need to use DOM and create a DOMDocumentFragment, use appendXML() to populate it then substitute it to your old <div/>. Either that or create a new DOMDocument, loadXML() then importNode() to your old document and replaceChild() your old div.

SimpleXMLElement::xpath returns an array of SimpleXMLElement objects. Copies, not references. So $oldVerse = $newVerse; does not change $xmlDoc in any way.
SimpleXML is sufficient to read XML, for manipulation you might want to choose a more powerful alternative from http://www.php.net/manual/de/refs.xml.php, e.g. DOM.

Related

Retrieving Barometric and Other Climate Data Using simple_html_dom.php

I want to periodically (once a day or so) collect the barometric pressure reading for various USA weather stations. Using simple_html_dom.php I can scrape the entire page of this site, for example (https://www.localconditions.com/weather-alliance-nebraska/69301/). However, I don't know how to then parse this down to just the barometric pressure reading: in this case "30.26".
Here's the code that grabs all the html. Obviously the find('Barometer') element isn't working.
<?php
// example of how to use basic selector to retrieve HTML contents
include('simple_html_dom.php');
// get DOM from URL or file
$html = file_get_html('https://www.localconditions.com/weather-alliance-nebraska/69301/');
// find all span tags with class=gb1
foreach($html->find('strong') as $e)
echo $e->outertext . '<HR>';
// get an element representing the second paragraph
$element = $html->find("Barometer");
echo $e->outertext . '<br>';
// extract text from HTML
echo $html->plaintext;
?>
Any advise on how to parse this?
Thanks!
As mentioned by #bato3 in his comment, queries like this are far better handled with xpath. Unfortunately, neither DOMDocument nor simplexml (which I usually use to parse xml/html) could digest the html of this site (at least not when I tried). So we have to do it with simple_html_dom and resort to (somewhat inelegant) CSS selectors and string manipulation:
$dest = $html->find("//div[class='col-sm-6 col-md-6'] > p:has(> strong)");
foreach($dest as $e) {
$target = $e->innertext;
if (strpos($target, "Barometer")!== false){
$pressure = explode(" ", $target);
echo $pressure[2];
}
}
Output:
30.25 inHg.

Extract pattern from xml file using PHP?

I have a remote XML file. I need to read, find some values an save them in an array.
I've got load the file with (no problem with this):
$xml_external_path = 'http://example.com/my-file.xml';
$xml = file_get_contents($xml_external_path);
In this file there are many instances of:
<unico>4241</unico>
<unico>234</unico>
<unico>534534</unico>
<unico>2345334</unico>
I need to extract just the number of these strings and save them in a array. I guess I need to use a pattern like:
$pattern = '/<unico>(.*?)<\/unico>/';
But I'm not sure what to do next. Keep in mind that it is an .xml file.
Result should be a populated array like this:
$my_array = array (4241, 234, 534534,2345334);
You can better use XPath to read through an XML file. XPath is a variant of DOMDocument focused on reading and editing XML files. You can query an XPath variable using patterns, which is based on the simple Unix path syntax. So // means anywhere and ./ means relative to selected node. XPath->query() will return a DOMNodelist with all the nodes according to the pattern. The following code will do what you want:
$xmlFile = "
<unico>4241</unico>
<unico>234</unico>
<unico>534534</unico>
<unico>2345334</unico>";
$xmlDoc = new DOMDocument();
$xmlDoc->loadXML($xmlFile);
$xpath = new DOMXPath($xmlDoc);
// This code returns a DOMNodeList of all nodes with the unico tags in the file.
$unicos = $xpath->query("//unico");
//This returns an integer of how many nodes were found that matched the pattern
echo $unicos->length;
You can find more info on XPath and its syntax here: XPath on Wikipedia#syntax
DOMNodeList implements Traversable, so you can use foreach() to traverse it. If you really want a flat array you can simply convert is using simple code like in question #15807314:
$unicosArr = array();
foreach($unicos as $node){
$unicosArr[] = $node->nodeValue;
}
Using preg_match_all:
<?php
$xml = '<unico>4241</unico>
<unico>234</unico>
<unico>534534</unico>
<unico>2345334</unico>';
$pattern = '/<unico>(.*?)<\/unico>/';
preg_match_all($pattern,$xml,$result);
print_r($result[0]);
You could try this, it basically just loops through each line of the file and finds whatever is between the XML <unico> tags.
<?php
$file = "./your.xml";
$pattern = '/<unico>(.*?)<\/unico>/';
$allVars = array();
$currentFile = fopen($file, "r");
if ($currentFile) {
// Read through file
while (!feof($currentFile)) {
$m_sLine = fgets($currentFile);
// Check for sitename validity
if (preg_match($pattern, $m_sLine) == true) {
$curVar = explode("<unico>", $m_sLine);
$curVar = explode("</unico>", $curVar[1]);
$allVars[] = $curVar[0];
}
}
}
fclose($currentFile);
print_r($allVars);
Is this sort of what you want? :)

Updating DOMAttr value with URL results in parameters being lost unless htmlentities() is used. Why?

I am trying to modify links in a string containing HTML but am finding the modified URLs are missing parameters.
Example:
$html = '
<p>
Example 1
</p>';
libxml_use_internal_errors(true);
$dom = new \DOMDocument();
$dom->loadHTML($html);
$xpath = new \DOMXPath($dom);
foreach ($xpath->query('//a/#href') as $node) {
echo '$node->nodeValue: ' . $node->nodeValue . PHP_EOL;
$newValue = 'http://example2.com?foo=bar&bar=foobar';
echo '$newValue: ' . $newValue . PHP_EOL;
$node->nodeValue = $newValue;
echo '$node->nodeValue: ' . $node->nodeValue . PHP_EOL;
}
Output:
$node->nodeValue: http://example.com?foo=bar&bar=foobar
$newValue: http://example2.com?foo=bar&bar=foobar
$node->nodeValue: http://example2.com?foo=bar
As you can see, the second parameter is lost after updating the nodeValue.
While experimenting I tried changing $newValue to this:
$newValue = htmlentities('http://example2.com?foo=bar&bar=foobar');
And the output then becomes:
$node->nodeValue: http://example.com?foo=bar&bar=foobar
$newValue: http://example2.com?foo=bar&bar=foobar
$node->nodeValue: http://example2.com?foo=bar&bar=foobar
Why is it necessary for the new node value to be run through htmlentities()?
Ampersands are reserved characters in XML/HTML — they begin character references. If you try to write them directly to strings in the DOM things often blow up because the DOM doesn't know what you're trying to say. When you use htmlentities() first it encodes the "&" and everyone is speaking the same language again.
Fortunately there's no need for htmlentities() at all. Instead of setting the nodeValue directly, use the setAttribute() method of the href's owner.
Rather than:
$node->nodeValue = $newValue;
Use:
$node->ownerElement->setAttribute('href', $newValue);
Directly manipulating strings in the DOM can lead to problems that won't even necessarily manifest the same across systems. I didn't lose parameters with your example, I lost the entire URL.
I highly recommend sticking with setters whenever possible.

Parsing XML with PHP (simplexml)

Firstly, may I point out that I am a newcomer to all things PHP so apologies if anything here is unclear and I'm afraid the more layman the response the better. I've been having real trouble parsing an xml file in to php to then populate an HTML table for my website. At the moment, I have been able to get the full xml feed in to a string which I can then echo and view and all seems well. I then thought I would be able to use simplexml to pick out specific elements and print their content but have been unable to do this.
The xml feed will be constantly changing (structure remaining the same) and is in compressed format. From various sources I've identified the following commands to get my feed in to the right format within a string although I am still unable to print specific elements. I've tried every combination without any luck and suspect I may be barking up the wrong tree. Could someone please point me in the right direction?!
$file = fopen("compress.zlib://$url", 'r');
$xmlstr = file_get_contents($url);
$xml = new SimpleXMLElement($url,null,true);
foreach($xml as $name) {
echo "{$name->awCat}\r\n";
}
Many, many thanks in advance,
Chris
PS The actual feed
Since no one followed my closevote, I think I can just as well put my own comments as an answer:
First of all, SimpleXml can load URIs directly and it can do so with stream wrappers, so your three calls in the beginning can be shortened to (note that you are not using $file at all)
$merchantProductFeed = new SimpleXMLElement("compress.zlib://$url", null, TRUE);
To get the values you can either use the implicit SimpleXml API and drill down to the wanted elements (like shown multiple times elsewhere on the site):
foreach ($merchantProductFeed->merchant->prod as $prod) {
echo $prod->cat->awCat , PHP_EOL;
}
or you can use an XPath query to get at the wanted elements directly
$xml = new SimpleXMLElement("compress.zlib://$url", null, TRUE);
foreach ($xml->xpath('/merchantProductFeed/merchant/prod/cat/awCat') as $awCat) {
echo $awCat, PHP_EOL;
}
Live Demo
Note that fetching all $awCat elements from the source XML is rather pointless though, because all of them have "Bodycare & Fitness" for value. Of course you can also mix XPath and the implict API and just fetch the prod elements and then drill down to the various children of them.
Using XPath should be somewhat faster than iterating over the SimpleXmlElement object graph. Though it should be noted that the difference is in an neglectable area (read 0.000x vs 0.000y) for your feed. Still, if you plan to do more XML work, it pays off to familiarize yourself with XPath, because it's quite powerful. Think of it as SQL for XML.
For additional examples see
A simple program to CRUD node and node values of xml file and
PHP Manual - SimpleXml Basic Examples
Try this...
$url = "http://datafeed.api.productserve.com/datafeed/download/apikey/58bc4442611e03a13eca07d83607f851/cid/97,98,142,144,146,129,595,539,147,149,613,626,135,163,168,159,169,161,167,170,137,171,548,174,183,178,179,175,172,623,139,614,189,194,141,205,198,206,203,208,199,204,201,61,62,72,73,71,74,75,76,77,78,79,63,80,82,64,83,84,85,65,86,87,88,90,89,91,67,92,94,33,54,53,57,58,52,603,60,56,66,128,130,133,212,207,209,210,211,68,69,213,216,217,218,219,220,221,223,70,224,225,226,227,228,229,4,5,10,11,537,13,19,15,14,18,6,551,20,21,22,23,24,25,26,7,30,29,32,619,34,8,35,618,40,38,42,43,9,45,46,651,47,49,50,634,230,231,538,235,550,240,239,241,556,245,244,242,521,576,575,577,579,281,283,554,285,555,303,304,286,282,287,288,173,193,637,639,640,642,643,644,641,650,177,379,648,181,645,384,387,646,598,611,391,393,647,395,631,602,570,600,405,187,411,412,413,414,415,416,649,418,419,420,99,100,101,107,110,111,113,114,115,116,118,121,122,127,581,624,123,594,125,421,604,599,422,530,434,532,428,474,475,476,477,423,608,437,438,440,441,442,444,446,447,607,424,451,448,453,449,452,450,425,455,457,459,460,456,458,426,616,463,464,465,466,467,427,625,597,473,469,617,470,429,430,615,483,484,485,487,488,529,596,431,432,489,490,361,633,362,366,367,368,371,369,363,372,373,374,377,375,536,535,364,378,380,381,365,383,385,386,390,392,394,396,397,399,402,404,406,407,540,542,544,546,547,246,558,247,252,559,255,248,256,265,259,632,260,261,262,557,249,266,267,268,269,612,251,277,250,272,270,271,273,561,560,347,348,354,350,352,349,355,356,357,358,359,360,586,590,592,588,591,589,328,629,330,338,493,635,495,507,563,564,567,569,568/mid/2891/columns/merchant_id,merchant_name,aw_product_id,merchant_product_id,product_name,description,category_id,category_name,merchant_category,aw_deep_link,aw_image_url,search_price,delivery_cost,merchant_deep_link,merchant_image_url/format/xml/compression/gzip/";
$zd = gzopen($url, "r");
$data = gzread($zd, 1000000);
gzclose($zd);
if ($data !== false) {
$xml = simplexml_load_string($data);
foreach ($xml->merchant->prod as $pr) {
echo $pr->cat->awCat . "<br>";
}
}
<?php
$xmlstr = file_get_contents("compress.zlib://$url");
$xml = simplexml_load_string($xmlstr);
// you can transverse the xml tree however you want
foreach ($xml->merchant->prod as $line) {
// $line->cat->awCat -> you can use this
}
more information here
Use print_r($xml) to see the structure of the parsed XML feed.
Then it becomes obvious how you would traverse it:
foreach ($xml->merchant->prod as $prod) {
print $prod->pId;
print $prod->text->name;
print $prod->cat->awCat; # <-- which is what you wanted
print $prod->price->buynow;
}
$url = 'you url here';
$f = gzopen ($url, 'r');
$xml = new SimpleXMLElement (fread ($f, 1000000));
foreach($xml->xpath ('//prod') as $name)
{
echo (string) $name->cat->awCatId, "\r\n";
}

Retrieve XML from a third party page in PHP

I need read in and parse data from a third party website which sends XML data. All of this needs to be done server side.
What is the best way to do this using PHP?
You can obtain the remote XML data with, e.g.
$xmldata = file_get_contents("http://www.example.com/xmldata");
or with curl. Then use SimpleXML, DOM, whatever.
A good way of parsing XML is often to use XPP (XML Pull Parsing) librairy, PHP has an implementation of it, it's called XMLReader.
http://php.net/manual/en/class.xmlreader.php
I would suggest you to use DOMDocument (PHP inline built class)
A simple example of its power could be the following code:
/***********************************************************************************************
Takes the RSS news feeds found at $url and prints them as HTML code.
Each news is rendered in a <div class="rss"> block in the order: date + title + description.
***********************************************************************************************/
function Render($url, $max_feeds = 1000)
{
$doc = new DOMDocument();
if(#$doc->load($url, LIBXML_NOCDATA|LIBXML_NOBLANKS))
{
$feed_count = 0;
$items = $doc->getElementsByTagName("item");
//echo $items->length; //DEBUG
foreach($items as $item)
{
if($feed_count > $max_feeds)
break;
//Unfortunately inside <item> node elements are not always in same order, therefor we have to call many times getElementsByTagName
//WARNING: using iconv function instead of utf8_decode because this last one did not convert properly some characters like apostrophe 0x19 from techsport.it feeds.
$title = iconv('UTF-8', 'CP1252', $item->getElementsByTagName("title")->item(0)->firstChild->textContent); //can use "CP1252//TRANSLIT"
$description = iconv('UTF-8', 'CP1252', $item->getElementsByTagName("description")->item(0)->firstChild->textContent); //can use "CP1252//TRANSLIT"
$link = iconv('UTF-8', 'CP1252', $item->getElementsByTagName("link")->item(0)->firstChild->textContent); //can use "CP1252//TRANSLIT"
//pubDate tag is not mandatory in RSS [RSS2 spec: http://cyber.law.harvard.edu/rss/rss.html]
$pub_date = $item->getElementsByTagName("pubDate"); $date_html = "";
//play with date here if you want
echo "<div class='rss'>\n<p class='title'><a href='" . $link . "'>" . $title . "</a></p>\n<p class='description'>" . $description . "</p>\n</div>\n\n";
$feed_count++;
}
}
else
echo "<div class='rss'>Service not available.</div>";
}
I have been using simpleXML for a while.

Categories