Currently I am using unset() for removing a parent node in simpleXML, and writing it back to the XML file
I tried this code and it was working a while ago, after cleaning my code I can't find why it doesn't work all of the sudden,
The debugging approaches I took: the file can be accessed, I can enter the loop and if statement, the file gets saved (notepad++ asks me to reload), but the <systemInfo></systemInfo> does not get deleted
Here is my sample Code:
$userorig = $_POST['user'];
$userinfos = simplexml_load_file('userInfo.xml'); // Opens the user XML file
foreach ($userinfos->userinfo->account as $account)
{
// Checks if the user in this iteration of the loop is the same as $userorig (the user i want to find)
if($account->user == $userorig)
{
echo "hello";
$rootSystem = $account->systemInfo;
unset($rootSystem);
}
}
$userinfos->saveXML('userInfo.xml');
My XML File:
<userinfos>
<userinfo>
<account>
<user>TIGERBOY-PC</user>
<toDump>2014-03-15 03:20:44</toDump>
<toDumpDone>0</toDumpDone>
<initialCheck>0</initialCheck>
<lastChecked>2014-03-16 07:12:17</lastChecked>
<alert>1</alert>
<systemInfo>
... (many nodes and sub nodes here) ...
</systemInfo>
</account>
</userinfo>
</userinfos>
Rather than iterating over the whole xml, use xpath to select the node:
$userorig = $_POST['user'];
$userinfos = simplexml_load_file('userInfo.xml'); // Opens the user XML file
$deletethisuser = $userinfos->xpath("/userinfos/userinfo/account[user = '$userorig']/systemInfo")[0];
unset($deletethisuser[0]);
Comments:
the [0] in the xpath... line requires PHP >= 5.4, in case you are running on a lower version, either update or go:
$deletethisuser = $userinfos->xpath("/userinfos/userinfo/account[user = '$userorig']/systemInfo");
unset($deletethisuser[0][0]);
Advised reading: hakre's answer in this thread: Remove a child with a specific attribute, in SimpleXML for PHP
It worked again, sorry, I did not know why it worked, I keep running it on multiple instances, and now it works, the program has weird behavior, but tried it for around 15 tries, it did its job
Related
I'm newbie for xml files related stuff. i've stuck with an issue.
I have a mysql query which fetches url data nearly 5000 rows (1 row contains 1 url).
so i've implemented a cron which fetches 1000 rows at time from mysql with pagination. i need to do some validations on the urls and should append the valid urls in an xml file.
Here is my code
public function urlcheck()
{
$xFile = $this->base_path."sitemap/path/urls.xml";
$page = 0;
$cache_key = 'valid_urls';
$page = $this->cache->redis->get($cache_key);
if(!$page){
$page=0;
}
$xFile = simplexml_load_file($xFile);
$this->load->model('productnew/productnew_es6_m');
$urls= $this->db->query("SELECT url FROM product_data where `active` = 1 limit ".$page.",1000")->result();
$dom = new DOMDocument('1.0','UTF-8');
$dom->formatOutput = true;
$root = $dom->createElement('urlset');
$root->setAttribute('xsi:schemaLocation', 'http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd');
$root->setAttribute('xmlns:xsi', 'http://www.w3.org/2001/XMLSchema-instance');
$root->setAttribute('xmlns', 'http://www.sitemaps.org/schemas/sitemap/0.9');
$dom->appendChild($root);
foreach($urls as $val)
{
// validations here
$url = $dom->createElement('url');
$root->appendChild($url);
$lastmod = $dom->createElement('lastmod', date("Y-m-d"));
$url->appendChild($lastmod);
$page++;
}
$dom->saveXML();
$dom->save($xFile) or die('XML Create Error');
if(sizeof($urls) == 0){
$page = 0;
}
print_r($page);
$this->cache->redis->save($cache_key, $page, 432000);
// echo '<xmp>'. $dom->saveXML() .'</xmp>';
// $dom->saveXML();
// $dom->save($xFile) or die('XML Create Error');
}
After my first cron execution, 300 valid urls out of 1000 urls are saved to xml file,
Now lets say In my second cron execution i have 200 valid urls out of 1000.
My expected result is to append these 200 to the existing xml file so that my xml file contains total 500 valid urls, and xml file should get refresh after 5000 urls as i mentioned above.
But after executing the cron every time, old url data is being replaced with latest once.
I was wondering how do I save the url values without overwriting the XML.
Thank you in Advance!
As per the comment above you are opening the file with one api (SimpleXML) but saving a new document with DOMDocument - thus overwriting previous work. Without SimpleXML perhaps you can try like this - though it is untested.
public function urlcheck(){
$file=$this->base_path."sitemap/path/urls.xml";
$cache_key='valid_urls';
$page=$this->cache->redis->get($cache_key);
if(!$page)$page=0;
$dom=new DOMDocument('1.0','UTF-8');
$dom->formatOutput = true;
$col=$dom->getElementsByTagName('urlset');
if( !empty( $col ) )$root=$col->item(0);
else{
$root=$dom->createElement('urlset');
$dom->appendChild( $root );
$root->setAttribute('xsi:schemaLocation','http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd');
$root->setAttribute('xmlns:xsi','http://www.w3.org/2001/XMLSchema-instance');
$root->setAttribute('xmlns','http://www.sitemaps.org/schemas/sitemap/0.9');
}
# does a `page` node exist - if so use the value as the $page variable
$col=$com->getElementsByTagName('page');
if( !empty( $col ) )$page=intval( $col->item(0)->nodeValue );
$this->load->model('productnew/productnew_es6_m');
$urls=$this->db->query("SELECT `url` FROM `product_data` where `active` = 1 limit ".$page.",1000")->result();
foreach( $urls as $val ){
$url = $dom->createElement('url');
$root->appendChild($url);
$lastmod = $dom->createElement('lastmod', date("Y-m-d"));
$url->appendChild($lastmod);
$page++;
}
$node=$dom->createElement( 'page', $page );
$root->insertBefore( $node, $root->firstChild );
if( empty( $urls ) )$page=0;
$dom->save( $file );
$this->cache->redis->save( $cache_key, $page, 432000 );
}
Appending to the document looks fine, but you don't open the file to which you want to append to from disk thought. Therefore on each page you start with 0 urls in the XML and append to the empty root node.
But after executing the cron every time, old url data is being replaced with latest once.
This is exactly the behaviour you describe and which sounds like you don't load the XML file in the first place, just write it.
So the question perhaps is how to open an XML file, append looks good by your description already.
Let's review, by reversing the introduction sentences of your question:
I need to do some validations on the urls and should append the valid urls in an xml file.
so i've implemented a cron which fetches 1000 rows at time from mysql with pagination.
I have a mysql query which fetches url data nearly 5000 rows (1 row contains 1 url).
Assuming the file to append each 1000 url-set to is already on disk (page 2-5), you would need to append. If however on page 1 the file would already be on disk, you would append to some other page 1-5.
So it looks like you have written the code only for when you're on the first page - to create a new document (and append to it).
And despite your question, appending does work, you write it yourself:
old url data is being replaced with latest once.
The only thing that does not work is to open the file on page 2 - 5.
So let's rephrase the question: How to open an XML file?
But first of all, the variable $page is not meant to stand for page as in page 1 - 5 above. It's just a variable with a questionable name and $page stands for the number of URLs processed so far in the cycle and not for the page in the pagination.
Regardless of its name, I'll use it for its value for this answer.
So now lets open the existing document for appending when $page is not 0:
...
$dom = new DOMDocument('1.0','UTF-8');
$dom->formatOutput = true;
if ($page !== 0) {
$dom->load(dom_import_simplexml($xFile)->ownerDocument->documentURI)
}
$col=$dom->getElementsByTagName('urlset');
...
only on the first run you'll have the described behaviour that the file is created new - and in that case it's fine (on the first run $page === 0).
in any other case $page is not 0 and the file is opened from disk.
I've left the other parts of your code alone so that this example is only introducing this 3-line if-clause.
The documentation for the load($file) function is available in the PHP docs, just in case you missed it so far:
https://www.php.net/manual/en/domdocument.load.php
Try to not re-use the same variable names if you want to come up to speed. Here I had to recycle a whole SimpleXMLElement and import it into DOM only to obtain the original xml-file-path to open the document - which was not available as plain string any longer despite it once was under the variable $xFile. But that just as a comment in the margin.
And as you're already using Redis, you perhaps may want to queue the URLs into it and process from there, then you'll likely not need the database paging. See Lists of the Redis Data-Types.
You can then also put the good URLs in there in a second list.
With two lists you can even constantly check the progress in Redis directly.
And when finally done, you can write the whole file at once in one transaction out of the good URLs in Redis.
If you want to throw some more (minimal) tech on it, take a look at Beanstalkd.
I've searched and tried a number of examples and answers I've found here, but I think my requirements are quite specific and I haven't been able to find an answer that's worked for me so far... Bodging a number of answers together hasn't worked either!
Aim - Update the hosplist_divert value of the same element that contains a specific name tag.
The XML file is hosted in a sperate folder to the php page, in those case at /data/hospitals2.xml
My XML is like so:
<Document>
<Placemark>
<name>UHSM Wythenshawe</name>
<hosplist_divert>1</hosplist_divert>
</Placemark>
</Document>
There are approx 50 Placemark entries in the file.
So far I am only able to return all the hospital names from the file with,
// Create the SXE object
// You can read from file using the simplexml_load_file function
$url = "http://www.patientpathfinder.co.uk/user/nwasdos/data/hospitals2.xml";
$sxe = new SimpleXMLElement($url, NULL, TRUE);
$sxe->registerXPathNamespace('hospital','http://earth.google.com/kml/2.2');
// Fetch the right HOSPITAL using XPATH
// hospital name stored in Document/Placemark/name
// dovert status stored in Document/Placemark/hosplist_divert
//trying to find above values for UHSM Wythenshawe
$result=$sxe->xpath('//hospital:name[.="UHSM Wythenshawe"]/parent::*');
foreach ($result as $hospital)
{
echo $hospital . "<br>";
}
// Update the values you want
//$target_hosp[0]->hosplist_divert = 'ON DIVERT';
// Store the updated values in the $xml variable
//$xml = $sxe->asXML();
// Print the updated XML
//echo $xml;
}
It took me about half a day to realise that I needed to define the namespace, but haven't really had time to understand why the namespace is required and would be happy to remove it from the XML file in. Favour of a working solution.
Thanks to all that contribute,
Nick
Each time I run the code, file updates and I can see the file last edited date and time are updated but the content in the XML file is not updated.
I just tried to update the following XML Code
<?xml version="1.0" encoding="utf-8"?>
<topcont>
<sitenondualtraining>
<title>The Heart of Awakening</title>
<descripition>nondual</descripition>
<link>www.test.com/post/latestpost</link>
</sitenondualtraining>
</topcont>
using PHP code
$topcont = new DOMDocument();
$topcont->load("http://fenner.tk/topcont.xml");
$topcont->topcont->sitenondualtraining->title = 'test';
$topcont->sitenondualtraining->descripition = $_POST['nd2'];
$topcont->sitenondualtraining->link = $_POST['nd3'];
$topcont->Save("topcont.xml");
I also tried
$topcont = new SimpleXmlElement('http://fenner.tk/topcont.xml',null, true);
$topcont->sitenondualtraining->title = $_POST['nd1'];
$topcont->sitenondualtraining->descripition = $_POST['nd2'];
$topcont->sitenondualtraining->link = $_POST['nd3'];
$topcont->asXml('topcont.xml');
But none of these are working. Can anyone point where the issue is? Thanks.
File permission are set to 777 but still not working
NO ERRORS BUT WARNINGS ARE
Warning: Creating default object from empty value in /home/fenner/public_html/topads.php on line 20
Warning: Creating default object from empty value in /home/fenner/public_html/topads.php on line 21 /home/fenner/public_html/
Using DomDocument, you were almost there. You can do it like this:
$topcont = new DOMDocument();
$topcont->load("topcont.xml");
$topcont->getElementsByTagName("title")->item(0)->nodeValue = $_POST['nd2'];
$topcont->getElementsByTagName("description")->item(0)->nodeValue = $_POST['nd2'];
$topcont->getElementsByTagName("link")->item(0)->nodeValue = $_POST['nd3'];
$topcont->save("topcont.xml");
Just remember to sanitize your inputs before storing your data ;)
Also worth looking into is creating cdata sections and using replaceData, depending on what you intend to store in each node.
EDIT
In response to your comment below, you might want to change your xml structure a little if you are going to be handling multiple child nodes. This way it is easier to loop through and update the node you are interested in. You will see below that I moved 'sitenondualtraining' and 'siteradiantmind' to be id's of an 'item" node, though you could easily change this to something like <site id="nodualtraining> if that's more like what you were looking for.
<?xml version="1.0" encoding="utf-8"?>
<topcont>
<item id="sitenondualtraining">
<title>test</title>
<description>hello test</description>
<link>hello</link>
</item>
<item id="siteradiantmind">
<title>The Heart of Awakening</title>
<description>radiantmind</description>
<link>www.radiantmind.com/post/latestpost</link>
</item>
</topcont>
Your PHP code would then be something like this, again this is quite basic and could be tidied up, but is a good start:
$items = $topcont->getElementsByTagName("item");
// loop through each item
foreach ($items as $item) {
$id = $item->getAttribute('id');
// check the item id to make sure we edit the correct one
if ($id == "sitenondualtraining") {
$item->getElementsByTagName("title")->item(0)->nodeValue = $_POST['nd1'];
$item->getElementsByTagName("link")->item(0)->nodeValue = $_POST['nd2'];
$item->getElementsByTagName("description")->item(0)->nodeValue = $_POST['nd3];
}
}
If you were feeling a little adventurous, you could have a look at xpath and xpath query, you can find some sample code in most php docs to get you started and the comments from other users can be helpful as well.
For reference: getAttribute, getElementsByTagName.
I apologize if this is a newbie question, but I cannot figure out why this doesn't work - and I can't seem to find anything about it when searching.
Basically, I am trying to scrape some userdetails from our site, that are not available from the sites REST api, so I have to do it manually. I have compiled a textfile with userids, that I use for fetching the wanted details from each user through Simple HTML Dom.
<?php
include('simple_html_dom.php') ;
include('functions.php') ;
$file = fopen("userids2.txt", "r") ;
while(!feof($file)) {
$userid = fgetss($file) ;
$url = 'http://<our URL>/user/'.$userid ;
echo $url ;
webscraper($url) ;
}
fclose($file) ;
?>
and here are the contents of functions.php:
<?php
function webscraper($loopurl) {
$html = new simple_html_dom();
$html->load_file($loopurl);
$test = $html->getElementsById('ctl00_ContentPlaceHolderDefault_UserViewUC_tabContainer_tabProfile_userProfile_ddWork') ;
foreach ($test as $element) {
echo $element ;
}
}
?>
The specific textfile used contains 4 userids that I know contain the information that I want. When I run the script it will only give me the output from the url from the last line in the textfile. It prints out the URLs fine, but refuses to load the remote html for the first three entries. If I delete the last line of the textfile, it then loads the new last line (which it refused to do before).
Any ideas?? Thanks in advance.
Doh.. I found out what the problem was. There was an "invisible" end of line character on all entries in the textfile EXCEPT the last one. So that was why it refused to work. Adding trim when retrieving the line fixed the problem:
$userid = trim(fgetss($file));
I probably should have known this, but at least I won't make this mistake next time :-).
I'm making an interface-website to update a concert-list on a band-website.
The list is stored as an XML file an has this structure :
I already wrote a script that enables me to add a new gig to the list, this was relatively easy...
Now I want to write a script that enables me to edit a certain gig in the list.
Every Gig is Unique because of the first attribute : "id" .
I want to use this reference to edit the other attributes in that Node.
My PHP is very poor, so I hope someone could put me on the good foot here...
My PHP script :
Well i dunno what your XML structure looks like but:
<gig id="someid">
<venue></venue>
<day></day>
<month></month>
<year></year>
</gig>
$xml = new SimpleXmlElement('gig.xml',null, true);
$gig = $xml->xpath('//gig[#id="'.$_POST['id'].'"]');
$gig->venue = $_POST['venue'];
$gig->month = $_POST['month'];
// etc..
$xml->asXml('gig.xml)'; // save back to file
now if instead all these data points are attributes you can use $gig->attributes()->venue to access it.
There is no need for the loop really unless you are doing multiple updates with one post - you can get at any specific record via an XPAth query. SimpleXML is also a lot lighter and a lot easier to use for this type of thing than DOMDOcument - especially as you arent using the feature of DOMDocument.
You'll want to load the xml file in a domdocument with
<?
$xml = new DOMDocument();
$xml->load("xmlfile.xml");
//find the tags that you want to update
$tags = $xml->getElementsByTagName("GIG");
//find the tag with the id you want to update
foreach ($tags as $tag) {
if($tag->getAttribute("id") == $id) { //found the tag, now update the attribute
$tag->setAttribute("[attributeName]", "[attributeValue]");
}
}
//save the xml
$xml->save();
?>
code is untested, but it's a general idea