Currently, im having this for appending data to my items file:
$xmldoc = new DOMDocument();
$xmldoc->load('ex.xml');
$item= $xmldoc->createElement('item');
$item->setAttribute('id', '100');
$item->setAttribute('category', 'Fitness');
$item->setAttribute('name', 'Basketball');
$item->setAttribute('url', 'http://google.com');
$item->setAttribute('description', 'This is a description');
$item->setAttribute('price', '899');
$xmldoc->getElementsByTagName('items')->item(0)->appendChild($item);
$xmldoc->save('ex.xml');
Now before appending this, I would like to check for an existing element "item" that has the same attribute id value.
And if it does it should update that element with these new data.
Currently it just appends and doesnt check anything.
$xmldoc = new DOMDocument();
$xmldoc->load('ex.xml');
$xpath = new DOMXPath($xmldoc);
$query = $xpath->query('/mainXML/items/item[#id = "100"]');
$create_new_node = false;
if($query->length == 0)
{
$item = $xmldoc->createElement('item');
$create_new_node = true;
}
else
{
$item = $query->item(0);
}
$item->setAttribute('id', '100');
$item->setAttribute('category', 'Fitness');
$item->setAttribute('name', 'Basketball');
$item->setAttribute('url', 'http://google.com');
$item->setAttribute('description', 'This is a description');
$item->setAttribute('price', '899');
if($create_new_node)
{
$xmldoc->getElementsByTagName('items')->item(0)->appendChild($item);
}
$xmldoc->save('ex.xml');
I haven't used this functionality but looks like a good match for DOMDocument: Get Element By ID
If you get a matching element, edit it, and if not, post away.
If you have a DTD for this xml file that specifies that the "id" attribute is an ID type (i.e. its value is unique in a document and uniquely identifies its element), then you can use DOMDocument::getElementById().
Most likely, however, you do not have a DTD. In this case, you should just use XPath:
$xmldoc = new DOMDocument();
$xmldoc->load('ex.xml');
$xpath = new DOMXPath($xmldoc);
$results = $xpath->query('//items/item[#id=100][0]');
if (!$results->length) {
$item= $xmldoc->createElement('item');
$item->setAttribute('id', '100');
$item->setAttribute('category', 'Fitness');
$item->setAttribute('name', 'Basketball');
$item->setAttribute('url', 'http://google.com');
$item->setAttribute('description', 'This is a description');
$item->setAttribute('price', '899');
$xmldoc->getElementsByTagName('items')->item(0)->appendChild($item);
$xmldoc->save('ex.xml');
}
You should also consider using SimpleXML for this task. The way this xml is structured and manipulated would probably be better-suited to SimpleXML.
Related
I try to access the values of a table on a web page with a php expression DOMXPath::query. When I navigate with my web browser in this page I can see this table but when I execute my query this table isn't visible and don't seem accessible.
This table have an id, but when I specify it on my query an other one is returned. I want to read the table with the id 'totals', but I only have that one with the id 'per_game'. When I inspect page's code, a lot of elements seem to be in comments.
Here is my script:
<?php
$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;
$doc->strictErrorChecking = false;
$doc->recover = true;
$doc->loadHTMLFile('https://www.basketball-reference.com/players/j/jokicni01.html');
$xpath = new DOMXPath($doc);
$table = $xpath->query("//div[#id='totals']")->item(0);
$elem = $doc->saveXML($table);
echo $elem;
?>
How can i read elements in the table with the id 'totals' ?
The full path is /html/body/div[#id="wrap"]/div[#id="content"]/div[#id="all_totals"]/div[#class="table_outer_container"]/div[#id="div_totals"]/table[#id="totals"]
You can cut your query in two parts : first, retrieve the comment in the correct div, then create a new document with this content to retrieve the element you want :
$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;
$doc->strictErrorChecking = false;
$doc->recover = true;
#$doc->loadHTMLFile('https://www.basketball-reference.com/players/j/jokicni01.html');
$xpath = new DOMXPath($doc);
// retrieve the comment section in 'all_totals' div
$all_totals_element = $xpath->query('/html/body/div[#id="wrap"]/div[#id="content"]/div[#id="all_totals"]/comment()')->item(0);
$all_totals_table = $doc->saveXML($all_totals_element);
// strip comment tags to keep the content inside
$all_totals_table = substr($all_totals_table, strpos($all_totals_table, '<!--') + strlen('<!--'));
$all_totals_table = substr($all_totals_table, 0, strpos($all_totals_table, '-->'));
// create a new Document with the content of the comment
$tableDoc = new DOMDocument ;
$tableDoc->loadHTML($all_totals_table);
$xpath = new DOMXPath($tableDoc);
// second part of the query
$totals = $xpath->query('/div[#class="table_outer_container"]/div[#id="div_totals"]/table[#id="totals"]')->item(0);
echo $tableDoc->saveXML($totals) ;
I have the following source code:
<?php
function getTerms()
{
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML('https://charitablebookings.com/terms'); // loads your HTML
$xpath = new DOMXPath($doc);
// returns a list of all links with rel=nofollow
$nodeList = $xpath->query("//div[#class='terms-conditions']");
$temp_dom = new DOMDocument();
$node = $nodeList->item(0);
$temp_dom = new DOMDocument();
foreach($nodeList as $n) $temp_dom->appendChild($temp_dom->importNode($n,true));
print_r($temp_dom->saveHTML());
}
getTerms();
?>
which I'm trying to get a text from a web page by getting a specific class. I don't get anything on my browser when I try to print_r the temp_dom. And $node is null. What am I doing wrong ?
Thanks for your time
The first issue is that DOMDocument's loadHTML method expects HTML content as its first parameter, not an URL.
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$html = file_get_contents('https://charitablebookings.com/terms');
$doc->loadHTML($html);
And the second problem is with your XPath expression: $xpath->query("//div[#class='terms-conditions']") - as there is no div with class of terms-conditions in the document (it probably gets added by some JavaScript loader).
Here are the codes:
$doc = new DomDocument('1.0');
// create root node
$root = $doc->createElement('root');
$root = $doc->appendChild($root);
$signed_values = array('a' => 'eee', 'b' => 'sd', 'c' => 'df');
// process one row at a time
foreach ($signed_values as $key => $val) {
// add node for each row
$occ = $doc->createElement('error');
$occ = $root->appendChild($occ);
// add a child node for each field
foreach ($signed_values as $fieldname => $fieldvalue) {
$child = $doc->createElement($fieldname);
$child = $occ->appendChild($child);
$value = $doc->createTextNode($fieldvalue);
$value = $child->appendChild($value);
}
}
// get completed xml document
$xml_string = $doc->saveXML() ;
echo $xml_string;
If I print it in the browser I don't get nice XML structure like
<xml> \n tab <child> etc.
I just get
<xml><child>ee</child></xml>
And I want to be utf-8
How is this all possible to do?
You can try to do this:
...
// get completed xml document
$doc->preserveWhiteSpace = false;
$doc->formatOutput = true;
$xml_string = $doc->saveXML();
echo $xml_string;
You can make set these parameter right after you've created the DOMDocument as well:
$doc = new DomDocument('1.0');
$doc->preserveWhiteSpace = false;
$doc->formatOutput = true;
That's probably more concise. Output in both cases is (Demo):
<?xml version="1.0"?>
<root>
<error>
<a>eee</a>
<b>sd</b>
<c>df</c>
</error>
<error>
<a>eee</a>
<b>sd</b>
<c>df</c>
</error>
<error>
<a>eee</a>
<b>sd</b>
<c>df</c>
</error>
</root>
I'm not aware how to change the indentation character(s) with DOMDocument. You could post-process the XML with a line-by-line regular-expression based replacing (e.g. with preg_replace):
$xml_string = preg_replace('/(?:^|\G) /um', "\t", $xml_string);
Alternatively, there is the tidy extension with tidy_repair_string which can pretty print XML data as well. It's possible to specify indentation levels with it, however tidy will never output tabs.
tidy_repair_string($xml_string, ['input-xml'=> 1, 'indent' => 1, 'wrap' => 0]);
With a SimpleXml object, you can simply
$domxml = new DOMDocument('1.0');
$domxml->preserveWhiteSpace = false;
$domxml->formatOutput = true;
/* #var $xml SimpleXMLElement */
$domxml->loadXML($xml->asXML());
$domxml->save($newfile);
$xml is your simplexml object
So then you simpleXml can be saved as a new file specified by $newfile
<?php
$xml = $argv[1];
$dom = new DOMDocument();
// Initial block (must before load xml string)
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
// End initial block
$dom->loadXML($xml);
$out = $dom->saveXML();
print_R($out);
Tried all the answers but none worked. Maybe it's because I'm appending and removing childs before saving the XML.
After a lot of googling found this comment in the php documentation. I only had to reload the resulting XML to make it work.
$outXML = $xml->saveXML();
$xml = new DOMDocument();
$xml->preserveWhiteSpace = false;
$xml->formatOutput = true;
$xml->loadXML($outXML);
$outXML = $xml->saveXML();
// ##### IN SUMMARY #####
$xmlFilepath = 'test.xml';
echoFormattedXML($xmlFilepath);
/*
* echo xml in source format
*/
function echoFormattedXML($xmlFilepath) {
header('Content-Type: text/xml'); // to show source, not execute the xml
echo formatXML($xmlFilepath); // format the xml to make it readable
} // echoFormattedXML
/*
* format xml so it can be easily read but will use more disk space
*/
function formatXML($xmlFilepath) {
$loadxml = simplexml_load_file($xmlFilepath);
$dom = new DOMDocument('1.0');
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->loadXML($loadxml->asXML());
$formatxml = new SimpleXMLElement($dom->saveXML());
//$formatxml->saveXML("testF.xml"); // save as file
return $formatxml->saveXML();
} // formatXML
Two different issues here:
Set the formatOutput and preserveWhiteSpace attributes to TRUE to generate formatted XML:
$doc->formatOutput = TRUE;
$doc->preserveWhiteSpace = TRUE;
Many web browsers (namely Internet Explorer and Firefox) format XML when they display it. Use either the View Source feature or a regular text editor to inspect the output.
See also xmlEncoding and encoding.
This is a slight variation of the above theme but I'm putting here in case others hit this and cannot make sense of it ...as I did.
When using saveXML(), preserveWhiteSpace in the target DOMdocument does not apply to imported nodes (as at PHP 5.6).
Consider the following code:
$dom = new DOMDocument(); //create a document
$dom->preserveWhiteSpace = false; //disable whitespace preservation
$dom->formatOutput = true; //pretty print output
$documentElement = $dom->createElement("Entry"); //create a node
$dom->appendChild ($documentElement); //append it
$message = new DOMDocument(); //create another document
$message->loadXML($messageXMLtext); //populate the new document from XML text
$node=$dom->importNode($message->documentElement,true); //import the new document content to a new node in the original document
$documentElement->appendChild($node); //append the new node to the document Element
$dom->saveXML($dom->documentElement); //print the original document
In this context, the $dom->saveXML(); statement will NOT pretty print the content imported from $message, but content originally in $dom will be pretty printed.
In order to achieve pretty printing for the entire $dom document, the line:
$message->preserveWhiteSpace = false;
must be included after the $message = new DOMDocument(); line - ie. the document/s from which the nodes are imported must also have preserveWhiteSpace = false.
based on the answer by #heavenevil
This function pretty prints using the browser
function prettyPrintXmlToBrowser(SimpleXMLElement $xml)
{
$domXml = new DOMDocument('1.0');
$domXml->preserveWhiteSpace = false;
$domXml->formatOutput = true;
$domXml->loadXML($xml->asXML());
$xmlString = $domXml->saveXML();
echo nl2br(str_replace(' ', ' ', htmlspecialchars($xmlString)));
}
I'm trying to cleanup some bad html using DOMDocument. The html has an <div class="article"> element, with <br/><br/> instead of </p><p> -- I want to regex these into paragraphs...but can't seem to get my node back into the original document:
//load entire doc
$doc = new DOMDocument();
$doc->loadHTML($htm);
$xpath = new DOMXpath($doc);
//get the article
$article = $xpath->query("//div[#class='article']")->parentNode;
//get as string
$article_htm = $doc->saveXML($article);
//regex the bad markup
$article_htm2 = preg_replace('/<br\/><br\/>/i', '</p><p>', $article_htm);
//create new doc w/ new html string
$doc2 = new DOMDocument();
$doc2->loadHTML($article_htm2);
$xpath2 = new DOMXpath($doc2);
//get the original article node
$article_old = $xpath->query("//div[#class='article']");
//get the new article node
$article_new = $xpath2->query("//div[#class='article']");
//replace original node with new node
$article->replaceChild($article_old, $article_new);
$article_htm_new = $doc->saveXML();
//dump string
var_dump($article_htm_new);
all i get is a 500 internal server error...not sure what I'm doing wrong.
There are several issues:
$xpath->query returns a nodeList, not a node. You must select an item from the nodeList
replaceChild() expects as 1st argument the new node, and as 2nd the node to replace
$article_new is part of another document, you first must import the node into $doc
Fixed code:
//load entire doc
$doc = new DOMDocument();
$doc->loadHTML($htm);
$xpath = new DOMXpath($doc);
//get the article
$article = $xpath->query("//div[#class='article']")->item(0)->parentNode;
//get as string
$article_htm = $doc->saveXML($article);
//regex the bad markup
$article_htm2 = preg_replace('/<br\/><br\/>/i', '</p>xxx<p>', $article_htm);
//create new doc w/ new html string
$doc2 = new DOMDocument();
$doc2->loadHTML($article_htm2);
$xpath2 = new DOMXpath($doc2);
//get the original article node
$article_old = $xpath->query("//div[#class='article']")->item(0);
//get the new article node
$article_new = $xpath2->query("//div[#class='article']")->item(0);
//import the new node into $doc
$article_new=$doc->importNode($article_new,true);
//replace original node with new node
$article->replaceChild($article_new, $article_old);
$article_htm_new = $doc->saveHTML();
//dump string
var_dump($article_htm_new);
Instead of using 2 documents you may create a DocumentFragment of $article_htm2 and use this fragment as replacement.
I think it should be
$article->parentNode->replaceChild($article_old, $article_new);
the article is not a child of itself.
A web service return Xml of format
<string>
<NewDataSet>
<DealBlotter>
<CustomerReference>161403239</CustomerReference>
<Symbol>EUR/USD</Symbol>
<BuySell>S</BuySell>
<ContractValue>-100000</ContractValue>
<Price>1.35070</Price>
<CounterValue>-135070</CounterValue>
<TradeDate>2011-01-20 22:05:21.690</TradeDate>
<ConfirmationNumber>78967117</ConfirmationNumber>
<Status>C</Status>
<lTID>111913820</lTID>
</DealBlotter>
</NewDataSet>
</string>
Now i am using curl to access this and then -
$xml = simplexml_load_string($result);
$dom = new DOMDOcument();
// Load your XML as a string
$dom->loadXML($xml);
// Create new XPath object
$xpath = new DOMXpath($dom);
$res = $xpath->query("/NewDataSet/DealBlotter");
foreach($res as $node)
{
print "i went inside foreach";
$custref = ($node->getElementsByTagName("CustomerReference")->item(0)->nodeValue);
print $custref;
$ccy = ($node->getElementsByTagName("Symbol")->item(0)->nodeValue);
print $ccy;
$type = ($node->getElementsByTagName("BuySell")->item(0)->nodeValue);
$lots = ($node->getElementsByTagName("ContractValue")->item(0)->nodeValue);
$price = ($node->getElementsByTagName("Price")->item(0)->nodeValue);
$confnumber = ($node->getElementsByTagName("ConfirmationNumber")->item(0)->nodeValue);
$status = ($node->getElementsByTagName("Status")->item(0)->nodeValue);
$ltid = ($node->getElementsByTagName("lTID")->item(0)->nodeValue);
$time = ($node->getElementsByTagName("TradeDate")->item(0)->nodeValue);
}
But nothing is getting printed. except the dummy statement.
using $res = $xpath->query("/string/NewDataSet/DealBlotter"); did not help. Also a print_r($res); gives output as DOMNodeList obect.
Doing this also does not print anything
$objDOM = new DOMDocument();
$objDOM->load($result);
$note = $objDOM->getElementsByTagName("DealBlotter");
foreach( $note as $value )
{
print "hello";
$tasks = $value->getElementsByTagName("Symbol");
$task = (string)$tasks->item(0)->nodeValue;
$details = $value->getElementsByTagName("Status");
$detail = (string)$details->item(0)->nodeValue;
print "$task :: $detail <br>";
}
There are a few problems.
With how you're loading the xml. Get rid of the simplexml line. It's not needed, and is messing things up. Instead just do $dom->loadXml($result);. There's no reason to load SimpleXML first if you're going to pass it directly into DomDocument.
With your query, the / operator is the direct decendent operator. So it means directly next to. So your first tag should be the root. So either add the root onto it:
$res = $xpath->query("/string/NewDataSet/DealBlotter");
Or make the leading slash into // which selects any matching decendent:
$res = $xpath->query("//NewDataSet/DealBlotter");
And finally, doing a var_dump on $res isn't going to tell you much. Instead, I like to do var_dump($res->length) since it'll tell you how many matches it has rather than that it's a domnodelist (which you already know)...