Building XML with PHP - Performance in mind - php

When building XML in PHP, is it quicker to build a string, then echo out the string or to use the XML functions that php gives you? Currently I'm doing the following:
UPDATED to better code snippet:
$searchParam = mysql_real_escape_string($_POST['s']);
$search = new Search($searchParam);
if($search->retResult()>0){
$xmlRes = $search->buildXML();
}
else {
$xmlRes = '<status>no results</status>';
}
$xml = "<?xml version=\"1.0\"?>";
$xml.="<results>";
$xml.=$xmlRes;
$xml.="</results>"
header ("content-type: text/xml");
header ("content-length: ".strlen($xml));
echo($xml);
class Search {
private $num;
private $q;
function __construct($s){
$this->q = mysql_query('select * from foo_table where name = "'.$s.'"');
$this->num = mysql_num_rows($this->q);
}
function retResult(){
return $this->num;
}
function buildXML(){
$xml ='<status>success</status>';
$xml.='<items>';
while($row = mysql_fetch_object($this->q)){
$xml.='<item>';
$desTag = '<info><![CDATA[';
foreach ($row as $key => $current){
if($key=='fob'){
//do something with current
$b = mysql_query('select blah from dddd where id ='.$current);
$a = mysql_fetch_array($b);
$xml.='<'.$key.'>'.$a['blah'].'</'.$key.'>';
}
else if($key =='this' || $key=='that'){
$desTag = ' '.$current;
}
else {
$xml.='<'.$key.'>'.$current.'</'.$key.'>';
}
}
$desTag.= ']]></info>';
$xml.=$desTag;
$xml.='</item>';
}
$xml.='</items>';
return $xml;
}
}
Is there a faster way of building the xml? I get to about 2000 items and it starts to slow down..
Thanks in advance!

Use the xml parser. Remember when you concatenate a string, you have to reallocate the WHOLE STRING on every concatenation.
For small strings, string is is probably faster, but in your case definitely use the XML functions.

I don't see that you're making no attempt to escape the text before concatenating it. Which means that sooner or later you're going to generate something that is almost-but-not-quite XML, and which will be rejected by any conforming parser.
Use a library (XMLWriter is probably more performant than others, but I haven't done XML with PHP).

You have a SQL query inside of a loop, which is usually quite a bad idea. Even if each query takes half a millisecond to complete, it's still a whole second just to execute those 2000 queries.
What you need to do is post the two queries in a new question so that someone can show you how to turn them into a single query using a JOIN.
Database stuff usually largely outweighs any kind of micro-optimization. Whether you use string concatenation or XMLWriter doesn't matter when you're executing several thousand queries.

try to echo in each iteration (put the echo $xml before the while loop ends, and reset $xml at the beggining), should be quicker

That code snippet doesn't make a lot of sense, please post some actual code, reduced for readability.
A faster version of the code you posted would be
$xml = '';
while ($row = mysql_fetch_row($result))
{
$xml .= '<items><test>' . implode('</test><test>', $row) . '</test></items>';
}
In general, using mysql_fetch_object() is slightly slower than the other options.
Perhaps what you were trying to do was something like this:
$xml = '<items>';
while ($row = mysql_fetch_assoc($result))
{
$xml .= '<item>';
foreach ($row as $k => $v)
{
$xml .= '<' . $k . '>' . htmlspecialchars($v) . '</' . $v . '>';
}
$xml .= '</item>';
}
$xml .= '</items>';
As mentionned elsewhere, you have to escape the values unless you're 100% sure there will never be any special character such as "<" ">" or "&". This also applies to $k actually. In that kind of script, it is also generally more performant to use XML attributes instead of nodes.
With so little information about your goal, all we can do is micro-optimize. Perhaps you should work on the principles behind your script instead? For instance, do you really have to generate 2000 items. Can you cache the result, can you cache anything? Can't you paginate the result, etc...
Quick word about using PHP's XML libraries, XMLWriter will generally be slightly slower than using string manipulation. Everything else will be noticeably slower than strings.

Related

PHP json_encode slow for big arrays

I have some json_encode related issues : i need to use a big array (several 100k items), each with very simple structure (one key, one string value).
json_decode works ok, but as soon as i want to json_encode it, it's awfully slow.
Since i fully control the data here, i tried to write a super simple json encoder, and it's fast.
I'm quite surprised, since my encoding function is crude, and oes not have any of the inner php optimizations that are quite certainly present in json_encode.
Any idea what the problem might be ?
I put my encoder function below for reference.
Thanks
protected function simpleJsonEncoder($data) {
if (is_array($data)) {
$is_indexed = (array_values($data) === $data);
$tab_str = [];
if ($is_indexed) {
foreach($data as $item) {
$str_item = $this->simpleJsonEncoder($item);
$tab_str[] = $str_item;
}
$result = '[' . implode(',', $tab_str) . ']';
}
else {
foreach($data as $index => $item) {
$str_item = $this->simpleJsonEncoder($item);
$tab_str[] = '"' . htmlspecialchars($index, ENT_QUOTES) . '":' . $str_item;
}
$result = '{' . implode(',', $tab_str) . '}';
}
}
else {
$result = '"' . htmlspecialchars($data, ENT_QUOTES) . '"';
}
return $result;
}
For posterity: I've been trying to find alternatives to json_encode for syncing large amounts of data, serialize is way quicker but returns a string much larger in size obviously. I stumbled across this page. I tried out this function - the md5 hash is different from json_encode and the time difference is negligible. From everything I've read recently, they've optimized json_encode somewhere along the line.
I'm on PHP 7.3 and time is in seconds (big object)
"user_func_hash": "xxx",
"user_func_time": 45.33081293106079,
"json_encode_hash": "yyy",
"json_encode_time": 45.759231090545654

Get XML tags from asXML()

I am parsing through an XML document and getting the values of nested tags using asXML(). This works fine, but I would like to move this data into a MySQL database whose columns match the tags of the file. So essentially how do I get the tags that asXML() is pulling text from?
This way I can eventually do something like: INSERT INTO db.table (TheXMLTag) VALUES ('XMLTagText');
This is my code as of now:
$xml = simplexml_load_file($target_file) or die ("Error: Cannot create object");
foreach ($xml->Message->SettlementReport->SettlementData as $main ){
$value = $main->asXML();
echo '<pre>'; echo $value; echo '</pre>';
}
foreach ($xml->Message->SettlementReport->Order as $main ){
$value = $main->asXML();
echo '<pre>'; echo $value; echo '</pre>';
}
This is what my file looks like to give you an idea (So essentially how do I get the tags within [SettlementData], [0], [Fulfillment], [Item], etc. ?):
I would like to move this data into a MySQL database whose columns match the tags of the file.
Your problem is two folded.
The first part of the problem is to do the introspection on the database structure. That is, obtain all table names and obtain the column names of these. Most modern databases offer this functionality, so does MySQL. In MySQL those are the INFORMATION_SCHEMA Tables. You can query them as if those were normal database tables. I generally recommend PDO for that in PHP, mysqli is naturally doing the job perfectly as well.
The second part is parsing the XML data and mapping it's data onto the database tables (you use SimpleXMLElement for that in your question so I related to it specifically). For that you first of all need to find out how you would like to map the data from the XML onto the database. An XML file does not have a 2D structure like a relational database table, but it has a tree structure.
For example (if I read your question right) you identify Message->SettlementReport->SettlementData as the first "table". For that specific example it is easy as the <SettlementData> only has child-elements that could represent a column name (the element name) and value (the text-content). For that it is easy:
header('Content-Type: text/plain; charset=utf-8');
$table = $xml->Message->SettlementReport->SettlementData;
foreach ($table as $name => $value ) {
echo $name, ': ', $value, "\n";
}
As you can see, specifying the key assignment in the foreach clause will give you the element name with SimpleXMLElement. Alternatively, the SimpleXMLElement::getName() method does the same (just an example which does the same just with slightly different code):
header('Content-Type: text/plain; charset=utf-8');
$table = $xml->Message->SettlementReport->SettlementData;
foreach ($table as $value) {
$name = $value->getName();
echo $name, ': ', $value, "\n";
}
In this case you benefit from the fact that the Iterator provided in the foreach of the SimpleXMLElement you access via $xml->...->SettlementData traverses all child-elements.
A more generic concept would be Xpath here. So bear with me presenting you a third example which - again - does a similar output:
header('Content-Type: text/plain; charset=utf-8');
$rows = $xml->xpath('/*/Message/SettlementReport/SettlementData');
foreach ($rows as $row) {
foreach ($row as $column) {
$name = $column->getName();
$value = (string) $column;
echo $name, ': ', $value, "\n";
}
}
However, as mentioned earlier, mapping a tree-structure (N-Depth) onto a 2D-structure (a database table) might now always be that straight forward.
If you're looking what could be an outcome (there will most often be data-loss or data-duplication) a more complex PHP example is given in a previous Q&A:
How excel reads XML file?
PHP XML to dynamic table
Please note: As the matter of fact such mappings on it's own can be complex, the questions and answers inherit from that complexity. This first of all means those might not be easy to read but also - perhaps more prominently - might just not apply to your question. Those are merely to broaden your view and provide and some examples for certain scenarios.
I hope this is helpful, please provide any feedback in form of comments below. Your problem might or might not be less problematic, so this hopefully helps you to decide how/where to go on.
I tried with SimpleXML but it skips text data. However, using the Document Object Model extension works.
This returns an array where each element is an array with 2 keys: tag and text, returned in the order in which the tree is walked.
<?php
// recursive, pass by reference (spare memory ? meh...)
// can skip non tag elements (removes lots of empty elements)
function tagData(&$node, $skipNonTag=false) {
// get function name, allows to rename function without too much work
$self = __FUNCTION__;
// init
$out = array();
$innerXML = '';
// get document
$doc = $node->nodeName == '#document'
? $node
: $node->ownerDocument;
// current tag
// we use a reference to innerXML to fill it later to keep the tree order
// without ref, this would go after the loop, children would appear first
// not really important but we never know
if(!(mb_substr($node->nodeName,0,1) == '#' && $skipNonTag)) {
$out[] = array(
'tag' => $node->nodeName,
'text' => &$innerXML,
);
}
// build current innerXML and process children
// check for children
if($node->hasChildNodes()) {
// process children
foreach($node->childNodes as $child) {
// build current innerXML
$innerXML .= $doc->saveXML($child);
// repeat process with children
$out = array_merge($out, $self($child, $skipNonTag));
}
}
// return current + children
return $out;
}
$xml = new DOMDocument();
$xml->load($target_file) or die ("Error: Cannot load xml");
$tags = tagData($xml, true);
//print_r($tags);
?>

Ampersands in database

I am trying to write a php function that goes to my database and pulls a list of URLS and arranges them into an xml structure and creates an xml file.
Problem is, Some of these urls will contain an ampersand that ARE HTML encoded. So, the database is good, but currently, when my function tries to grab these URLS, the script will stop at the ampersands and not finish.
One example link from database:
http://www.mysite.com/myfile.php?select=on&league_id=8&sport=15
function buildXML($con) {
//build xml file
$sql = "SELECT * FROM url_links";
$res = mysql_query($sql,$con);
$gameArray = array ();
while ($row = mysql_fetch_array($res))
{
array_push($row['form_link']);
}
$xml = '<?xml version="1.0" encoding="utf-8"?><channel>';
foreach ($gameArray as $link)
{
$xml .= "<item><link>".$link."</link></item>";
}
$xml .= '</channel>';
file_put_contents('../xml/full_rankings.xml',$xml);
}
mysql_close($con);
session_write_close();
If i need to alter the links in the database, that can be done.
You can use PHP's html_entity_decode() on the $link to convert & back to &.
In your XML, you could also wrap the link in <![CDATA[]]> to allow it to contain the characters.
$xml .= "<item><link><![CDATA[" . html_entity_decode($link) . "]]></link></item>";
UPDATE
Just noticed you're actually not putting anything into the $gameArray:
array_push($row['form_link']);
Try:
$gameArray[] = $row['form_link'];
* #Musa looks to have noticed it first, for due credit.
Look at this line
array_push($row['form_link']);
you never put anything in the $gameArray array, it should be
array_push($gameArray, $row['form_link']);
You need to use htmlspecialchars_decode. It will decode any encoded special characters in string passed to it.
This is most likely what you are looking for:
http://www.php.net/manual/en/function.mysql-real-escape-string.php
Read the documentation, there are examples at the bottom of the page...
'&' in oracleSQL and MySQL are used in queries as a logical operator which is why it is tossing an error.
You may also want to decode the HTML...

Passing Object Operators As Strings (PHP)

I'm building a script that takes the contents of several (~13) news feeds and parses the XML data and inserts the records into a database. Since I don't have any control over the structure of the feeds, I need to tailor an object operator for each one to drill down into the structure in order to get the information I need.
The script works just fine if the target node is one step below the root, but if my string contains a second step, it fails ( 'foo' works, but 'foo->bar' fails). I've tried escaping characters and eval(), but I feel like I'm missing something glaringly obvious. Any help would be greatly appreciated.
// Roadmaps for xml navigation
$roadmap[1] = "deal"; // works
$roadmap[2] = "channel->item"; // fails
$roadmap[3] = "deals->deal";
$roadmap[4] = "resource";
$roadmap[5] = "object";
$roadmap[6] = "product";
$roadmap[8] = "channel->deal";
$roadmap[13] = "channel->item";
$roadmap[20] = "product";
$xmlSource = $xmlURL[$fID];
$xml=simplexml_load_file($xmlSource) or die(mysql_error());
if (!(empty($xml))) {
foreach($xml->$roadmap[$fID] as $div) {
include('./_'.$incName.'/feedVars.php');
include('./_includes/masterCategory.php.inc');
$test = sqlVendors($vendorName);
} // end foreach
echo $vUpdated." records updated.<br>";
echo $vInserted." records Inserted.<br><br>";
} else {
echo $xmlSource." returned an empty set!";
} // END IF empty $xml result
While Fosco's solution will work, it is indeed very dirty.
How about using xpath instead of object properties?
$xml->xpath('deals/deal');
PHP isn't going to magically turn your string which includes -> into a second level search.
Quick and dirty hack...
eval("\$node = \"\$xml->" . $roadmap[$fID] . "\";");
foreach($node as $div) {

Parsing XML with PHP (simplexml)

Firstly, may I point out that I am a newcomer to all things PHP so apologies if anything here is unclear and I'm afraid the more layman the response the better. I've been having real trouble parsing an xml file in to php to then populate an HTML table for my website. At the moment, I have been able to get the full xml feed in to a string which I can then echo and view and all seems well. I then thought I would be able to use simplexml to pick out specific elements and print their content but have been unable to do this.
The xml feed will be constantly changing (structure remaining the same) and is in compressed format. From various sources I've identified the following commands to get my feed in to the right format within a string although I am still unable to print specific elements. I've tried every combination without any luck and suspect I may be barking up the wrong tree. Could someone please point me in the right direction?!
$file = fopen("compress.zlib://$url", 'r');
$xmlstr = file_get_contents($url);
$xml = new SimpleXMLElement($url,null,true);
foreach($xml as $name) {
echo "{$name->awCat}\r\n";
}
Many, many thanks in advance,
Chris
PS The actual feed
Since no one followed my closevote, I think I can just as well put my own comments as an answer:
First of all, SimpleXml can load URIs directly and it can do so with stream wrappers, so your three calls in the beginning can be shortened to (note that you are not using $file at all)
$merchantProductFeed = new SimpleXMLElement("compress.zlib://$url", null, TRUE);
To get the values you can either use the implicit SimpleXml API and drill down to the wanted elements (like shown multiple times elsewhere on the site):
foreach ($merchantProductFeed->merchant->prod as $prod) {
echo $prod->cat->awCat , PHP_EOL;
}
or you can use an XPath query to get at the wanted elements directly
$xml = new SimpleXMLElement("compress.zlib://$url", null, TRUE);
foreach ($xml->xpath('/merchantProductFeed/merchant/prod/cat/awCat') as $awCat) {
echo $awCat, PHP_EOL;
}
Live Demo
Note that fetching all $awCat elements from the source XML is rather pointless though, because all of them have "Bodycare & Fitness" for value. Of course you can also mix XPath and the implict API and just fetch the prod elements and then drill down to the various children of them.
Using XPath should be somewhat faster than iterating over the SimpleXmlElement object graph. Though it should be noted that the difference is in an neglectable area (read 0.000x vs 0.000y) for your feed. Still, if you plan to do more XML work, it pays off to familiarize yourself with XPath, because it's quite powerful. Think of it as SQL for XML.
For additional examples see
A simple program to CRUD node and node values of xml file and
PHP Manual - SimpleXml Basic Examples
Try this...
$url = "http://datafeed.api.productserve.com/datafeed/download/apikey/58bc4442611e03a13eca07d83607f851/cid/97,98,142,144,146,129,595,539,147,149,613,626,135,163,168,159,169,161,167,170,137,171,548,174,183,178,179,175,172,623,139,614,189,194,141,205,198,206,203,208,199,204,201,61,62,72,73,71,74,75,76,77,78,79,63,80,82,64,83,84,85,65,86,87,88,90,89,91,67,92,94,33,54,53,57,58,52,603,60,56,66,128,130,133,212,207,209,210,211,68,69,213,216,217,218,219,220,221,223,70,224,225,226,227,228,229,4,5,10,11,537,13,19,15,14,18,6,551,20,21,22,23,24,25,26,7,30,29,32,619,34,8,35,618,40,38,42,43,9,45,46,651,47,49,50,634,230,231,538,235,550,240,239,241,556,245,244,242,521,576,575,577,579,281,283,554,285,555,303,304,286,282,287,288,173,193,637,639,640,642,643,644,641,650,177,379,648,181,645,384,387,646,598,611,391,393,647,395,631,602,570,600,405,187,411,412,413,414,415,416,649,418,419,420,99,100,101,107,110,111,113,114,115,116,118,121,122,127,581,624,123,594,125,421,604,599,422,530,434,532,428,474,475,476,477,423,608,437,438,440,441,442,444,446,447,607,424,451,448,453,449,452,450,425,455,457,459,460,456,458,426,616,463,464,465,466,467,427,625,597,473,469,617,470,429,430,615,483,484,485,487,488,529,596,431,432,489,490,361,633,362,366,367,368,371,369,363,372,373,374,377,375,536,535,364,378,380,381,365,383,385,386,390,392,394,396,397,399,402,404,406,407,540,542,544,546,547,246,558,247,252,559,255,248,256,265,259,632,260,261,262,557,249,266,267,268,269,612,251,277,250,272,270,271,273,561,560,347,348,354,350,352,349,355,356,357,358,359,360,586,590,592,588,591,589,328,629,330,338,493,635,495,507,563,564,567,569,568/mid/2891/columns/merchant_id,merchant_name,aw_product_id,merchant_product_id,product_name,description,category_id,category_name,merchant_category,aw_deep_link,aw_image_url,search_price,delivery_cost,merchant_deep_link,merchant_image_url/format/xml/compression/gzip/";
$zd = gzopen($url, "r");
$data = gzread($zd, 1000000);
gzclose($zd);
if ($data !== false) {
$xml = simplexml_load_string($data);
foreach ($xml->merchant->prod as $pr) {
echo $pr->cat->awCat . "<br>";
}
}
<?php
$xmlstr = file_get_contents("compress.zlib://$url");
$xml = simplexml_load_string($xmlstr);
// you can transverse the xml tree however you want
foreach ($xml->merchant->prod as $line) {
// $line->cat->awCat -> you can use this
}
more information here
Use print_r($xml) to see the structure of the parsed XML feed.
Then it becomes obvious how you would traverse it:
foreach ($xml->merchant->prod as $prod) {
print $prod->pId;
print $prod->text->name;
print $prod->cat->awCat; # <-- which is what you wanted
print $prod->price->buynow;
}
$url = 'you url here';
$f = gzopen ($url, 'r');
$xml = new SimpleXMLElement (fread ($f, 1000000));
foreach($xml->xpath ('//prod') as $name)
{
echo (string) $name->cat->awCatId, "\r\n";
}

Categories