Sort OPML with simplexml

Sort OPML with simplexml - php

I've read through a bunch of posts here but I still can't figure out how to sort the data I am reading from an OPML file using simplexml functions. I realize this is kind of a duplicate, but I'm apparently too slow to get this right using abstracted examples.
I have a pretty standard OPML file with contents like:
<?xml version="1.0" encoding="UTF-8"?>
<opml version="1.0">
<!-- OPML generated by Fever -->
<head><title>Fever // Coffee</title></head>
<body>
<outline type="rss" text="I Love Coffee" title="I Love Coffee" xmlUrl="http://en.ilovecoffee.jp/posts/rss" htmlUrl="http://www.ilovecoffee.jp"/>
<outline type="rss" text="Dear Coffee I Love You" title="Dear Coffee I Love You" xmlUrl="http://feeds.feedburner.com/DearCoffeeILoveYou" htmlUrl="http://www.dearcoffeeiloveyou.com"/>
</body>
</opml>
I am generating a Markdown list using the simplest possible code:
foreach ($opml->body->outline as $feed) {
echo '* [' . $feed['title'] . '](' . $feed[htmlUrl] . ')' . "\n";
}
I simply want to sort the list by the "title" attribute, but I can't get my head around how to do so.
It's my understanding I need to convert the xml object into an array, which I can do with:
$json = json_encode($opml);
$xml_array = json_decode($json,TRUE);
But I can't seem to get things right to sort that array by the "title"

Rather than trying to blindly convert the whole XML document into an array (which will rarely be a particularly useful array), you should build an array with the items you want:
$feed_list = array();
foreach ($opml->body->outline as $feed) {
$feed_list[] = $feed;
}
You can then use usort() to sort the items by whatever condition you want, e.g. using strcasecmp() to give a case-insensitive alphanumeric order:
usort($feed_list, function($a, $b) {
// $a and $b are SimpleXMLElement objects which are being sorted
return strcasecmp( (string)$a['title'], (string)$b['title'] );
});
You now have a sorted array to pass to your existing display logic:
foreach ( $feed_list as $feed ) {
echo '* [' . $feed['title'] . '](' . $feed['htmlUrl'] . ')' . "\n";
}

Related

Separating XML products by Category in PHP

I read an XML file in PHP by
$xml=simplexml_load_file("./files/downloaded.xml");
This file is having many products with different categories.
I want to separate them with respect to their categories.
Here is the look of the read file view by the following code.
print "<pre>";
print_r($xml);
print "</pre>";
I separated the products by the following code
$baby = array();
for($x=0;$x < count($xml->product); $x++)
{
if( preg_match("#Groceries > ([ a-zA-Z0-9]+) >#i",$xml->product[$x]->category,$match) )
{
$match[1] = str_replace(" ", "" , strtolower($match[1]) );
if($match[1] == "baby"){
$baby[] = $xml->product[$x];
}
}
}
and it has been separated in an array named as $baby and here is the view of the baby array by the following code
print "<pre>";
print_r($baby);
print "</pre>";
Now I want to save this as baby.xml and baby.json file but I don't know how to save this.
I tried this code to save these files
$baby_json = simplexml_load_string($baby);
$json = json_encode($baby_json);
file_put_contents("./files/foodcupobard.json",$json);
file_put_contents("./files/foodcupobard.xml",$baby);
But it is not working after separation.
Here is the code which works before separation
$xml=simplexml_load_file("./files/downloaded.xml");
$xml_json = simplexml_load_string($xml);
$json = json_encode($xml_json);
file_put_contents("./files/baby.json",$json);
file_put_contents("./files/baby.xml",$xml);
The reason it is not working after separation is that after separation {$baby} becomes an array instead of SimpleXMLElement Object. Can anyone help me to save these separated products into an baby.xml and baby.json files ? or any other way to separate these products with php code ?
Any Help would be much appreciated!
Thanks :)

Manipulate the original SimpleXml object instead of creating an array.
Then save as XML with $xml->asXML($filename);
Use xpath to select <product> nodes with a certain <category>. xpath is like SQL for XML:
/products/product[starts-with(category, 'Foo > Bar >')]
Comments:
expression will return all <product> having a <category> starting with "Foo > Bar >"
[] enclose a condition.
you could use the contains function instead of start-with
code example:
$products = $xml->xpath("/products/product[starts-with(category, 'Foo > Bar >')]");
BUT $products is an array of SimpleXml elements, but no SimpleXml object, so asXML() won't work here.
Solution 1:
select all <product> that are NOT in the desired category
delete those from $xml
save with asXML()
code example:
$products = $xml->xpath("/products/product[not(starts-with(category, 'Foo > Bar >'))]");
foreach ($products as $product)
unset($product[0]);
This is the self-reference-technique to delete a node with unset.
show the manipulated XML:
echo $xml->asXML();
see it working: https://eval.in/512140
Solution 2
Go with the original $products and build a new XML string from it.
foreach ($products as $product)
$newxmlstr = $newxmlstr . $product->asXML();
$newxmlstr = "<products>" . $newxmlstr . "</products>";
see it working: https://eval.in/512153
I prefer solution 1. XML manipulation by string functions carry the risk of error. If the original XML is really large, solution 2 might be faster.

Get XML tags from asXML()

I am parsing through an XML document and getting the values of nested tags using asXML(). This works fine, but I would like to move this data into a MySQL database whose columns match the tags of the file. So essentially how do I get the tags that asXML() is pulling text from?
This way I can eventually do something like: INSERT INTO db.table (TheXMLTag) VALUES ('XMLTagText');
This is my code as of now:
$xml = simplexml_load_file($target_file) or die ("Error: Cannot create object");
foreach ($xml->Message->SettlementReport->SettlementData as $main ){
$value = $main->asXML();
echo '<pre>'; echo $value; echo '</pre>';
}
foreach ($xml->Message->SettlementReport->Order as $main ){
$value = $main->asXML();
echo '<pre>'; echo $value; echo '</pre>';
}
This is what my file looks like to give you an idea (So essentially how do I get the tags within [SettlementData], [0], [Fulfillment], [Item], etc. ?):

I would like to move this data into a MySQL database whose columns match the tags of the file.
Your problem is two folded.
The first part of the problem is to do the introspection on the database structure. That is, obtain all table names and obtain the column names of these. Most modern databases offer this functionality, so does MySQL. In MySQL those are the INFORMATION_SCHEMA Tables. You can query them as if those were normal database tables. I generally recommend PDO for that in PHP, mysqli is naturally doing the job perfectly as well.
The second part is parsing the XML data and mapping it's data onto the database tables (you use SimpleXMLElement for that in your question so I related to it specifically). For that you first of all need to find out how you would like to map the data from the XML onto the database. An XML file does not have a 2D structure like a relational database table, but it has a tree structure.
For example (if I read your question right) you identify Message->SettlementReport->SettlementData as the first "table". For that specific example it is easy as the <SettlementData> only has child-elements that could represent a column name (the element name) and value (the text-content). For that it is easy:
header('Content-Type: text/plain; charset=utf-8');
$table = $xml->Message->SettlementReport->SettlementData;
foreach ($table as $name => $value ) {
echo $name, ': ', $value, "\n";
}
As you can see, specifying the key assignment in the foreach clause will give you the element name with SimpleXMLElement. Alternatively, the SimpleXMLElement::getName() method does the same (just an example which does the same just with slightly different code):
header('Content-Type: text/plain; charset=utf-8');
$table = $xml->Message->SettlementReport->SettlementData;
foreach ($table as $value) {
$name = $value->getName();
echo $name, ': ', $value, "\n";
}
In this case you benefit from the fact that the Iterator provided in the foreach of the SimpleXMLElement you access via $xml->...->SettlementData traverses all child-elements.
A more generic concept would be Xpath here. So bear with me presenting you a third example which - again - does a similar output:
header('Content-Type: text/plain; charset=utf-8');
$rows = $xml->xpath('/*/Message/SettlementReport/SettlementData');
foreach ($rows as $row) {
foreach ($row as $column) {
$name = $column->getName();
$value = (string) $column;
echo $name, ': ', $value, "\n";
}
}
However, as mentioned earlier, mapping a tree-structure (N-Depth) onto a 2D-structure (a database table) might now always be that straight forward.
If you're looking what could be an outcome (there will most often be data-loss or data-duplication) a more complex PHP example is given in a previous Q&A:
How excel reads XML file?
PHP XML to dynamic table
Please note: As the matter of fact such mappings on it's own can be complex, the questions and answers inherit from that complexity. This first of all means those might not be easy to read but also - perhaps more prominently - might just not apply to your question. Those are merely to broaden your view and provide and some examples for certain scenarios.
I hope this is helpful, please provide any feedback in form of comments below. Your problem might or might not be less problematic, so this hopefully helps you to decide how/where to go on.

I tried with SimpleXML but it skips text data. However, using the Document Object Model extension works.
This returns an array where each element is an array with 2 keys: tag and text, returned in the order in which the tree is walked.
<?php
// recursive, pass by reference (spare memory ? meh...)
// can skip non tag elements (removes lots of empty elements)
function tagData(&$node, $skipNonTag=false) {
// get function name, allows to rename function without too much work
$self = __FUNCTION__;
// init
$out = array();
$innerXML = '';
// get document
$doc = $node->nodeName == '#document'
? $node
: $node->ownerDocument;
// current tag
// we use a reference to innerXML to fill it later to keep the tree order
// without ref, this would go after the loop, children would appear first
// not really important but we never know
if(!(mb_substr($node->nodeName,0,1) == '#' && $skipNonTag)) {
$out[] = array(
'tag' => $node->nodeName,
'text' => &$innerXML,
);
}
// build current innerXML and process children
// check for children
if($node->hasChildNodes()) {
// process children
foreach($node->childNodes as $child) {
// build current innerXML
$innerXML .= $doc->saveXML($child);
// repeat process with children
$out = array_merge($out, $self($child, $skipNonTag));
}
}
// return current + children
return $out;
}
$xml = new DOMDocument();
$xml->load($target_file) or die ("Error: Cannot load xml");
$tags = tagData($xml, true);
//print_r($tags);
?>

Access a node value according to the attribute [duplicate]

This question already has answers here:
SimpleXML: Selecting Elements Which Have A Certain Attribute Value
(2 answers)
Closed 7 years ago.
I'm having trouble accessing a value from an XML file. Especially I can't extract the FrenchText. All I always get is the value from the GermanText. Does anyone have an idea, how to get to the value of the node containing the french string directly.
Consulting the web, I found this, but it only results in an error
echo $STRUCTURE_ITEM->NAME[#lang="fr"]
Any help would be greatly appreciated.
Here's my code
<?php
$myXMLData = '<?xml version="1.0" encoding="utf-8"?>
<CATALOG>
<CLASSIFICATION>
<STRUCTURE_ITEM>
<KEY>3605</KEY>
<NAME lang="de">GermanText1</NAME>
<NAME lang="fr">FrenchText1</NAME>
<PARENT_ID>worlds</PARENT_ID>
<SORT>0</SORT>
</STRUCTURE_ITEM>
<STRUCTURE_ITEM>
<KEY>3606</KEY>
<NAME lang="de">GermanText2</NAME>
<NAME lang="fr">FrenchText2</NAME>
<PARENT_ID>3605</PARENT_ID>
<SORT>0</SORT>
</STRUCTURE_ITEM>
</CLASSIFICATION>
</CATALOG>';
$xml=simplexml_load_string($myXMLData);
foreach($xml->CLASSIFICATION->children() as $STRUCTURE_ITEM) {
echo $STRUCTURE_ITEM->KEY . ", ";
echo $STRUCTURE_ITEM->NAME . ", ";
echo $STRUCTURE_ITEM->NAME . ", "; // <---- The problem lies here
echo $STRUCTURE_ITEM->PARENT_ID . "<br>";
} ;
?>
EDIT
Thanks for the valuable input. I've tried
$category_name_fr = $xml->xpath('//STRUCTURE_ITEM/NAME[#lang="fr"]');
Now I get the french values, but I get back an array containing all available french text. (FrenchText1, FrenchText2). But I just want the value of the current node.

Let me suggest an approach different from xpath: an inner foreach loop will iterate over all <NAME> and check for the lang attribute:
$lang = "fr";
foreach ($xml->CLASSIFICATION->STRUCTURE_ITEM as $item) {
echo $item->KEY . ", ";
foreach ($item->NAME as $name)
if ($name['lang'] == $lang) echo $name . ", ";
echo $item->PARENT_ID . "<br />";
}
See it in action: https://eval.in/303058
EDIT: If your XML is consistent meaning that de is always first and fr is always second, simply do:
$lang = 1; // 1 = fr, 0 = de
foreach ($xml->CLASSIFICATION->STRUCTURE_ITEM as $item) {
echo $item->KEY . ", ";
echo $item->NAME[$lang] . ", ";
echo $item->PARENT_ID . "<br />";
}
See this in action: https://eval.in/303062

Your xpath formula does something like this:
1. Starts from document root and proceeds in direction called 'descendant-or-self' (shortcut is //)
2. From this context searches in 'children' direction which is default (notice, that direction is not given after /) and looks for element called STRUCTURE_ITEM
3. From updated context (STRUCTURE_ITEM elements), xpath once more seeks for children elements, called NAME
4. Finally all nodes that fits ale checked with additional test (placed in []). Test says - check if there are nodes in 'attributes' direction (shortcut #). Nodes must be named 'fr'.
Because multiple nodes fits, results are concatenated. If you wish to extract NAME elements separately, try to shrink your initial context (don't start from document root [//]).
#edit:
this seems helpful: XPath query return multiple nodes on different levels

Breaking foreach at certain string/ Reading through text file and generating XML

I don't know if this is the right way to go about it, but right now I am dealing with a very large text file of membership details. It is really inconsistent though, but typically conforming to this format:
Name
School
Department
Address
Phone
Email
&&^ (indicating the end of the individual record)
What I want to do with this information is read through it, and then format it into XML.
So right now I have a foreach reading through the long file like this:
<?php
$textline = file("asrlist.txt");
foreach($textline as $showline){
echo $showline . "<br>";
}
?>
And that's where I don't know how to continue. Can anybody give me some hints on how I could organize these records into XML?

Here a straightforward solution using simplexml:
$members = explode('&&^', $textline); // building array $members
$xml = new SimpleXMLElement("<?xml version="1.0" encoding="UTF-8"?><members></members>");
$fieldnames = array('name','school','department','address','phone','email');
// set $fieldsep to character(s) that seperate fields from each other in your textfile
$fieldsep = '\p\n'; // a wild guess...
foreach ($members as $member) {
$m = explode($fieldsep, $member); // build array $m; $m[0] would contain "name" etc.
$xmlmember = $xml->addChild('member');
foreach ($m as $key => $data)
$xmlmember->addChild($fieldnames[$key],$data);
} // foreach $members
$xml->asXML('mymembers.xml');
For reading and parsing the text-file, CSV-related functions could be a good alternative, as mentioned by other users.

To read big files you can use fgetcsv

If && works as a delimiter for records in that file, you could start with replacing it with </member><member>. Prepend whole file with <member> and append </member> at the end. You will have something XML alike.
How to replace?
You might find unix tools like sed useful.
sed 's/&&/\<\/member\>\<member\>/' <input.txt >output.xml
You can also accomplish it with PHP, using str_replace():
foreach($textline as $showline){
echo str_replace( '&&', '</member><member>', $showline ) . "<br>";
}

Parsing XML with PHP (simplexml)

Firstly, may I point out that I am a newcomer to all things PHP so apologies if anything here is unclear and I'm afraid the more layman the response the better. I've been having real trouble parsing an xml file in to php to then populate an HTML table for my website. At the moment, I have been able to get the full xml feed in to a string which I can then echo and view and all seems well. I then thought I would be able to use simplexml to pick out specific elements and print their content but have been unable to do this.
The xml feed will be constantly changing (structure remaining the same) and is in compressed format. From various sources I've identified the following commands to get my feed in to the right format within a string although I am still unable to print specific elements. I've tried every combination without any luck and suspect I may be barking up the wrong tree. Could someone please point me in the right direction?!
$file = fopen("compress.zlib://$url", 'r');
$xmlstr = file_get_contents($url);
$xml = new SimpleXMLElement($url,null,true);
foreach($xml as $name) {
echo "{$name->awCat}\r\n";
}
Many, many thanks in advance,
Chris
PS The actual feed

Since no one followed my closevote, I think I can just as well put my own comments as an answer:
First of all, SimpleXml can load URIs directly and it can do so with stream wrappers, so your three calls in the beginning can be shortened to (note that you are not using $file at all)
$merchantProductFeed = new SimpleXMLElement("compress.zlib://$url", null, TRUE);
To get the values you can either use the implicit SimpleXml API and drill down to the wanted elements (like shown multiple times elsewhere on the site):
foreach ($merchantProductFeed->merchant->prod as $prod) {
echo $prod->cat->awCat , PHP_EOL;
}
or you can use an XPath query to get at the wanted elements directly
$xml = new SimpleXMLElement("compress.zlib://$url", null, TRUE);
foreach ($xml->xpath('/merchantProductFeed/merchant/prod/cat/awCat') as $awCat) {
echo $awCat, PHP_EOL;
}
Live Demo
Note that fetching all $awCat elements from the source XML is rather pointless though, because all of them have "Bodycare & Fitness" for value. Of course you can also mix XPath and the implict API and just fetch the prod elements and then drill down to the various children of them.
Using XPath should be somewhat faster than iterating over the SimpleXmlElement object graph. Though it should be noted that the difference is in an neglectable area (read 0.000x vs 0.000y) for your feed. Still, if you plan to do more XML work, it pays off to familiarize yourself with XPath, because it's quite powerful. Think of it as SQL for XML.
For additional examples see
A simple program to CRUD node and node values of xml file and
PHP Manual - SimpleXml Basic Examples

Try this...
$url = "http://datafeed.api.productserve.com/datafeed/download/apikey/58bc4442611e03a13eca07d83607f851/cid/97,98,142,144,146,129,595,539,147,149,613,626,135,163,168,159,169,161,167,170,137,171,548,174,183,178,179,175,172,623,139,614,189,194,141,205,198,206,203,208,199,204,201,61,62,72,73,71,74,75,76,77,78,79,63,80,82,64,83,84,85,65,86,87,88,90,89,91,67,92,94,33,54,53,57,58,52,603,60,56,66,128,130,133,212,207,209,210,211,68,69,213,216,217,218,219,220,221,223,70,224,225,226,227,228,229,4,5,10,11,537,13,19,15,14,18,6,551,20,21,22,23,24,25,26,7,30,29,32,619,34,8,35,618,40,38,42,43,9,45,46,651,47,49,50,634,230,231,538,235,550,240,239,241,556,245,244,242,521,576,575,577,579,281,283,554,285,555,303,304,286,282,287,288,173,193,637,639,640,642,643,644,641,650,177,379,648,181,645,384,387,646,598,611,391,393,647,395,631,602,570,600,405,187,411,412,413,414,415,416,649,418,419,420,99,100,101,107,110,111,113,114,115,116,118,121,122,127,581,624,123,594,125,421,604,599,422,530,434,532,428,474,475,476,477,423,608,437,438,440,441,442,444,446,447,607,424,451,448,453,449,452,450,425,455,457,459,460,456,458,426,616,463,464,465,466,467,427,625,597,473,469,617,470,429,430,615,483,484,485,487,488,529,596,431,432,489,490,361,633,362,366,367,368,371,369,363,372,373,374,377,375,536,535,364,378,380,381,365,383,385,386,390,392,394,396,397,399,402,404,406,407,540,542,544,546,547,246,558,247,252,559,255,248,256,265,259,632,260,261,262,557,249,266,267,268,269,612,251,277,250,272,270,271,273,561,560,347,348,354,350,352,349,355,356,357,358,359,360,586,590,592,588,591,589,328,629,330,338,493,635,495,507,563,564,567,569,568/mid/2891/columns/merchant_id,merchant_name,aw_product_id,merchant_product_id,product_name,description,category_id,category_name,merchant_category,aw_deep_link,aw_image_url,search_price,delivery_cost,merchant_deep_link,merchant_image_url/format/xml/compression/gzip/";
$zd = gzopen($url, "r");
$data = gzread($zd, 1000000);
gzclose($zd);
if ($data !== false) {
$xml = simplexml_load_string($data);
foreach ($xml->merchant->prod as $pr) {
echo $pr->cat->awCat . "<br>";
}
}

<?php
$xmlstr = file_get_contents("compress.zlib://$url");
$xml = simplexml_load_string($xmlstr);
// you can transverse the xml tree however you want
foreach ($xml->merchant->prod as $line) {
// $line->cat->awCat -> you can use this
}
more information here

Use print_r($xml) to see the structure of the parsed XML feed.
Then it becomes obvious how you would traverse it:
foreach ($xml->merchant->prod as $prod) {
print $prod->pId;
print $prod->text->name;
print $prod->cat->awCat; # <-- which is what you wanted
print $prod->price->buynow;
}

$url = 'you url here';
$f = gzopen ($url, 'r');
$xml = new SimpleXMLElement (fread ($f, 1000000));
foreach($xml->xpath ('//prod') as $name)
{
echo (string) $name->cat->awCatId, "\r\n";
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Sort OPML with simplexml - php

Related

Separating XML products by Category in PHP

Get XML tags from asXML()

Access a node value according to the attribute [duplicate]

Breaking foreach at certain string/ Reading through text file and generating XML

Parsing XML with PHP (simplexml)

Categories

Resources