php and XML range into array - php

The code below helps be to get the WHOLE XML and put it into an array. What I'm wondering is, what would be a good way to get the XML only from item 3 - 6 or any arbitrary range instead of the whole document.
$d = new DOMDocument();
$d->load('http://news.google.com/?output=rss');
foreach ($d->getElementsByTagName('item') as $t) {
$list = array ( 'title' => $t->getElementsByTagName('title')->item(0)->nodeValue);
array_push($mt_arr, $list);
}
Thanks

You can use Xpath.
You can either use DOMXpath or use the xpath method to create an Xpath query that will return the subset of nodes.
$d->xpath('/SOME/XPATH/STATEMENT');

Related

Complex Loop through a complex SimpleXMLElement

I need to save some values from XML.
First step - I get the structure:
$xml = $dom_xml->saveXML();
$xml_ = new \SimpleXMLElement($xml);
dd($xml_);
Here TextFrame has 8 arrays. Each of them has PathPointType, which has
4 more arrays with 3 attributes each. And these attributes I need from each TextFrame.
I can get, for instance, Anchor value doing this:
$res = $xml_
->Spread
->TextFrame
->Properties
->PathGeometry
->GeometryPathType
->PathPointArray
->PathPointType
->attributes();
dd($res['Anchor']);
(BTW: is there more prettier way to get it?)
But the question is - how is it possible to loop through all arrays and save values separately for each array?
I assume here has to be a multidimensional foreach loop in conjunction with for loop?
Or is better to achieve it using DOMDocument?
As it looks as though you are starting off with DOMDocument (as you are using $dom_xml->saveXML() to generate the XML), it may be easier to continue using it and it also has some easy features for getting the details your after.
Using getElementsByTagName() allows you to get a list of the elements with a specific tag name from a start point, so starting with $dom_xml, get all of the <TextFrame> elements. Then foreach() over this list and using this element as a start point, use getElementsByTagName("PathPointType") to get the nested <PathPointType> elements. At this point you can then use getAttribute("Anchor") for each of the attributes you need from the <PathPointType> elements...
$textFrames = $dom_xml->getElementsByTagName("TextFrame");
foreach ( $textFrames as $frame ) {
$pathPointTypes = $frame->getElementsByTagName("PathPointType");
foreach ( $pathPointTypes as $type ) {
echo $type->getAttribute("Anchor").PHP_EOL;
}
}
Edit
You can extend the code to build an array of frames and then the anchors within that. This code also stores the anchor in an associative array so that if you add the other attributes, you can add them here (or remove it if you don't need another layer of detail)...
$frames =[];
foreach ( $textFrames as $frame ) {
$anchors = [];
$pathPointTypes = $frame->getElementsByTagName("PathPointType");
foreach ( $pathPointTypes as $type ) {
$anchors[] = ['Anchor' => $type->getAttribute("Anchor")];
}
$frames[] = $anchors;
}
Also if you have some way of identifying the frames, you could create an associative array at that level as well...
$frames[$frameID] = $anchors;
As a complement to the existing answer from Nigel Ren, I thought I'd show how the same loops look with SimpleXML.
Firstly, note that you don't need to convert the XML to string and back if you want to switch between DOM and SimpleXML for any reason, you can use simplexml_import_dom which just swaps out the interface:
$sxml = simplexml_import_dom($dom_xml);
Next we need our TextFrame elements; we could either step through the structure explicitly, as you had before:
$textFrames = $sxml->Spread->TextFrame;
Or we could use XPath to search for matching tag names within our current node (. is the current element, and // means "any descendant":
$textFrames = $sxml->xpath('.//TextFrame');
The first will give you a SimpleXMLElement object, and the second an array, but either way, you can use foreach to go through the matches.
This time we definitely want an XPath expression to get the PathPointType nodes, to avoid all the nested loops through levels we're not that interested in:
foreach ( $textFrames as $frame ) {
$pathPointTypes = $frame->xpath('.//PathPointType');
foreach ( $pathPointTypes as $type ) {
echo $type['Anchor'] . PHP_EOL;
}
}
Note that you don't need to call $type->attributes(); unless you're dealing with namespaces, all you need to get an attribute is $node['AttributeName']. Beware that attributes in SimpleXML are objects though, so you'll often want to force them to be strings with (string)$node['AttributeName'].
To take the final example, you might then have something like this:
$frames = [];
foreach ( $sxml->Spread->TextFrame as $frame ) {
$anchors = [];
$pathPointTypes = $frame->xpath('.//PathPointType');
foreach ( $pathPointTypes as $type ) {
$anchors[] = ['Anchor' => (string)$type['Anchor']];
}
$frames[] = $anchors;
}

Replace a foreach loop with XPath expression using DOMXPath

I want to replace a foreach loop using a xpath expression, but I need that a DOMXPath object to return more than one list.
I have the following XML (simplified) and I using DOMDocument and DOMXPath to iterate over it:
<a:RoomsType>
<a:Rooms>
<a:Room>
<a:RPH>0</a:RPH>
</a:Room>
<a:Room>
<a:RPH>1</a:RPH>
</a:Room>
<a:Room>
<a:RPH>2</a:RPH>
</a:Room>
<a:Room>
<a:RPH>0</a:RPH>
</a:Room>
<a:Rooms>
<a:RoomsType>
I want to split the rooms by the RPH number, creating a list of rooms for each RPH number. Currently, I'm using the following code:
//$xpath is a DOMXPath object
$roomsToIterate = $this->xpath->query("//a:RoomsType/a:Rooms/a:Room");
$roomList = array();
foreach ($roomsToIterate as $room) {
$rphCandidate = $room->getElementsByTagName("RPH")->item(0)->nodeValue;
if (!isset($roomList[$rphCandidate])) {
$roomList[$rphCandidate] = array();
}
$roomList[$rphCandidate][] = $room;
}
This is working for now, but I want to replace the foreach loop with a Xpath expression. I can use the expression $rooms = $this->xpath->query("//a:RoomsType/a:Rooms/a:Room[a:RPH='{$rph}']"); with $rph being a number, but how can I do it if I don't know the RPH (it could be anything between 0 and 99). Is it possible?
In short, Are there any way to replace my foreach loop using XPath?
I was thinking about the use of registerPhpFunctions and a custom function, but I concerned about the performance of this approach compared with foreach loop
Xpath 1.0 expression will return a list of nodes, they can to some extend flatten an existing structure if you use an axis like descendant or ancestor, but it will be a list of nodes. It can not group or aggregate them.
You could fetch a lists of nodes with a specific RPH value. But you would need to this for each value, the result would be another loop. This would mean to fetch all RPH values, make them unique, iterate them and execute and Xpath expression for each value.
Your current solution is fine.

Check if XML element is existing in loop

For a website i'm making i need to get data from an external XML file.
I load the data like this:
$doc = new DOMDocument();
$url = 'http://myurl/results/xml/12345';
if (!$doc->load($url))
{
echo json_encode(array('error'=> 'error'));
exit;
}
$xpath = new DOMXPath($doc);
$program_date = $xpath->query('//game/date');
Then i use a foreach loop to get all the data
if($program_date){
foreach($program_date as $node){
$programArray['program_date'][] = $node->nodeValue;
}
}
The problem i'm having is that sometimes a certain game doesn't have a date.
So when a game doesn't have a date, i just want it to put "-", instead of the date from the XML file. My problem is that i don't know how to check if a date is present in the data.
I used a lot of ways like isset, !isset, else, !empty, empty
$teamArray['program_kind'][] = "-";
but noting works...
Can someone help me with this problem?
Thanks in advance
You need to iterate the game elements, use them as a context and fetch the data with additional XPath expressions.
But one thing first. Use DOMXPath::evaluate(). DOMXPath::query() only supports location paths. It can only return a node list. But XPath expressions can return scalar values, too.
$xpath = new DOMXPath($doc);
$games = $xpath->evaluate('//game');
The result of //game will always be a DOMNodeList object. It can be an empty list, but you can directly iterate it. A condition like if ($games) will always be true.
foreach ($games as $game) {
Now that you have the game element node, you can use it as an context to fetch other data.
$date = $xpath->evaluate('string(date)', $game);
string() casts the first node of the location path into a string. If it can not match a node, it will return an empty string. Check normalize-space() if you want to remove whitespaces at the same time.
You can validate if the game element has a date node using count().
$hasDate = $xpath->evaluate('count(date) > 0', $game);
The result of this XPath expression is always a boolean.

DOMXpath - Get href attribute and text value of an a element

So I have a HTML string like this:
<td class="name">
Some Name
</td>
<td class="name">
Some Name2
</td>
Using XPath I'm able to get value of href attribute using this Xpath query:
$domXpath = new \DOMXPath($this->domPage);
$hrefs = $domXpath->query("//td[#class='name']/a/#href");
foreach($hrefs as $href) {...}
And It's even easier to get a text value, like this:
// Xpath auto. strips any html tags so we are
// left with clean text value of a element
$domXpath = new \DOMXPath($this->domPage);
$names = $domXpath->query("//td[#class='name']/");
foreach($names as $name) {...}
Now I'm curious to know, how can I combine those two queries to get both values with only one query (If it's something like that even posible?).
Fetch
//td[#class='name']/a
and then pluck the text with nodeValue and the attribute with getAttribute('href').
Apart from that, you can combine Xpath queries with the Union Operator | so you can use
//td[#class='name']/a/#href|//td[#class='name']
as well.
To reduce the code to a single loop, try:
$anchors = $domXpath->query("//td[#class='name']/a");
foreach($anchors as $a)
{
print $a->nodeValue." - ".$a->getAttribute("href")."<br/>";
}
As per above :) Too slow ..
Simplest way, evaluate is for this task!
The simplest way to obtain a value is by evaluate() method:
$xp = new DOMXPath($dom);
$v = $xp->evaluate("string(/etc[1]/#stringValue)");
Note: important to limit XPath returns to 1 item (the first a in this case), and cast the value with string() or round(), etc.
So, in a set of multiple items, using your foreach code,
$names = $domXpath->query("//td[#class='name']/");
foreach($names as $contextNode) {
$text = $domXpath->evaluate("string(./a[1])",$contextNode);
$href = $domXpath->evaluate("string(./a[1]/#href)",$contextNode);
}
PS: this example is only for evaluate's illustration... When the information already exists at the node, use what offers best performance, as methods getAttribute(), saveXML(), etc. and properties as $nodeValue, $textContent, etc. supplied by DOMNode. See #Gordon's answer for this particular problem. The XPath subquery (at context) is good for complex cases — or symplify your code, avoiding to check hasChildNodes() + loop for $childNodes, etc. with no significative gain in performance.

iterating over unknown XML structure with PHP (DOM)

I want to write a function that parses a (theoretically) unknown XML data structure into an equivalent PHP array.
Here is my sample XML:
<?xml version="1.0" encoding="UTF-8"?>
<content>
<title>Sample Text</title>
<introduction>
<paragraph>This is some rudimentary text</paragraph>
</introduction>
<description>
<paragraph>Here is some more text</paragraph>
<paragraph>Even MORE text</paragraph>
<sub_section>
<sub_para>This is a smaller, sub paragraph</sub_para>
<sub_para>This is another smaller, sub paragraph</sub_para>
</sub_section>
</description>
</content>
I modified this DOM iterating function from devarticles:
$data = 'path/to/xmldoc.xml';
$xmlDoc = new DOMDocument(); #create a DOM element
$xmlDoc->load( $data ); #load data into the element
$xmlRoot = $xmlDoc->firstChild; #establish root
function xml2array($node)
{
if ($node->hasChildNodes())
{
$subNodes = $node->childNodes;
foreach ($subNodes as $subNode)
{
#filter node types
if (($subNode->nodeType != 3) || (($subNode->nodeType == 3)))
{
$arraydata[$subNode->nodeName]=$subNode->nodeValue;
}
xml2array($subNode);
}
}
return $arraydata;
}
//The getNodesInfo function call
$xmlarray = xml2array($xmlRoot);
// print the output - with a little bit of formatting for ease of use...
foreach($xmlarray as $xkey)
{
echo"$xkey<br/><br/>";
}
Now, because of the way I'm passing the elements to the array I'm overwriting any elements that share a node name (since I ideally want to give the keys the same names as their originating nodes). My recursion isn't great... However, even if I empty the brackets - the second tier of nodes are still coming in as values on the first tier (see the text of the description node).
Anyone got any ideas how I can better construct this?
You might be better off just snagging some code off the net
http://www.bin-co.com/php/scripts/xml2array/
/**
* xml2array() will convert the given XML text to an array in the XML structure.
* Link: http://www.bin-co.com/php/scripts/xml2array/
* Arguments : $contents - The XML text
* $get_attributes - 1 or 0. If this is 1 the function will get the attributes as well as the tag values - this results in a different array structure in the return value.
* $priority - Can be 'tag' or 'attribute'. This will change the way the resulting array sturcture. For 'tag', the tags are given more importance.
* Return: The parsed XML in an array form. Use print_r() to see the resulting array structure.
* Examples: $array = xml2array(file_get_contents('feed.xml'));
* $array = xml2array(file_get_contents('feed.xml', 1, 'attribute'));
*/
function xml2array($contents, $get_attributes=1, $priority = 'tag') {
You might be interested in SimpleXML or xml_parse_into_struct.
$arraydata is neither passed to subsequent calls to xml2array() nor is the return value used, so yes "My recursion isn't great..." is true ;-)
To append a new element to an existing array you can use empty square brackets, $arr[] = 123; $arr[$x][] = 123;
You might also want to check out XML Unserializer
http://pear.php.net/package/XML_Serializer/redirected

Categories