I have an xml file that I need to parse through and get values. Below is a snippit of xml
<?xml version="1.0"?>
<mobile>
<userInfo>
</userInfo>
<CATALOG>
<s0>
<SUB0>
<DESCR>Paranormal Studies</DESCR>
<SUBJECT>147</SUBJECT>
</SUB0>
</s0>
<sA>
<SUB0>
<DESCR>Accounting</DESCR>
<SUBJECT>ACCT</SUBJECT>
</SUB0>
<SUB1>
<DESCR>Accounting</DESCR>
<SUBJECT>ACCTG</SUBJECT>
</SUB1>
<SUB2>
<DESCR>Anatomy</DESCR>
<SUBJECT>ANATOMY</SUBJECT>
</SUB2>
<SUB3>
<DESCR>Anthropology</DESCR>
<SUBJECT>ANTHRO</SUBJECT>
</SUB3>
<SUB4>
<DESCR>Art</DESCR>
<SUBJECT>ART</SUBJECT>
</SUB4>
<SUB5>
<DESCR>Art History</DESCR>
<SUBJECT>ARTHIST</SUBJECT>
</SUB5>
</sA>
So, I need to grab all the child elements of <sA> and then there are more elements called <sB> etc
But I do not know how to get all of the child elements with <sA>, <sB>, etc.
How about this:
$xmlstr = LoadTheXMLFromSomewhere();
$xml = new simplexml_load_string($xmlstr);
$result = $xml->xpath('//sA');
foreach ($result as $node){
//do something with node
}
PHP does have a nice class to access XML, which is called SimpleXml for a reason, consider heavily using that if your code is going to access only a part of the XML (aka query the xml). Also, consider doing queries using XPath, which is the best way to do it
Notice that I did the example with sA nodes only, but you can configure your code for other node types really easily.
Hope I can help!
you should look into simplexml_load_string() as I'm pretty sure it would make your life a lot easier. It returns a StdObject that you can use like so:
$xml = simplexml_load_string(<your huge xml string>);
foreach ($xml->hpt_mobile->CATALOG->sA as $value){
// do things with sA children
}
$xml = new DOMDocument();
$xml->load('path_to_xml');
$htp = $xml->getElementsByTagName('hpt_mobile')[0];
$catalog = $htp->getElementsByTagName('CATALOG')[0]
$nodes = $catalog->getElementsByTagName('sA')->childNodes;
Related
I'm trying to read an RSS feed from Flickr but it has some nodes which are not readable by Simple XML (media:thumbnail, flickr:profile, and so on).
How do I get round this? My head hurts when I look at the documentation for the DOM. So I'd like to avoid it as I don't want to learn.
I'm trying to get the thumbnail by the way.
The solution is explained in this nice article. You need the children() method for accessing XML elements which contain a namespace. This code snippet is quoted from the article:
$feed = simplexml_load_file('http://www.sitepoint.com/recent.rdf');
foreach ($feed->item as $item) {
$ns_dc = $item->children('http://purl.org/dc/elements/1.1/');
echo $ns_dc->date;
}
With the latest version, you can now reference colon nodes with curly brackets.
$item->{'itunes:duration'}
You're dealing with a namespace? I think you need to use the ->children method.
$ns_dc = $item->children('http://namespace.org/');
Can you provide a snippet with the xml declaration?
An even simpler method using PHP of accessing namespaced XML nodes without declaring a namespace is....
In order to get the value of <su:authorEmail> from the following source
<item>
<title>My important article</title>
<pubDate>Mon, 29 Feb 2017 00:00:00 +0000</pubDate>
<link>https://myxmlsource.com/32984</link>
<guid>https://myxmlsource.com/32984</guid>
<author>Blogs, Jo</author>
<su:departments>
<su:department>Human Affairs</su:department>
</su:departments>
<su:authorHash>4f329b923419b3cb2c654d615e22588c</su:authorHash>
<su:authorEmail>hIwW14tLc+4l/oo7agmRrcjwe531u+mO/3IG3xe5jMg=</su:authorEmail>
<dc:identifier>/32984/Download/0032984-11042.docx</dc:identifier>
<dc:format>Journal article</dc:format>
<dc:creator>Blogs, Jo</dc:creator>
<slash:comments>0</slash:comments>
</item>
Use the following code:
$rss = new DOMDocument();
$rss->load('https://myxmlsource.com/rss/xml');
$nodes = $rss->getElementsByTagName('item');
foreach ($nodes as $node) {
$title = $node->getElementsByTagName('title')->item(0)->nodeValue;
$author = $node->getElementsByTagName('author')->item(0)->nodeValue;
$authorHash = $node->getElementsByTagName('authorHash')->item(0)->nodeValue;
$department = $node->getElementsByTagName('department')->item(0)->nodeValue;
$email = decryptEmail($node->getElementsByTagName('authorEmail')->item(0)->nodeValue);
}
I want to create dynamic tags in XML using PHP
like this : <wsse:Username>fqsuser01</wsse:Username>
the main thing is that I want the tags will change the value inside ---> "wsse"
(like this value)
what I need to do? to create this XML file wite PHP?
Thanks,
For this purpose you can use XMLWriter for example (another option is SimpleXML). Both option are in PHP core so any third party libraries aren't needed. wsse is a namespace - more about them you can read here
I also share with you some example code:
<?php
//create a new xmlwriter object
$xml = new XMLWriter();
//using memory for string output
$xml->openMemory();
//set the indentation to true (if false all the xml will be written on one line)
$xml->setIndent(true);
//create the document tag, you can specify the version and encoding here
$xml->startDocument();
//Create an element
$xml->startElement("root");
//Write to the element
$xml->writeElement("r1:id", "1");
$xml->writeElement("r2:id", "2");
$xml->writeElement("r3:id", "3");
$xml->endElement(); //End the element
//output the xml
echo $xml->outputMemory();
?>
Result:
<?xml version="1.0"?>
<root>
<r1:id>1</r1:id>
<r2:id>2</r2:id>
<r3:id>3</r3:id>
</root>
You could use a string and convert it to XML using simplexml_load_string(). The string must be well formed.
<?php
$usernames= array(
'username01',
'username02',
'username03'
);
$xml_string = '<wsse:Usernames>';
foreach($usernames as $username ){
$xml_string .= "<wsse:Username>$username</wsse:Username>";
}
$xml_string .= '</wsse:Usernames>';
$note=
<<<XML
$xml_string
XML; //backspace this line all the way to the left
$xml=simplexml_load_string($note);
?>
If you wanted to be able to change the namespaces on each XML element you would do something very similar to what is shown above. (Form a string with dynamic namespaces)
The XML portion that I instructed you to backspace all of the way has weird behavior. See https://www.w3schools.com/php/func_simplexml_load_string.asp for an example that you can copy & paste.
I've got an xml like this:
<father>
<son>Text with <b>HTML</b>.</son>
</father>
I'm using simplexml_load_string to parse it into SimpleXmlElement. Then I get my node like this
$xml->father->son->__toString(); //output: "Text with .", but expected "Text with <b>HTML</b>."
I need to handle simple HTML such as:
<b>text</b> or <br/> inside the xml which is sent by many users.
Me problem is that I can't just ask them to use CDATA because they won't be able to handle it properly, and they are already use to do without.
Also, if it's possible I don't want the file to be edited because the information need to be the one sent by the user.
The function simplexml_load_string simply erase anything inside HTML node and the HTML node itself.
How can I keep the information ?
SOLUTION
To handle the problem I used the asXml as explained by #ThW:
$tmp = $xml->father->son->asXml(); //<son>Text with <b>HTML</b>.</son>
I just added a preg_match to erase the node.
A CDATA section is a character node, just like a text node. But it does less encoding/decoding. This is mostly a downside, actually. On the upside something in a CDATA section might be more readable for a human and it allows for some BC in special cases. (Think HTML script tags.)
For an XML API they are nearly the same. Here is a small DOM example (SimpleXML abstracts to much).
$document = new DOMDocument();
$father = $document->appendChild(
$document->createElement('father')
);
$son = $father->appendChild(
$document->createElement('son')
);
$son->appendChild(
$document->createTextNode('With <b>HTML</b><br>It\'s so nice.')
);
$son = $father->appendChild(
$document->createElement('son')
);
$son->appendChild(
$document->createCDataSection('With <b>HTML</b><br>It\'s so nice.')
);
$document->formatOutput = TRUE;
echo $document->saveXml();
Output:
<?xml version="1.0"?>
<father>
<son>With <b>HTML</b><br>It's so nice.</son>
<son><![CDATA[With <b>HTML</b><br>It's so nice.]]></son>
</father>
As you can see they are serialized very differently - but from the API view they are basically exchangeable. If you're using an XML parser the value you get back should be the same in both cases.
So the first possibility is just letting the HTML fragment be stored in a character node. It is just a string value for the outer XML document itself.
The other way would be using XHTML. XHTML is XML compatible HTML. You can mix an match different XML formats, so you could add the XHTML fragment as part of the outer XML.
That seems to be what you're receiving. But SimpleXML has some problems with mixed nodes. So here is an example how you can read it in DOM.
$xml = <<<'XML'
<father>
<son>With <b>HTML</b><br/>It's so nice.</son>
</father>
XML;
$document = new DOMDocument();
$document->loadXml($xml);
$xpath = new DOMXpath($document);
$result = '';
foreach ($xpath->evaluate('/father/son[1]/node()') as $child) {
$result .= $document->saveXml($child);
}
echo $result;
Output:
With <b>HTML</b><br/>It's so nice.
Basically you need to save each child of the son element as XML.
SimpleXML is based on the same DOM library internally. That allows you to convert a SimpleXMLElement into a DOM node. From there you can again save each child as XML.
$father = new SimpleXMLElement($xml);
$sonNode = dom_import_simplexml($father->son);
$document = $sonNode->ownerDocument;
$result = '';
foreach ($sonNode->childNodes as $child) {
$result .= $document->saveXml($child);
}
echo $result;
I have different XML files where I renamed for each XML file all individual tags, so that every XML file has the same tag name. That was easy because the function was customized for the XML file.
But instand of writing 7 new functions for each XML file now I want to check if a XML file has a specidifed child or not. Because if I want to say:
foreach ($items as $item) {
$node = dom_import_simplexml($item);
$title = $node->getElementsByTagName('title')->item(0)->textContent;
$price = $node->getElementsByTagName('price')->item(0)->textContent;
$url = $node->getElementsByTagName('url')->item(0)->textContent;
$publisher = $node->getElementsByTagName('publisher')->item(0)->textContent;
$category = $node->getElementsByTagName('category')->item(0)->textContent;
$platform = $node->getElementsByTagName('platform')->item(0)->textContent;
}
I get sometimes: PHP Notice: Trying to get property of non-object in ...
For example. Two different XML sheets. One contains publisher, category and platform, the other not:
XML 1:
<products>
<product>
<desc>This is a Test</desc>
<price>11.69</price>
<price_base>12.99</price_base>
<publisher>Stackoverflow</publisher>
<category>PHP</category>
</packshot>
<title>Check if child exists? - SimpleXML (PHP)</title>
<url>http://stackoverflow.com/questions/ask</url>
</product>
</products>
XML 2:
<products>
<product>
<image></image>
<title>Questions</title>
<price>23,90</price>
<url>google.de/url>
<platform>Stackoverflow</platform>
</product>
</products>
You see, sometimes one XML file contains publisher, category and platform but sometimes not. But it could also be that not every node of a XML file contains all attributes like in the first!
So I need to check for every node of a XML file individual if the node is containing publisher, category or/and platform.
How can I do that with SimpleXML?
I thought about switch case but at first I need to check which childs are contained in every node.
EDIT:
Maybe I found a solution. Is that a solution or not?
if($node->getElementsByTagName('platform')->item(0)){
echo $node->getElementsByTagName('platform')->item(0)->textContent . "\n";
}
Greetings and Thank You!
One way to rome... (working example)
$xml = "<products>
<product>
<desc>This is a Test</desc>
<price>11.69</price>
<price_base>12.99</price_base>
<publisher>Stackoverflow</publisher>
<category>PHP</category>
<title>Check if child exists? - SimpleXML (PHP)</title>
<url>http://stackoverflow.com/questions/ask</url>
</product>
</products>";
$xml = simplexml_load_string($xml);
#set fields to look for
foreach(['desc','title','price','publisher','category','platform','image','whatever'] as $path){
#get the first node
$result = $xml->xpath("product/{$path}[1]");
#validate and set
$coll[$path] = $result?(string)$result[0]:null;
#if you need here a local variable do (2 x $)
${$path} = $coll[$path];
}
#here i do array_filter() to remove all NULL entries
print_r(array_filter($coll));
#if local variables needed do
extract($coll);#this creates $desc, $price
Note </packshot> is an invalid node, removed here.
xpath syntax https://www.w3schools.com/xmL/xpath_syntax.asp
Firstly, you're over-complicating your code by switching from SimpleXML to DOM with dom_import_simplexml. The things you're doing with DOM can be done in much shorter code with SimpleXML.
Instead of this:
$node = dom_import_simplexml($item);
$title = $node->getElementsByTagName('title')->item(0)->textContent;
you can just use:
$title = (string)$item->title[0];
or even just:
$title = (string)$item->title;
To understand why this works, take a look at the SimpleXML examples in the manual.
Armed with that knowledge, you'll be amazed at how simple it is to see if a child exists or not:
if ( isset($item->title) ) {
$title = (string)$item->title;
} else {
echo "There is no title!";
}
I have an XML file loaded into a DOM document,
I wish to iterate through all 'foo' tags, getting values from every tag below it. I know I can get values via
$element = $dom->getElementsByTagName('foo')->item(0);
foreach($element->childNodes as $node){
$data[$node->nodeName] = $node->nodeValue;
}
However, what I'm trying to do, is from an XML like,
<stuff>
<foo>
<bar></bar>
<value/>
<pub></pub>
</foo>
<foo>
<bar></bar>
<pub></pub>
</foo>
<foo>
<bar></bar>
<pub></pub>
</foo>
</stuff>
iterate over every foo tag, and get specific bar or pub, and get values from there.
Now, how do I iterate over foo so that I can still access specific child nodes by name?
Not tested, but what about:
$elements = $dom->getElementsByTagName('foo');
$data = array();
foreach($elements as $node){
foreach($node->childNodes as $child) {
$data[] = array($child->nodeName => $child->nodeValue);
}
}
It's generally much better to use XPath to query a document than it is to write code that depends on knowledge of the document's structure. There are two reasons. First, there's a lot less code to test and debug. Second, if the document's structure changes it's a lot easier to change an XPath query than it is to change a bunch of code.
Of course, you have to learn XPath, but (most of) XPath isn't rocket science.
PHP's DOM uses the xpath_eval method to perform XPath queries. It's documented here, and the user notes include some pretty good examples.
Here's another (lazy) way to do it.
$data[][$node->nodeName] = $node->nodeValue;
With FluidXML you can query and iterate XML very easly.
$data = [];
$store_child = function($i, $fooChild) use (&$data) {
$data[] = [ $fooChild->nodeName => $fooChild->nodeValue ];
};
fluidxml($dom)->query('//foo/*')->each($store_child);
https://github.com/servo-php/fluidxml