How to retrieve all the data inside all the elements in xml? - php

I am having an issue getting xml data in php
My xml is fairly complicated and there are several nested children in the tag.
xml
?xml version="1.0" encoding="UTF-8"?>
<book id="5">
<title id="76">test title</title>
<figure id="77"></figure>
<ch id="id78">
<aa id="80"><emph>content1</emph></aa>
<ob id="id_84" page-num="697" extra-info="4"><emph type="bold">opportunity.</emph></ob>
<ob id="id_85" page-num="697" extra-info="5"><emph type="bold">test data.</emph></ob>
<para id="id_86" page-num="697">2008.</para>
<body>
..more elements
<content>more contents..
</content>
</body>
</ch>
MY codes
//I need to load many different xml files.
$xml_file = simplexml_load_file($filename);
foreach ($xml_file->children() as $child){
echo $child->getName().':'. $child."<br>";
}
The codes above would only display
book, title, figure, ch but not the elements inside the ch tag. How do I display all the element inside each tag? Any tips? Thanks a lot!

Two things:
You need to match your <ob> </objective> tags.
Your foreach needs to be recursive. You should check if each item in your foreach has a child, then recursively foreach over that elements. I'd recommend using a separate function for this that you recursively call.
Example:
$xml_file = simplexml_load_file($filename);
parseXML($xml_file->children());
function parseXML($xml_children)
{
foreach ($xml_children as $child){
echo $child->getName().':'. $child."<br>";
if ($child->count() > 0)
{
parseXML($child->children());
}
}
}

You need to do resursive call
parseAllXml($xml_file);
function parseAllXml($xmlcontent)
{
foreach($xmlcontent->children() as $child)
{
echo $child->getName().':'. $child."<br>";
$is_further_child = ( count($child->children()) >0 )?true:false;
if( $is_further_child )
{
parseAllXml($child);
}
}
}

Related

Convert xml to html with emphasis in php

I have an XML file that contains the following content.
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE article>
<article
xmlns="http://docbook.org/ns/docbook" version="5.0"
xmlns:xlink="http://www.w3.org/1999/xlink" >
<para>
This is an <emphasis role="strong">test</emphasis> sentence.
</para>
</article>
When I use
$xml_data = simplexml_load_string($filedata);
foreach ($xml_data['para'] as $data) {
echo $data;
}
I got This is an sentence.. But I want to get This is an <b>test<b> sentence. as result.
Instead of simplexml_load_string I'd recommend DOMDocument, but that is just a personal preference. A naïve implementation might just do a string replacement and that might totally work for you. However, since you've provided actual XML that even includes a NS I'm going to try to keep this as XML-centric as possible, while skipping XPath which could possibly be used, too.
This code loads the XML and walks every node. If it find a <para> element it walks all of the children of that node looking for an <emphasis> node, and if it finds one it replaces it with a new new that is a <b> node.
The replacement process is a little complex, however, because if we just use nodeValue we might lose any HTML that lives in there, so we need to walk the children of the <emphasis> node and clone those into our replacement node.
Because the source document has a NS, however, we also need to remove that from our final HTML. Since we are going from XML to HTML, I think that is a safe usage of a str_replace without going to crazy in the XML land for that.
The code should have enough comments to make sense, hopefully.
<?php
$filedata = <<<EOT
<?xml version="1.0" encoding="utf-8" ?>
<article
xmlns="http://docbook.org/ns/docbook" version="5.0"
xmlns:xlink="http://www.w3.org/1999/xlink" >
<para>
This is an <emphasis role="strong">hello <em>world</em></emphasis> sentence.
</para>
</article>
EOT;
$dom = new DOMDocument();
$dom->loadXML($filedata);
foreach($dom->documentElement->childNodes as $node){
if(XML_ELEMENT_NODE === $node->nodeType && 'para' === $node->nodeName){
// Replace any emphasis elements
foreach($node->childNodes as $childNode) {
if(XML_ELEMENT_NODE === $childNode->nodeType && 'emphasis' === $childNode->nodeName){
// This is arguably the most "correct" way to replace, just in case
// there's extra nodes inside. A cheaper way would be to not loop
// and just use the nodeValue however you might lose some HTML.
$newNode = $dom->createElement('b');
foreach($childNode->childNodes as $grandChild){
$newNode->appendChild($grandChild->cloneNode(true));
}
$childNode->replaceWith($newNode);
}
}
// Build our output
$output = '';
foreach($node->childNodes as $childNode) {
$output .= $dom->saveHTML($childNode);
}
// The provided XML has a namespace, and when cloning nodes that NS comes
// along. Since we are going from regular XML to irregular HTML I think
// a string replacement is best.
$output = str_replace(' xmlns="http://docbook.org/ns/docbook"', '', $output);
echo $output;
}
}
Demo here: https://3v4l.org/04Tc3#v8.0.23
NOTE: PHP 8 added replaceWith. If you are using PHP 7 or less you'd use replaceChild and just play around with things a bit.
What if you have the following XML?
<entry>
<para>This is the first text</para>
<emphasis>This is the second text</emphasis>
<para>This is the <emphasis>next</emphasis> text</para>
<itemizedlist>
<listitem>
<para>
This is an paragraph inside a list
</para>
</listitem>
<itemizedlist>
<listitem>
<para>
This is an paragraph inside a list inside a list
</para>
</listitem>
</itemizedlist>
</itemizedlist>
</entry>
using
if(XML_ELEMENT_NODE === $stuff2->nodeType && 'para' === $stuff2->nodeName){
$newNode = $dom->createElement('p');
foreach($stuff2->childNodes as $grandChild){
$newNode->appendChild($grandChild->cloneNode(true));
}
$stuff2->replaceWith($newNode);
}
if (XML_ELEMENT_NODE === $stuff2->nodeType && 'itemizedlist' === $stuff2->nodeName) {
$newNode = $dom->createElement('ul');
foreach($stuff2->childNodes as $grandChild){
$newNode->appendChild($grandChild->cloneNode(true));
}
$stuff2->replaceWith($newNode);
}
if(XML_ELEMENT_NODE === $stuff2->nodeType && 'emphasis' === $stuff2->nodeName){
$newNode = $dom->createElement('b');
foreach($stuff2->childNodes as $grandChild){
$newNode->appendChild($grandChild->cloneNode(true));
}
$stuff2->replaceWith($newNode);
}
if (XML_ELEMENT_NODE === $stuff2->nodeType && 'listitem' === $stuff2->nodeName) {
$newNode = $dom->createElement('li');
foreach($stuff2->childNodes as $grandChild){
$newNode->appendChild($grandChild->cloneNode(true));
}
$stuff2->replaceWith($newNode);
}
only results in
<p>This is the first text</p>
<emphasis>This is the second text</emphasis>
<para>This is the <emphasis>next</emphasis> text</para>
<itemizedlist>
<listitem>
<para>This is an paragraph inside a list</para>
</listitem>
<itemizedlist>
<listitem>
<para>This is an paragraph inside a list inside a list</para>
</listitem>
</itemizedlist>
</itemizedlist>

How to remove XML tag based on child attribute using php?

I have an XML like below
<entries>
<entry>
<title lang="en">Sample</title>
<entrydate>0</entrydate>
<contents>0</contents>
<entrynum>0</entrynum>
</entry>
<entry>
<title lang="fr">Sample</title>
<entrydate>1</entrydate>
<contents>1</contents>
<entrynum>1</entrynum>
</entry>
</entries>
Is there a way in PHP to delete the parent node (entry) based on the title lang attribute? I need to keep only the en ones, so in this case I would need to get the XML without the second entry node.
I tried looking around but couldn't find any solution...
You need to use DOMDocument class to parse string to XML document. Then use DOMXpath class to find target element in document and use DOMNode::removeChild() to remove selected element from document.
$doc = new DOMDocument();
$doc->loadXML($xml);
$xpath = new DOMXpath($doc);
// select target entry tag
$entry = $xpath->query("//entry[title[#lang='fr']]")->item(0);
// remove selected element
$entry->parentNode->removeChild($entry);
$xml = $doc->savexml();
You can check result in demo
You could also read your file and generated new one with your modification
<?php
$entries = array('title' => "What's For Dinner",
'link' => 'http://menu.example.com/',
'description' => 'Choose what to eat tonight.');
print "<entries>\n";
foreach ($entries as $element => $content) {
print " <$element>";
print htmlentities($content);
print "</$element>\n";
}
print "</entries>";
?>
Use the method described in this answer, i.e.
<?php
$xml = simplexml_load_file('1.xml');
$del_items = [];
foreach ($xml->entry as $e) {
$attr = $e->title->attributes();
if ($attr && $attr['lang'] != 'en') {
$del_items []= $e;
}
}
foreach ($del_items as $e) {
$dom = dom_import_simplexml($e);
$dom->parentNode->removeChild($dom);
}
echo $xml->asXML();
Output
<?xml version="1.0" encoding="UTF-8"?>
<entries>
<entry>
<title lang="en">Sample</title>
<entrydate>0</entrydate>
<contents>0</contents>
<entrynum>0</entrynum>
</entry>
</entries>
The items cannot be removed within the first loop, because otherwise we may break the iteration chain. Instead, we collect the entry objects into $del_items array, then remove them from XML in separate loop.

SimpleXML: trouble with parent with attributes

Need help with updating some simplexml code I did along time ago. The XML file I'm parsing from is formatted in a new way, but I can't figure out how to navigate it.
Example of old XML format:
<?xml version="1.0" encoding="UTF-8"?>
<pf version="1.0">
<pinfo>
<pid><![CDATA[test1 pid]]></pid>
<picture><![CDATA[http://test1.image]]></picture>
</pinfo>
<pinfo>
<pid><![CDATA[test2 pid]]></pid>
<picture><![CDATA[http://test2.image]]></picture>
</pinfo>
</pf>
and then the new XML format (note "category name" added):
<?xml version="1.0" encoding="UTF-8"?>
<pf version="1.2">
<category name="Cname1">
<pinfo>
<pid><![CDATA[test1 pid]]></pid>
<picture><![CDATA[http://test1.image]]></picture>
</pinfo>
</category>
<category name="Cname2">
<pinfo>
<pid><![CDATA[test2 pid]]></pid>
<picture><![CDATA[http://test2.image]]></picture>
</pinfo>
</category>
</pf>
And below the old code for parsing that doesn't work since the addition of "category name" in the XML:
$pinfo = new SimpleXMLElement($_SERVER['DOCUMENT_ROOT'].'/xml/file.xml', null, true);
foreach($pinfo as $resource)
{
$Profile_id = $resource->pid;
$Image_url = $resource->picture;
// and then some echo´ing of the collected data inside the loop
}
What do I need to add or do completely different? I tried with xpath,children and sorting by attributes but no luck - SimpleXML has always been a mystery to me :)
You were iterating over all <pinfo> elements located in the root element previously:
foreach ($pinfo as $resource)
Now all <pinfo> elements have moved from the root element into the <category> elements. You now need to query those elements first:
foreach ($pinfo->xpath('/*/category/pinfo') as $resource)
The now wrong named variable $pinfo is standing a bit in the way so it better do some more changes:
$xml = new SimpleXMLElement($_SERVER['DOCUMENT_ROOT'].'/xml/file.xml', null, true);
$pinfos = $xml->xpath('/*/category/pinfo');
foreach ($pinfos as $pinfo) {
$Profile_id = $pinfo->pid;
$Image_url = $pinfo->picture;
// ... and then some echo´ing of the collected data inside the loop
}
The category elements exist as their own array when you load the XML file. The XML you are used to parsing is contained within. All you need to do is wrap your current code with another foreach. Other than that there isn't much to change.
foreach($pinfo as $category)
{
foreach($category as $resource)
{
$Profile_id = $resource->pid;
$Image_url = $resource->picture;
// and then some echo´ing of the collected data inside the loop
}
}

Deleting elements from xml file with foreach in php

Hi i have a code like this:
$doc = new DOMDocument();
$doc->Load('courses.xml');
foreach ($doc->getElementsByTagName('courses') as $tagcourses)
{
foreach ( $tagcourses ->getElementsByTagName('course') as $tagcourse)
{
if(($tagcourse->getAttribute('instructorId')) == $iid){
$tagcourses->removeChild($tagcourse);
}
}
}
$doc->Save('courses.xml');
And i have a xml file:
<courses>
<course courseId="1" instructorId="1">
<course_code>456</course_code>
<course_name>bil</course_name>
</course>
<course courseId="2" instructorId="2">
<course_code>234</course_code>
<course_name>math</course_name>
</course>
<course courseId="3" instructorId="2">
<course_code>341</course_code>
<course_name>cs</course_name>
</course>
<course courseId="4" instructorId="2">
<course_code>244</course_code>
<course_name>phyc</course_name>
</course>
</courses>
In this code i tried to remove elements which has instructor id that specified with iid.The problem is all courses that has this instructor id must be removed.But in my program just the first course that has this iid is being removed.Can you suggest a solution?Thanks.
The getElementsByTagName() is returning a live nodelist. If you remove an element from it in a loop, the loop is then iterating over a different set of elements than it started with, and the results are unpredictable. Instead, store the nodes you want to remove on an array, then iterate over that and remove them.
$doc = new DOMDocument();
$doc->Load('courses.xml');
$to_remove = array();
foreach ($doc->getElementsByTagName('courses') as $tagcourses)
{
foreach ( $tagcourses ->getElementsByTagName('course') as $tagcourse)
{
if(($tagcourse->getAttribute('instructorId')) == $iid){
$to_remove[] = $tagcourse;
}
}
}
// Remove the nodes stored in your array
// by removing it from its parent
foreach ($to_remove as $node)
{
$node->parentNode->removeChild($node);
}
$doc->Save('courses.xml');

xml and php getting tag elements with certain element and outputting

I am have two xml files.. I first get one and loop through it then I need to take an id from the first xml file and find it in the second one and echo out the results associated with that id. If I were to do this with SQL I would simply do this:
$query = (SELECT * FROM HotelSummary WHERE roomTypeCode = '$id') or die();
while($row=mysql_fetch_array($query)){
$name = $row['Name'];
}
echo $name;
How can I do this is in xml and php??
I recommend you to read the DOMDocument documentation.
It's quite heavy but also powerful (not always clear what happens, but the Internet shold always give you a solution)
You can simply walk through your first document, finding your Id and then find your DOMElement via an XPath.
<?php
$dom = new DOMDocument();
$dom->load('1.xml');
foreach ($dom->getElementsByTagName('article') as $node) {
// your conditions to find out the id
$id = $node->getAttribute('id');
}
$dom = new DOMDocument();
$dom->load('2.xml');
$xpath = new DOMXPath($dom);
$element = $xpath->query("//*[#id='".$id."']")->item(0);
// would echo "top_2" based on my example files
echo $element->getAttribute('name');
Based on following test files:
1.xml
<?xml version="1.0" encoding="UTF-8"?>
<articles>
<article id="foo_1">
<title>abc</title>
</article>
<article id="foo_2">
<title>def</title>
</article>
</articles>
2.xml
<?xml version="1.0" encoding="UTF-8"?>
<tests>
<test id="foo_1" name="top_1">
</test>
<test id="foo_2" name="top_2">
</test>
</tests>
Use SimpleXML to create an object representation of the file. You can then loop through the elements of the Simple XML object.
Depending on the format of the XML file:
Assuming it is:
<xml>
<roomTypeCode>
<stuff>stuff</stuff>
<name>Skunkman</name>
</roomTypeCode>
<roomTypeCode>
<stuff>other stuff</stuff>
<name>Someone Else</name>
</roomTypeCode>
</xml>
It would be something like this:
$xml = simplexml_load_file('xmlfile.xml');
for($i = 0; $i < count($xml->roomTypeCode); $i++)
{
if($xml->roomTypeCode[$i]->stuff == "stuff")
{
$name = $xml->roomTypeCode[$i]->name;
}
}
That connects to the XML file, finds how many roomTypeCode entries there are, searches for the value of "stuff" within and when it matches it correctly, you can access anything having to do with that XML entry.

Categories