Why is my recursive loop creating too many children? - php

I'm using a PHP recursive loop to parse through an XML document to create a nested list, however for some reason the loop is broken and creating duplicates of elements within the list, as well as blank elements.
The XML (a list of family tree data) is structured as follows:
<?xml version="1.0" encoding="UTF-8"?>
<family>
<indi>
<id>id1</id>
<fn>Thomas</fn>
<bday></bday>
<dday></dday>
<spouse></spouse>
<family>
<indi>
<id>id1</id>
<fn>Alexander</fn>
<bday></bday>
<dday></dday>
<spouse></spouse>
<family>
</family>
</indi>
<indi>
<id>id1</id>
<fn>John</fn>
<bday></bday>
<dday></dday>
<spouse></spouse>
<family>
<indi>
<id>id1</id>
<fn>George</fn>
<bday></bday>
<dday></dday>
<spouse></spouse>
<family>
</family>
</indi>
</family>
</indi>
</family>
</indi>
</family>
And here's my PHP loop, which loads the XML file then loops through it to create a nested ul:
<?php
function outputIndi($indi) {
echo '<li>';
$id = $indi->getElementsByTagName('id')->item(0)->nodeValue;
echo '<span class="vcard person" id="' . $id . '">';
$fn = $indi->getElementsByTagName('fn')->item(0)->nodeValue;
$bday = $indi->getElementsByTagName('bday')->item(0)->nodeValue;
echo '<span class="edit fn">' . $fn . '</span>';
echo '<span class="edit bday">' . $bday . '</span>';
// ...
echo '</span>';
echo '<ul>';
$family = $indi->getElementsByTagName('family');
foreach ($family as $subIndi) {
outputIndi($subIndi);
}
echo '</ul></li>';
}
$doc = new DOMDocument();
$doc->load('armstrong.xml');
outputIndi($doc);
?>
EDIT here's the desired outcome (nested lists, with ul's signifying families and li's signifying individuals)
<ul>
<li>
<span class="vcard">
<span class="fn">Thomas</span>
<span class="bday"></span>
<span class="dday"></span>
<ul>
... repeat for all ancestors ...
</ul>
<li>
<ul>
You can see the output at http://chris-armstrong.com/gortin . Any ideas where I'm going wrong? I think it's something to do with the $subIndi value, but anytime I try and change it I get an error. Would really appreciate any help!

Sounds perfect! Could you give me an
example? Does this mean I can save the
data as XML, then load it in as nested
ul's?
Yes, you can do exactly that. Here's an XSL which renders nested UL's:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<body>
<h2>Family tree</h2>
<ul>
<li><xsl:value-of select="indi/fn" /></li>
<!-- apply-templates will select all the indi/family nodes -->
<xsl:apply-templates select="indi/family" />
</ul>
</body>
</html>
</xsl:template>
<xsl:template match="family">
<ul>
<li>
<div>
<xsl:value-of select="id" />: <xsl:value-of select="fn" />
(<xsl:variable name="bday" select="bday" />
to
<xsl:variable name="dday" select="dday" />)
</div>
</li>
<!-- This node matches the 'family' nodes, and we're going to apply-templates on the inner 'family' node,
so this is the same thing as recursion. -->
<xsl:apply-templates select="family" />
</ul>
</xsl:template>
</xsl:stylesheet>
I don't know php, but this article will show you how to transform XML using the style sheet above.
You can also link your style sheet by adding a stylesheet directive at the top of your XML file (see for an example).

getElementsByTagName will give you all nodes, not just immediate children:
$family = $indi->getElementsByTagName('family');
foreach ($family as $subIndi) {
outputIndi($subIndi);
}
You will call outputIndi() for grand children, etc repeatedly.
Here is an example (from another stackoverflow question):
for ($n = $indi->firstChild; $n !== null; $n = $n->nextSibling) {
if ($n instanceof DOMElement && $n->tagName == "family") {
outputIndi($n);
}
}

Replace this
$family = $indi->getElementsByTagName('family');
foreach ($family as $subIndi) {
outputIndi($subIndi);
}
by this
if(!empty($indi))
foreach($indi as $subIndi){
outputIndi($subIndi);
}
I realize
if($indi->hasChildNodes())
is better than
if(!empty($indi))

Related

Replace span's in PHP but keep content inside

I have the following string:
<span style="font-size: 13px;">
<span style="">
<span style="">
<span style="font-family: Roboto, sans-serif;">
<span style="">
Some text content
</span>
</span>
</span>
</span>
</span>
and I want to change this string to the following using PHP:
<span style="font-size: 13px;">
<span style="font-family: Roboto, sans-serif;">
Some text content
</span>
</span>
I dont have any idea, how to do that, because when I try to use str_replace to replace the <span style=""> I dont know, how to replace the </span> and keep the content inside. My next problem is, that I dont know exactly, how much <span style=""> I have in my string. I also have not only 1 of this blocks in my string.
Thanks in advance for your help, and maybe sorry for my stupid question - I'm still learning.
This is easily done with a proper HTML parser. PHP has DOMDocument which can parse X/HTML into the Document Object Model which can then be manipulated how you want.
The trick to solving this problem is being able to recursively traverse the DOM tree, seeking out each node, and replacing the ones you don't want. To this I've written a short helper method by extending DOMDocument here...
$html = <<<'HTML'
<span style="font-size: 13px;">
<span style="">
<span style="">
<span style="font-family: Roboto, sans-serif;">
<span style="">
Some text content
</span>
</span>
</span>
</span>
</span>
HTML;
class MyDOMDocument extends DOMDocument {
public function walk(DOMNode $node, $skipParent = false) {
if (!$skipParent) {
yield $node;
}
if ($node->hasChildNodes()) {
foreach ($node->childNodes as $n) {
yield from $this->walk($n);
}
}
}
}
libxml_use_internal_errors(true);
$dom = new MyDOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$keep = $remove = [];
foreach ($dom->walk($dom->childNodes->item(0)) as $node) {
if ($node->nodeName !== "span") { // we only care about span nodes
continue;
}
// we'll get rid of all span nodes that don't have the style attribute
if (!$node->hasAttribute("style") || !strlen($node->getAttribute("style"))) {
$remove[] = $node;
foreach($node->childNodes as $child) {
$keep[] = [$child, $node];
}
}
}
// you have to modify them one by one in reverse order to keep the inner nodes
foreach($keep as [$a, $b]) {
$b->parentNode->insertBefore($a, $b);
}
foreach($remove as $a) {
if ($a->parentNode) {
$a->parentNode->removeChild($a);
}
}
// Now we should have a rebuilt DOM tree with what we expect:
echo $dom->saveHTML();
Output:
<span style="font-size: 13px;">
<span style="font-family: Roboto, sans-serif;">
Some text content
</span>
</span>
For a more general way to modify HTML document, take a look at XSLT (Extensible Stylesheet Language Transformations). PHP has a XSLT library.
You then have an XML document with your transform rules in place:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="html" indent="yes"/>
<!-- remove spans with empty styles -->
<xsl:template match="*[#style and string-length(./#style) = 0]">
<xsl:apply-templates />
</xsl:template>
<!-- catch all to copy any elements that aren't matched in other templates -->
<xsl:template match="*">
<xsl:copy select=".">
<!-- copy the attributes of the element -->
<xsl:copy-of select="#*" />
<!-- continue applying templates to this element's children -->
<xsl:apply-templates select="*" />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Then your PHP:
$sourceHtml = new DOMDocument();
$sourceHtml->load('source.html');
$xsl = new DOMDocument();
$xsl->load('transform.xsl');
$xsltProcessor = new XSLTProcessor;
$xsltProcessor->importStyleSheet($xsl); // attach the xsl rules
echo $xsltProcessor->transformToXML($sourceHtml);
$transformedHtml = $xsltProcessor->transformToDoc($sourceHtml);
$transformedHtml->saveHTMLFile('transformed.html');
XSLT is superpowerful for this kind of thing, and you can set all sorts of rules for parent/sibling relationships, and modify attributes and content accordingly.

How to read all nodes of a XML file with more than three nodes levels using PHP?

I am currently writing a PHP script to read all nodes of a XML file with more than three node levels (depth > 2). However I only could read accurately upto first level child.
I would highly appreciate, if you could let me know the error I have made while I am trying to read second level child nodes.
My xml file
<?xml version="1.0" encoding="utf-8"?>
<document nipperstudio="2.3.10.3500" xmlversion="2" xmlrevision="3">
<report>
<part index="1" title="Your Report" ref="YOURREPORT">
<section index="1.1" title="Introduction" ref="INTRODUCTION">
<text>Inside the section 1.1.:</text>
<list type="bullet">
<listitem>detailed description of list item 01;</listitem>
<listitem>detailed description of list item 02;</listitem>
</list>
</section>
<section index="1.2" title="Report Conventions" ref="REPORTCONVENTIONS">
<text>This report makes use of the text conventions.</text>
<table index="3" title="Report text conventions" ref="REPORTTEXTCONVENTIONS">
<headings>
<heading>Convention</heading>
<heading>Description</heading>
</headings>
</table>
</section>
</part>
<part index="2" title="Security Audit" ref="SECURITYAUDIT">
<section index="2.1" title="Introduction" ref="INTRODUCTION">
<text>Inside the section 2.1.:</text>
<list type="bullet">
<listitem>detailed description of list item 01;</listitem>
<listitem>detailed description of list item 02;</listitem>
</list>
<section index="2.1.1" title="Issue Overview" ref="ISSUEOVERVIEW">
<text>Inside the section 2.1.1</text>
<text title="Issue Finding">The is the body text of 2.1.1.</text>
</section>
<section index="2.1.2" title="Rating Overview" ref="RATINGSYSTEM">
<text>Inside the section 2.1.1</text>
<text title="Issue Finding">The is the body text of 2.1.1.</text>
</section>
</section>
<section index="2.2" title="section title" ref="SECTION2.2">
<section index="2.2.1" title="Finding" ref="FINDING">
<text>Inside the section 2.2.1</text>
<text title="Issue Finding">The is the body text of 2.2.1.</text>
</section>
</section>
</part>
</report>
</document>
My PHP Script is given below.
Test XML Reader
<html>
<title>Test XML Reader</title>
<body>
<p>Output from xmlreader</p>
<?php
readXmlFiles();
?>
</body>
</html>
<?php
function readXmlFiles(){
// create the reader object
$reader = new XMLReader();
// reader the XML file.
$reader->open("./fwxml/03.xml"); //open the xml file to read.
while($reader->read()) {
switch($reader->nodeType) {
case (XMLREADER::ELEMENT):
if ($reader->localName == 'report') { //read the local name of the node
$node = $reader->expand();
$dom = new DomDocument();
$n = $dom->importNode($node,true);
$dom->appendChild($n);
foreach(($dom->getElementsByTagName('part')) as $fwpart) {
$parttitle = $fwpart->getAttribute('title');
echo "=====".$parttitle."=====<br>";
foreach(($fwpart->childNodes) as $cnode){
if($cnode->nodeName == 'section'){
$index = $cnode->getAttribute('index');
$title = $cnode->getAttribute('title');
$ref = $cnode->getAttribute('ref');
echo "Index = " .$index."<br>";
echo "Title = " .$title."<br>";
echo "Ref = " .$ref."<br>";
$fwsec = $dom->getElementsByTagName('section');
echo $fwsec->item(0)->nodeValue."<br>";
echo "<br><br><br>";
}//end of if
}//end of foreach
}
}
break; //end of XMLREADER::ELEMENT
case (XMLREADER::END_ELEMENT):
// do something based on when the element closes.
break;
}
}
} //end of function
?>

Preg replace divs with li but keep class active

Trying to get my head around how to create a PHP preg replace for a string that will convert
<div class="active make_link">1</div>
<div class="make_link digit">2</div>
<div class="make_link digit">3</div>
etc
to
<li class="active">1</li>
<li>2</li>
<li>3</li>
etc
Figured out how to replace the elements but not how to keep the class active.
$new_pagination = preg_replace('/<div[^>]*>(.*)<\/div>/U', '<li>$1</li>', $old_pagination);
Any ideas?
Try this..You can do this using str_ireplace too
<?php
$html='<div class="active make_link">1</div>
<div class="make_link digit">2</div>
<div class="make_link digit">3</div>';
echo str_ireplace(array('<div','</div','class="active make_link"','class="make_link digit"'),array('<li','</li','active',''),$html);
Or simple html dom:
require_once('simple_html_dom.php');
$doc = str_get_html($string);
foreach($doc->find('div') as $div){
$div->tag = 'li';
preg_match('/active/', $div->class, $m);
$div->class = #$m[0];
}
echo $doc;
This may seem a bit excessive, but it's a good use-case for XSLT:
$xslt = <<<XML
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="#*|node()">
<xsl:copy><xsl:apply-templates select="#*|node()" /></xsl:copy>
</xsl:template>
<xsl:template match="div">
<li>
<xsl:if test="#*[name()='class' and contains(., 'active')]">
<xsl:attribute name="class">active</xsl:attribute>
</xsl:if>
<xsl:apply-templates select="node()" />
</li>
</xsl:template>
</xsl:stylesheet>
XML;
It uses the identity rule and then overrides handling for <div>, adding a class="active" for nodes that have such a class name.
$xsl = new XSLTProcessor;
$doc = new DOMDocument;
$doc->loadXML($xslt);
$xsl->importStyleSheet($doc);
$doc = new DOMDocument;
$html = <<<HTML
<div class="active make_link">1</div>
<div class="make_link digit">2<div>test</div></div>
<div class="make_link digit">3</div>
HTML;
$doc->loadHTML($html);
echo $xsl->transformToDoc($doc)->saveHTML();

PHP function to compare equality of XML elements

I have an XML file which is supposed to be my phones contacts backup and I am trying to create a php file to retrieve only the contacts that have a phone number assigned to them. The file contains contacts from different applications.
The XML has these elements:
<Contact>
<Id>5238</Id>
<GivenName>friend1</GivenName>
<FullName>friendA</FullName>
<CreateTime>0001-01-01T00:00:00+00:00</CreateTime>
<ModifyTime>0001-01-01T00:00:00+00:00</ModifyTime>
<Starred>false</Starred>
<AccountName>SIM</AccountName>
<AccountType>com.anddroid.contacts.sim</AccountType>
</Contact>
<PhoneNumbers>
<Id>53</Id>
<ContactId>1380</ContactId>
<Name>2</Name>
<Value>07123456789</Value>
<Primary>2</Primary>
</PhoneNumbers>
<Contact>
<Id>328</Id>
<FamilyName>tee</FamilyName>
<GivenName>friend2</GivenName>
<FullName>friend2 tee</FullName>
<CreateTime>0001-01-01T00:00:00+00:00</CreateTime>
<ModifyTime>0001-01-01T00:00:00+00:00</ModifyTime>
<Picture>18948</Picture>
<Starred>false</Starred>
<AccountName>xxxxxxx#hotmail.com</AccountName>
<AccountType>com.htc.socialnetwork.facebook</AccountType>
</Contact>
And I want to make a php file that will retrieve the FullName from Contact and the Value from PhoneNumbers where the Contact/Id matches the PhoneNumbers/ContactId.
I created this code:
<?php
$xml = simplexml_load_file("Contact.xml");
$i=0;
$k=0;
foreach ($xml->Contact as $contact) {
if ($contact->AccountName == "SIM"){
echo "Contact: " . $k . "<br /> "; echo $contact->nodeValue[$k] . "<br /> " . $contact->FullName . "<br /> ";
$k++;
}
}
foreach ($xml->PhoneNumbers as $number) {
echo "Contact: " . $i . "<br /> "; echo $number->Value . "<br /> ";
$i++;
}
?>
It outputs 53 contacts and 173 numbers. If I dont put the if ($contact->AccountName == "SIM") it outputs the same numbers but 700++ contacts. I just want some help producing a function or something to output only the contacts that I already have their phone number.
Any help is appreciated.
Thank you
I would suggest to use a XSL-stylesheet:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
<xsl:template match="/">
<ul><xsl:apply-templates/></ul>
</xsl:template>
<xsl:template match="Contact">
<!-- select phonenumbers with the matching ContactId -->
<xsl:variable name="numbers" select="//PhoneNumbers[ContactId=current()/Id]"/>
<!-- when any matching PhoneNumber has been found, continue -->
<xsl:if test="count($numbers) > 0">
<li>
<xsl:value-of select="FullName"/>
<ul>
<!-- call a named template with the matching PhoneNumbers as param -->
<xsl:call-template name="printNumbers">
<xsl:with-param name="numbers" select="$numbers" />
</xsl:call-template>
</ul>
</li>
</xsl:if>
</xsl:template>
<xsl:template name="printNumbers">
<xsl:param name="numbers" />
<!-- loop through PhoneNumbers and print the Value -->
<xsl:for-each select="$numbers">
<li><xsl:value-of select="Value" /></li>
</xsl:for-each>
</xsl:template>
<xsl:template match="PhoneNumbers"/>
</xsl:stylesheet>
How to use the stylesheet:
<?php
$doc = new DOMDocument();
$xsl = new XSLTProcessor();
$doc->load('path/to/stylesheet.xsl');
$xsl->importStyleSheet($doc);
$doc->load('Contact.xml');
echo $xsl->transformToXML($doc);
?>

Parse XML to a nested list with PHP

I have an XML file which contains family tree data in a nested structure, and I'm wanting to parse it into a nested list.
I have the following code
<?php
$doc = new DOMDocument();
$doc->load('armstrong.xml');
echo $doc->saveXML();
?>
Which loads in the following XML file and prints it as-is
<?xml version="1.0" encoding="UTF-8"?>
<indi>
<id>id1</id>
<fn>Matt</fn>
<bday>1919</bday>
<dday>2000</dday>
<spouse>Evelyn Ross</spouse>
<family>
<indi>
<id>id2</id>
<fn>Jane</fn>
<bday></bday>
<dday></dday>
<spouse></spouse>
<family>
</family>
</indi>
<indi>
<id>id3</id>
<fn>Jason</fn>
<bday></bday>
<dday></dday>
<spouse></spouse>
<family>
</family>
</indi>
<indi>
<id>id4</id>
<fn>Samuel</fn>
<bday></bday>
<dday></dday>
<spouse></spouse>
<family>
<indi>
<id>id5</id>
<fn>John</fn>
<bday></bday>
<dday></dday>
<spouse></spouse>
<family>
</family>
</indi>
<indi>
<id>id6</id>
<fn>John</fn>
<bday></bday>
<dday></dday>
<spouse></spouse>
<family>
</family>
</indi>
</family>
</indi>
</family>
However I want to parse it into the following format:
<ul>
<li>
<span class="vcard person" id="id1">
<span class="edit fn">Matt</span>
<span class="edit bday">1956</span>
<span class="edit dday"></span>
<span class="edit spouse">Eunace Fulton</span>
</span>
<ul> ... List of Family ... </ul>
</li>
</ul>
I'm pretty new to php, so if this is an incredibly simple problem I apologise! Would really appreciate any ideas.
EDIT
I'm now using the following recursive loop but still having problems
$doc = new DOMDocument();
$doc->load('armstrong.xml');
function outputIndi($indi) {
$i = new DOMDocument();
$i = $indi;
echo '<li>';
echo '<span class="edit fn">' . $indi->getElementsByTagName("fn") . '</span>'; // name not a real attribute, must access through DOM
echo '<span class="edit bday">' . $indi->getElementsByTagName("bday") . '</span>'; // ditto
// ...
echo '<ul>';
foreach ($indi->getElementsByTagName("family") as $subIndi) { // again, family not a real attribute
outputIndi($subIndi);
}
echo '</ul>';
echo '</li>';
}
outputIndi($doc->documentRoot);
?>
Here's your code. You'll need to add the rest of the attributes (dday, spouse)
RECURSION!
function outputIndi($indi) {
echo '<li>';
$id = $indi->getElementsByTagName('id')->item(0)->nodeValue;
echo '<span class="vcard person" id="' . $id . '">';
$fn = $indi->getElementsByTagName('fn')->item(0)->nodeValue;
$bday = $indi->getElementsByTagName('bday')->item(0)->nodeValue;
echo '<span class="edit fn">' . $fn . '</span>';
echo '<span class="edit bday">' . $bday . '</span>';
// ...
echo '<ul>';
$family = $indi->getElementsByTagName('family')->item(0)->childNodes;
foreach ($family as $subIndi) {
outputIndi($subIndi);
}
echo '</ul>';
echo '</span>';
echo '</li>';
}
$doc = new DOMDocument();
$doc->load('armstrong.xml');
outputIndi($doc->documentElement);
You see, it outputs all information about an "indi", loops through each child of <family>, and calls itself on that. Does that make sense?

Categories