how can i retrieve data from nested xml node using php? - php

I am new in xml and data retrieve and i have problem with this code.
XML code:
<?xml version="1.0" encoding="UTF-8"?>
<site>
<page>
<content>
<P>
<FONT size="2" face="Tahoma">
<STRONG>text...</STRONG>
</FONT>
</P>
<P>
<FONT size="2" face="Tahoma">text....</FONT>
</P>
<P align="center">
<IMG style="WIDTH: 530px" border="1" alt="" src="http://www.alkul.com/online/2014/5/6/child%20disorder.jpg">
</P>
<P>
<STRONG>
<FONT size="2" face="Tahoma">text3</FONT>
</STRONG>
</P>
<P>
<STRONG>
<FONT size="2" face="Tahoma">text1</FONT>
</STRONG>
</P>
</content>
</page>
</site>
php code:
<?php
$html = "";
$url = "Data.xml";
$xml = simplexml_load_file($url);
for ($i = 0; $i<10; $i++) {
$title = $xml->page[$i]->content->P->FONT;
$html .= "<p>$title</p>";
}
echo $html;
I just need to display the content of content node but the output is empty

First of all, the provided XML is not valid as you should receive the following error:
Warning: simplexml_load_string(): Entity: line 8: parser error : Opening and ending tag mismatch: IMG line 8 and P
In XML the IMG element needs to be closed like this:
<IMG style="WIDTH: 530px" border="1" alt="" src="http://www.alkul.com/online/2014/5/6/child%20disorder.jpg"/>
Note the forward slash at the end of the element.
If you do not see that error, please look in your error log or enable error reporting in PHP.
Now the XML can be parsed by SimpleXML. I ended up with this:
$pList = $xml->xpath('./page/content/P');
foreach ($pList as $pElement) {
$text = strip_tags($pElement->asXML());
echo $text . "<br>";
}
It selects all the P elements into $pList and iterates over the list. For each element it takes the XML and strips all tags from it, leaving you with just the "inner text" for each element.
Lastly, I'd suggest you use the PHP Simple HTML DOM Parser as it is quite easy to use and more tailored towards scraping data from HTML.

If you only want to display what is in the content node so here is your code
<?php
$html = "";
$url = "data.xml";
$xml = simplexml_load_file($url);
$title = $xml->page->content->asXML();
$html .= "<p>$title</p>";
echo $html;

You have HTML inside an XML node. This needs XML encoding, normally done with a CDATA block. You then can just use the $xml->page->content element with echo or by casting it to string.
XML (take note of the <![CDATA[ ... ]]> part):
<?xml version="1.0" encoding="UTF-8"?>
<site>
<page>
<content><![CDATA[
<P>
<FONT size="2" face="Tahoma">
<STRONG>text...</STRONG>
</FONT>
</P>
<P>
<FONT size="2" face="Tahoma">text....</FONT>
</P>
<P align="center">
<IMG style="WIDTH: 530px" border="1" alt="" src="http://www.alkul.com/online/2014/5/6/child%20disorder.jpg">
</P>
<P>
<STRONG>
<FONT size="2" face="Tahoma">text3</FONT>
</STRONG>
</P>
<P>
<STRONG>
<FONT size="2" face="Tahoma">text1</FONT>
</STRONG>
</P>
]]></content>
</page>
</site>
PHP:
$xml = simplexml_load_file($url);
$firstTenPages = new LimitIterator(new IteratorIterator($xml->page), 0, 10);
foreach ($firstTenPages as $page)
{
echo $page->content;
}

Related

Replace span's in PHP but keep content inside

I have the following string:
<span style="font-size: 13px;">
<span style="">
<span style="">
<span style="font-family: Roboto, sans-serif;">
<span style="">
Some text content
</span>
</span>
</span>
</span>
</span>
and I want to change this string to the following using PHP:
<span style="font-size: 13px;">
<span style="font-family: Roboto, sans-serif;">
Some text content
</span>
</span>
I dont have any idea, how to do that, because when I try to use str_replace to replace the <span style=""> I dont know, how to replace the </span> and keep the content inside. My next problem is, that I dont know exactly, how much <span style=""> I have in my string. I also have not only 1 of this blocks in my string.
Thanks in advance for your help, and maybe sorry for my stupid question - I'm still learning.
This is easily done with a proper HTML parser. PHP has DOMDocument which can parse X/HTML into the Document Object Model which can then be manipulated how you want.
The trick to solving this problem is being able to recursively traverse the DOM tree, seeking out each node, and replacing the ones you don't want. To this I've written a short helper method by extending DOMDocument here...
$html = <<<'HTML'
<span style="font-size: 13px;">
<span style="">
<span style="">
<span style="font-family: Roboto, sans-serif;">
<span style="">
Some text content
</span>
</span>
</span>
</span>
</span>
HTML;
class MyDOMDocument extends DOMDocument {
public function walk(DOMNode $node, $skipParent = false) {
if (!$skipParent) {
yield $node;
}
if ($node->hasChildNodes()) {
foreach ($node->childNodes as $n) {
yield from $this->walk($n);
}
}
}
}
libxml_use_internal_errors(true);
$dom = new MyDOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$keep = $remove = [];
foreach ($dom->walk($dom->childNodes->item(0)) as $node) {
if ($node->nodeName !== "span") { // we only care about span nodes
continue;
}
// we'll get rid of all span nodes that don't have the style attribute
if (!$node->hasAttribute("style") || !strlen($node->getAttribute("style"))) {
$remove[] = $node;
foreach($node->childNodes as $child) {
$keep[] = [$child, $node];
}
}
}
// you have to modify them one by one in reverse order to keep the inner nodes
foreach($keep as [$a, $b]) {
$b->parentNode->insertBefore($a, $b);
}
foreach($remove as $a) {
if ($a->parentNode) {
$a->parentNode->removeChild($a);
}
}
// Now we should have a rebuilt DOM tree with what we expect:
echo $dom->saveHTML();
Output:
<span style="font-size: 13px;">
<span style="font-family: Roboto, sans-serif;">
Some text content
</span>
</span>
For a more general way to modify HTML document, take a look at XSLT (Extensible Stylesheet Language Transformations). PHP has a XSLT library.
You then have an XML document with your transform rules in place:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="html" indent="yes"/>
<!-- remove spans with empty styles -->
<xsl:template match="*[#style and string-length(./#style) = 0]">
<xsl:apply-templates />
</xsl:template>
<!-- catch all to copy any elements that aren't matched in other templates -->
<xsl:template match="*">
<xsl:copy select=".">
<!-- copy the attributes of the element -->
<xsl:copy-of select="#*" />
<!-- continue applying templates to this element's children -->
<xsl:apply-templates select="*" />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Then your PHP:
$sourceHtml = new DOMDocument();
$sourceHtml->load('source.html');
$xsl = new DOMDocument();
$xsl->load('transform.xsl');
$xsltProcessor = new XSLTProcessor;
$xsltProcessor->importStyleSheet($xsl); // attach the xsl rules
echo $xsltProcessor->transformToXML($sourceHtml);
$transformedHtml = $xsltProcessor->transformToDoc($sourceHtml);
$transformedHtml->saveHTMLFile('transformed.html');
XSLT is superpowerful for this kind of thing, and you can set all sorts of rules for parent/sibling relationships, and modify attributes and content accordingly.

How can print node value XML in an orderly way PHP

I have a XML like this:
<description>
<heading id="01"> Math </heading>
<p id="01"> Text 1 </p>
<heading id="02"> History </heading>
<p id="02"> Text 2</p>
<p id="03"> Text 3</p>
<heading id="03"> Biology </heading>
<p id="04"> Text 4 </p>
</description>
I also have many xml files have structure like this one, they are only different from amount of <p> node of every <heading> node.
How can I print <heading> and some <p> node and the second heading....
I tried to use foreach, but it's not true.
my code:
<?php
$xml=simplexml_load_file("NWB2.xml") or die("Error: Cannot create object");
echo "<b>".$xml->{'description'}->{'heading'}."</b>";
echo "<p>".$xml->{'description'}->{'p'}."</p>";
?>
If it's just a case of having different style depending on the input tag, you can use a foreach() loop and output the value according to the tag type...
$xml=simplexml_load_file("NWB2.xml") or die("Error: Cannot create object");
foreach ( $xml as $type => $value ) {
if ($type == "p") {
echo $value;
}
else {
echo "<b>$value</b>";
}
}

Get img tag details using preg_replace [PHP]

This is my content:
<p><img src="http://localhost/contents/uploads/2017/11/1.jpg" width="215" height="1515"></p>
This is my PHP code:
function convert_the_content($content){
$content = preg_replace('/<p><img.+src=[\'"]([^\'"]+)[\'"].*>/i', "<p class=\"uploaded-img\"><img class=\"lazy-load\" data-src=\"$1\" /></p>", $content);
return $content;
}
I using my code to add a class for <p> tag and <img> tag and to convert src="" to data-src="".
The problem that my code has removed the width and the height attr from <img> tag, So my question is how can i change my code to work and getting this details with it too?
NOTE: My content may have many of <img> and <p> tags.
If you only have this very exact HTML snippet, you can do it simpler by just doing
$html = <<< HTML
<p><img src="http://localhost/contents/uploads/2017/11/1.jpg" width="215" height="1515"></p>
HTML;
$html = str_replace('<p>', '<p class="foo">', $html);
$html = str_replace(' src=', ' data-src=', $html);
echo $html;
This will output
<p class="foo"><img data-src="http://localhost/contents/uploads/2017/11/1.jpg" width="215" height="1515"></p>
If you are trying to convert arbitrary HTML, consider using a DOM Parser instead:
<?php
$html = <<< HTML
<html>
<body>
<p><img src="http://localhost/contents/uploads/2017/11/1.jpg" width="215" height="1515"></p>
<p><img width="215" height="1515" src="http://localhost/contents/uploads/2017/11/1.png"></p>
<p ><img
class="blah"
height="1515"
width="215"
src="http://localhost/contents/uploads/2017/11/1.png"
>
</p>
</body>
</html>
HTML;
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_use_internal_errors(false);
$xpath = new DOMXPath($dom);
foreach ($xpath->evaluate('//p[img]') as $paragraphWithImage) {
$paragraphWithImage->setAttribute('class', 'foo');
foreach ($paragraphWithImage->getElementsByTagName('img') as $image) {
$image->setAttribute('class', trim('bar ' . $image->getAttribute('class')));
$image->setAttribute('data-src', $image->getAttribute('src'));
$image->removeAttribute('src');
}
};
echo $dom->saveHTML($dom->documentElement);
Output:
<html><body>
<p class="foo"><img width="215" height="1515" class="bar" data-src="http://localhost/contents/uploads/2017/11/1.jpg"></p>
<p class="foo"><img width="215" height="1515" class="bar" data-src="http://localhost/contents/uploads/2017/11/1.png"></p>
<p class="foo"><img class="bar blah" height="1515" width="215" data-src="http://localhost/contents/uploads/2017/11/1.png"></p>
</body></html>

Regex - Replacing content - eZ Publish XML field

I have an Xml content that i want to modify before using the eZ Publish 5 API to create it.
I am trying to implement a Regex to modify the content.
Here is the Xml code that i have (with html entities) :
Print of Xml code http://img15.hostingpics.net/pics/453268xmlcode.jpg
I want to be able to catch empty.jpg in :
<img alt="" src="http://www.asite.org/empty.jpg" />
And replace the whole line for each occurrence by :
<custom name="my_checkbox"></custom>
Problem :
The img tag can sometimes contain other attributes like : height="15" width="12"
<img height="15" alt="" width="12" src="http://www.asite.org/empty.jpg" />
And sometimes the attributes are after the src attribute in a different order.
The aim would be :
Xml code - Aim http://img15.hostingpics.net/pics/318980xmlcodeaim.jpg
I've tried many things so far but nothing worked.
Thanks in advance for helping.
Cheers !
EDIT :
Here is an example of what i've tried so far :
/(<img [a-z = ""]* src="http:\/\/www\.asite\.org\/empty\.jpg" \/&gt)/g
Dealing with XML i've used an XML parser to reach the desired section.
Then we can apply a regex (~<img.*?>(?=</span)~) to select and replace the image tag with your custom tag (note that in the object received by the xml parser the html entities are replaces with their equivalent char).
This is a piece of code that emulates and handle your situation:
<?php
$xmlstr = <<<XML
<sections>
<section>
<paragraph>
<literal class="html">
<img alt="" src="http://asite.org/empty.png" /></span></span> Yes/no&nbsp;<br />
<img alt="" src="http://asite.org/empty.png" /></span></span> Other text/no&nbsp;<br />
</literal>
</paragraph>
</section>
</sections>
XML;
$sections = new SimpleXMLElement($xmlstr);
foreach ($sections->section->paragraph as $paragraph) {
$re = "~<img.*?>(?=</span)~";
$subst = "<custom name=\"my_checkbox\"></custom>";
$paragraph->literal = preg_replace($re, $subst, $paragraph->literal);
}
echo $sections->asXML();
?>
The output is:
<?xml version="1.0"?>
<sections>
<section>
<paragraph>
<literal class="html">
<custom name="my_checkbox"></custom></span></span> Yes/no&nbsp;<br />
<custom name="my_checkbox"></custom></span></span> Other text/no&nbsp;<br />
</literal>
</paragraph>
</section>
</sections>
An online demo can be found HERE

Xpath preserving break lines and other html tags

Below is the source of html page:
<h3>Background</h3>
<p>Example 1<br>Example 2<br> </br> <ul></li>ABC<li></ul>
</p>
<h3>Job Description</h3>
<p>content of job description</p>
This is xpath query:
//node()[preceding::h3[text()="Background"] and following-sibling::h3[text()="Job Description"]]
I need this output:
<p>Example 1<br>Example 2<br> </br> <ul></li>ABC<li></ul>
</p>
With simple you would need to do something like:
$html = str_get_html($str);
foreach($html->find('h3') as $h3){
if($h3->text() == 'Background'){
echo $h3->next_sibling();
}
}
// <p>Example 1<br>Example 2<br> </br> <ul></li>ABC<li></ul> </p>
You can't get there with Dom or Xpath because the html is too invalid (ul's inside of p's)
This line fixed the code. It now preserved break line tag and <li> tag.
//node()[preceding::h3[text()="Background"] and following-sibling::h3[text()="Job Description"]]/node()'
I have added /node() at the end of the string.

Categories