php DOMDocument - Get outer node and Protect <![CDATA[]] blocks as string - php

I have a xml file and some of it nodes has a CDATA Block like this:
<item>
<content>OneWord</content>
</item>
<item>
<content><![CDATA[Some Text or Serialized arrays]]></content>
</item>
And I tried to get outer node as bellow:
$file = 'file.xml';
$contents = file_get_contents( $file );
$dom = new DOMDocument( '1.0', 'utf-8' );
$dom->loadXML( $contents, LIBXML_NOCDATA );
$xpath = new DOMXPath( $dom );
// -- get outer
$item = $xpath->query( './item' )->item(1);
$str = $dom->saveXML($item);
var_dump($str);
And it print item node without CDATA block but I want that node has CDATA Blocks.
Thanks

Is it not as simple as removing the LIBXML_NOCDATA option ("Merge CDATA as text nodes")?
For me,
$dom = new DOMDocument( '1.0', 'utf-8' );
$dom->loadXML( $contents );
$xpath = new DOMXPath( $dom );
// -- get outer
$item = $xpath->query( './item' )->item(1);
$str = $dom->saveXML($item);
var_dump($str);
outputs
string '<item>
<content><![CDATA[Some Text or Serialized arrays]]></content>
</item>' (length=78)

Related

Xml Append/Add new node php

I'm trying to update and add node to a xml file. Currently I can create and overwrite the file with my new node, however what I need to do is do to is add the new node to the existing file (.xml) and I am exhausted. I am new to php (I've tried all every code on this site and this is my current code can't be added here ... please Help
$doc = new DOMDocument;
// Load the XML
///$doc->loadXML("<root/>");
//---- ///$xml = new Document;
///$xml ->loadXML($xml);
//$xml = simplexml_load_file("pole.xml");
$title = $_POST["title"];
$xml = <<<XML <item> <title>$title</title> </item> XML;
$xml = new Document;
$xml ->loadXML($xml);
$xml ->appendXML($xml);
$xml = new SimpleXMLElement($xml);
echo $xml->saveXML('pole.xml');
I can offer no advice for using SimpleXML but as the above does attempt at using DOMDocument perhaps the following simple example will be of use.
$filename='pole.xml';
# Stage 1
# -------
// Create an instance of DOMDocument and then
// generate whatever XML you need using DOMDocument
// and save.
libxml_use_internal_errors( true );
$dom=new DOMDocument('1.0','utf-8');
$dom->formatOutput=true;
$dom->preserveWhiteSpace=true;
$root=$dom->createElement('Root');
$dom->appendChild( $root );
$item=$dom->createElement('Item');
$title=$dom->createElement('title','Hello World');
$item->appendChild( $title );
$root->appendChild( $item );
$dom->save( $filename );
$dom=null;
This yields the following XML:
<?xml version="1.0" encoding="utf-8"?>
<Root>
<Item>
<title>Hello World</title>
</Item>
</Root>
To then modify the XML file you have created or downloaded etc:
# Stage 2
# -------
// A new instance of DOMDocument is NOT strictly necessary here
// if you are continuing to work with the generated XML but for the purposes
// of this example assume stage 1 and stage 2 are done in isolation.
// Find the ROOT node of the document and then add some more data...
// This simply adds two new simple nodes that have various attributes
// but could be considerably more complex in structure.
$dom=new DOMDocument;
$dom->formatOutput=true;
$dom->preserveWhiteSpace=false;
$dom->load( $filename );
# Find the Root node... !!important!!
$root=$dom->getElementsByTagName('Root')->item(0);
# add a new node
$item=$dom->createElement('Banana','Adored by monkeys');
$attributes=array(
'Origin' => 'Central America',
'Type' => 'Berry',
'Genus' => 'Musa'
);
foreach( $attributes as $attr => $value ){
$attr=$dom->createAttribute( $attr );
$attr->value=$value;
$item->appendChild( $attr );
}
#ensure that you add the new node to the dom
$root->appendChild( $item );
#new node
$item=$dom->createElement('Monkey','Enemies of Bananas');
$attributes=array(
'Phylum' => 'Chordata',
'Class' => 'Mammalia',
'Order' => 'Primates'
);
foreach( $attributes as $attr => $value ){
$attr=$dom->createAttribute( $attr );
$attr->value=$value;
$item->appendChild( $attr );
}
$root->appendChild( $item );
$dom->save( $filename );
$dom=null;
This modifies the XML file and yields the following:
<?xml version="1.0" encoding="utf-8"?>
<Root>
<Item>
<title>Hello World</title>
</Item>
<Banana Origin="Central America" Type="Berry" Genus="Musa">Adored by monkeys</Banana>
<Monkey Phylum="Chordata" Class="Mammalia" Order="Primates">Enemies of Bananas</Monkey>
</Root>

PHP XML parser CDATA keyword missing after parse

I have the following XML code which I want to read and get the value inside "content" tag.
"<?xml version='1.0' encoding='ISO-8859-1'?>
<ad modelVersion='0.9'>
<richmediaAd>
<content>
<![CDATA[<script src=\"mraid.js\"></script>
<div class=\"celtra-ad-v3\">
<img src=\"data: image/png, celtra\" style=\"display: none\"onerror=\"(function(img){ varparams={ 'channelId': '45f3f23c','clickUrl': 'http%3a%2f%2fexamplehost.com%3a53766%2fCloudMobRTBWeb%2fClickThroughHandler.ashx%3fadid%3de6983c95-9292-4e16-967d-149e2e77dece%26cid%3d352%26crid%3d850'};varreq=document.createElement('script');req.id=params.scriptId='celtra-script-'+(window.celtraScriptIndex=(window.celtraScriptIndex||0)+1);params.clientTimestamp=newDate/1000;req.src=(window.location.protocol=='https: '?'https': 'http')+': //ads.celtra.com/e7f5ce18/mraid-ad.js?';for(varkinparams){req.src+='&'+encodeURIComponent(k)+'='+encodeURIComponent(params[ k ]); }img.parentNode.insertBefore(req, img.nextSibling);})(this);\"/>
</div>]]>
</content>
<width>320</width>
<height>50</height>
</richmediaAd>
</ad>"
I tried 2 methods (SimpleXML and DOM). I managed to get the value but found the keyword "CDATA" missing. What I got inside "content" tag was:
<script src="mraid.js"></script>
<div class="celtra-ad-v3">
<img src="data: image/png, celtra" style="display: none"onerror="(function(img){ varparams={ 'channelId': '45f3f23c','clickUrl': 'http%3a%2f%2fexamplehost.com%3a53766%2fCloudMobRTBWeb%2fClickThroughHandler.ashx%3fadid%3de6983c95-9292-4e16-967d-149e2e77dece%26cid%3d352%26crid%3d850'};varreq=document.createElement('script');req.id=params.scriptId='celtra-script-'+(window.celtraScriptIndex=(window.celtraScriptIndex||0)+1);params.clientTimestamp=newDate/1000;req.src=(window.location.protocol=='https: '?'https': 'http')+': //ads.celtra.com/e7f5ce18/mraid-ad.js?';for(varkinparams){req.src+='&'+encodeURIComponent(k)+'='+encodeURIComponent(params[ k ]); }img.parentNode.insertBefore(req, img.nextSibling);})(this);"/>
</div>
I know the parser was trying to sort of "beautify" the XML by removing CDATA. But what I want is just the raw data with "CDATA" tag in it. Is there any way to achieve this?
Appreciate your help.
And below is my 2 methods for your reference:
Method 1:
$type = simplexml_load_string($response['adm']) or die("Error: Cannot create object");
$data = $type->richmediaAd[0]->content;
Yii::warning((string) $data);
Yii::warning(strpos($data, 'CDATA'));
Method 2:
$doc = new \DOMDocument();
$doc->loadXML($response['adm']);
$richmediaAds = ($doc->getElementsByTagName("richmediaAd"));
foreach($richmediaAds as $richmediaAd){
$contents = $richmediaAd->getElementsByTagName("content");
foreach($contents as $content){
Yii::warning($content->nodeValue);
}
}
I'll improve this if I can, but you can target explicitly the "CDATA Section" node of your content element and use $doc->saveXML( $node ) with the node as the parameter to get that exact XML element structure.
$doc = new \DOMDocument();
$doc->loadXML( $xml );
$xpath = new \DOMXPath( $doc );
$nodes = $xpath->query( '/ad/richmediaAd/content');
foreach( $nodes[0]->childNodes as $node )
{
if( $node->nodeType === XML_CDATA_SECTION_NODE )
{
echo $doc->saveXML( $node ); // string content
}
}
Edit: You may wish to support some redundancy if there is no CDATA found.
Without XPATH
$doc = new \DOMDocument();
$doc->loadXML( $xml );
$doc->normalize();
foreach( $doc->getElementsByTagName('content')->item(0)->childNodes as $node )
{
if( $node->nodeType === XML_CDATA_SECTION_NODE )
{
echo $doc->saveXML( $node ); // string content
}
}

Creating PHP -> XML with special character '&'

I'm trying to create an XML from PHP with special characters.
$xml = new SimpleXMLElement('<?xml version="1.0" encoding="UTF-8" ' . 'standalone="yes"?><Root/>');
$data->addChild('NAME', $variable);
$dom = new DOMDocument('1.0');
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->loadXML($xmlAusgabe = $xml->asXML());
$dom->save('../test.xml');
When there is a special character like '&' in it the output is empty.
I thought these characters are available in UTF-8.
Can someone help me?
When embed XML, always there are many problems :( try this:
Change
$dom->loadXML($xmlAusgabe = $xml->asXML());
by
$xmlAusgabe = $xml->asXML();
$xmlAusgabe = mb_convert_encoding( $xmlAusgabe, 'HTML-ENTITIES', 'UTF-8') ;
$dom->loadXML( $xmlAusgabe );
Then check encoding and fileencoding is utf8. If you use vim editor:
set encoding=utf-8
set fileencoding=utf-8
UPDATE
libxml_use_internal_errors( FALSE );
$xml = new SimpleXMLElement('<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Root/>');
$data->addChild('NAME', mb_convert_encoding( $variable, 'HTML-ENTITIES', 'UTF-8') );
$xmlAusgabe = $xml->asXML();
$dom = new DOMDocument();
$dom->loadXML( $xmlAusgabe );
$dom->save('../test.xml');
I check this code and run. To print error use:
libxml_get_errors()

Replace content of node using PHP and XPath

I have a string of 'source html' and a string of 'replacement html'. In the 'source html' I want to look for a node with a specific class and replace its content with my 'replacement html'. I have tried using the replaceChild method, but this seems to require that I traverse a level up (parentNode).
This doesn't work
$dom = new DOMDocument;
$dom->loadXml($sourceHTML);
$replacement = $dom->createDocumentFragment();
$replacement->appendXML($replacementHTML);
$xpath = new DOMXPath($dom);
$oldNode = $xpath->query('//div[contains(#class,"arrangement--index__field-dato")]')->item(0);
$oldNode->replaceChild($replacement, $oldNode);
This works, but it's not the content which is being replaced
$dom = new DOMDocument;
$dom->loadXml($sourceHTML);
$replacement = $dom->createDocumentFragment();
$replacement->appendXML($replacementHTML);
$xpath = new DOMXPath($dom);
$oldNode = $xpath->query('//div[contains(#class,"arrangement--index__field-dato")]')->item(0);
$oldNode->parentNode->replaceChild($replacement, $oldNode);
How do I replace the content or the node I have queried for?
Instead of replacing the child node, loop over it's children, drop them and insert the new content as child node. Something like
foreach ($oldNode->childNodes as $child)
$oldNode->removeChild($child);
$oldNode->appendChild($replacement);
This will replace the contents (children) instead of the node itself.
This seems to work!
$dom = new DOMDocument;
$dom->loadXml($sourceHTML);
$replacement = $dom->createDocumentFragment();
$replacement->appendXML($replacementHTML);
$xpath = new DOMXPath($dom);
$oldNode = $xpath->query('//div[contains(#class,"arrangement--index__field-dato")]')->item(0);
$oldNode->removeChild($oldNode->firstChild);
$oldNode->appendChild($replacement);

How to extract a node attribute from XML using PHP's DOM Parser

I've never really used the DOM parser before and now I have a question.
How would I go about extracting the URL from this markup:
<files>
<file path="http://www.thesite.com/download/eysjkss.zip" title="File Name" />
</files>
Using simpleXML:
$xml = new SimpleXMLElement($xmlstr);
echo $xml->file['path']."\n";
Output:
http://www.thesite.com/download/eysjkss.zip
To do it with DOM you do
$dom = new DOMDocument;
$dom->load( 'file.xml' );
foreach( $dom->getElementsByTagName( 'file' ) as $file ) {
echo $file->getAttribute( 'path' );
}
You can also do it with XPath:
$dom = new DOMDocument;
$dom->load( 'file.xml' );
$xPath = new DOMXPath( $dom );
foreach( $xPath->evaluate( '/files/file/#path' ) as $path ) {
echo $path->nodeValue;
}
Or as a string value directly:
$dom = new DOMDocument;
$dom->load( 'file.xml' );
$xPath = new DOMXPath( $dom );
echo $xPath->evaluate( 'string(/files/file/#path)' );
You can fetch individual nodes also by traversing the DOM manually
$dom = new DOMDocument;
$dom->preserveWhiteSpace = FALSE;
$dom->load( 'file.xml' );
echo $dom->documentElement->firstChild->getAttribute( 'path' );
Marking this CW, because this has been answered before multiple times (just with different elements), including me, but I am too lazy to find the duplicate.
you can use PHP Simple HTML DOM Parser,this is a php library。http://simplehtmldom.sourceforge.net/

Categories