PHP XML parser CDATA keyword missing after parse - php

I have the following XML code which I want to read and get the value inside "content" tag.
"<?xml version='1.0' encoding='ISO-8859-1'?>
<ad modelVersion='0.9'>
<richmediaAd>
<content>
<![CDATA[<script src=\"mraid.js\"></script>
<div class=\"celtra-ad-v3\">
<img src=\"data: image/png, celtra\" style=\"display: none\"onerror=\"(function(img){ varparams={ 'channelId': '45f3f23c','clickUrl': 'http%3a%2f%2fexamplehost.com%3a53766%2fCloudMobRTBWeb%2fClickThroughHandler.ashx%3fadid%3de6983c95-9292-4e16-967d-149e2e77dece%26cid%3d352%26crid%3d850'};varreq=document.createElement('script');req.id=params.scriptId='celtra-script-'+(window.celtraScriptIndex=(window.celtraScriptIndex||0)+1);params.clientTimestamp=newDate/1000;req.src=(window.location.protocol=='https: '?'https': 'http')+': //ads.celtra.com/e7f5ce18/mraid-ad.js?';for(varkinparams){req.src+='&'+encodeURIComponent(k)+'='+encodeURIComponent(params[ k ]); }img.parentNode.insertBefore(req, img.nextSibling);})(this);\"/>
</div>]]>
</content>
<width>320</width>
<height>50</height>
</richmediaAd>
</ad>"
I tried 2 methods (SimpleXML and DOM). I managed to get the value but found the keyword "CDATA" missing. What I got inside "content" tag was:
<script src="mraid.js"></script>
<div class="celtra-ad-v3">
<img src="data: image/png, celtra" style="display: none"onerror="(function(img){ varparams={ 'channelId': '45f3f23c','clickUrl': 'http%3a%2f%2fexamplehost.com%3a53766%2fCloudMobRTBWeb%2fClickThroughHandler.ashx%3fadid%3de6983c95-9292-4e16-967d-149e2e77dece%26cid%3d352%26crid%3d850'};varreq=document.createElement('script');req.id=params.scriptId='celtra-script-'+(window.celtraScriptIndex=(window.celtraScriptIndex||0)+1);params.clientTimestamp=newDate/1000;req.src=(window.location.protocol=='https: '?'https': 'http')+': //ads.celtra.com/e7f5ce18/mraid-ad.js?';for(varkinparams){req.src+='&'+encodeURIComponent(k)+'='+encodeURIComponent(params[ k ]); }img.parentNode.insertBefore(req, img.nextSibling);})(this);"/>
</div>
I know the parser was trying to sort of "beautify" the XML by removing CDATA. But what I want is just the raw data with "CDATA" tag in it. Is there any way to achieve this?
Appreciate your help.
And below is my 2 methods for your reference:
Method 1:
$type = simplexml_load_string($response['adm']) or die("Error: Cannot create object");
$data = $type->richmediaAd[0]->content;
Yii::warning((string) $data);
Yii::warning(strpos($data, 'CDATA'));
Method 2:
$doc = new \DOMDocument();
$doc->loadXML($response['adm']);
$richmediaAds = ($doc->getElementsByTagName("richmediaAd"));
foreach($richmediaAds as $richmediaAd){
$contents = $richmediaAd->getElementsByTagName("content");
foreach($contents as $content){
Yii::warning($content->nodeValue);
}
}

I'll improve this if I can, but you can target explicitly the "CDATA Section" node of your content element and use $doc->saveXML( $node ) with the node as the parameter to get that exact XML element structure.
$doc = new \DOMDocument();
$doc->loadXML( $xml );
$xpath = new \DOMXPath( $doc );
$nodes = $xpath->query( '/ad/richmediaAd/content');
foreach( $nodes[0]->childNodes as $node )
{
if( $node->nodeType === XML_CDATA_SECTION_NODE )
{
echo $doc->saveXML( $node ); // string content
}
}
Edit: You may wish to support some redundancy if there is no CDATA found.
Without XPATH
$doc = new \DOMDocument();
$doc->loadXML( $xml );
$doc->normalize();
foreach( $doc->getElementsByTagName('content')->item(0)->childNodes as $node )
{
if( $node->nodeType === XML_CDATA_SECTION_NODE )
{
echo $doc->saveXML( $node ); // string content
}
}

Related

Plesk XML formatting

In PHP i have this code for making a XML header for the plesk API.
$request = <<<EOF
<packet version="1.6.7.0">
<mail>
<update>
<set>
<filter>
<site-id>$site_id</site-id>
<mailname>
<name>$name</name>
<autoresponder>
<enabled>true</enabled>
<subject>$subject</subject>
<text>$mail_body</text>
<end_date>$date</end_date>
</autoresponder>
</mailname>
</filter>
</set>
</update>
</mail>
</packet>
EOF;
However i get this response: 1014 Parser error: Cannot parse the XML from the source specified
I have put the xml into a formatting of 2, 3 ,4 and tab spacing and it doesnt seem to be able to parse it.
What am i doing wrong?
You can't guess to create a valid XML by string concatenation, especially when you have complex contents like an email text.
No all characters are allowed inside XML tags: you have to properly escape not-allowed characters. Fortunately, php have some parser that do this job for you.
First of all, create an empty XML template (check its validity using a XML validator):
$xml = '<?xml version="1.0" encoding="utf-8" ?>
<packet version="1.6.7.0">
<mail>
<update>
<set>
<filter>
<site-id/>
<mailname>
<name/>
<autoresponder>
<enabled/>
<subject/>
<text/>
<end_date/>
</autoresponder>
</mailname>
</filter>
</set>
</update>
</mail>
</packet>
';
Then, load it into a DOMDocument object and init a DOMXPath object:
$dom = new DomDocument();
$dom->loadXML( $xml );
$xpath = new DOMXPath( $dom );
Then, find each node that you want to change and set/update its node value:
$nodes = $xpath->query( 'mail/update/set/filter/site-id' );
$nodes->item(0)->nodeValue = $site_id;
$nodes = $xpath->query( 'mail/update/set/filter/mailname/name' );
$nodes->item(0)->nodeValue = $name;
For the <autoresponder> children, you can perform a loop through each child, using * at the end of your search pattern:
$nodes = $xpath->query( 'mail/update/set/filter/mailname/autoresponder/*' );
foreach( $nodes as $node )
{
if( 'enabled' == $node->nodeName )
{
$node->nodeValue = 'true';
}
elseif( 'subject' == $node->nodeName )
{
$node->nodeValue = $subject;
}
elseif( 'text' == $node->nodeName )
{
$cdata = $dom->createCDATASection( $mail_body );
$node->appendChild( $cdata );
}
elseif( 'end_date' == $node->nodeName )
{
$node->nodeValue = $date;
}
}
Note the different syntax adopted for mail body: I use a CDATA node here: if your XML doesn't allow CDATA, replace it with standard ->nodeValue syntax. Or — instead — you can have to use CDATA method for all the nodes.
When the XML is ready, you can echo it by:
echo $dom->saveXML();
DOMXPath allow to perform complex searches in the XML tree: it's not mandatory in your case, because you start from a short, empty, unambiguous template. I use it for demonstration purpose, but you can replace a line like this:
$nodes = $xpath->query( 'mail/update/set/filter/site-id' );
with:
$nodes = $dom->getElementsByTagName( 'site-id' );
and it will work fine.
Read more about DOMDocument
Read more about DOMXPath

Fetch value from XML Object?

I need to fetch the value of "joinmeetingurl" element from the xml. I tried in following way. But it returns nothing. Please help me to fetch the value.
<?php
$xml = '<serv:message xmlns:serv="http://www.webex.com/schemas/2002/06/service"
xmlns:com="http://www.webex.com/schemas/2002/06/common"
xmlns:meet="http://www.webex.com/schemas/2002/06/service/meeting"
xmlns:att="http://www.webex.com/schemas/2002/06/service/attendee">
<serv:header>
<serv:response>
<serv:result>SUCCESS</serv:result>
<serv:gsbstatus>PRIMARY</serv:gsbstatus>
</serv:response>
</serv:header>
<serv:body>
<serv:bodycontent xsi:type="meet:getjoinurlMeetingResponse"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<meet:joinmeetingurl>meetingURL</meet:joinmeetingurl>
</serv:bodycontent>
</serv:body>
</serv:message>';
$xml = simplexml_load_string($xml);
$items = $xml->registerXPathNamespace('meet','http://www.webex.com/schemas/2002/06/service/meeting');
$resp = $xml->xpath('//meet:joinmeetingurl');
?>
Im getting empty value for $resp always.
Your XPath should've worked, and you can cast the element to string to get the value, for example :
$xml = <<<XML
<serv:message xmlns:serv="http://www.webex.com/schemas/2002/06/service"
xmlns:com="http://www.webex.com/schemas/2002/06/common"
xmlns:meet="http://www.webex.com/schemas/2002/06/service/meeting"
xmlns:att="http://www.webex.com/schemas/2002/06/service/attendee">
<serv:header>
<serv:response>
<serv:result>SUCCESS</serv:result>
<serv:gsbstatus>PRIMARY</serv:gsbstatus>
</serv:response>
</serv:header>
<serv:body>
<serv:bodycontent xsi:type="meet:getjoinurlMeetingResponse"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<meet:joinmeetingurl>meetingURL</meet:joinmeetingurl>
</serv:bodycontent>
</serv:body>
</serv:message>
XML;
$xml = simplexml_load_string($xml);
$xml->registerXPathNamespace('meet','http://www.webex.com/schemas/2002/06/service/meeting');
$resp = $xml->xpath('//meet:joinmeetingurl');
echo (string)$resp[0];
eval.in demo
output :
meetingURL
Can offer no guidance with simplexml and associated functions but it seems quite simple with standard DOMDocument and DOMXPath
$dom=new DOMDocument;
$dom->loadXML( $xml );
$xpath=new DOMXPath( $dom );
$col=$xpath->query('//meet:joinmeetingurl');
foreach( $col as $node )echo $node->nodeValue;
$dom=null;

Change tag attribute value with PHP DOMDocument

I want to change the value of the attribute of a tag with PHP DOMDocument.
For example, say we have this line of HTML:
Click here
I load the above code in PHP as follows:
$dom = new domDocument;
$dom->loadHTML('Click here');
I want to change the "href" value to "http://google.com/" using the DOMDocument extension of PHP. Is this possible?
Thanks for the help as always!
$dom = new DOMDocument();
$dom->loadHTML('Click here');
foreach ($dom->getElementsByTagName('a') as $item) {
$item->setAttribute('href', 'http://google.com/');
echo $dom->saveHTML();
exit;
}
$dom = new domDocument;
$dom->loadHTML('Click here');
$elements = $dom->getElementsByTagName( 'a' );
if($elements instanceof DOMNodeList)
foreach($elements as $domElement)
$domElement->setAttribute('href', 'http://www.google.com/');

How to extract a node attribute from XML using PHP's DOM Parser

I've never really used the DOM parser before and now I have a question.
How would I go about extracting the URL from this markup:
<files>
<file path="http://www.thesite.com/download/eysjkss.zip" title="File Name" />
</files>
Using simpleXML:
$xml = new SimpleXMLElement($xmlstr);
echo $xml->file['path']."\n";
Output:
http://www.thesite.com/download/eysjkss.zip
To do it with DOM you do
$dom = new DOMDocument;
$dom->load( 'file.xml' );
foreach( $dom->getElementsByTagName( 'file' ) as $file ) {
echo $file->getAttribute( 'path' );
}
You can also do it with XPath:
$dom = new DOMDocument;
$dom->load( 'file.xml' );
$xPath = new DOMXPath( $dom );
foreach( $xPath->evaluate( '/files/file/#path' ) as $path ) {
echo $path->nodeValue;
}
Or as a string value directly:
$dom = new DOMDocument;
$dom->load( 'file.xml' );
$xPath = new DOMXPath( $dom );
echo $xPath->evaluate( 'string(/files/file/#path)' );
You can fetch individual nodes also by traversing the DOM manually
$dom = new DOMDocument;
$dom->preserveWhiteSpace = FALSE;
$dom->load( 'file.xml' );
echo $dom->documentElement->firstChild->getAttribute( 'path' );
Marking this CW, because this has been answered before multiple times (just with different elements), including me, but I am too lazy to find the duplicate.
you can use PHP Simple HTML DOM Parser,this is a php library。http://simplehtmldom.sourceforge.net/

How do I remove a specific node using its attribute value in PHP XML Dom?

My question is best phrase as:
Remove a child with a specific attribute, in SimpleXML for PHP
except I'm not using simpleXML.
I'm new to XML for PHP so I may not be doing the best way
I have a xml created using the $dom->save($xml) for each individual user. (not placing all in one xml due to undisclosed reasons)
It gives me that xml declaration <?xml version="1.0"?> (no idea how to make it to others, but that's not the point, hopefully)
<?xml version="1.0"?>
<details>
<person>name</person>
<data1>some data</data1>
<data2>some data</data2>
<data3>some data</data3>
<category id="0">
<categoryName>Cat 1</categoryName>
<categorydata1>some data</categorydata1>
</category>
<category id="1">
<categoryName>Cat 2</categoryName>
<categorydata1>some data</categorydata1>
<categorydata2>some data</categorydata2>
<categorydata3>some data</categorydata3>
<categorydata4>some data</categorydata4>
</category>
</details>
And I want to remove a category that has a specific attribute named id with the DOM class in php when i run a function activated from using a remove button.
the following is the debug of the function im trying to get to work. Can i know what I'm doing wrong?
function CatRemove($myXML){
$xmlDoc = new DOMDocument();
$xmlDoc->load( $myXML );
$categoryArray = array();
$main = $xmlDoc->getElementsByTagName( "details" )->item(0);
$mainElement = $xmlDoc->getElementsByTagName( "details" );
foreach($mainElement as $details){
$currentCategory = $details->getElementsByTagName( "category" );
foreach($currentCategory as $category){
$categoryID = $category->getAttribute('id');
array_push($categoryArray, $categoryID);
if($categoryID == $_POST['categorytoremoveValue']) {
return $categoryArray;
}
}
}
$xmlDoc->save( $myXML );
}
Well the above prints me an array of [0]->0 all the time when i slot the return outside the if.
is there a better way? I've tried using getElementbyId as well but I've no idea how to work that.
I would prefer not to use an attribute though if that would make things easier.
Ok, let’s try this complete example of use:
function CatRemove($myXML, $id) {
$xmlDoc = new DOMDocument();
$xmlDoc->load($myXML);
$xpath = new DOMXpath($xmlDoc);
$nodeList = $xpath->query('//category[#id="'.(int)$id.'"]');
if ($nodeList->length) {
$node = $nodeList->item(0);
$node->parentNode->removeChild($node);
}
$xmlDoc->save($myXML);
}
// test data
$xml = <<<XML
<?xml version="1.0"?>
<details>
<person>name</person>
<data1>some data</data1>
<data2>some data</data2>
<data3>some data</data3>
<category id="0">
<categoryName>Cat 1</categoryName>
<categorydata1>some data</categorydata1>
</category>
<category id="1">
<categoryName>Cat 2</categoryName>
<categorydata1>some data</categorydata1>
<categorydata2>some data</categorydata2>
<categorydata3>some data</categorydata3>
<categorydata4>some data</categorydata4>
</category>
</details>
XML;
// write test data into file
file_put_contents('untitled.xml', $xml);
// remove category node with the id=1
CatRemove('untitled.xml', 1);
// dump file content
echo '<pre>', htmlspecialchars(file_get_contents('untitled.xml')), '</pre>';
So you want to remove the category node with a specific id?
$node = $xmlDoc->getElementById("12345");
if ($node) {
$node->parentNode->removeChild($node);
}
You could also use XPath to get the node, for example:
$xpath = new DOMXpath($xmlDoc);
$nodeList = $xpath->query('//category[#id="12345"]');
if ($nodeList->length) {
$node = $nodeList->item(0);
$node->parentNode->removeChild($node);
}
I haven’t tested it but it should work.
Can you try with this modified version:
function CatRemove($myXML, $id){
$doc = new DOMDocument();
$doc->loadXML($myXML);
$xpath = new DOMXpath($doc);
$nodeList = $xpath->query("//category[#id='$id']");
foreach ($nodeList as $element) {
$element->parentNode->removeChild($element);
}
echo htmlentities($doc->saveXML());
}
It's working for me. Just adapt it to your needs. It's not intended to use as-is, but just a proof of concept.
You also have to remove the xml declaration from the string.
the above funciton modified to remove an email from a mailing list
function CatRemove($myXML, $id) {
$xmlDoc = new DOMDocument();
$xmlDoc->load($myXML);
$xpath = new DOMXpath($xmlDoc);
$nodeList = $xpath->query('//subscriber[#email="'.$id.'"]');
if ($nodeList->length) {
$node = $nodeList->item(0);
$node->parentNode->removeChild($node);
}
$xmlDoc->save($myXML);
}
$xml = 'list.xml';
$to = $_POST['email'];//user already submitted they email using a form
CatRemove($xml,$to);

Categories