XML cleaning [xml:space] using PHP

XML cleaning [xml:space] using PHP - php

How to remove all nodes like xml:space="preserve" from XML, to get clean result
old XML
<table>
<actor xml:space="preserve"> </actor>
</table>
I want result be like this
<table>
<actor> </actor>
</table>
EDIT
this the php code
function produce_XML_object_tree($raw_XML) {
libxml_use_internal_errors(true);
try {
$xmlTree = new SimpleXMLElement($raw_XML);
} catch (Exception $e) {
// Something went wrong.
$error_message = 'SimpleXMLElement threw an exception.';
foreach(libxml_get_errors() as $error_line) {
$error_message .= "\t" . $error_line->message;
}
trigger_error($error_message);
return false;
}
return $xmlTree;
}
$xml_feed_url = "www.xmlpage.com/web.xml";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $xml_feed_url);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$xml = curl_exec($ch);
curl_close($ch);
$cont = produce_XML_object_tree($xml);
echo json_encode($cont);

Use an xpath expression to locate the attributes and remove them.
Example:
//$xml = your xml string
$dom = new DOMDocument();
$dom->loadXML($xml);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//#xml:space') as $attr) {
$attr->ownerElement->removeAttributeNode($attr);
}
echo $dom->saveXML();
Output:
<?xml version="1.0"?>
<table>
<actor> </actor>
</table>
This will remove any xml:space attributes. If you want to target only those xml:space attributes that have a value of "preserve", change the query to //#xml:space[.="preserve"].

$string = str_ireplace('xml:space="preserve"','',$string);
function produce_XML_object_tree($raw_XML) {
libxml_use_internal_errors(true);
try {
$xmlTree = new SimpleXMLElement($raw_XML);
} catch (Exception $e) {
// Something went wrong.
$error_message = 'SimpleXMLElement threw an exception.';
foreach(libxml_get_errors() as $error_line) {
$error_message .= "\t" . $error_line->message;
}
trigger_error($error_message);
return false;
}
return str_ireplace('xml:space="preserve"','',$xmlTree;);
}
$xml_feed_url = "www.xmlpage.com/web.xml";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $xml_feed_url);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$xml = curl_exec($ch);
curl_close($ch);
$cont = produce_XML_object_tree($xml);
echo json_encode($cont);

As long as you're concerned to remove all attribute-nodes that are with a namespace prefix, you can do so by selecting them via xpath and remove them from the XML document.
The xpath query for all attributes with a prefix can be obtained by comparing the name (that is prefix and local-name) with the local-name (that is the local-name only). If it differs you've got a match:
//#*[name(.) != local-name(.)]
Querying specific nodes with SimpleXML and XPath to delete them has been outlined earlier as an answer to the question Remove a child with a specific attribute, in SimpleXML for PHP (Nov 2008) and is pretty straight-forward by using the SimpleXML-Self-Reference:
$xml = simplexml_load_string($buffer);
foreach ($xml->xpath('//#*[name(.) != local-name(.)]') as $attr) {
unset($attr[0]);
}
The self-reference here is to remove the attribute $attr via $attr[0].
Full Example:
$buffer = <<<XML
<table>
<actor class="foo" xml:space="preserve"> </actor>
</table>
XML;
$xml = simplexml_load_string($buffer);
foreach ($xml->xpath('//#*[name(.) != local-name(.)]') as $attr) {
unset($attr[0]);
}
echo $xml->asXML();
Example Output:
<?xml version="1.0"?>
<table>
<actor class="foo"> </actor>
</table>

Related

Insert XML result from curl to database

I'm trying to insert the xml result from curl to database and I'm facing problems the data does not enter database. I have tried both mysqli method and PDO as well but the data does not enter database. Can anyone help me out with this.Here's my code:-
MYSQLI
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url."?".$queryString);
curl_setopt($ch, CURLOPT_POST,1);
curl_setopt($ch, CURLOPT_POSTFIELDS,$parameters);
curl_setopt( $ch, CURLOPT_HTTPHEADER, array('Content-Type: text/xml'));
// Save response to the variable $data
curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$data = curl_exec($ch);
// Close Curl connection
if ($data === False) {
echo "cURL ERROR:".curl_error($ch);
}
curl_close($ch);
echo $url."?".$queryString;
echo '<pre>';
echo htmlspecialchars(print_r($data, true));
echo '</pre>';
$xml= simplexml_load_string($data);
//print_r($xml) ;
con = mysqli_connect("localhost","root","");
mysqli_select_db($con,"test") or die(mysqli_connect_error());
foreach($xml -> Products as $product) {
$product->Product;
{
$name = mysqli_real_escape_string($product->name);
$short_description = mysqli_real_escape_string($product-
>short_description);
$brand = mysqli_real_escape_string($product->brand);
$model = mysqli_real_escape_string($product->model);
$warranty_type =mysqli_real_escape_string($product->warranty_type);
$sql = "Insert into test (name, short_description,
brand,model,warranty_type ) " . "Values
('$name','$short_description','$brand','$model','$warranty_type')";
$result = mysqli_query($con, $sql);
error_log($sql);
if (!$result) {
echo 'Error:' . mysqli_error($con);
} else {
echo 'Success';
}
}
}
**PDO**
$db= new PDO('mysql:host=localhost;dbname=test','root','');
$xmldoc= new DOMDocument();
$xmldoc -> load($data);
$xmldata = $xmldoc ->getElementsByTagName('Product');
$xmlcount = $xmldata ->length;
for($i=0; $i< $xmlcount; $i++){
$name = $xmldata -> item($i) ->getElementsByTagName('name')->item(0)->
childNodes->item(0)->nodeValue->item(0);
$short_description = $xmldata -> item($i) -
>getElementsByTagName('short_description')->item(0)-> childNodes->item(0)-
>nodeValue->item(0);
$brand = $xmldata -> item($i) ->getElementsByTagName('brand')->item(0)->
childNodes->item(0)->nodeValue->item(0);
$model = $xmldata -> item($i) ->getElementsByTagName('model')->item(0)->
childNodes->item(0)->nodeValue->item(0);
$warranty_type = $xmldata -> item($i) -
>getElementsByTagName('warranty_type')->item(0)-> childNodes->item(0)-
>nodeValue->item(0);
$stmt = $db->prepare("insert into test values(?,?,?,?,?)");
$stmt -> bindParam(1,$name);
$stmt -> bindParam(2,$short_description);
$stmt -> bindParam(3,$brand);
$stmt -> bindParam(4,$model);
$stmt -> bindParam(5,$warranty_type);
$stmt ->execute();
}
The PDO method shows that getElementsByTagName not found in DOMNode. I am using PHPStorm 2016.2.1 as my IDE.
This is the response I get from curl in xml format
<SuccessResponse>
<Head>
<RequestId>0a1530d415089888818591159e</RequestId>
<RequestAction>GetProducts</RequestAction>
<ResponseType>Product</ResponseType>
<Timestamp>2017-10-26T03:34:40+0000</Timestamp>
<isES>true</isES>
</Head>
<Body>
<TotalProducts>589</TotalProducts>
<Products>
<Product>
<PrimaryCategory>10001164</PrimaryCategory>
<SPUId/>
<Attributes>
<name>Big Rabbit Furry Key Chain (Light Blue)</name>
<short_description><ul> <li><span style="font-
family:arial,sans-serif; font-size:10pt">A perfect gift for yourself
or someone special</span></li> <li><span style="font-
family:arial,sans-serif; font-size:10pt">Good accessory for handbag,
tote backpack cellphones, keychain or car</span></li>
<li><span style="color:rgb(17, 17, 17); font-family:arial,sans-
serif; font-size:10pt">Super cute and soft</span></li>
<li><span style="color:rgb(17, 17, 17); font-family:arial,sans-
serif; font-size:10pt">Ideal companion of your keys, bags, cellphones
car or other wonderful objects</span></li> <li><span
style="color:black; font-family:arial,sans-serif; font-
size:10pt">Size(cm): 10 x 6 x 18</span></li> </ul>
</short_description>
<video/>
<services/>
<brand>Not Specified</brand>
<model>bulu15-lightblue</model>
<color_family/>
<Hazmat/>
<warranty_type>No Warranty</warranty_type>
<warranty/>
<product_warranty/>
<name_ms>Big Rabbit Furry Key Chain (Light Blue)</name_ms>
<product_warranty_en/>
<description_ms/>
<external_url/>
</Attributes>
<Skus>
<Sku>
<Status>active</Status>
<quantity>100</quantity>
<tax_class>default</tax_class>
<_compatible_variation_>...</_compatible_variation_>
<SellerSku>test112</SellerSku>
<ShopSku/>
<package_content><p>1 x Big Rabbit Furry Key Chain (Light Blue)</p></package_content>
<Url/>
<package_width>6.00</package_width>
<package_height>18.00</package_height>
<special_price>0.0</special_price>
<price>0.0</price>
<package_length>10.00</package_length>
<package_weight>0.10</package_weight>
<Available>100</Available>
<Images>
<Image>https://my-live.slatic.net/original/601531319cd1dca721c70637e96cc403.jpg</Image>
<Image>https://my-live.slatic.net/original/7e6ad47bbfa1de474fe36c20b06c35f7.jpg</Image>
<Image>https://my-live.slatic.net/original/02bc2102d6864a867c25328724f0c7c7.jpg</Image>
<Image/>
<Image/>
<Image/>
<Image/>
<Image/>
</Images>
</Sku>
</Skus>
</Product>

There are a few issues with your PDO version of the code.
The first is that your using $xmldoc -> load($data);, load() is expecting a file name and your passing in the actual data. So change this to loadXML().
Then when your accessing the elements of the data, your using
->nodeValue->item(0). nodeValue gives you the contents of the node, so you need to stop at this point. In the code below, I've just changed it for the name, but you should see how this needs to be changed through your code.
$xmldoc= new DOMDocument();
$xmldoc -> loadXML($data);
$xmldata = $xmldoc ->getElementsByTagName('Product');
$xmlcount = $xmldata ->length;
for($i=0; $i< $xmlcount; $i++){
$name = $xmldata -> item($i) ->getElementsByTagName('name')->item(0)->
childNodes->item(0)->nodeValue;
echo $name;
}

XPath not working as expected [php]

I often use XPath with php for parsing pages,
but this time i don't understand the behavior with this specific page with the following code, I hope you can help me on this.
Code that I use to parse this page http://www.jeuxvideo.com/recherche.php?m=9&t=10&q=Call+of+duty :
<?php
$What = 'Call of duty';
$What = urlencode($What);
$Query = 'http://www.jeuxvideo.com/recherche.php?m=9&t=10&q='.$What;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $Query);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 20);
$response = curl_exec($ch);
curl_close($ch);
/*
$search = array("<article", "</article>");
$replace = array("<div", "</div>");
$response = str_replace($search, $replace, $response);
*/
$dom = new DOMDocument();
#$dom->loadHTML($response);
$xpath = new DOMXPath($dom);
$elements = $xpath->query('//article[#class="recherche-aphabetique-item"]/a');
//$elements = $xpath->query('//div[#class="recherche-aphabetique-item"]/a');
count($elements);
var_dump($elements);
?>
fiddle to test it :
http://phpfiddle.org/main/code/r9n6-d0j0
I just want to get all "a" nodes that are in "article" nodes with the class "recherche-aphabetique-item".
But it returns me nothing :/.
As you can see in the commented code I've tried to replace html5 elements articles to div, but I got the same behavior.
Thanks four your help.

I'm seeing lots of DOMDocument::loadHTML(): Unexpected end tag errors - you should use the internal error handling functions of libxml to help fix this perhaps. Also, when I looked at the DOM of the remote site I could not see any a tags that would match the XPath query, only span tags
<?php
$What = 'Call of duty';
$What = urlencode($What);
$Query = 'http://www.jeuxvideo.com/recherche.php?m=9&t=10&q='.$What;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $Query);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 20);
$response = curl_exec($ch);
curl_close($ch);
/* try to suppress errors using libxml */
libxml_use_internal_errors( true );
$dom = new DOMDocument();
/* additional flags for DOMDocument */
$dom->validateOnParse=false;
$dom->standalone=true;
$dom->strictErrorChecking=false;
$dom->recover=true;
$dom->formatOutput=false;
#$dom->loadHTML($response);
libxml_clear_errors();
$xpath = new DOMXPath($dom);
$elements = $xpath->query('//article[#class="recherche-aphabetique-item"]/span');
count( $elements );
var_dump( $elements );
?>
output
object(DOMNodeList)#97 (1) { ["length"]=> int(94) }
You could further simplify this perhaps by trying:
$What = 'Call of duty';
$What = urlencode($What);
$Query = 'http://www.jeuxvideo.com/recherche.php?m=9&t=10&q='.$What;
libxml_use_internal_errors( true );
$dom = new DOMDocument();
$dom->validateOnParse=false;
$dom->standalone=true;
$dom->strictErrorChecking=false;
$dom->recover=true;
$dom->formatOutput=false;
#$dom->loadHTMLFile($Query);
libxml_clear_errors();
$xpath = new DOMXPath($dom);
$elements = $xpath->query('//article[#class="recherche-aphabetique-item"]/span');
count($elements);
foreach( $elements as $node )echo $node->nodeValue,'<br />';

DOM parsing with php is returning zero child elements

I want to extract url data just like as facebook. For that I am using php DOMDocument.While retrieving the DOM content i.e while retrieving "title" the DOMDocument is returning 0 elements. Here is my code
<?php
header("Content-Type: text/xml");
echo '<?xml version="1.0" encoding="UTF-8" ?>';
//$url = $_REQUEST["url"];
$url = "http://business.tutsplus.com/articles/how-to-set-up-your-first-magento-store--fsw-43137";
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_HEADER,0);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_FOLLOWLOCATION,1);
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER,false);
$data = curl_exec($ch);
curl_close($ch);
$dom = new DOMDocument();
#$dom->loadHTML($data);
$title = $dom->getElementsByTagName("title");
//$title = $dom->find("title");
echo "<urlData>";
echo "<title>";
echo $title->length;
echo "</title>";
echo "</urlData>";
?>
Here $title->length is returning 0 elements. What is the problem?

php process xml into array

I'm tring to process xml (website) into php array.
I have tried the following code which works for everyting in results but i need to get the totalpage which i'm not able to see how I can do this.
function get_content($url)
/// basically opens the page and stores it as a variable. Buggered if I know how it works!
{
$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_HEADER, 0);
ob_start();
curl_exec ($ch);
curl_close ($ch);
$string = ob_get_contents();
ob_end_clean();
return $string;
$string = NULL;
$ch = NULL;
$url = NULL;
}
$url = "url";
$content = get_content($url);
$content_x = explode("<result>", $content);
foreach ($content_x as $item)
{
$p1 = strpos($item, '<title>');
$p2 = strpos($item, '</title>');
$l1 = $p2 - $p1;
echo '<br>'.$title = substr($item, $p1, $l1);
}
xml site feed
<?xml version="1.0" encoding="UTF-8"?>
<response version="2">
<totalpage>1005</totalpage>
<results>
<result>
<title>test</title>
<title2>test2</title2>
<title3>test3</title3>
<result>
<result>
<title>test</title>
<title2>test2</title2>
<title3>test3</title3>
<result>
<result>
<title>test</title>
<title2>test2</title2>
<title3>test3</title3>
<result>
........so on
<results>
</response>
I need to get totalpage and everyting in results
How do get totalpage and is they better way to process the results

You absolutely should not be using string manipulation to try to parse XML. There are any number of PHP libraries that can do this. I might recommend SimpleXML.
Usage would be:
$xml = simplexml_load_string($content);
$totalpage = $xml->response->totalpage;

Selecting a content type from youtube API

I have an xml feed from the youtube API. So far I have successfully lifted the title,thumbnail, ratings etc. However I am struggling to parse the content url from the following feed.
<feed xmlns='http://www.w3.org/2005/Atom' xmlns:media='http://search.yahoo.com/mrss/'>
<entry>
<media:group>
<media:content url='http://www.youtube.com/v/ZTUVgYoeN_b?f=gdata_standard...' type='application/x-shockwave-flash' medium='video' isDefault='true' expression='full' duration='215' yt:format='5'/>
<media:content url='rtsp://rtsp2.youtube.com/ChoLENy73bIAEQ1kgGDA==/0/0/0/video.3gp' type='video/3gpp' medium='video' expression='full' duration='215' yt:format='1'/>
<media:content url='rtsp://rtsp2.youtube.com/ChoLENy73bIDRQ1kgGDA==/0/0/0/video.3gp' type='video/3gpp' medium='video' expression='full' duration='215' yt:format='6'/>
</media:group>
</entry>
</feed>
Below is my code so far. I believe this has worked for other items because I was not selecting from multiple nodes i.e. there is only one title, thumbnail url etc...
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $xml_feed_url);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$xml = curl_exec($ch);
curl_close($ch);
$feed = produce_XML_object_tree($xml);
$entry = $feed->entry;
$media = $entry->children('http://search.yahoo.com/mrss/');
$urla=$media->content->attributes();
$url=$urla["url"];
function produce_XML_object_tree($raw_XML)
{
libxml_use_internal_errors(true);
try
{
$xmlTree = new SimpleXMLElement($raw_XML);
}
catch (Exception $e)
{
// Something went wrong.
$error_message = 'SimpleXMLElement threw an exception.';
foreach(libxml_get_errors() as $error_line)
{
$error_message .= "\t" . $error_line->message;
}
trigger_error($error_message);
return false;
}
return $xmlTree;
}
How can I select one of these content types?

Solved
To select one of the content types I just had to use a simple index. Hopefully this is a lesson I only need to learn once:
$urla=$media->group->content[0]->attributes();
$url=$urla["url"];

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

XML cleaning [xml:space] using PHP - php

Related

Insert XML result from curl to database

XPath not working as expected [php]

DOM parsing with php is returning zero child elements

php process xml into array

Selecting a content type from youtube API

Categories

Resources