Remove multiple empty nodes with SimpleXML - php

I want to delete all the empty nodes in my XML document using SimpleXML
Here is my code :
$xs = file_get_contents('liens.xml')or die("Fichier XML non chargé");
$doc_xml = new SimpleXMLElement($xs);
foreach($doc_xml->xpath('//*[not(text())]') as $torm)
unset($torm);
$doc_xml->asXML("liens.xml");
I saw with a print_r() that XPath is grabbing something, but nothing is removed from my XML file.

$file = 'liens.xml';
$xpath = '//*[not(text())]';
if (!$xml = simplexml_load_file($file)) {
throw new Exception("Fichier XML non chargé");
}
foreach ($xml->xpath($xpath) as $remove) {
unset($remove[0]);
}
$xml->asXML($file);

I know this post is a bit old but in your foreach, $torm is replaced in every iteration. This means your unset($torm) is doing nothing to the original $doc_xml object.
Instead you will need to remove the element itself:
foreach($doc_xml->xpath('//*[not(text())]') as $torm)
unset($torm[0]);
###
by using a simplxmlelement-self-reference.

Related

Extract pattern from xml file using PHP?

I have a remote XML file. I need to read, find some values an save them in an array.
I've got load the file with (no problem with this):
$xml_external_path = 'http://example.com/my-file.xml';
$xml = file_get_contents($xml_external_path);
In this file there are many instances of:
<unico>4241</unico>
<unico>234</unico>
<unico>534534</unico>
<unico>2345334</unico>
I need to extract just the number of these strings and save them in a array. I guess I need to use a pattern like:
$pattern = '/<unico>(.*?)<\/unico>/';
But I'm not sure what to do next. Keep in mind that it is an .xml file.
Result should be a populated array like this:
$my_array = array (4241, 234, 534534,2345334);
You can better use XPath to read through an XML file. XPath is a variant of DOMDocument focused on reading and editing XML files. You can query an XPath variable using patterns, which is based on the simple Unix path syntax. So // means anywhere and ./ means relative to selected node. XPath->query() will return a DOMNodelist with all the nodes according to the pattern. The following code will do what you want:
$xmlFile = "
<unico>4241</unico>
<unico>234</unico>
<unico>534534</unico>
<unico>2345334</unico>";
$xmlDoc = new DOMDocument();
$xmlDoc->loadXML($xmlFile);
$xpath = new DOMXPath($xmlDoc);
// This code returns a DOMNodeList of all nodes with the unico tags in the file.
$unicos = $xpath->query("//unico");
//This returns an integer of how many nodes were found that matched the pattern
echo $unicos->length;
You can find more info on XPath and its syntax here: XPath on Wikipedia#syntax
DOMNodeList implements Traversable, so you can use foreach() to traverse it. If you really want a flat array you can simply convert is using simple code like in question #15807314:
$unicosArr = array();
foreach($unicos as $node){
$unicosArr[] = $node->nodeValue;
}
Using preg_match_all:
<?php
$xml = '<unico>4241</unico>
<unico>234</unico>
<unico>534534</unico>
<unico>2345334</unico>';
$pattern = '/<unico>(.*?)<\/unico>/';
preg_match_all($pattern,$xml,$result);
print_r($result[0]);
You could try this, it basically just loops through each line of the file and finds whatever is between the XML <unico> tags.
<?php
$file = "./your.xml";
$pattern = '/<unico>(.*?)<\/unico>/';
$allVars = array();
$currentFile = fopen($file, "r");
if ($currentFile) {
// Read through file
while (!feof($currentFile)) {
$m_sLine = fgets($currentFile);
// Check for sitename validity
if (preg_match($pattern, $m_sLine) == true) {
$curVar = explode("<unico>", $m_sLine);
$curVar = explode("</unico>", $curVar[1]);
$allVars[] = $curVar[0];
}
}
}
fclose($currentFile);
print_r($allVars);
Is this sort of what you want? :)

PHP return value after XML exploration

I got a PHP array with a lot of XML users-file URL :
$tab_users[0]=john.xml
$tab_users[1]=chris.xml
$tab_users[n...]=phil.xml
For each user a <zoom> tag is filled or not, depending if user filled it up or not:
john.xml = <zoom>Some content here</zoom>
chris.xml = <zoom/>
phil.xml = <zoom/>
I'm trying to explore the users datas and display the first filled <zoom> tag, but randomized: each time you reload the page the <div id="zoom"> content is different.
$rand=rand(0,$n); // $n is the number of users
$datas_zoom=zoom($n,$rand);
My PHP function
function zoom($n,$rand) {
global $tab_users;
$datas_user=new SimpleXMLElement($tab_users[$rand],null,true);
$tag=$datas_user->xpath('/user');
//if zoom found
if($tag[0]->zoom !='') {
$txt_zoom=$tag[0]->zoom;
}
... some other taff here
// no "zoom" value found
if ($txt_zoom =='') {
echo 'RAND='.$rand.' XML='.$tab_users[$rand].'<br />';
$datas_zoom=zoom($r,$n,$rand); } // random zoom fct again and again till...
}
else {
echo 'ZOOM='.$txt_zoom.'<br />';
return $txt_zoom; // we got it!
}
}
echo '<br />Return='.$datas_zoom;
The prob is: when by chance the first XML explored contains a "zoom" information the function returns it, but if not nothing returns... An exemple of results when the first one is by chance the good one:
// for RAND=0, XML=john.xml
ZOOM=Anything here
Return=Some content here // we're lucky
Unlucky:
RAND=1 XML=chris.xml
RAND=2 XML=phil.xml
// the for RAND=0 and XML=john.xml
ZOOM=Anything here
// content founded but Return is empty
Return=
What's wrong?
I suggest importing the values into a database table, generating a single local file or something like that. So that you don't have to open and parse all the XML files for each request.
Reading multiple files is a lot slower then reading a single file. And using a database even the random logic can be moved to SQL.
You're are currently using SimpleXML, but fetching a single value from an XML document is actually easier with DOM. SimpleXMLElement::xpath() only supports Xpath expression that return a node list, but DOMXpath::evaluate() can return the scalar value directly:
$document = new DOMDocument();
$document->load($xmlFile);
$xpath = new DOMXpath($document);
$zoomValue = $xpath->evaluate('string(//zoom[1])');
//zoom[1] will fetch the first zoom element node in a node list. Casting the list into a string will return the text content of the first node or an empty string if the list was empty (no node found).
For the sake of this example assume that you generated an XML like this
<zooms>
<zoom user="u1">z1</zoom>
<zoom user="u2">z2</zoom>
</zooms>
In this case you can use Xpath to fetch all zoom nodes and get a random node from the list.
$document = new DOMDocument();
$document->loadXml($xml);
$xpath = new DOMXpath($document);
$zooms = $xpath->evaluate('//zoom');
$zoom = $zooms->item(mt_rand(0, $zooms->length - 1));
var_dump(
[
'user' => $zoom->getAttribute('user'),
'zoom' => $zoom->textContent
]
);
Your main issue is that you are not returning any value when there is no zoom found.
$datas_zoom=zoom($r,$n,$rand); // no return keyword here!
When you're using recursion, you usually want to "chain" return values on and on, till you find the one you need. $datas_zoom is not a global variable and it will not "leak out" outside of your function. Please read the php's variable scope documentation for more info.
Then again, you're calling zoom function with three arguments ($r,$n,$rand) while the function can only handle two ($n and $rand). Also the $r is undiefined, $n is not used at all and you are most likely trying to use the same $rand value again and again, which obviously cannot work.
Also note that there are too many closing braces in your code.
I think the best approach for your problem will be to shuffle the array and then to use it like FIFO without recursion (which should be slightly faster):
function zoom($tab_users) {
// shuffle an array once
shuffle($tab_users);
// init variable
$txt_zoom = null;
// repeat until zoom is found or there
// are no more elements in array
do {
$rand = array_pop($tab_users);
$datas_user = new SimpleXMLElement($rand, null, true);
$tag=$datas_user->xpath('/user');
//if zoom found
if($tag[0]->zoom !='') {
$txt_zoom=$tag[0]->zoom;
}
} while(!$txt_zoom && !empty($tab_users));
return $txt_zoom;
}
$datas_zoom = zoom($tab_users); // your zoom is here!
Please read more about php scopes, php functions and recursion.
There's no reason for recursion. A simple loop would do.
$datas_user=new SimpleXMLElement($tab_users[$rand],null,true);
$tag=$datas_user->xpath('/user');
$max = $tag->length;
while(true) {
$test_index = rand(0, $max);
if ($tag[$test_index]->zoom != "") {
break;
}
}
Of course, you might want to add a bit more logic to handle the case where NO zooms have text set, in which case the above would be an infinite loop.

PHP: How to change part of XML using DomElement

I am trying to make a function that changes part of an XML using XPath. I used part of someone else post:
/*********************************************************************
Function to replace part of an XML
**********************************************************************/
function replacePartofXML($element, $methodName, $methodValue, $xml, $newPartofXML)
{
$xpathstring = "//" . $element . "[#$methodName = \"$methodValue\"]";
$xml->xpath($xpathstring);
//$domToChange = dom_import_simplexml($xml->xpath($xpathstring));
$domToChange = dom_import_simplexml($xml);
$domReplace = dom_import_simplexml($newPartofXML);
$nodeImport = $domToChange->ownerDocument->importNode($domReplace, TRUE);
$domToChange->parentNode->replaceChild($nodeImport, $domToChange);
return($xml);
}
What I want to do is return the appended XML. I can't use dom_import_simplexml($xml->node->node) as my XML has many repeating element (but they have different ID reason why I am trying to use xpath)
The commented line does not work either as xpath returns an array and dom_import_simplexml is cannot import arrays.
Thanks for you input
You can take the first element returned by xpath() in case you believe the target element is unique (no-element-returned checking omitted) :
$domToChange = dom_import_simplexml($xml->xpath($xpathstring)[0]);
or iterate through the return value of xpath() and replace one by one.

Remove tags with Simple HTML DOM parser [duplicate]

I would like to use Simple HTML DOM to remove all images in an article so I can easily create a small snippet of text for a news ticker but I haven't figured out how to remove elements with it.
Basically I would do
Get content as HTML string
Remove all image tags from content
Limit content to x words
Output.
Any help?
There is no dedicated methods for removing elements. You just find all the img elements and then do
$e->outertext = '';
when you only delete the outer text you delete the HTML content itself, but if you perform another find on the same elements it will appear in the result. the reason is that the simple HTML DOM object still has it's internal structure of the element, only without its actual content. what you need to do in order to really delete the element is simply reload the HTML as string to the same variable. this way the object will be recreated without the deleted content, and the simple HTML DOM object will be built without it.
here is an example function:
public function removeNode($selector)
{
foreach ($this->find($selector) as $node)
{
$node->outertext = '';
}
$this->load($this->save());
}
put this function inside the simple_html_dom class and you're good.
I think you have some difficulties because you forgot to save(dump the internal DOM tree back into string).
Try this:
$html = file_get_html("http://example.com");
foreach($html ->find('img') as $item) {
$item->outertext = '';
}
$html->save();
echo $html;
I could not figure out where to put the function so I just put the following directly in my code:
$html->load($html->save());
It basically locks changes made in the for loop back into the html per above.
The supposed solutions are quite expensive and practically unusable in a big loop or other kind of repetition.
I prefer to use "soft deletes":
foreach($html->find('somecondition'),$item){
if (somecheck) $item->setAttribute('softDelete', true); //<= set marker to check in further code
$item->outertext='';
foreach($foo as $bar){
if(!baz->getAttribute('softDelete'){
//do something
}
}
}
This is working for me:
foreach($html->find('element') as $element){
$element = NULL;
}
Adding new answer since removeNode is definitely a better way of removing it:
$html->removeNode('img');
This method probably was not available when accepted answer was marked. You do not need to loop the html to find each one, this will remove them.
Use outerhtml instead of outertext
<div id='your_div'>the contents of your div</div>
$your_div->outertext = '';
echo $your_div // echoes <div id='your_div'></div>
$your_div->outerhtml= '';
echo $your_div // echoes nothing
Try this:
$dom = new Dom();
$dom->loadStr($text);
foreach ($dom->find('element') as $element) {
$element->delete();
}
This works now:
$element->remove();
You can see the documentation for the method here.
Below I remove the HEADER and all SCRIPT nodes of the incoming url by using 2 different methods of the FIND() function. Remove the 2nd parameter to return an array of all matching nodes then just loop through the nodes.
$clean_html = file_get_html($url);
// Find and remove 1st instance of node.
$node = $clean_html->find('header', 0);
$node->remove();
// Find and remove all instances of Nde.
$nodes = $clean_html->find('script');
foreach($nodes as $node) {
$node->remove();
}

Simple HTML Dom: How to remove elements?

I would like to use Simple HTML DOM to remove all images in an article so I can easily create a small snippet of text for a news ticker but I haven't figured out how to remove elements with it.
Basically I would do
Get content as HTML string
Remove all image tags from content
Limit content to x words
Output.
Any help?
There is no dedicated methods for removing elements. You just find all the img elements and then do
$e->outertext = '';
when you only delete the outer text you delete the HTML content itself, but if you perform another find on the same elements it will appear in the result. the reason is that the simple HTML DOM object still has it's internal structure of the element, only without its actual content. what you need to do in order to really delete the element is simply reload the HTML as string to the same variable. this way the object will be recreated without the deleted content, and the simple HTML DOM object will be built without it.
here is an example function:
public function removeNode($selector)
{
foreach ($this->find($selector) as $node)
{
$node->outertext = '';
}
$this->load($this->save());
}
put this function inside the simple_html_dom class and you're good.
I think you have some difficulties because you forgot to save(dump the internal DOM tree back into string).
Try this:
$html = file_get_html("http://example.com");
foreach($html ->find('img') as $item) {
$item->outertext = '';
}
$html->save();
echo $html;
I could not figure out where to put the function so I just put the following directly in my code:
$html->load($html->save());
It basically locks changes made in the for loop back into the html per above.
The supposed solutions are quite expensive and practically unusable in a big loop or other kind of repetition.
I prefer to use "soft deletes":
foreach($html->find('somecondition'),$item){
if (somecheck) $item->setAttribute('softDelete', true); //<= set marker to check in further code
$item->outertext='';
foreach($foo as $bar){
if(!baz->getAttribute('softDelete'){
//do something
}
}
}
This is working for me:
foreach($html->find('element') as $element){
$element = NULL;
}
Adding new answer since removeNode is definitely a better way of removing it:
$html->removeNode('img');
This method probably was not available when accepted answer was marked. You do not need to loop the html to find each one, this will remove them.
Use outerhtml instead of outertext
<div id='your_div'>the contents of your div</div>
$your_div->outertext = '';
echo $your_div // echoes <div id='your_div'></div>
$your_div->outerhtml= '';
echo $your_div // echoes nothing
Try this:
$dom = new Dom();
$dom->loadStr($text);
foreach ($dom->find('element') as $element) {
$element->delete();
}
This works now:
$element->remove();
You can see the documentation for the method here.
Below I remove the HEADER and all SCRIPT nodes of the incoming url by using 2 different methods of the FIND() function. Remove the 2nd parameter to return an array of all matching nodes then just loop through the nodes.
$clean_html = file_get_html($url);
// Find and remove 1st instance of node.
$node = $clean_html->find('header', 0);
$node->remove();
// Find and remove all instances of Nde.
$nodes = $clean_html->find('script');
foreach($nodes as $node) {
$node->remove();
}

Categories