Store the values of nested DOMNodes in a PHP array - php

I have the following html structure:
<span class="1">
<span class="name">
</span>
<span class="books">
<span class="english">
</span>
<span class="english">
</span>
</span>
</span>
<span class="2">
<span class="name">
</span>
<span class="books">
<span class="english">
</span>
<span class="english">
</span>
</span>
</span>
...
I am using the following function to retrieve it:
$oDomObject = $oDomXpath->query("//span[number(#class)=number(#class)]");
How can I store the values in a PHP array keeping the nesting order?
foreach ($oDomObject as $oObject) {
..*SOMETHING*..
}
Thank you for your help!

You will want to build a recursive function that resembles the following.
WARNING: Not-tested and may require some tweaking. But this should put your head in the right place.
foreach ($oDomObject as $oObject) {
$myArray[] = getChildren($oObject);
}
function getChildren($nodeObj) {
retArray = array();
if($nodeObj->hasChildren()) {
$retArray[] = getChildren($nodeObj);
} else {
$retArray[] = $nodeObj->nodeValue;
}
return $retArray;
}
What it does: If it encounters a node without children, it appends the value to the array. If not, it appends an array of the children's values to the array. This occurs ad nauseam, and as deeply as you can wrap your head around.
Things to think about:
What do I want my array to look like when this finishes, because with certain levels of depth, this gets very ridiculous and very annoying to traverse.
Why am I appending to an array, which I am likely to loop through again, instead of handling the desired operation right now?

Related

Php simple html dom parser find string with any character

I have this html
<div class="price-box">
<p class="old-price">
<span class="price-label">This:</span>
<span class="price" id="old-price-326">
8,69 € </span>
</p>
<p class="special-price">
<span class="price-label">This is:</span>
<span class="price" id="product-price-326">
1,99 € </span> <span style="">/ 6.87 </span>
</p>
</div>
I'm need get "1,99 €", but the id 'product-price-326' is generating random numbers. How to find 'product-price-*'? I'm trying
foreach($preke->find('span[id="product-price-[0-9]"]') as $div)
and
foreach($preke->find('span[id="product-price-"]') as $div)
but it doesn't work.
As per my comment, here's what you need to do:
foreach($preke->find('span[id^="product-price-"]') as $div) {} // note the ^ before the =
^= means starts with.
I am not sure what $preke is, but if it's a DOM selector that supports proper class selectors you can use
$preke->find('span[id^="product-price"]')
or
$preke->find('span[id*="product-price"]')
The ^= tells it to look for elements that has an ID starting with "product-price" and the *= tells it to look for elements that has an ID that contains "product-price".
Try Like This Might Be Works
foreach($preke->find('span[id^="product-price-"]') as $div) { /* Code */ }
why not to get it using class?
echo $preke->find('.special-price', 0)->find('.price', 0)->plaintext;
this will get you 1,99 €

Remove the closing </ span> from the expression contains more than one

There is a very large piece of code that does not work out well when specific syntax html.
There is an expression:
<span class="*0">
<span class="*1">TEXT</span>
...
<span class="*2">TEXT</span>
</span>
There is a regular expression:
$mstr = '#<span class="0">(.*?)</span>#';
What is needed:
Cut the upper span (<span class = "* 0">) with the correct closing tag.
My regular cuts out the first in a row :(
Here is a solution. I don't know if it fits your needs, but it does the job. It simply looks for all the starting tags and closing tags, stores their substring positions and pairs them. Then it removes the tag with the class you need.
One note: if a tag is not propperly closed, this could fail. So I would suggest you build in some safety measures.
$start_pos=stripos($var,'<span class="*0">');
$len=strlen($var);
$str_len=strlen('<span class="*0">');
$offset=0;
do{
$p=stripos($var,'<span',$offset);
if($p===false){break;}
$open_pos[]=$p;
$offset=$p+1;
}while($offset<$len);
$offset=0;
do{
$p=stripos($var,'</span>',$offset);
if($p===false){break;}
$close_pos[]=$p;
$offset=$p+1;
}while($offset<$len);
$t=0;
do{
$change=false;
for($i=0;$i<count($open_pos)-1;$i++){
foreach($close_pos as $k=>$v){
if($open_pos[$i+1]>$v){
if($open_pos[$i]==$start_pos){
$end_pos=$v;
break 3;
}
unset($open_pos[$i],$close_pos[$k]);
$open_pos=array_values($open_pos);
$close_pos=array_values($close_pos);
$change=true;
break 2;
}
}
}
if($open_pos[$i]!=$start_pos){
unset($open_pos[$i],$close_pos[0]);
$open_pos=array_values($open_pos);
$close_pos=array_values($close_pos);
$change=true;
}
else{
$end_pos=$close_pos[0];
break 3;
}
if(count($open_pos)<2)break;
$t++;
}while($t<1000);
$var=substr_replace($var,'###',$end_pos,7);
$var=substr_replace($var,'###',$start_pos,$str_len);
echo $var;
Tested on this beautiful HTML:
$var='<span class="*A">a
<span class="*B">b
<span class="*E">e</span>
<span class="*C">c
<span class="*D">d
<span class="*E">e</span>
<span class="*0">BEFORE THIS ONE
<span class="*F">a</span>
<span class="*G">g
<span class="*H">h
<span class="*J">j</span>
</span>
<span class="*K">k</span>
<span class="*L">l</span>
<span class="*M">m</span>
_GGG</span>
<span class="*N">n</span>
BETWEEN</span>BETWEEN
<span class="*O">o
<span class="*P">p</span>
_OOO</span>
</span>
_CCC</span>
<span class="*Q">q
<span class="*R">r</span>
_RRR</span>
</span>
</span>
';

Find all elements except for those with certain class with simple_html_dom.php

I am using the simple_html_dom parser and I want to fetch data from html code that looks like this:
<pre class="root">
<span class="B bgB"></span>
<span class="B bgB"></span>
<span class="B bgB"></span>
<span class="B bgB"></span>
<span class="W"></span>
<span class="Y DH"> </span>
<span class="Y DH">Some text</span>
</pre>
etc..
But I only want to get the content from the ones without the bgB class. So far I have this code:
$elements = $html->find('pre.root span[class!=bgB]');
But all spans are fetched and later printed, not only the ones without the bgB class. How can I accomplish this?
It can't be done with simple but if you switch to this one you can use the css :not pseudo:
$html = str_get_html($str);
$elements = $html->find('pre.root span:not(.bgB)');

Scrape HTML & count children using Simple HTML DOM

I'm trying to collect data from a website, and want to count the amount of elements in another element. Targeting different DOM elements works fine, but for some reason the $count variable in the example below stays at "0". I'm probably missing something really silly, but I can't seem to find it.
The HTML on the website is as follows:
<div id="list_options">
<div class="list_mtgdef_option pointer">
<div class="list_mtgdef_foildesc shadow">
</div>
<div class="list_mtgdef_stock tooltip">
<div class="list_mtgdef_stock_left">
<span class="foil008469_1 block "></span>
<span class="foil008469_2 block transparency_25"></span>
</div>
<div class="list_mtgdef_stock_right">
<span class="008469_8 block "></span>
<span class="008469_7 block "></span>
<span class="008469_6 block "></span>
<span class="008469_5 block "></span>
<span class="008469_4 block "></span>
<span class="008469_3 block "></span>
<span class="008469_2 block "></span>
<span class="008469_1 block "></span>
</div>
</div>
</div>
</div>
And this is the php I'm using:
$array = array();
foreach($html->find('#list_options .list_mtgdef_option') as $element) {
$count = 0;
foreach($element->find('.list_mtgdef_stock', 0)->childNodes(1)->childNodes as $node) {
if(!($node instanceof \DomText))
$count++;
}
$row = array(
'stock' => strval($count)
);
array_push($array, $row);
}
echo json_encode($array);
You can just do:
count($element->find('.list_mtgdef_stock > *[2] > *'))
//=>8
Silly indeed: ()
$element->find('.list_mtgdef_stock', 0)->childNodes(1)->childNodes()
I solved with count elements from child :
$element= $element->find('.list_mtgdef_stock', 0);
count($element->children())

php preg_replace the last link of two

I have many links out of one foreach. each foreach output some dom tree like:
<span id="span1">
<a(.*?)/test/(.*?)>word1</a>
</span>
<span id="span2">
<a(.*?)/fold/(.*?)>word2</a>
</span>
Now I want to replace the last link of the two, change the whole code as:
<span id="span1">
<a(.*?)/test/(.*?)>word1</a><!-- remain this link, do not replace. -->
</span>
<span id="span2">
word2
</span>
My preg_replace code here:
$code = '<span>test1</span><span>test2</span>';
echo preg_replace('%href="(.*?)/fold/(.*?)"%', 'href="#" class="replaced" title="$2"', $code);
I want get code like
<span id="span1">
test1
</span>
<span id="span2">
test2
</span>
But it will output <span id="span1">word2</span>, not as I expected. how to do well? thanks.
this will work (fixed):
preg_replace('(href="(.*?)/fold/(.*?)">(.*?)</a>)', 'href="#" class="replaced" title="$3">$3</a>', $code);
Thanks for onatm suggestion, finnally, I use simple_html_dom make a judge and get the code what I need.
$code = <<<EOT
<span id="span1">word1</span><span id="span2">word2</span>
EOT;
$html = str_get_html($code);
if($html->find("span[id=span1]")) {
foreach($html->find("span[id=span1]") as $data1)
$result1 = $data1;
}
if($html->find("span[id=span2]")) {
foreach($html->find("span[id=span2]") as $data2)
$result2 = preg_replace('%href="(.*?)/fold/(.*?)">(.*?)</a>%', 'href="#" class="replaced" title="$3">$3</a>', $data2);
}
echo $result1.''.$result2;

Categories