how to create child nodes with php xpath query - php

Code is something like this:
$b = $xpath->query('//td[#class="abc"]');
foreach ($b as $b2) { /*contents*/ }
Inside td, this contents will be there:
<h2>Link Title
<span class="spanclass">Spinning</span></h3>
<div class="xyz">ABC XYZ abc xyz</div>
<span class="phr">Style</span>red
if i print/echo, "print_r ($b2->nodeValue);", content will be like this:
Link Title
Spinning
ABC XYZ abc xyz
Style red
Now tried, this code inside each $b2:
foreach ($xpath->query('.//a[#href]', $b2) as $child) {
print_r($child->nodeValue);
}
Answer is coming:
Link Title
I need this information from there:
$title = 'Link Title';
$titlelink = 'http://www.example.com';
$spanclassinfo = 'Spinning';
$stylerandom = 'red';
Any help?
Thanks in advance.

You should check if it has attributes with something like this.
Get all attributes in each element
foreach ($b as $b2) {
if ($b2->hasAttributes()) {
foreach ($p->attributes as $attr) {
$name = $attr->nodeName;
$value = $attr->nodeValue;
echo 'Attribute ' . $name . ' - ' . $value . '<br />';
}
}
}
Get specific attributes from an element
foreach ($b as $b2) {
if ($b2->hasAttributes()) {
$titlelink = $b2->attributes->getNamedItem('href')->value;
}
}
You'll need to add in some checks to make sure that the attribute you're trying to select actually exists. That way you're not trying to grab the href attribute on a span tag.

Related

How Can i get the child element using class using php DOMXPath?

I want to get the child element with specific class form html I have manage to find the element using tag name but can't figureout how can I get the child emlement with specific class?
Here is my CODE:
<?php
$html = file_get_contents('myfileurl'); //get the html returned from the following url
$pokemon_doc = new DOMDocument();
libxml_use_internal_errors(TRUE); //disable libxml errors
if (!empty($html)) { //if any html is actually returned
$pokemon_doc->loadHTML($html);
libxml_clear_errors(); //remove errors for yucky html
$pokemon_xpath = new DOMXPath($pokemon_doc);
//get all the h2's with an id
$pokemon_row = $pokemon_xpath->query("//li[#class='content']");
if ($pokemon_row->length > 0) {
foreach ($pokemon_row as $row) {
$title = $row->getElementsByTagName('h3');
foreach ($title as $a) {
echo "Title: ";
echo strip_tags($a->nodeValue). '<br>';
}
$links = $row->getElementsByTagName('a');
foreach ($links as $l) {
echo "Link: ";
echo strip_tags($l->nodeValue). '<br>';
}
$desc = $row->getElementsByTagName('span');
//I tried that but didnt work..... iwant to get the span with class desc
//$desc = $row->query("//span[#class='desc']");
foreach ($desc as $d) {
echo "DESC: ";
echo strip_tags($d->nodeValue) . '<br><br>';
}
// echo $row->nodeValue . "<br/>";
}
}
}
?>
Please let me know if this is a duplicate but I cant find out or you think question is not good or not explaining well please let me know in comments.
Thanks.

Simple HTML DOM Not Finding DIV

I have code trying to extract the Event SKU from the Robot Events Page, here is an example. The code that I am using dosn't find any of the SKU on the page. The SKU is on line 411, with a div of the class "product-sku". My code doesn't event find the Div on the page and just downloads all the events. Here is my code:
<?php
require('simple_html_dom.php');
$html = new simple_html_dom();
if(!$events)
{
echo mysqli_error($con);
}
while($event = mysqli_fetch_row($events))
{
$htmldown = file_get_html($event[4]);
$html->load($htmldown);
echo "Downloaded";
foreach ($html->find('div[class=product-sku]') as $row) {
$sku = $row->plaintext;
echo $sku;
}
}
?>
Can anyone help me fix my code?
This code is used DOMDocument php class. It works successfully for below sample HTML. Please try this code.
// new dom object
$dom = new DOMDocument();
// HTML string
$html_string = '<html>
<body>
<div class="product-sku1" name="div_name">The this the div content product-sku</div>
<div class="product-sku2" name="div_name">The this the div content product-sku</div>
<div class="product-sku" name="div_name">The this the div content product-sku</div>
</body>
</html>';
//load the html
$html = $dom->loadHTML($html_string);
//discard white space
$dom->preserveWhiteSpace = TRUE;
//the table by its tag name
$divs = $dom->getElementsByTagName('div');
// loop over the all DIVs
foreach ($divs as $div) {
if ($div->hasAttributes()) {
foreach ($div->attributes as $attribute){
if($attribute->name === 'class' && $attribute->value == 'product-sku'){
// Peri DIV class name and content
echo 'DIV Class Name: '.$attribute->value.PHP_EOL;
echo 'DIV Content: '.$div->nodeValue.PHP_EOL;
}
}
}
}
I would use a regex (regular expression) to accomplish pulling skus out.
The regex:
preg_match('~<div class="product-sku"><b>Event Code:</b>(.*?)</div>~',$html,$matches);
See php regex docs.
New code:
<?php
if(!$events)
{
echo mysqli_error($con);
}
while($event = mysqli_fetch_row($events))
{
$htmldown = curl_init($event[4]);
curl_setopt($htmldown, CURLOPT_RETURNTRANSFER, true);
$html=curl_exec($htmldown);
curl_close($htmldown)
echo "Downloaded";
preg_match('~<div class="product-sku"><b>Event Code:</b>(.*?)</div>~',$html,$matches);
foreach ($matches as $row) {
echo $row;
}
}
?>
And actually in this case (using that webpage) being that there is only one sku...
instead of:
foreach ($matches as $row) {
echo $row;
}
You could just use: echo $matches[1]; (The reason for array index 1 is because the whole regex pattern plus the sku will be in $matches[0] but just the subgroup containing the sku is in $matches[1].)
try to use
require('simple_html_dom.php');
$html = new simple_html_dom();
if(!$events)
{
echo mysqli_error($con);
}
while($event = mysqli_fetch_row($events))
{
$htmldown = str_get_html($event[4]);
echo "Downloaded";
foreach ($htmldown->find('div[class=product-sku]') as $row) {
$sku = $row->plaintext;
echo $sku;
}
}
and if class "product-sku" is only for div's then you can use
$htmldown->find('.product-sku')

i have an article driven from a db and it contains image tag, i need a regex to grab that image and put it into anchor tags

Some text before <img style="float: left;" src="../images/265218_imgthw.jpg" alt="" width="81" height="88"> some text after
Say i have something like that wrap inside some text coming from db. I need it to be changed to:
Some text before some text after
Any help please.
I tried breaking the code using the img tag with explode. It works fine but need to reduce the code using regEx. This is how i do it:
$string = "<img";
$explored = explode($string,$desc);
$desc = "";
foreach ($explored as $key => $value) {
static $counter = 0;
if(strpos($explored[$key], '/>')){
$desc .= "<a href='".$path.$image[$counter]."' class='lightview'><img".$explored[$key];
$counter++;
} else $desc .= $explored[$key]."<a href='".$path.$image[$key]."' class='lightview'><img";
}
$string = "/>";
$explored = explode($string,$desc);
$desc = "";
foreach ($explored as $key => $value) {
if(strpos($explored[$key], '<img'))
{
$desc .= $explored[$key]."/></a>";
}
else $desc .= $explored[$key];
}
You can do it using SimpleHTMLDOM pretend that you have a variable named as $t
// '$t' is the element to be wrapped with anchor
$tempHtml = str_get_html('<a>' . $t->outertext . '</a>');
$link = $tempHtml->find('a', 0);
// '$src' is the url of the link
$link->href = $image['src'];

How to get only required xpath elements?

I have an xml file that I want to store a node's rank attribute in a variable.
I tried:
echo $var = $xmlobj->xpath("//Listing[#rank]");
to no avail, it just prints ArrayArray.
How can this be done?
if($xmlobj = simplexml_load_string(file_get_contents($xml_feed)))
{
foreach($xmlobj as $listing)
{
// echo 'Session ID: ' . $sessionId = $listing->sessionId . '<br />';
// echo 'Result Set: ' . $ResultSet = $listing->ResultSet . '<br />';
print_r($xmlobj->xpath("//Listing[#rank]"));
// $result = $xmlobj->xpath("/page/");
// print_r($result);
}
}
Henrik's suggestion:
foreach($xmlobj as $listing)
{
$var = $xmlobj->xpath("//Listing[#rank]");
foreach ($var as $xmlElement)
{
echo (string)$xmlElement;
}
}
Here you go
<page>
<ResultSet id="adListings" numResults="3">
<Listing rank="1" title="Reliable Local Moving Company" description="TEST." siteHost="www.example.com">
</Listing>
Edit after playing around with the posted example xml:
My initial answer was somewhat off track - casting to string would give you the inner text of the selected elements, if they have one (not the case here)
"//Listing[#rank]" selects all 'Listing' Elements that have a 'rank' attribute. If you want to select the attributes themselves, use "//Listing/#rank"
To output an attribute, use the SimpleXMLElement with array syntax: $xmlElement['rank']
So in your case:
foreach($xmlobj as $listing)
{
$var = $xmlobj->xpath("//Listing/#rank");
foreach ($var as $xmlElement)
{
echo $xmlElement['rank'];
}
}
or
foreach($xmlobj as $listing)
{
$var = $xmlobj->xpath("//Listing[#rank]");
foreach ($var as $xmlElement)
{
echo $xmlElement['rank'];
echo $xmlElement['title']; // Added to demonstrate difference
}
}
should work.
In the first case, the $xmlElement would only contain the 'rank' attribute while in the second, it would contain the complete 'Listing' element (hence allowing the title output).

Need some help with XML parsing

The XML feed is located at: http://xml.betclick.com/odds_fr.xml
I need a php loop to echo the name of the match, the hour, and the bets options and the odds links.
The function will select and display ONLY the matchs of the day with streaming="1" and the bets type "Ftb_Mr3".
I'm new to xpath and simplexml.
Thanks in advance.
So far I have:
<?php
$xml_str = file_get_contents("http://xml.betclick.com/odds_fr.xml");
$xml = simplexml_load_string($xml_str);
// need xpath magic
$xml->xpath();
// display
?>
Xpath is pretty simple once you get the hang of it
you basically want to get every match tag with a certain attribute
//match[#streaming=1]
will work pefectly, it gets every match tag from underneath the parent tag with the attribute streaming equal to 1
And i just realised you also want matches with a bets type of "Ftb_Mr3"
//match[#streaming=1]/bets/bet[#code="Ftb_Mr3"]
This will return the bet node though, we want the match, which we know is the grandparent
//match[#streaming=1]/bets/bet[#code="Ftb_Mr3"]/../..
the two dots work like they do in file paths, and gets the match.
now to work this into your sample just change the final bit to
// need xpath magic
$nodes = $xml->xpath('//match[#streaming=1]/bets/bet[#code="Ftb_Mr3"]/../..');
foreach($nodes as $node) {
echo $node['name'].'<br/>';
}
to print all the match names.
I don't know how to work xpath really, but if you want to 'loop it', this should get you started:
<?php
$xml = simplexml_load_file("odds_fr.xml");
foreach ($xml->children() as $child)
{
foreach ($child->children() as $child2)
{
foreach ($child2->children() as $child3)
{
foreach($child3->attributes() as $a => $b)
{
echo $a,'="',$b,"\"</br>";
}
}
}
}
?>
That gets you to the 'match' tag which has the 'streaming' attribute. I don't really know what 'matches of the day' are, either, but...
It's basically right out of the w3c reference:
http://www.w3schools.com/PHP/php_ref_simplexml.asp
I am using this on a project. Scraping Beclic odds with:
<?php
$match_csv = fopen('matches.csv', 'w');
$bet_csv = fopen('bets.csv', 'w');
$xml = simplexml_load_file('http://xml.cdn.betclic.com/odds_en.xml');
$bookmaker = 'Betclick';
foreach ($xml as $sport) {
$sport_name = $sport->attributes()->name;
foreach ($sport as $event) {
$event_name = $event->attributes()->name;
foreach ($event as $match) {
$match_name = $match->attributes()->name;
$match_id = $match->attributes()->id;
$match_start_date_str = str_replace('T', ' ', $match->attributes()->start_date);
$match_start_date = strtotime($match_start_date_str);
if (!empty($match->attributes()->live_id)) {
$match_is_live = 1;
} else {
$match_is_live = 0;
}
if ($match->attributes()->streaming == 1) {
$match_is_running = 1;
} else {
$match_is_running = 0;
}
$match_row = $match_id . ',' . $bookmaker . ',' . $sport_name . ',' . $event_name . ',' . $match_name . ',' . $match_start_date . ',' . $match_is_live . ',' . $match_is_running;
fputcsv($match_csv, explode(',', $match_row));
foreach ($match as $bets) {
foreach ($bets as $bet) {
$bet_name = $bet->attributes()->name;
foreach ($bet as $choice) {
// team numbers are surrounded by %, we strip them
$choice_name = str_replace('%', '', $choice->attributes()->name);
// get the float value of odss
$odd = (float)$choice->attributes()->odd;
// concat the row to be put to csv file
$bet_row = $match_id . ',' . $bet_name . ',' . $choice_name . ',' . $odd;
fputcsv($bet_csv, explode(',', $bet_row));
}
}
}
}
}
}
fclose($match_csv);
fclose($bet_csv);
?>
Then loading the csv files into mysql. Running it once a minute, works great so far.

Categories