Get content inside nested div class using Goute PHP - php

Sorry for bad english.
So i want to scrap some content from the website, but the div classes are nested and confusing me.
Basically the structure is :
<div id="gsc_vcd_table">
<div class="gs_scl">
<div class="gsc_vcd_field">
Pengarang
</div>
<div class="gsc_vcd_value">
I Anggara Wijaya, Djoko Budiyanto Setyohadi
</div>
</div>
<div class="gs_scl">
<div class="gsc_vcd_field">
Tanggal Terbit
</div>
<div class="gsc_vcd_value">
2017/3/1
</div>
</div>
</div>
I want to get text I Anggara Wijaya, Djoko Budiyanto Setyohadi from Pengarang field and also get 2017/3/1 from Tanggal Terbit field.
$crawlerdetail=$client->request('GET',$detail);
$detailscholar=$crawlerdetail->filter('div.gsc_vcd_table');
foreach ($detailscholar as $key)
{
$keyCrawler=new Crawler($key);
$pengarang=($scCrawler->filter('div.gsc_vcd_value')->count()) ? $scCrawler->filter('div.gsc_vcd_value')->text() : '';
echo $pengarang;
}
Help me please.

If you want to use SimpleXMLElement class.
See this code:
<?php
$string = <<<XML
<div id="gsc_vcd_table">
<div class="gs_scl">
<div class="gsc_vcd_field">
Pengarang
</div>
<div class="gsc_vcd_value">
I Anggara Wijaya, Djoko Budiyanto Setyohadi
</div>
</div>
<div class="gs_scl">
<div class="gsc_vcd_field">
Tanggal Terbit
</div>
<div class="gsc_vcd_value">
2017/3/1
</div>
</div>
</div>
XML;
$xml = new SimpleXMLElement($string);
$result1 = $xml->xpath("//div[contains(#class, 'gsc_vcd_field')]");
$result2 = $xml->xpath("//div[contains(#class, 'gsc_vcd_value')]");
foreach ($result1 as $key => $node) {
echo "FIELD: $result1[$key] , VALUE: $result2[$key]<br>\n";
}
And also for get xpath pattern of any elements, you can use inspect in chrome, and Copy XPath.
Another solution is use preg_match_all, see:
preg_match_all('/<div class="gsc_vcd_field">\r\n(.*?)\r\n.*<\/div>\r\n.*<div class="gsc_vcd_value">\r\n(.*?)\r\n.*<\/div>/', $string, $matches);
foreach ($matches[1] as $key => $match) {
echo "FIELD: " . $matches[1][$key] . " , VALUE: " . $matches[2][$key] . "<br>\n";
}

Related

simple html dom traversal confusion when looping

I'm trying to use the php script simplehtmldom to loop over divs on a web page while scraping.
Right now I have this:
$url = "https://test.com/";
$html = new simple_html_dom();
$html->load_file($url);
$item_list = $html->find('div.main div[id]');
foreach ($item_list as $item)
{
echo $item->outertext . PHP_EOL;
}
This will give me many like this (from the echo in the loop above):
<div id=1>
<div>
stuff here
</div>
<div>
<span class="title">name</span>
</div>
</div>
<div id=2>
<div>
stuff here
</div>
<div>
<span class="title">name 2</span>
</div>
</div>
What I'm trying to do is loop over the span with class=title, but no matter what I can't seem to quite get the right selector. Could someone help me out?
You can get the spans adding span[class=title] as a selector:
$item_list = $html->find('div.main div[id] span[class=title]');
foreach ($item_list as $item)
{
echo $item->outertext . PHP_EOL;
}

XPath PHP - check if class exists in a parent class

I have this kind of HTML file:
<div class="find-this">I do not need this</div>
<div class="content ">
<div class="find-this">
<span class="yellowcard"></span>
<span class="name">Cristiano Ronaldo</span>
</div>
</div>
<div class=" content">
<div class="find-this">
<span class="redcard"></span>
<span class="name">Lionel Messi</span>
</div>
</div>
So far, I get the find-this class that are in content parent class.
$nodes = $xpath->query("//div[contains(#class,'content')]//div[#class='find-this']");
foreach ($nodes as $key => $node) {
echo "Player ". $key .": " . $node->nodeValue;
}
Result:
Player0: Cristiano Ronaldo
Player1: Lionel Messi
How I can find out which find-this class is parent of <span class="yellowcard"> and which one is parent of <span class="redcard">?
Thank you in advice.
To select the find-this div which is a parent of <span class="yellowcard"> and the div which is parent of <span class="redcard"> use the XPaths shown below:
$yellow_nodes = $xpath->query("//span[#class='yellowcard']/parent::div[#class='find-this']");
$red_nodes = $xpath->query("//span[#class='redcard']/parent::div[#class='find-this']");

How do I loop through multiple child nodes of XML?

I am having some trouble trying to loop through an XML document. The XML looks like this:
<data>
<weather>
<hourly>
<time>0</time>
<tempC>17</tempC>
<tempF>62</tempF>
<windspeedMiles>24</windspeedMiles>
<windspeedKmph>39</windspeedKmph>
</hourly>
<hourly>
<time>3</time>
<tempC>16</tempC>
<tempF>60</tempF>
<windspeedMiles>22</windspeedMiles>
<windspeedKmph>35</windspeedKmph>
</hourly>
</weather>
<weather>
<hourly>
<time>0</time>
<tempC>17</tempC>
<tempF>62</tempF>
<windspeedMiles>24</windspeedMiles>
<windspeedKmph>39</windspeedKmph>
</hourly>
<hourly>
<time>3</time>
<tempC>16</tempC>
<tempF>60</tempF>
<windspeedMiles>22</windspeedMiles>
<windspeedKmph>35</windspeedKmph>
</hourly>
</weather>
</data>
My code (below) whilst it loops through all 'weather' nodes, it only picks out the first 'hourly' child node and completely skips the second. Would someone be able to help me as if I am honest, I do not know enough about looping to fix it and its driving me nuts! Grr.
Here is my PHP code which loads an XML document from online and then formats the XML results into div tags and obviously loops through the XML but as I said only loops through the first 'hourly' node of each 'weather' node.
<?php
// load SimpleXML
$data = new SimpleXMLElement('myOnlineXMLdocument.xml', null, true);
echo <<<EOF
<div class="observationRow">
<div class="observationTitleSmall"><br>Time</div>
<div class="observationTitleSmall"><br>Temp C</div>
<div class="observationTitleSmall"><br>Temp F</div>
<div class="observationTitleSmall"><br>Wind Speed MPH</div>
<div class="observationTitleSmall"><br>Wind Speed KMPH</div>
</div>
EOF;
foreach($data as $weather) // loop through our hours
{
echo <<<EOF
<div>
<div class="observationCellSmall"><br>{$weather->time}</div>
<div class="observationCellSmall"><br>{$weather->tempC}</div>
<div class="observationCellSmall"><br>{$weather->tempF}</div>
<div class="observationCellSmall"><br>{$weather->hourly->windspeedMiles}</div>
<div class="observationCellSmall"><br>{$weather->hourly->windspeedKmph}</div>
EOF;
}
echo '</div>';
?>
EDITED CODE:
$str = "";
foreach($data->weather as $weather)
{
foreach ($weather->hourly as $hour)
{
$str .= "
<div>";
if ($hour->time == "0") {
$str .= "
<div class='observationCellSmall'><br>$weather->date</div>
<div class='observationCellSmall'><br>$weather->maxtempC</div>
<div class='observationCellSmall'><br>$weather->mintempC</div>";
}
$str .= "
<div class='observationCellSmall'><br>$hour->time</div>
<div class='observationCellSmall'><br>$hour->tempC</div>
<div class='observationCellSmall'><br>$hour->tempF</div>
<div class='observationCellSmall'><br>$hour->windspeedMiles</div>
<div class='observationCellSmall'><br>$hour->windspeedKmph</div>
</div>
";
}
}
echo $str;
Using a slenderized version of your XML feed, that generates this:
<div>
<div class='observationCellSmall'><br>2013-08-19</div>
<div class='observationCellSmall'><br>17</div>
<div class='observationCellSmall'><br>15</div>
<div class='observationCellSmall'><br>0</div>
<div class='observationCellSmall'><br>15</div>
<div class='observationCellSmall'><br>59</div>
<div class='observationCellSmall'><br>11</div>
<div class='observationCellSmall'><br>18</div>
</div>
<div>
<div class='observationCellSmall'><br>300</div>
<div class='observationCellSmall'><br>15</div>
<div class='observationCellSmall'><br>59</div>
<div class='observationCellSmall'><br>13</div>
<div class='observationCellSmall'><br>21</div>
</div>
<div>
<div class='observationCellSmall'><br>2013-08-20</div>
<div class='observationCellSmall'><br>21</div>
<div class='observationCellSmall'><br>16</div>
<div class='observationCellSmall'><br>0</div>
<div class='observationCellSmall'><br>17</div>
<div class='observationCellSmall'><br>62</div>
<div class='observationCellSmall'><br>11</div>
<div class='observationCellSmall'><br>18</div>
</div>
<div>
<div class='observationCellSmall'><br>300</div>
<div class='observationCellSmall'><br>16</div>
<div class='observationCellSmall'><br>61</div>
<div class='observationCellSmall'><br>10</div>
<div class='observationCellSmall'><br>17</div>
</div>
You need a nested loop. One to loop over the weathers, and and another to loop over the hourlies.
foreach($data->weather as $weather) {
foreach($weather->hourly as $hourly) {
// code here
}
}
I don't remember the simplexml API 100% off my head, if that doesn't work you might need to use ->getChildren() or something to make it iterable.
Either that, or use xpath and nab the hourlies directly: /data/weather/hourly.

Substitute a phrase and characters from a result with simple html dom

My code working good to result me an external part of the price of the item from an online store, but is loaded with standard html, css and letters, I wanna be just numbers without "," or "ABC" just numbers like "123".
This is a part of external mobile-store site:
<div class="prod-box-separation" style="padding-left:15px;padding-right:15px;text-align:center;padding-top:7px;">
<div style="color:#cc1515;">
<div class="price-box">
<span class="regular-price" id="product-price-47488">
<span >
<span class="price">2.443,<sup>00</sup> RON</span>
</span>
</span>
</div>
</div>
</div>
<div class="prod-box-separation" style="padding-left:10px;padding-right:10px;">
<style>
.delivery {
display:block;
}
</style>
<p class="availability in-stock">
<div class="stock_info">Produs in stoc</div>
<div class="delivery"><div class="delivery_title">Livrare(in timpul orelor de program):</div>
<div class="delivery_item">Bucuresti - BANEASA : imediat</div>
<div class="delivery_item">Bucuresti - EROILOR : luni dupa ora 13.00.</div>
<div class="delivery_item">CURIER : Marti</div>
</div>
</p>
Garanţie: 12 luni
Here is my actual code:
<?php
include_once('../simple_html_dom.php');
$dom = file_get_html("http://www.site.com/page.html");
// alternatively use str_get_html($html) if you have the html string already...
foreach ($dom->find('span[class=price]') as $node)
{
echo $node->innertext;
}
?>
and my result is this: 2.443,<sup>00</sup> RON But correct result will be: 2.443 or 2443
You could do something like this:
<?php
include_once('../simple_html_dom.php');
$dom = file_get_html("http://www.site.com/page.html");
// alternatively use str_get_html($html) if you have the html string already...
foreach ($dom->find('span[class=price]') as $node)
{
$result = $node->innertext;
$price = explode(",<sup>", $result);
echo $price[0];
}
?>

PHP remove all div with class="myclass" except first one + add another div instead of others

I have a $content with
<div class="slidesWrap">
<div class="slidesСontainer">
<div class="myclass">
...
</div>
<div class="myclass">
...
</div>
...
...
<div class="myclass">
...
</div>
</div>
<div class="nav">
...
</div>
</div>
some other text here, <p></p> bla-bla-bla
I would like to remove via PHP all the divs with class="myclass" except the first one, and add another div instead of others, so that the result is:
<div class="slidesWrap">
<div class="slidesСontainer">
<div class="myclass">
...
</div>
<div>Check all divs here</div>
</div>
<div class="nav">
...
</div>
</div>
some other text here, <p></p> bla-bla-bla
Would be grateful if someone can point me a solution.
UDATE2:
some similar question here
from that I came up with the following test code:
$content = '<div class="slidesWrap">
<div class="slidesСontainer">
<div class="myclass">
</div>
<div class="myclass">
</div>
<div class="myclass">
</div>
</div>
<div class="nav">
</div>
</div>
some other text here, <p></p> bla-bla-bla';
$dom = new DOMDocument();
$dom->loadHtml($content);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//*[#class="myClass" and position()>1]') as $liNode) {
$liNode->parentNode->removeChild($liNode);
}
echo $dom->saveXml($dom->documentElement);
Any ideas where I can test it?
Here is what you are looking for (similar to your edit, but it removes the added html tags):
$doc = new DOMDocument();
$doc->loadHTML($content);
$xp = new DOMXpath($doc);
$elements = $xp->query("//div[#class='myclass']");
if($elements->length > 1)
{
$newElem = $doc->createElement("div");
$newElem->appendChild($doc->createTextNode("Check all divs "));
$newElemLink = $newElem->appendChild($doc->createElement("a"));
$newElemLink->setAttribute("href", "myurl");
$newElemLink->appendChild($doc->createTextNode("here"));
$elements->item(1)->parentNode->replaceChild($newElem, $elements->item(1));
for($i = $elements->length - 1; $i > 1 ; $i--)
{
$elements->item($i)->parentNode->removeChild($elements->item($i));
}
}
echo $doc->saveXML($doc->getElementsByTagName('div')->item(0));
$var = ':not(.myClass:eq(1))';
$var.removeClass("myClass");
$var.addClass("some_other_Class");
If I got you right, you've got a string called $content with all that content in it
It's not the best solution I guess but here is my attempt (which works fine for me):
if( substr_count($content, '<div class="myclass') > 1 ) {
$parts = explode('<div class="myclass',$content);
echo '<div class="myclass'.$parts[1];
echo '<div>Check all divs here</div>';
}
else {echo $content;}

Categories