My code working good to result me an external part of the price of the item from an online store, but is loaded with standard html, css and letters, I wanna be just numbers without "," or "ABC" just numbers like "123".
This is a part of external mobile-store site:
<div class="prod-box-separation" style="padding-left:15px;padding-right:15px;text-align:center;padding-top:7px;">
<div style="color:#cc1515;">
<div class="price-box">
<span class="regular-price" id="product-price-47488">
<span >
<span class="price">2.443,<sup>00</sup> RON</span>
</span>
</span>
</div>
</div>
</div>
<div class="prod-box-separation" style="padding-left:10px;padding-right:10px;">
<style>
.delivery {
display:block;
}
</style>
<p class="availability in-stock">
<div class="stock_info">Produs in stoc</div>
<div class="delivery"><div class="delivery_title">Livrare(in timpul orelor de program):</div>
<div class="delivery_item">Bucuresti - BANEASA : imediat</div>
<div class="delivery_item">Bucuresti - EROILOR : luni dupa ora 13.00.</div>
<div class="delivery_item">CURIER : Marti</div>
</div>
</p>
Garanţie: 12 luni
Here is my actual code:
<?php
include_once('../simple_html_dom.php');
$dom = file_get_html("http://www.site.com/page.html");
// alternatively use str_get_html($html) if you have the html string already...
foreach ($dom->find('span[class=price]') as $node)
{
echo $node->innertext;
}
?>
and my result is this: 2.443,<sup>00</sup> RON But correct result will be: 2.443 or 2443
You could do something like this:
<?php
include_once('../simple_html_dom.php');
$dom = file_get_html("http://www.site.com/page.html");
// alternatively use str_get_html($html) if you have the html string already...
foreach ($dom->find('span[class=price]') as $node)
{
$result = $node->innertext;
$price = explode(",<sup>", $result);
echo $price[0];
}
?>
Related
Sorry for bad english.
So i want to scrap some content from the website, but the div classes are nested and confusing me.
Basically the structure is :
<div id="gsc_vcd_table">
<div class="gs_scl">
<div class="gsc_vcd_field">
Pengarang
</div>
<div class="gsc_vcd_value">
I Anggara Wijaya, Djoko Budiyanto Setyohadi
</div>
</div>
<div class="gs_scl">
<div class="gsc_vcd_field">
Tanggal Terbit
</div>
<div class="gsc_vcd_value">
2017/3/1
</div>
</div>
</div>
I want to get text I Anggara Wijaya, Djoko Budiyanto Setyohadi from Pengarang field and also get 2017/3/1 from Tanggal Terbit field.
$crawlerdetail=$client->request('GET',$detail);
$detailscholar=$crawlerdetail->filter('div.gsc_vcd_table');
foreach ($detailscholar as $key)
{
$keyCrawler=new Crawler($key);
$pengarang=($scCrawler->filter('div.gsc_vcd_value')->count()) ? $scCrawler->filter('div.gsc_vcd_value')->text() : '';
echo $pengarang;
}
Help me please.
If you want to use SimpleXMLElement class.
See this code:
<?php
$string = <<<XML
<div id="gsc_vcd_table">
<div class="gs_scl">
<div class="gsc_vcd_field">
Pengarang
</div>
<div class="gsc_vcd_value">
I Anggara Wijaya, Djoko Budiyanto Setyohadi
</div>
</div>
<div class="gs_scl">
<div class="gsc_vcd_field">
Tanggal Terbit
</div>
<div class="gsc_vcd_value">
2017/3/1
</div>
</div>
</div>
XML;
$xml = new SimpleXMLElement($string);
$result1 = $xml->xpath("//div[contains(#class, 'gsc_vcd_field')]");
$result2 = $xml->xpath("//div[contains(#class, 'gsc_vcd_value')]");
foreach ($result1 as $key => $node) {
echo "FIELD: $result1[$key] , VALUE: $result2[$key]<br>\n";
}
And also for get xpath pattern of any elements, you can use inspect in chrome, and Copy XPath.
Another solution is use preg_match_all, see:
preg_match_all('/<div class="gsc_vcd_field">\r\n(.*?)\r\n.*<\/div>\r\n.*<div class="gsc_vcd_value">\r\n(.*?)\r\n.*<\/div>/', $string, $matches);
foreach ($matches[1] as $key => $match) {
echo "FIELD: " . $matches[1][$key] . " , VALUE: " . $matches[2][$key] . "<br>\n";
}
I have html page what im trying to read(used htmlsql.class.php, but as its too old and outdated, then i have to use phpQuery).
The html markup is:
<ul class="small-block-grid-1 medium-block-grid-2 large-block-grid-3">
<li>
<div data-widget-type="epg.tvGuide.channel" data-view="epg.tvGuide.channel" id="widget-765574917197" class=" widget-epg_tvGuide_channel">
<div class="group-box">
<div class="group-header l-center" data-action="togglePreviousBroadcasts">
<span class="header-text">
<img src="logo.png" style="height: 40px" />
</span>
</div>
<div>
<div class="tvGuide-item is-past">
<span data-action="toggleEventMeta">
06:15 what a day
</span>
<div class="tvGuide-item-meta">
Some text.
<div>Näita rohkem</div>
</div>
</div>
<div class="tvGuide-item is-current">
<span data-action="toggleEventMeta">
06:15 what a day
</span>
<div class="tvGuide-item-meta">
Some text.
<div>Näita rohkem</div>
</div>
</div>
<div class="tvGuide-item">
<span data-action="toggleEventMeta">
06:15 what a day
</span>
<div class="tvGuide-item-meta">
Some text.
<div>Näita rohkem</div>
</div>
</div>
</div>
</div>
Then with the previos thing it was fearly easy:
$wsql->select('li');
if (!$wsql->query('SELECT * FROM span')){
print "Query error: " . $wsql->error;
exit;
}
foreach($wsql->fetch_array() as $row){
But i could not read the class so i need to know when the class is current and when its not.
As im new to phpQuery then and reallife examples are hard to find.
can someone point me to the right direction.
I would like to have the "span" text and item meta, allso i like to know when the div class is "is-past" or "is-current"
You can find infos about phpQuery here: https://code.google.com/archive/p/phpquery/
I prefer "one-file" version on top in downloads:
https://code.google.com/archive/p/phpquery/downloads
Simple examples based on your code:
// for loading files use phpQuery::newDocumentFileHTML();
// for plain strings use phpQuery::newDocument();
$document = phpQuery::newDocumentFileHTML('http://domain.com/yourFile.html');
$items = pq($document)->find('.tvGuide-item');
foreach($items as $item) {
if(pq($item)->hasClass('is-past') === true) {
// matching past items
}
if(pq($item)->hasClass('is-current') === true) {
// matching current items
}
// examples for finding elements and grabbing text/attributes
$span = pq($item)->find('span');
$text_in_span = pq($span)->text();
$meta = pq($item)->find('.tvGuide-item-meta');
$link_in_meta = pq($meta)->find('a');
$href_of_link_in_meta = pq($link_in_meta)->attr('href');
}
I have this HTML:
<div class="price" itemprop="offers" itemscope itemtype="http://schema.org/Offer">
<small class="old-price">Stara cena: 1.890 RSD</small>
<span>Ušteda: <strong>1.000 RSD</strong></span>
<h5>890 <em>RSD</em>
<div class="tooltip"><p>Cene sa popustom uz gotovinsko plaćanje za online porudžbine</p></div>
</h5>
<span style="display:none" itemprop="priceCurrency" content="RSD"></span>
<span itemprop="price" content="890.00"></span>
</div>
I'm collecting prices from tag like this:
foreach($html->find('span[itemprop=price]') as $element) {
$niz['price'][] = $element->content;
}
And now i need to collect text from small tag if it exists (if it does not exist then i need empty string in an array):<small class="old-price">Stara cena: 1.890 RSD</small>
So i need something like this:
if($html->find('small[class=old-price]',0))
{
$niz['oldprice'][] = $element->innertext;
}else{
$niz['oldprice'][] = '';
}
Problem is that i get only elements from class=old-price in array and not a single empty string.
Any advice would be appreciated.
Hi can you please use code
foreach($html->find('small[class=old-price]') as $element) {
if($element->plaintext)
{
$niz['oldprice'][] = $element->plaintext;
}else{
$niz['oldprice'][] = '';
}
}
Here is the code snippet from which I have to fetch the firstChild from the DIV named u-Row-6...
<div class="u-Row-6">
<div class='article_details_price2'>
<strong >
855,90 € *
</strong>
<div class="PseudoPrice">
<em>EVP: 999,00 € *</em>
<span>
(14.32 % <span class="frontend_detail_data">gespart</span>)
</span>
</div>
</div>
</div>
For this I have used the following code:
foreach($dom->getElementsByTagName('div') as $p) {
if ($p->getAttribute('class') == 'u-Row-6') {
if ($first) {
$name = $p->firstChild-nodeValue;
$name = str_replace('€', '', $name);
$name = str_replace(chr(194), " ", $name);
$first = false;
}
}
}
But mysteriously this code is not working for me
There is a number of problems with your code:
$first is not initialized to a true value, which will prevent the string replacement code from running even once
The $p->firstChild-nodeValue lacks an > before nodeValue
$p->firstChild will actually resolve to a text node (any text between <div class="u-Row-6"> and <div class='article_details_price2'> - currently nothing), not the strong you are looking for and not <div class='article_details_price2'> either, as one might have expected.
You may want to use an XPath query instead, to get all the strong tags within a div of class "u-Row-6", and then loop through the found tags:
$src = <<<EOS
<div class="u-Row-6">
<div class='article_details_price2'>
<strong >
855,90 € *
</strong>
<div class="PseudoPrice">
<em>EVP: 999,00 € *</em>
<span>
(14.32 % <span class="frontend_detail_data">gespart</span>)
</span>
</div>
</div>
</div>
EOS;
$dom = new DOMDocument();
$dom->loadHTML($src);
$xpath = new DOMXPath($dom);
$strongTags = $xpath->query('//div[#class="u-Row-6"]//strong');
foreach ($strongTags as $tag) {
echo "The strong tag contents: " . $tag->nodeValue, PHP_EOL;
// Replacement code goes here ...
}
Output:
The strong tag contents:
855,90 € *
XPaths are actually quite handy. Read more about them here.
I have a $content with
<div class="slidesWrap">
<div class="slidesСontainer">
<div class="myclass">
...
</div>
<div class="myclass">
...
</div>
...
...
<div class="myclass">
...
</div>
</div>
<div class="nav">
...
</div>
</div>
some other text here, <p></p> bla-bla-bla
I would like to remove via PHP all the divs with class="myclass" except the first one, and add another div instead of others, so that the result is:
<div class="slidesWrap">
<div class="slidesСontainer">
<div class="myclass">
...
</div>
<div>Check all divs here</div>
</div>
<div class="nav">
...
</div>
</div>
some other text here, <p></p> bla-bla-bla
Would be grateful if someone can point me a solution.
UDATE2:
some similar question here
from that I came up with the following test code:
$content = '<div class="slidesWrap">
<div class="slidesСontainer">
<div class="myclass">
</div>
<div class="myclass">
</div>
<div class="myclass">
</div>
</div>
<div class="nav">
</div>
</div>
some other text here, <p></p> bla-bla-bla';
$dom = new DOMDocument();
$dom->loadHtml($content);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//*[#class="myClass" and position()>1]') as $liNode) {
$liNode->parentNode->removeChild($liNode);
}
echo $dom->saveXml($dom->documentElement);
Any ideas where I can test it?
Here is what you are looking for (similar to your edit, but it removes the added html tags):
$doc = new DOMDocument();
$doc->loadHTML($content);
$xp = new DOMXpath($doc);
$elements = $xp->query("//div[#class='myclass']");
if($elements->length > 1)
{
$newElem = $doc->createElement("div");
$newElem->appendChild($doc->createTextNode("Check all divs "));
$newElemLink = $newElem->appendChild($doc->createElement("a"));
$newElemLink->setAttribute("href", "myurl");
$newElemLink->appendChild($doc->createTextNode("here"));
$elements->item(1)->parentNode->replaceChild($newElem, $elements->item(1));
for($i = $elements->length - 1; $i > 1 ; $i--)
{
$elements->item($i)->parentNode->removeChild($elements->item($i));
}
}
echo $doc->saveXML($doc->getElementsByTagName('div')->item(0));
$var = ':not(.myClass:eq(1))';
$var.removeClass("myClass");
$var.addClass("some_other_Class");
If I got you right, you've got a string called $content with all that content in it
It's not the best solution I guess but here is my attempt (which works fine for me):
if( substr_count($content, '<div class="myclass') > 1 ) {
$parts = explode('<div class="myclass',$content);
echo '<div class="myclass'.$parts[1];
echo '<div>Check all divs here</div>';
}
else {echo $content;}