so here is what I have and it works perfectly:
include('simple_html_dom.php');
// Create DOM from URL or file
$html = file_get_html('http://localhost/index.html');
in this page resides a button called "phone number", once you click on it it opens a div
<div class="phone" style="display: none;">
<span class="number"> 212-222-3453</span>
</div>
is there a away to change it to display:block before I scrape the data ?
Yes, use the below code.
include('simple_html_dom.php');
$html = file_get_html('index.html');
$phoneArray = $html->find('div[class=phone]');
$phoneArray[0]->style="display:block";
Related
I have code like this, and it's fetching data from other website.
require('simple_html_dom.php');
$html = file_get_html("www.example.com");
$info['diesel'] = $html->find(".on .price",0)->innertext;
$info['pb95'] = $html->find(".pb .price",0)->innertext;
$info['lpg'] = $html->find(".lpg .price",0)->innertext;
The html code from other website looks:
<a href="#" class="station-detail-wrapper on text-center active">
<h3 class="fuel-header">ON</h3>
<div class="price">
5,97
<span>zł</span>
</div>
</a>
So if i use echo $info['diesel'] it shows me 5,97 zł. I would like to delete this <span>zł</span> to show price only.
May be you can replace that span tag with blank:
echo $info['diesel']=str_replace("<span>zł</span>","",$info['diesel']);
In PHP: Simple HTML DOM, How do I select all <strong> tag that are inside div with class abc, which are inside div with class 123:
<div class="123">
<div class="abc">
<strong>Text</strong>
</div>
</div>
You need to use a selector like div.123 div.abc strong and get the first element of the result. Here is a working example:
<?php
require 'simple_html_dom.php';
$html =<<<html
<div class="123">
<div class="abc">
<strong>Text</strong>
</div>
</div>
html;
$dom = str_get_html($html);
$el = $dom->find('div.123 div.abc strong', 0);
print $el;
print "\n";
print $el->innertext;
Result:
<strong>Text</strong>
Text
You can refer to the manual for a better understanding of how selectors work.
I am using Simple HTML DOM parser to fetch some data. Everything works great but I am facing a problem when I have enabled the read more plugin on my WordPress site.
The hidden content (the rest content of the article) is inside this div.
A sample:
<div class="mycontent">
Here is some content
<div class="brm" style="display: none;">
Here is another content but it's not vissible because the style of this div is set to display:none
</div>
<p>read more..</p>
</div>
So far I am using:
$url = "www.myurl.com";
$html = new simple_html_dom();
$html->load_file($url);
$maindiv = $html->find('div.mycontent',0)->outertext;
it displays everything except the content inside the div <div class="brm" style="display: none;">
Any ideas how to get the hidden content?
It actually does get that div:
include 'simple_html_dom.php';
$str = <<<EOF
<script type="text/javascript">
<div class="mycontent">
Here is some content
<div class="brm" style="display: none;">
Here is another content but it's not vissible because the style of this div is set to display:none
</div>
<p>read more..</p>
</div>
EOF;
$html = str_get_html($str);
echo $html->find('div.mycontent',0)->outertext;
// <div class="mycontent"> Here is some content <div class="brm" style="display: none;"> Here is another content but it's not vissible because the style of this div is set to display:none </div> <p>read more..</p> </div>
I have div which contain other html tags along with text
I want to extract only text from this div OR inside all html tags
<div class="rpr-help m-chm">
<div class="header">
<h2 class="h6">Repair Help</h2>
</div><!-- /end .header -->
<div class="inner m-bsc">
<ul>
<li>Repair Video</li>
<li>Repair Q&A</li>
</ul>
</div>
<div>
<br>
<span class="h4">Cross Reference Information</span><br>
<p>Part Number 285753A (AP3963893) replaces 1195967, 280152, 285140, 285743, 285753, 3352470, 3363664, 3364002, 3364003, 62672, 62693, 661560, 80008, 8559748, AH1485646, EA1485646, PS1485646.
<br>
</p>
</div>
</div>
Here is my Regexp
preg_match_all("/<div class=\"rpr-help m-chm\">(.*)<\/.*>/s", $urlcontent, $description);
Its working fine whenever I assign this complete div to $urlcontent variable.
But when I am fetching data from real url like $urlcontent = "www.test.com/test.html";
its returning complete webpage script.
How can I get inside content of <div class="rpr-help m-chm"> ?
Is there any correction require in my regexp?
Any help would be appreciated. Thanks
It's not possible to parse HTML/XHTML by regex. Source
You can't parse [X]HTML with regex. Because HTML can't be parsed by
regex. Regex is not a tool that can be used to correctly parse HTML
Based on the language you use, Please consider using a thirdpart library for HTML parsing.
use this function
function GetclassContent($tagStart,$tagEnd,$content)
{
$first_step = explode( $tagStart,$content );
$second_step = explode($tagEnd,$first_step[1] );
return $second_step[0];
}
Steps to Use Above function
$website="www.test.com/test.html";
$content=file_get_contents($website);
$tagStart ='<div class="rpr-help m-chm">';
$tagEnd = "</div >";
$RequiredContent = GetclassContent($tagStart,$tagEnd,$content);
I'm having trouble figuring out how to use PHP Simple HTML DOM Parser for pulling information from a website.
require('simple_html_dom.php');
$html = file_get_html('https://example.com');
$ret = array();
foreach($html->find(".project-card-mini-wrap") as $element) {
echo $element;
}
The output of $element is:
<div class="project-card-mini-wrap">
<a class="project_item block mb2 green-dark" href="/projects/andrewkostirev/kostirev-the-real-you">
<div class="project_thumbnail hover-group border border-box mb1">
<img alt="Project image" class="hover-zoomin fit" src="https://ksr-ugc.imgix.net/projects/2123706/photo-original.png?v=1444253259&w=218&h=162&fit=crop&auto=format&q=92&s=9d6c437e96b720dce82fc9b598b3e8ae" />
<div class="funding_tag highlight">10 days to go</div>
<div class="hover-zoomout bg-green-90">
<p class="white p2 h5">A clothing brand like never seen before</p>
</div>
</div>
<div class="project_name h5 bold"> KOSTIREV - THE REAL YOU </div>
</a>
</div>
This is the information I'd like to pull from the website:
1: Link href
2: Image src
3: Project name
Hopefully this will provide some insight to you as well as other users of PHP Simple HTML DOM Parser
foreach($html->find(".project-card-mini-wrap") as $element) {
echo "Project name: ",$element->find('.project_name',0)->innertext,"<br/>\n";
echo "Image source: ",$element->find('img',0)->src,"<br/>\n";
echo "Link: ",$element->find('a',0)->href,"<br/>\n";
}
Produces this output:
Project name: KOSTIREV - THE REAL YOU
Image source: https://ksr-ugc.imgix.net/projects/2123706/photo-original.png?v=1444253259&w=218&h=162&fit=crop&auto=format&q=92&s=9d6c437e96b720dce82fc9b598b3e8ae
Link: /projects/andrewkostirev/kostirev-the-real-you
I tried this and it worked, thanks for the help! Here is something i made using primewire.ag as a example.... The goal here was to extract all the links of a given page.
<?php
require('simple_html_dom.php');
// Create DOM from URL or file
$html = file_get_html('http://www.primewire.ag/watch-2805774-Star-Wars-The-Last-Jedi-online-free');
// Find All Movie Links
$linkPrefix = 'http://primewire.ag';
$linkClass;
foreach($html->find(".movie_version_link") as $linkClass) {
echo "Link: ",$linkPrefix,$linkClass->find('a',0)->href,"<br/>\n";
}
?>
This is also a good library for scraping and traversing via HTML
https://github.com/paquettg/php-html-parser