I have tree divs inside a father div called results_all as you can see below:
<div id="results_all">
<div class="result_information">
</div>
<div class="result_information">
</div>
<div class="result_information">
</div>
</div>
all I want to do is involve all divs whose class is called result_information, with the following code <tr><th>.
so that the final results to be:
<tr><th>
<div class="result_information">
</div>
</th></tr>
<tr><th>
<div class="result_information">
</div>
</th></tr>
<tr><th>
<div class="result_information">
</div>
</th></tr>
How I can do this kind of thing using DomDocument with PHP?
This will match the class using DOMXPath, it's then just a case of outputting the matched content with the appropriate tags when you loop over the results.
$html = '<div id="results_all">
<div class="result_information">
test3
</div>
<div class="result_information">
test2
</div>
<div class="result_information">
test
</div>
</div>
';
$classname = 'result_information';
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$results = $xpath->query("//*[contains(#class, '$classname')]");
for($i = 0; $i < $results->length; $i++ ){
print '<tr><th><div class="result_information">' . $results->item($i)->nodeValue . '</div></tr></th>';
}
Related
I need to scrap the following elements inside each one of these div's class="product-grid-item" (page contains several of them), but in fact I have no clue how to do it... so, I need help not to pull my hair out.
1 - The link and image inside the div: class="product-element-top2;
<a href="https://...this_link" class="product-image-link"> (just need the link)
<img width="300" height="300" src="https://...this_image_url... (just need this image URL)
2 - The title inside the h3 tag as follows;
<h3 class="wd-entities-title"><a href="https://...linkhere">The title goes here (just the title)
3 - Last but not least, I need to grab tha price inside this;
<span class="price"><span class="woocommerce-Price-amount amount"><bdi><span class="woocommerce-Price-currencySymbol">€</span>20,00</bdi></span></span> (just the "€20.00")
Here's the full HTML:
<div class="product-grid-item" data-loop="1">
<div class="product-element-top">
<a href="https://...linkhere" class="product-image-link">
<img width="300" height="300" src="https://image-goes-here.jpg" class="attachment-woocommerce_thumbnail size-woocommerce_thumbnail"> </a>
<div class="top-information wd-fill">
<h3 class="wd-entities-title">The title goes here</h3>
<span class="price"><span class="woocommerce-Price-amount amount"><bdi><span class="woocommerce-Price-currencySymbol">€</span>20,00</bdi></span></span>
<div class="wd-add-btn wd-add-btn-replace woodmart-add-btn">
<span>Options</span></div>
</div>
<div class="wd-buttons wd-pos-r-t color-scheme-light woodmart-buttons">
<div class="wd-compare-btn product-compare-button wd-action-btn wd-style-icon wd-compare-icon">
Buy
</div>
<div class="quick-view wd-action-btn wd-style-icon wd-quick-view-icon wd-quick-view-btn">
quick view
</div>
<div class="wd-wishlist-btn wd-action-btn wd-style-icon wd-wishlist-icon woodmart-wishlist-btn">
<a class="" href="https://linkhere/wishlist/" data-key="dcf36756534755" data-product-id="387654" data-added-text="See Wishlist">Wishlist</a>
</div>
</div>
<div class="quick-shop-wrapper wd-fill wd-scroll">
<div class="quick-shop-close wd-action-btn wd-style-text wd-cross-icon">Close</div>
<div class="quick-shop-form wd-scroll-content">
</div>
</div>
</div>
</div>
One of my clumsy attempts:
$html = file_get_contents("https://url-here.goetohere");
$DOM = new DOMDocument();
$DOM->loadHTML($html);
$finder = new DomXPath($DOM);
$classname = 'product-grid-item';
$classname = 'product-element-top2';
$classname = 'product-element-top2';
$classname = 'wd-entities-title';
$classname = 'price';
$nodes = $finder->query("//*[contains(#class, '$classname')]");
foreach ($nodes as $node) {
echo 'here »» ' . htmlentities($node->nodeValue) . '<br>';
}
Assuming that the HTML is being fetched correctly prior to attempting any DOM processing then it is fairly straightforward to construct some basic XPath expressions to find the indicated content.
As per the comment page contains several of them there are 2 product-grid-item divs as you'll note in the output.
$html='
<div class="product-grid-item" data-loop="1">
<div class="product-element-top">
<a href="https://...linkhere" class="product-image-link">
<img width="300" height="300" src="https://image-goes-here.jpg" class="attachment-woocommerce_thumbnail size-woocommerce_thumbnail">
</a>
<div class="top-information wd-fill">
<h3 class="wd-entities-title">
The title goes here
</h3>
<span class="price">
<span class="woocommerce-Price-amount amount">
<bdi>
<span class="woocommerce-Price-currencySymbol">€</span>20,00
</bdi>
</span>
</span>
<div class="wd-add-btn wd-add-btn-replace woodmart-add-btn">
<a href="https://...linkhere" data-quantity="1" class="button product_type_variable add_to_cart_button add-to-cart-loop">
<span>Options</span>
</a>
</div>
</div>
<div class="wd-buttons wd-pos-r-t color-scheme-light woodmart-buttons">
<div class="wd-compare-btn product-compare-button wd-action-btn wd-style-icon wd-compare-icon">
Buy
</div>
<div class="quick-view wd-action-btn wd-style-icon wd-quick-view-icon wd-quick-view-btn">
quick view
</div>
<div class="wd-wishlist-btn wd-action-btn wd-style-icon wd-wishlist-icon woodmart-wishlist-btn">
<a class="" href="https://linkhere/wishlist/" data-key="dcf36756534755" data-product-id="387654" data-added-text="See Wishlist">Wishlist</a>
</div>
</div>
<div class="quick-shop-wrapper wd-fill wd-scroll">
<div class="quick-shop-close wd-action-btn wd-style-text wd-cross-icon">
Close
</div>
<div class="quick-shop-form wd-scroll-content"></div>
</div>
</div>
</div>
<div class="product-grid-item" data-loop="1">
<div class="product-element-top">
<a href="https://www.example.com/banana" class="product-image-link">
<img width="300" height="300" src="https://www.example.com/kittykat.jpg" class="attachment-woocommerce_thumbnail size-woocommerce_thumbnail">
</a>
<div class="top-information wd-fill">
<h3 class="wd-entities-title">
Oh look, another title!
</h3>
<span class="price">
<span class="woocommerce-Price-amount amount">
<bdi>
<span class="woocommerce-Price-currencySymbol">€</span>540,00
</bdi>
</span>
</span>
<div class="wd-add-btn wd-add-btn-replace woodmart-add-btn">
<a href="https://www.example.com/gorilla" data-quantity="1" class="button product_type_variable add_to_cart_button add-to-cart-loop">
<span>Options</span>
</a>
</div>
</div>
<div class="wd-buttons wd-pos-r-t color-scheme-light woodmart-buttons">
<div class="wd-compare-btn product-compare-button wd-action-btn wd-style-icon wd-compare-icon">
Buy
</div>
<div class="quick-view wd-action-btn wd-style-icon wd-quick-view-icon wd-quick-view-btn">
quick view
</div>
<div class="wd-wishlist-btn wd-action-btn wd-style-icon wd-wishlist-icon woodmart-wishlist-btn">
<a class="" href="https://www.example.com/wishlist/" data-key="dcf36756534755" data-product-id="387654" data-added-text="See Wishlist">Wishlist</a>
</div>
</div>
<div class="quick-shop-wrapper wd-fill wd-scroll">
<div class="quick-shop-close wd-action-btn wd-style-text wd-cross-icon">
Close
</div>
<div class="quick-shop-form wd-scroll-content"></div>
</div>
</div>
</div>';
To process the downloaded HTML
# set the libxml parameters and create new DOMDocument/XPath objects.
libxml_use_internal_errors( true );
$dom=new DOMDocument;
$dom->validateOnParse=false;
$dom->strictErrorChecking=false;
$dom->recover=true;
$dom->loadHTML( $html );
libxml_clear_errors();
$xp=new DOMXPath( $dom );
# some basic XPath expressions
$exprs=(object)array(
'product-link' => '//a[#class="product-image-link"]',
'product-img-src' => '//a[#class="product-image-link"]/img',
'h3-title-text' => '//h3[#class="wd-entities-title"]',
'price' => '//span[#class="price"]/span/bdi'
);
# find the keys (for convenience) to be used below
$keys=array_keys( get_object_vars( $exprs ) );
# store results here
$res=array();
# loop through all patterns and issue XPath query.
foreach( $exprs as $key => $expr ){
# add key to output and set as an array.
$res[ $key ]=[];
$col=$xp->query( $expr );
# find the data if the query succeeds
if( $col && $col->length > 0 ){
foreach( $col as $node ){
switch( $key ){
case $keys[0]:$res[$key][]=$node->getAttribute('href');break;
case $keys[1]:$res[$key][]=$node->getAttribute('src');break;
case $keys[2]:$res[$key][]=trim($node->textContent);break;
case $keys[3]:$res[$key][]=trim($node->textContent);break;
}
}
}
}
# show the result or do really interesting things with the data
printf('<pre>%s</pre>',print_r($res,true));
Which yields:
Array
(
[product-link] => Array
(
[0] => https://...linkhere
[1] => https://www.example.com/banana
)
[product-img-src] => Array
(
[0] => https://image-goes-here.jpg
[1] => https://www.example.com/kittykat.jpg
)
[h3-title-text] => Array
(
[0] => The title goes here
[1] => Oh look, another title!
)
[price] => Array
(
[0] => â¬20,00
[1] => â¬540,00
)
)
i am trying to scrape table which is in div inside div and inside span, i am getting data but as whole content and not formatted as table, all i want to get all data in table format like rows and columns, how to get this output in table format? any help will be appreciated.. Thanks in advance.
edit: i have provided html table code which is commented because its not working
<html>
<div class="container">
<div class="table-responsive">
<table class="table table-striped table-condensed">
<thead>
<tr bgcolor='#ddbbff'>
<th>Stations</th>
<th>Day/Date</th>
<th>Arrive</th>
<th>Depart</th>
<th>Status</th>
</tr>
</thead>
<tbody>
<?php
//error_reporting(E_ALL);
//ini_set('display_errors', '1');
$url = "https://www.railmitra.com/train-running-status/12129?day=yesterday";
$class_to_scrape="well well-sm";
$html = file_get_contents($url);
$document = new DOMDocument();
$document->loadHTML($html);
$selector = new DOMXPath($document);
$anchors = $selector->query("/html/body//div[#class='". $class_to_scrape ."']");
echo "ok, no php syntax errors. <br>Lets see what we scraped.<br>";
foreach ($anchors as $node) {
$full_content = innerHTML($node);
echo "<br>".$full_content."<br>" ;
//echo "<tr>";
//echo "<td>" . $stations[] = $cells->item(0)->textContent . "</td>";
//echo "<td>" . $dayDate[] = $cells->item(1)->textContent . "</td>";
//echo "<td>" . $arr[] = $cells->item(2)->textContent . "</td>";
//echo "<td>" . $dep[] = $cells->item(3)->textContent . "</td>";
//echo "<td>" . $status[] = $cells->item(4)->textContent . "</td>";
}
/* this function preserves the inner content of the scraped element.
** http://stackoverflow.com/questions/5349310/how-to-scrape-web-page-data-
without-losing-tags
** So be sure to go and give that post an uptick too:)
**/
function innerHTML(DOMNode $node)
{
$doc = new DOMDocument();
foreach ($node->childNodes as $child) {
$doc->appendChild($doc->importNode($child, true));
}
return $doc->saveHTML();
}
?>
</tbody>
</table>
</div>
</div>
</html>
edit: added html source
<div class="well well-sm">
<div class=row">
<div class="col-7 col-md-4">
<span class="ind-crossed">
<i class="fa fa-check-circle-o aria-hidden="true">
::before
</i>
</span>
" Pune Junction"
</div>
<div class="col-5 col-md-3">
<span> day1 </span>
<span> 09-sep </span>
</div>
<div class="col-2 col-md-1">
<span> 00:00 </span>
<br>
</div>
<div class="col-2 col-md-1">
<span> 18:25 </span>
<br>
</div>
<div class="col-8 col-md-3 text-right">
<span class="tl-msg text-success"> Right Time </span>
</div>
</div>
</div>
<div class="well well-sm">...</div>
The output of the script is:
"Outstanding 1... Outstanding 2... Outstanding 3..."
How can I output the for example only the second position of the array, so that I can output the results separately?
Like this: $out[1]
<?php
$html = '
<div class="page-wrapper">
<section class="page single-review" itemtype="http://schema.org/Review" itemscope="" itemprop="review">
<article class="review clearfix">
<div class="review-content">
<div class="review-text" itemprop="reviewBody">
Outstanding 1...
</div>
<div class="review-text" itemprop="reviewBody">
Outstanding 2...
</div>
</div>
</article>
</section>
</div>
<div class="review-text" itemprop="reviewBody">
Outstanding 3...
</div>
';
$dom = new DOMDocument;
$dom->loadHTML ($html);
$xpath = new DOMXPath ($dom);
foreach ($xpath->query (".//div[#class='review-text']") as $review)
{
$out = $review->nodeValue;
echo $out;
}
?>
There are two ways to do this. Either add the entire query result to an array like this
$arr = $xpath->query (".//div[#class='review-text']")
and then call it by
$arr[1]->nodeValue
or if the HTML is big, you can really save memory by doing
$counter = 0;
foreach ($xpath->query (".//div[#class='review-text']") as $review)
{
if ($counter == 1) {
$out = $review->nodeValue;
echo $out;
}
$counter++;
}
Depending on what you need, you can pick one of the options. Also if you need to run other code, but the printing in the foreach, the second way can do that too.
You shoul try something like this
<?php
$html = '
<div class="page-wrapper">
<section class="page single-review" itemtype="http://schema.org/Review" itemscope="" itemprop="review">
<article class="review clearfix">
<div class="review-content">
<div class="review-text" itemprop="reviewBody">
Outstanding 1...
</div>
<div class="review-text" itemprop="reviewBody">
Outstanding 2...
</div>
</div>
</article>
</section>
</div>
<div class="review-text" itemprop="reviewBody">
Outstanding 3...
</div>
';
$dom = new DOMDocument;
$dom->loadHTML ($html);
$xpath = new DOMXPath ($dom);
$reviews = $xpath->query (".//div[#class='review-text']");
echo $reviews[1]->nodeValue; // Outstanding 2...
The point is to store all the reviews in an array and then echo the one you want from the array.
i need a data "2.5 (0.5)" and "3.5"
my pattern is '/class="match_total_goal_div">.+</s'
But it is not working.
Please help.
<div class="match_total_goal_div">
2.5 (0.5) </div>
<div class="match_half_goal_div hide" ">
</div>
</td>
<td class="text-center corner_goal_range">
<div>
<span class="newlabel">N.A.</span>
</div>
.
.
.
<div class="match_total_goal_div">
3.5 </div>
.
.
.
First, you need to add brackets around your .+ to capture the desired data. By the way you need a question mark: .+?.
Hope this can help you
$str = '<div class="match_total_goal_div">
2.5 (0.5) </div>
<div class="match_total_goal_div">
3.5 </div>';
$pattern = '/class="match_total_goal_div">(.+?)</s';
preg_match_all($pattern, $str, $matches);
var_dump($matches);
Check this code to accomplish your goal
<?php
$html = '<div class="match_total_goal_div">
2.5 (0.5) </div>
<div class="match_half_goal_div hide">
</div>
<td class="text-center corner_goal_range"></td>
<div>
<span class="newlabel">N.A.</span>
</div>
<div class="match_total_goal_div">
3.5 </div>';
$DOM = new DOMDocument();
$DOM->loadHTML($html);
$finder = new DomXPath($DOM);
$classname = 'match_total_goal_div';
$nodes = $finder->query("//*[contains(#class, '$classname')]");
foreach ($nodes as $node) {
echo $node->nodeValue."\n";
}
?>
Live demo : http://sandbox.onlinephpfunctions.com/code/b3e645ac56b9f7bf57d4519abd6b1be90ed87945
I'm having a lot of trouble parsing the page source of a results page. The results page returns data about businesses in a city. This data includes name, address, phone number, owner name and URL. Any help would be much appreciated.
This is an example of one of the results (There are hundreds in the original file):
<div class="ListingResults_All_CONTAINER ListingResults_Level3_CONTAINER">
<div class="ListingResults_Level3_HEADER">
<div class="ListingResults_All_ENTRYTITLERIGHT">
<div><img src="/external/wcpages/images/L3more.gif" alt="317 at Montgomery"></div>
</div>
<div class="ListingResults_All_ENTRYTITLELEFT">
<div class="ListingResults_All_ENTRYTITLELEFTBOX"><strong><span itemprop="name">317 at Montgomery</span></strong></div>
</div>
</div>
<div class="ListingResults_Level3_MAIN">
<div class="ListingResults_Level3_MAINRIGHT">
<div class="ListingResults_Level3_MAINRIGHTBOX">
<div class="ListingResults_Level3_LOGO"><img src="http://www.centerstateceo.com/external/wcpages/wcwebcontent/webcontentpage.aspx?contentid=2071" class="ListingResults_Level3_LOGOIMG"><div style="width:100%;height:1px;overflow:hidden;"></div>
</div>
<div class="ListingResults_MAINRIGHTBOXDIVIDER" style="width:100%;overflow:hidden;height:1px;">_</div>
<div class="ListingResults_Level3_AFFILIATIONS"></div>
</div>
</div>
<div class="ListingResults_Level3_MAINLEFT">
<div class="ListingResults_Level3_MAINLEFTBOX" itemtype="http://data-vocabulary.org/Address" itemscope="" itemprop="address"><span itemprop="street-address">317 Montgomery St.</span><br><span itemprop="locality">Syracuse</span>, <span itemprop="region">NY</span> <span itemprop="postal-code">13202 </span><div class="ListingResults_Level3_MAINCONTACT"><img src="/external/wcpages/images/maincontact.gif" alt="Mr. Dean Whittles">Mr. Dean Whittles</div>
<div class="ListingResults_Level3_PHONE1"><img src="/external/wcpages/images/phone.gif" alt="Work Phone: (315) 214-4267">(315) 214-4267</div>
</div>
</div>
</div>
<div class="ListingResults_Level3_FOOTER">
<div class="ListingResults_Level3_DESCRIPTION">
<div class="ListingResults_Level3_DESCRIPTIONBOX"></div>
</div>
<div class="ListingResults_Level3_FOOTERRIGHT">
<div class="ListingResults_Level3_FOOTERRIGHTBOX">
<div class="ListingResults_Level3_SOCIALMEDIA"></div>
</div>
</div>
<div class="ListingResults_Level3_FOOTERRIGHT">
<div class="ListingResults_Level3_FOOTERRIGHTBOX">
<div class="ListingResults_Level3_COUPONS"></div>
</div>
</div>
<div class="ListingResults_Level3_FOOTERLEFT">
<div class="ListingResults_Level3_FOOTERLEFTBOX"><span class="ListingResults_Level3_LEARNMORE"><a href="/Restaurants/317-at-Montgomery-7897" class="level3_footer_left_box_a friendly">
Learn More
</a></span><span class="ListingResults_Level3_VISITSITE"> | <a href="http://www.317syr.com" onclick="recordReferralOnClick('20947', '7897', 'W');" target="_blank">
Visit Site
</a></span><span class="ListingResults_Level3_MAP"> | Show on Map</span></div>
</div>
</div>
</div>
PHP Code from Comment:
<?php
$dom = new DOMDocument();
$dom->loadHtml($data);
$spans = $dom->getElementsByTagName('span');
foreach ($spans as $el) {
$children = $el->childNodes->item(1);
if (is_object($children) AND $children->tagName == 'a') {
$url = $children->getAttribute('href');
echo $url;
continue;
}
$user_param = $el->getAttribute('itemprop');
$value = $el->nodeValue;
if ($user_param != "") {
echo $user_param . " " . $value . "\n";
}
}
?>