I'm trying to fetch a product name and price on this website Toplivo.bg
I am using the Simple HTML DOM parser to get it. Here is my code
include_once('simple_html_dom.php');
$link="https://toplivo.bg/en/products/Construction-materials/Dry-construction-mixtures/Screeds-and-flooring";
$html = file_get_html($link);
//Price
foreach ($html->find('div[class="content"]') as $text){
echo $text -> plaintext.'<br>';
}
?>
The problem is that first, I need to select the warehouse on the website to get the price for "Baumit Cement screed Baumit Solido E160, 25 kg".
Can I select it by default through PHP code? For example, I want to select the "Plovdiv region -> Plovdiv Store"
Thanks for helping!
This can be achieved using cURL. Complete code below:
<?php
include_once('simple_html_dom.php');
$link = "https://toplivo.bg/en/products/Construction-materials/Dry-construction-mixtures/Screeds-and-flooring";
// let's use curl to create a get request first to select a store while keeping the session using a cookie file
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://toplivo.bg/izborNaSklad/39');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookie-45fg.txt');
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookie-45fg.txt');
$output = curl_exec($ch);
curl_setopt($ch, CURLOPT_URL, $link); // now let's fetch the raw content of the store products page
$output = curl_exec($ch);
$html = str_get_html($output); // since we have the raw input, we can use the str_get_html method instead of file_get_html
//Price
foreach ($html->find('div[class="content"]') as $text){
echo $text->plaintext . '<br>';
}
?>
Related
I wants to create a php script for alerts from my work website when new notice is published, so following the page url
http://www.mahapwd.com/nit/ueviewnotice.asp?noticeid=1767
from this page i want a variable for Date & Time of Meeting (Date and time seperately two variables)
Place of Meeting and Published On
please help me to create a perfect php script.
I tried to create following script but it gives to many errors
<?php
$url1 = "http://www.mahapwd.com/nit/ueIndex.asp?district=12";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$data = curl_exec($ch);
preg_match("/href=(.*)\", $data, $urldata);
$url2 = "http://www.mahapwd.com/nit/$urldata[1];
curl_setopt($ch, CURLOPT_URL, $url2);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$data2 = curl_exec($ch);
preg_match("/Published On:</b>(.*)<\/font>", $data, $pubDt);
$PubDate = $pubDt[1];
preg_match("/Time of Meeting:</b>(.*) ", $data, $MtDt);
$MeetDate = $MtDt[1];
preg_match("/Time of Meeting:</b>$MtDt[1] (.*)</font>", $data, $MtTime);
$MeetTime = $MtTime[1];
preg_match("/Place of Meeting:</b>(.*)<\/font>", $data, $pubDt);
$PubDate = $pubDt[1];
?>
Hello i have done simple code for you. You can download simple_html_dom.php from http://simplehtmldom.sourceforge.net/
require_once "simple_html_dom.php";
$url='http://www.mahapwd.com/nit/ueviewnotice.asp?noticeid=1767';
//parse url
for ($i=0;$i<1;$i++) {
$html1 = file_get_html($url);
if(!$html1){ echo "no content"; }
else {
//here is parsed html
$string1 = $html1;
//now you need to find table
$element1=$html1->find('table');
//here is a table you need
$input=$element1[2];
//now you can select row from here
foreach($input->find('td') as $element) {
//in here you can find name than save it to database than check it
}
}
}
I am doing data scrapping on a website, where I'm trying to get all product URL's from a category page. I am not sure why simple_html_dom isn't returning the product URLs from the category page. Here is my PHP code.
// Require simplehtmldom
require_once 'includes/simplehtmldom_1_5/simple_html_dom.php';
// Category page URL
$srcurl = 'http://www.lastcall.com/Hers/Womens-Apparel/Dresses/Cocktail/cat11210008_cat5900001_cat6150001/c.cat#userConstrainedResults=true&refinements=&page=1&pageSize=120&sort=&definitionPath=/nm/commerce/pagedef_rwd/template/EndecaDriven&locationInput=&radiusInput=100&onlineOnly=&allStoresInput=false&rwd=true&catalogId=cat11210008';
$html = file_get_html($srcurl); // get DOM from URL or file
The file_get_html wasn't displaying any HTML element from "lastcall" (was working fine for other website URLs). So, I used PHPs CURL like this,
// Line 1 to 4 same
// $html = file_get_html($srcurl); // get DOM from URL or file
$curl = curl_init();
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($curl, CURLOPT_HEADER, false);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_URL, $srcurl);
curl_setopt($curl, CURLOPT_REFERER, $srcurl);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
$str = curl_exec($curl);
curl_close($curl);
$html = new simple_html_dom(); // Create a DOM object
$html->load($str); // Load HTML from a string
echo $html; // Disply data on test page
After using CURL I am getting only the header and footer data from the above URL but the page is not displaying products block, from where I can actually extract all the products link. I just need help with displaying the products block, later I can implement match case to get the product links. Thanks in advance.
Regards,
Ankur
This example only displayed a blank page for me.
This one did as well.
I've got the latest version of PHP and cURL set up properly, as far as I know so there shouldn't be any problem at that end. I'd prefer JavaScript to retrieve products but I'm open minded.
I happen to not be highly skilled, but I'd like to get my foot in the door.
edit: I will show you the code that doesn't work, and the error it is giving me.
<?php
// Your developer key
$cj_id = "My ID - omitted for privacy.";
// Your website ID
$website_id = "Also removed for privacy.";
// Keywords to search for
$keywords = "credit+card";
// URL to query with cURL
$url = "https://product-search.api.cj.com/v2/product-search?website-id=$website_id&keywords=$keywords";
// Initiate the cURL fetch
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, FALSE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
// Send authorization header with the CJ ID. Without this, the query won't work
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Authorization: '.$cj_id));
$result = curl_exec($ch);
// Put the results to an object
$resultXML = simplexml_load_string($result);
// Print the results
print "<pre>";
print_r($resultXML);
print "</pre>";
?>
Now, this is the error that it's giving me.
SimpleXMLElement Object
(
[error-message] => Invalid Key provided. Valid keys are: advertiser-ids, advertiser-sku, currency, high-price, high-sale-price, isbn, keywords, low-price, low-sale-price, manufacturer-name, manufacturer-sku, page-number, records-per-page, serviceable-area, sort-by, sort-order, upc, website-id
)
You have a error in your URL, try this:
$url = "https://product-search.api.cj.com/v2/product-search?website-id=$website_id&keywords=$keywords";
instead of :
$url = "https://product-search.api.cj.com/v2/product-search?website-id=$website_id&keywords=$keywords";
<?php
echo '<pre>';
$url='https://product-search.api.cj.com/v2/product-search?website-id=your-id-key-here&advertiser-ids=4415206&records-per-page=999&serviceable-area=US';
$CJ_KEY='0085eb59c8928f028ba5b27bccfe17cdd20cf4e9079b977b2cc6df72752abab9205676a2f7ee67befe9dccab85f656ef46aba49e500faccbf75dfc6e03f655334d/00848a3f9bf0e13525bce27f008d6245c3e42ae80f2d80a8d9d2220807ca386f4b10146cbbcfff06aafb5e49c03a3318213389dee7861abb2dd7229470390a89c9';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, FAlSE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Authorization: '.$CJ_KEY));
$curl_results = curl_exec($ch);
$xml = simplexml_load_string($curl_results);
var_dump($xml);
// Loop Insert Product to database
echo '<pre>';
// if you no set: records-per-page=999, default get 50 products latest
// advertiser-ids=4415206 is Id of Advertiser in CJ, you can replace other id ,
Hope helpful for you , good luck !
?>
<?php echo file_get_contents ("http://www.google.com/"); ?>
but I only want to get the contents of the tag in the url...how to do that...?
I need to echo the content between a tag....not the whole page
Refer this PHP manual and cURL which also help you.
You may also use user define function instead of file_get_contents():
function get_content($URL){
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $URL);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
echo get_content('http://example.com');
Hope, it will resolve your issue.
I think you want to extract content from a specific html tag in the file. For this you can use regular expressions. However view the following link to parse an HTML document file:
http://php.net/manual/en/class.domdocument.php
libxml_use_internal_errors(true);
$url = "http://stackoverflow.com/questions/15947331/php-echo-file-get-contents-how-to-get-content-in-a-certain-tag";
$dom = new DomDocument();
$dom->loadHTML(file_get_contents($url));
foreach($dom->getElementsByTagName('a') as $element) {
echo $element->nodeValue.'<br/>';
}
exit;
More info: http://www.php.net/manual/en/class.domdocument.php
There you can see how to select elements by id or class, how to get elements' attribute values etc.
Note: It's better to get content via cURL instead of get_file_contents. For example:
function file_get_contents_curl($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
Also note that on some websites you have to specify options like CURLOPT_USERAGENT etc., otherwise the content may not be returned.
Here are the other options: http://www.php.net/manual/en/function.curl-setopt.php
Hi can anyone help me how to select a specific div from the content of a webpage.
Let's say i want to get the div with id="wrapper_content" from webpage http://www.test.com/page3.php.
My current code looks something like this: (not working)
//REG EXP.
$s_searchFor = '#^/.dont know what to put here..#ui';
//CURL
$ch = curl_init();
$timeout = 5; // set to zero for no timeout
curl_setopt ($ch, CURLOPT_URL, 'http://www.test.com/page3.php');
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
if(!preg_match($s_searchFor, $ch))
{
$file_contents = curl_exec($ch);
}
curl_close($ch);
// display file
echo $file_contents;
So i'd like to know how i can use reg expressions to find a specific div and how to unset the rest of the webpage so that $file_content only contains the div.
HTML isn't regular, so you shouldn't use regex. Instead I would recommend a HTML Parser such as Simple HTML DOM or DOM
If you were going to use Simple HTML DOM you would do something like the following:
$html = str_get_html($file_contents);
$elem = $html->find('div[id=wrapper_content]', 0);
Even if you used regex your code still wouldn't work correctly. You need to get the contents of the page before you can use regex.
//wrong
if(!preg_match($s_searchFor, $ch)){
$file_contents = curl_exec($ch);
}
//right
$file_contents = curl_exec($ch); //get the page contents
preg_match($s_searchFor, $file_contents, $matches); //match the element
$file_contents = $matches[0]; //set the file_contents var to the matched elements
include('simple_html_dom.php');
$html = str_get_html($file_contents);
$elem = $html->find('div[id=wrapper_content]', 0);
Download simple_html_dom.php
check our hpricot, it lets you elegantly select sections
first you would use curl to get the document, then use hpricot to get the part you need