How to pass store selection in Simple HTML DOM parser - php

I'm trying to fetch a product name and price on this website Toplivo.bg
I am using the Simple HTML DOM parser to get it. Here is my code
include_once('simple_html_dom.php');
$link="https://toplivo.bg/en/products/Construction-materials/Dry-construction-mixtures/Screeds-and-flooring";
$html = file_get_html($link);
//Price
foreach ($html->find('div[class="content"]') as $text){
echo $text -> plaintext.'<br>';
}
?>
The problem is that first, I need to select the warehouse on the website to get the price for "Baumit Cement screed Baumit Solido E160, 25 kg".
Can I select it by default through PHP code? For example, I want to select the "Plovdiv region -> Plovdiv Store"
Thanks for helping!

This can be achieved using cURL. Complete code below:
<?php
include_once('simple_html_dom.php');
$link = "https://toplivo.bg/en/products/Construction-materials/Dry-construction-mixtures/Screeds-and-flooring";
// let's use curl to create a get request first to select a store while keeping the session using a cookie file
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://toplivo.bg/izborNaSklad/39');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookie-45fg.txt');
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookie-45fg.txt');
$output = curl_exec($ch);
curl_setopt($ch, CURLOPT_URL, $link); // now let's fetch the raw content of the store products page
$output = curl_exec($ch);
$html = str_get_html($output); // since we have the raw input, we can use the str_get_html method instead of file_get_html
//Price
foreach ($html->find('div[class="content"]') as $text){
echo $text->plaintext . '<br>';
}
?>

Related

extract specific data from webpage using php

I wants to create a php script for alerts from my work website when new notice is published, so following the page url
http://www.mahapwd.com/nit/ueviewnotice.asp?noticeid=1767
from this page i want a variable for Date & Time of Meeting (Date and time seperately two variables)
Place of Meeting and Published On
please help me to create a perfect php script.
I tried to create following script but it gives to many errors
<?php
$url1 = "http://www.mahapwd.com/nit/ueIndex.asp?district=12";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$data = curl_exec($ch);
preg_match("/href=(.*)\", $data, $urldata);
$url2 = "http://www.mahapwd.com/nit/$urldata[1];
curl_setopt($ch, CURLOPT_URL, $url2);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$data2 = curl_exec($ch);
preg_match("/Published On:</b>(.*)<\/font>", $data, $pubDt);
$PubDate = $pubDt[1];
preg_match("/Time of Meeting:</b>(.*)&nbsp", $data, $MtDt);
$MeetDate = $MtDt[1];
preg_match("/Time of Meeting:</b>$MtDt[1]&nbsp(.*)</font>", $data, $MtTime);
$MeetTime = $MtTime[1];
preg_match("/Place of Meeting:</b>(.*)<\/font>", $data, $pubDt);
$PubDate = $pubDt[1];
?>
Hello i have done simple code for you. You can download simple_html_dom.php from http://simplehtmldom.sourceforge.net/
require_once "simple_html_dom.php";
$url='http://www.mahapwd.com/nit/ueviewnotice.asp?noticeid=1767';
//parse url
for ($i=0;$i<1;$i++) {
$html1 = file_get_html($url);
if(!$html1){ echo "no content"; }
else {
//here is parsed html
$string1 = $html1;
//now you need to find table
$element1=$html1->find('table');
//here is a table you need
$input=$element1[2];
//now you can select row from here
foreach($input->find('td') as $element) {
//in here you can find name than save it to database than check it
}
}
}

simple_html_dom not loading all elements

I am doing data scrapping on a website, where I'm trying to get all product URL's from a category page. I am not sure why simple_html_dom isn't returning the product URLs from the category page. Here is my PHP code.
// Require simplehtmldom
require_once 'includes/simplehtmldom_1_5/simple_html_dom.php';
// Category page URL
$srcurl = 'http://www.lastcall.com/Hers/Womens-Apparel/Dresses/Cocktail/cat11210008_cat5900001_cat6150001/c.cat#userConstrainedResults=true&refinements=&page=1&pageSize=120&sort=&definitionPath=/nm/commerce/pagedef_rwd/template/EndecaDriven&locationInput=&radiusInput=100&onlineOnly=&allStoresInput=false&rwd=true&catalogId=cat11210008';
$html = file_get_html($srcurl); // get DOM from URL or file
The file_get_html wasn't displaying any HTML element from "lastcall" (was working fine for other website URLs). So, I used PHPs CURL like this,
// Line 1 to 4 same
// $html = file_get_html($srcurl); // get DOM from URL or file
$curl = curl_init();
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($curl, CURLOPT_HEADER, false);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_URL, $srcurl);
curl_setopt($curl, CURLOPT_REFERER, $srcurl);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
$str = curl_exec($curl);
curl_close($curl);
$html = new simple_html_dom(); // Create a DOM object
$html->load($str); // Load HTML from a string
echo $html; // Disply data on test page
After using CURL I am getting only the header and footer data from the above URL but the page is not displaying products block, from where I can actually extract all the products link. I just need help with displaying the products block, later I can implement match case to get the product links. Thanks in advance.
Regards,
Ankur

Does anyone have any scripts to share that retrieve products using Commission Junction's API that actually work?

This example only displayed a blank page for me.
This one did as well.
I've got the latest version of PHP and cURL set up properly, as far as I know so there shouldn't be any problem at that end. I'd prefer JavaScript to retrieve products but I'm open minded.
I happen to not be highly skilled, but I'd like to get my foot in the door.
edit: I will show you the code that doesn't work, and the error it is giving me.
<?php
// Your developer key
$cj_id = "My ID - omitted for privacy.";
// Your website ID
$website_id = "Also removed for privacy.";
// Keywords to search for
$keywords = "credit+card";
// URL to query with cURL
$url = "https://product-search.api.cj.com/v2/product-search?website-id=$website_id&keywords=$keywords";
// Initiate the cURL fetch
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, FALSE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
// Send authorization header with the CJ ID. Without this, the query won't work
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Authorization: '.$cj_id));
$result = curl_exec($ch);
// Put the results to an object
$resultXML = simplexml_load_string($result);
// Print the results
print "<pre>";
print_r($resultXML);
print "</pre>";
?>
Now, this is the error that it's giving me.
SimpleXMLElement Object
(
[error-message] => Invalid Key provided. Valid keys are: advertiser-ids, advertiser-sku, currency, high-price, high-sale-price, isbn, keywords, low-price, low-sale-price, manufacturer-name, manufacturer-sku, page-number, records-per-page, serviceable-area, sort-by, sort-order, upc, website-id
)
You have a error in your URL, try this:
$url = "https://product-search.api.cj.com/v2/product-search?website-id=$website_id&keywords=$keywords";
instead of :
$url = "https://product-search.api.cj.com/v2/product-search?website-id=$website_id&keywords=$keywords";
<?php
echo '<pre>';
$url='https://product-search.api.cj.com/v2/product-search?website-id=your-id-key-here&advertiser-ids=4415206&records-per-page=999&serviceable-area=US';
$CJ_KEY='0085eb59c8928f028ba5b27bccfe17cdd20cf4e9079b977b2cc6df72752abab9205676a2f7ee67befe9dccab85f656ef46aba49e500faccbf75dfc6e03f655334d/00848a3f9bf0e13525bce27f008d6245c3e42ae80f2d80a8d9d2220807ca386f4b10146cbbcfff06aafb5e49c03a3318213389dee7861abb2dd7229470390a89c9';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, FAlSE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Authorization: '.$CJ_KEY));
$curl_results = curl_exec($ch);
$xml = simplexml_load_string($curl_results);
var_dump($xml);
// Loop Insert Product to database
echo '<pre>';
// if you no set: records-per-page=999, default get 50 products latest
// advertiser-ids=4415206 is Id of Advertiser in CJ, you can replace other id ,
Hope helpful for you , good luck !
?>

<?php echo file_get_contents how to get content in a certain tag

<?php echo file_get_contents ("http://www.google.com/"); ?>
but I only want to get the contents of the tag in the url...how to do that...?
I need to echo the content between a tag....not the whole page
Refer this PHP manual and cURL which also help you.
You may also use user define function instead of file_get_contents():
function get_content($URL){
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $URL);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
echo get_content('http://example.com');
Hope, it will resolve your issue.
I think you want to extract content from a specific html tag in the file. For this you can use regular expressions. However view the following link to parse an HTML document file:
http://php.net/manual/en/class.domdocument.php
libxml_use_internal_errors(true);
$url = "http://stackoverflow.com/questions/15947331/php-echo-file-get-contents-how-to-get-content-in-a-certain-tag";
$dom = new DomDocument();
$dom->loadHTML(file_get_contents($url));
foreach($dom->getElementsByTagName('a') as $element) {
echo $element->nodeValue.'<br/>';
}
exit;
More info: http://www.php.net/manual/en/class.domdocument.php
There you can see how to select elements by id or class, how to get elements' attribute values etc.
Note: It's better to get content via cURL instead of get_file_contents. For example:
function file_get_contents_curl($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
Also note that on some websites you have to specify options like CURLOPT_USERAGENT etc., otherwise the content may not be returned.
Here are the other options: http://www.php.net/manual/en/function.curl-setopt.php

Selecting a specific div from a extern webpage using CURL

Hi can anyone help me how to select a specific div from the content of a webpage.
Let's say i want to get the div with id="wrapper_content" from webpage http://www.test.com/page3.php.
My current code looks something like this: (not working)
//REG EXP.
$s_searchFor = '#^/.dont know what to put here..#ui';
//CURL
$ch = curl_init();
$timeout = 5; // set to zero for no timeout
curl_setopt ($ch, CURLOPT_URL, 'http://www.test.com/page3.php');
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
if(!preg_match($s_searchFor, $ch))
{
$file_contents = curl_exec($ch);
}
curl_close($ch);
// display file
echo $file_contents;
So i'd like to know how i can use reg expressions to find a specific div and how to unset the rest of the webpage so that $file_content only contains the div.
HTML isn't regular, so you shouldn't use regex. Instead I would recommend a HTML Parser such as Simple HTML DOM or DOM
If you were going to use Simple HTML DOM you would do something like the following:
$html = str_get_html($file_contents);
$elem = $html->find('div[id=wrapper_content]', 0);
Even if you used regex your code still wouldn't work correctly. You need to get the contents of the page before you can use regex.
//wrong
if(!preg_match($s_searchFor, $ch)){
$file_contents = curl_exec($ch);
}
//right
$file_contents = curl_exec($ch); //get the page contents
preg_match($s_searchFor, $file_contents, $matches); //match the element
$file_contents = $matches[0]; //set the file_contents var to the matched elements
include('simple_html_dom.php');
$html = str_get_html($file_contents);
$elem = $html->find('div[id=wrapper_content]', 0);
Download simple_html_dom.php
check our hpricot, it lets you elegantly select sections
first you would use curl to get the document, then use hpricot to get the part you need

Categories