Scrape Amazon.com webpage with PHP

Scrape Amazon.com webpage with PHP - php

I'm trying to simply fetch the html of a remote Amazon url. I had working code, but maybe they changed something? Not sure. I've spent hours trying code samples and plugins from here and there, but nothing is working. Here's what I have right now, but of course it doesn't work either:
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $item['URL']);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);
$output = json_decode(curl_exec($curl));
//echo curl_getinfo($curl, CURLINFO_HTTP_CODE);
curl_close($curl);
#file_put_contents($graphics_file_root.'rps/amazon/temp2.html',$output);
$html = new DOMDocument();
#$html->loadHTML($output);
#file_put_contents($graphics_file_root.'rps/amazon/temp.html',$html->saveHTML());
$temp = $html->getElementsByTagName('img');
$html = file_get_contents($item['URL']);
#file_put_contents($graphics_file_root.'rps/amazon/temp2.html',$html);
$temp = $html->getElementsByTagName('img');
echo count($temp);
print_r($temp);
This doesn't work. simple_html_dom doesn't work. Nothing does that I can find.

It looks like some of the code I found online was json specific and removing the json-decode fixed it:
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $item['URL']);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);
$output = curl_exec($curl);
//echo curl_getinfo($curl, CURLINFO_HTTP_CODE);
curl_close($curl);
//file_put_contents($graphics_file_root.'rps/amazon/temp2.html',$output);
$html = new DOMDocument();
#$html->loadHTML($output);
//file_put_contents($graphics_file_root.'rps/amazon/temp.html',$html->saveHTML());
$temp = $html->getElementsByTagName('img');

Related

Get a number from a site with curl and then using it in my application

I am messing with curl to extract a number from a site, and I got that working nice an dandy.
The problem is the variable I get is an object type and hence I cannot interact with the variable on my site as I would like, and I have Googled a lot and what I found is that you cannot convert an object to a float.
include "simple_html_dom.php";
$url = "https://oilprice.com/oil-price-charts";
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
$response = curl_exec($curl);
curl_close($curl);
$html = new simple_html_dom();
$html->load($response);
foreach($html->find('tr[data-id="46"]') as $link);
foreach($link->find('td[3]') as $link);
echo $link . "<br>";

PHP - get json data from url

I'm trying to load json data from this url =
http://api.opencagedata.com/geocode/v1/json?query=48.84737%2C2.28605&pretty=1&no_annotations=1&no_dedupe=1&key=b61388b5a248b7cfcaa9579ed290485b
Using file_get_contents works with other json urls but this one is strange. It returns only "{" the first line. Strlen gives 1480 which is right.Substr(2,18) gives "documentation" which is right too. But still i can't echo the entire text. Maybe there's some way to read the text line by line and save it in another string ? The entire text is still fully loaded in the textfile
Here's the php code i tried
<?php
$url = file_get_contents("http://api.opencagedata.com/geocode/v1/json?query=48.84737%2C2.28605&pretty=1&no_annotations=1&no_dedupe=1&key=b61388b5a248b7cfcaa9579ed290485b");
$save = file_put_contents("filename.txt", $url);
echo $url;
?>
Also tried this function but still same.
function get_data($url) {
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}

You can get return value with json_decode.
function get_data($url) {
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);
return json_decode($data,true);
}

curl json with cookie returns "‹ŠŽÿÿ)»L"

i am trying to get the content of this json: http://steamcommunity.com/market/pricehistory/?country=DE&currency=3&appid=730&market_hash_name=Chroma%20Case
This is my code:
$url = "http://steamcommunity.com/market/pricehistory/?country=DE&currency=3&appid=730&market_hash_name=Chroma%20Case";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_COOKIE, 'steamLogin = 76561198075419487%7C%7C3F1A776553C4BE1D0F6DA83059052E79DB7EB3C7');
$output = curl_exec($ch);
$info = curl_getinfo($ch);
curl_close($ch);
$json_string = json_encode($output, JSON_PRETTY_PRINT);
When printing out $json_string it results in nothing, $output results in "‹ŠŽÿÿ)»L". I would like to grab the actual content on the website, the steamLogin-Cookie is needed for that. The cookie that's stored in my browser at the moment is the one I hardcoded in the source.
If you need any more info, feel free to ask.

Adding curl_setopt($ch, CURLOPT_ENCODING,""); made it :)

Save Page As XML

I have made a script that generates an IMDB API link for a movie in XML.
Once this link is generated it will save to an XML file with its contents. The only issue is that the contents aren't saving.
Link generated:
http://imdbapi.org/?title=One+Piece&type=xml&plot=simple&mt=none&episode=0&aka=simple&release=simple
PHP script:
$url="http://imdbapi.org/?title=One+Piece&type=xml&plot=simple&mt=none&episode=0&aka=simple&release=simple";
$curl = curl_init();
$data = fopen("text.xml", "w");
curl_setopt ($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_FILE, $data);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
curl_exec ($curl);
if ( !$data ) {
echo "No";
} else {
$contents = curl_exec($curl);
fwrite($data, $contents);
}
curl_close($curl);
fclose($data);

Instead of using file_get_contents, you can use CURL
$ch = curl_init('http://imdbapi.org/?title=One+Piece&type=xml&plot=simple&mt=none&episode=0&aka=simple&release=simple');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$response = curl_exec($ch);
curl_close($ch);
Now $response shall contains your XML. And you can do something like
file_put_contents('filename.xml', $response);
make sure that filename.xml is writable

php CURL not working with dynamic URLs

I can't work out why this URL is not being found by CURL. The CURL engine is simply taken to a 400 error page.
My code is very simple and works fantastically with non-dynamic URLs.
I am hoping it's something easy to spot, for example, a missing CURL option.
I have tried using $url = urlencode($url) but that didn't work either.
Here's the code:
$url = 'http://www.destinations-uk.com/accommodations.php?link=accommodations&country=england&category=Reviews&id=1';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 2);
curl_setopt($ch, CURLOPT_TIMEOUT, 15);
$r = curl_exec($ch);
$r = explode("\n", $r);
$keys = array();
if(!empty($r)) $keys[] = array_shift($r);
foreach($r as $line){
preg_match('/.+:\s/',$line,$match);
if($match) $keys[substr($match[0],0,-2)] = preg_replace('/.+:\s/','', $line);
}
print_r($keys);

Perhaps, this is something on the server-side done to prevent automated requests.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Scrape Amazon.com webpage with PHP - php

Related

Get a number from a site with curl and then using it in my application

PHP - get json data from url

curl json with cookie returns "‹ŠŽÿÿ)»L"

Save Page As XML

php CURL not working with dynamic URLs

Categories

Resources