PHP - Get the prices displayed by google - php

I would like to recover all price from Google search in a PHP file.
Exemple of price search : https://www.google.com/search?ei=QBN1XIfYDrG5gwfmq6bwDg&q=860+evo+500go&oq=860+evo+500go&gs_l=psy-ab.3..0j0i10j0i22i10i30j0i22i30l3.5044.6363..6572...0.0..0.59.347.6......0....1..gws-wiz.......0i71j0i20i263j0i67j0i203.HYjd3deC288
file_get_contents doesn't work and i've to use CURL like in this topic :
PHP file_get_contents error 503
Now i dunno how to create next of script.
I guess I have to create a loop and use the function preg_match to keep only what i need.
It is right ? Can i've any exemple ?
Here is the beginning of my script :
$url = "https://www.google.com/search?ei=QBN1XIfYDrG5gwfmq6bwDg&q=860+evo+500go&oq=860+evo+500go&gs_l=psy-ab.3..0j0i10j0i22i10i30j0i22i30l3.5044.6363..6572...0.0..0.59.347.6......0....1..gws-wiz.......0i71j0i20i263j0i67j0i203.HYjd3deC288";
function curl_get_file_contents($URL) {
$c = curl_init();
curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($c, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($c, CURLOPT_URL, $URL);
$contents = curl_exec($c);
curl_close($c);
if ($contents) return $contents;
else return FALSE;
foreach ($variable as $key => $value) {
echo $result;
}
}

This will not work reliably. Google search is quite sensitive to scraping and quickly starts to respond with request to verify captcha when it suspects automated access.
I would recommend to look into a better source for information you need, otherwise you just risk burning time on writing code that won't run because you can't consistently get your data in first place.

Related

Getting Instagram's followers data using Curl

I'm trying to fetch the number of followers of an instagram account through web scraping and curl. Using their API may be easier but i want to know why this won't work, because in many case i got the data through HTML.
static $url='https://www.instagram.com/cats_of_instagram/';
function getUrlContent($url){
try {
$curl_connection = curl_init($url);
curl_setopt($curl_connection, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curl_connection, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl_connection, CURLOPT_SSL_VERIFYPEER, false);
//Data are stored in $data
$data = (curl_exec($curl_connection));
$position = strpos($data,"<span data-reactid=\".0.1.0.0:0.1.3.1.0.2\"> followers</span>");
print_r($position);
curl_close($curl_connection);
} catch(Exception $e) {
return $e->getMessage();
}
}
Problem is the function strpos does not return position.
$position = strpos($data,"<span data-reactid=\".0.1.0.0:0.1.3.1.0.2\"> followers</span>");
You can't do that.
The element you're looking for is rendered by javascript, after the page has loaded.
curl doesn't wait for scripts to run (nor does it run any). It just returns the html.
You can easily verify this by printing $data. Or by looking at the page's source.
To "see" the element you're looking for, you need to use the DOM inspector.

check if file exist on remote website without knowing extension

I need to find a file name _template on more than one remote server.
The file will have one of two extensions:
_template.htm
or
_template.php
I can find a file using cURL on a remote server which is easy but cant wrap my head around how to find one or the other.
The code is in a loop and it will search on about 20 different sites for this file, if it finds it just needs to say: Found it if it doesn't find any of the two files it needs to say: None Found.
Doing cURL twice takes too long to load.
Searching through the ftp array also takes too long.
My current code below that doesnt work, in know why it doesnt work thanks guys i would like to know how i can make it work: (This is in a loop)
$template_file = glob("/_template.{htm,php}", GLOB_BRACE);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $client_link.$template_file);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_MAXREDIRS, 10);
$data = curl_exec($ch);
curl_close($ch);
preg_match_all("/HTTP\/1\.[1|0]\s(\d{3})/",$data,$matches);
$code = end($matches[1]);
if(!$data)
{
echo "No Files Found<br>";
}
else
{
if($code == 200)
{
$filefound = 'Found'.$template_file;
}
elseif($code == 404)
{
$filefound = '404 File not Found';
}
}
The problem is with the glob function - that function matches paths on the local filesystem. Add a print_r($template_file);' after theglob` and you'll see that it doesn't match anything (or it only matches things on your local system).
What you need to do is build the URLs one-by-one:
foreach(array("htm", "php") as $ext)
{
// build the URL string
$url = "http://example.com/_template." . $ext;
// now do whatever you need to with the URL...
}

How do you detect if a remote website uses flash?

I am trying to write a tool that detects if a remote website uses flash using php. So far I have written a script that detects if embed or objects exist which give an indicator that there is a possibility of it being installed but some sites encrypt their code so renders this function useless.
include_once('simple_html_dom.php');
$flashTotalCount = 0;
function file_get_contents_curl($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$html = file_get_contents_curl($url);
$doc = new DOMDocument();
#$doc->loadHTML($html);
foreach($html->find('embed') as $pageEmbed){
$flashTotalCount++;
}
foreach($html->find('object') as $pageObject){
$flashTotalCount++;
}
if($flashTotalCount == 0){
echo "NO FLASH";
}
else{
echo "FLASH";
}
Would anyone one know of a way to check to see if a website uses flash or if possible get header information that flash is being used etc.
Any advise would be helpful.
As far as I understand, flash can be loaded by javascript. So you should execute the web page. For this purposes you'll have to use tool like this:
http://seleniumhq.org/docs/02_selenium_ide.html#the-waitfor-commands-in-ajax-applications
I don't think that it is usable from php.

To get around the ajax 'same origin policy', code for a PHP ajax request forwarder?

I want to bypass the ajax same-origin policy by having a php page on my site that basically acts like a JSON proxy. Eg i make an ajax request like this:
mysite.com/myproxy.php?url=blah.com/api.json&a=1&b=2
It then makes a request to:
blah.com/api.json?a=1&b=2
And returns the JSON (or whatever) result to the original requester.
Now i assume i'd be stupidly reinventing the wheel if i wrote this php code (plus i don't know php!) - is there some pre-existing code to do this? I'm sure i'm not the only one who's butted my head up against the same-origin policy before.
Oh yeah JSONP isn't an option for this particular api.
Thanks all
Okay, here's something -
Slap this into a php script, call it like this
script.php?url=blah
post the contents you want posted to the server.
<?php
$curlPost = http_build_query($_POST);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $_GET['url']);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $curlPost);
$data = curl_exec($ch);
curl_close($ch);
echo json_encode($data);
?>
Now this script is a bit too open for my liking, so to increase security I would recommend that you add a list of domains to a white list.
So add this to the top:
$whitelist = array('http://www.google.com','http://www.ajax.com');
$list = array();
foreach($whitelist as $w)
$list[] = parse_url($w,PHP_URL_HOST);
$url = $_GET['url'];
$url = pathinfo($url,PHP_URL_HOST);
if(!in_array($url, $list)) die('no access to that domain');

building a 'simple' php url proxy

I need to implement a simple PHP proxy in a web application I am building (Its flash based and the destination service provider doesn't allow edits to their crossdomain.xml file)
Can any php gurus offer advice on the following 2 options? Also, I think, but am not sure, that I need to include some header info as well.
Thanks for any feedback!
option1
$url = $_GET['path'];
readfile($path);
option2
$content .= file_get_contents($_GET['path']);
if ($content !== false)
{
echo($content);
}
else
{
// there was an error
}
First of all, never ever ever include a file based only on user input. Imagine what would happen if someone would call your script like this:
http://example.com/proxy.php?path=/etc/passwd
Then onto the issue: what kind of data are you proxying? If any kind at all, then you need to detect the content type from the content, and pass it on so the receiving end knows what it's getting. I would suggest using something like HTTP_Request2 or something similar from Pear (see: http://pear.php.net/package/HTTP_Request2) if at all possible. If you have access to it, then you could do something like this:
// First validate that the request is to an actual web address
if(!preg_match("#^https?://#", $_GET['path']) {
header("HTTP/1.1 404 Not found");
echo "Content not found, bad URL!";
exit();
}
// Make the request
$req = new HTTP_Request2($_GET['path']);
$response = $req->send();
// Output the content-type header and use the content-type of the original file
header("Content-type: " . $response->getHeader("Content-type"));
// And provide the file body
echo $response->getBody();
Note that this code hasn't been tested, this is just to give you a starting point.
Here's another solution using curl
Can anyone comment??
$ch = curl_init();
$timeout = 30;
$userAgent = $_SERVER['HTTP_USER_AGENT'];
curl_setopt($ch, CURLOPT_URL, $_REQUEST['url']);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
$response = curl_exec($ch);
if (curl_errno($ch)) {
echo curl_error($ch);
} else {
curl_close($ch);
echo $response;
}

Categories