Thanks for looking at my question.
I want to get the mobile version by the use of either file_get_contents() or cURL. I know that it can be done by the help of modifying the HTTP headers in the request. Can you please give me a simple example to do so?
Thanks again!
Regards,
Sanket
As an alternative, file_get_contents and stream_context_create can also be used:
$opts = array('http' =>
array(
'header' => 'User-agent: Mozilla/5.0 (iPhone; U; CPU like Mac OS X; en) AppleWebKit/420.1 (KHTML, like Gecko) Version/3.0 Mobile/3B48b Safari/419.3',
)
);
$context = stream_context_create($opts);
$result = file_get_contents($url, false, $context);
Is this what you are looking for ?
curl -A "Mozilla/5.0 (iPhone; U; CPU like Mac OS X; en) AppleWebKit/420+ (KHTML, like Gecko) Version/3.0 Mobile/1A543a Safari/419.3" http://example.com/your-url
You need to set the user agent string:
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (iPhone; U; CPU like Mac OS X; en) AppleWebKit/420.1 (KHTML, like Gecko) Version/3.0 Mobile/3B48b Safari/419.3');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$page = curl_exec($ch);
curl_close($ch);
Related
I'm trying to scrape
newark.com
I have written code, which I have run locally to test it, and it works amazingly!
<?php
$link = 'https://www.newark.com/';
$proxy = ['server' => '172.93.142.42:3128'];
$user_agents = ['Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36', 'Mozilla/5.0 (Linux; Android 8.0.0; H3113 Build/50.1.A.10.40; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/68.0.3440.91 Mobile Safari/537.36 [FB_IAB/FB4A;FBAV/185.0.0.39.72;]', 'Mozilla/5.0 (iPhone; CPU iPhone OS 11_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E302 [FBAN/FBIOS;FBAV/166.0.0.53.95;FBBV/101310068;FBDV/iPhone7,2;FBMD/iPhone;FBSN/iOS;FBSV/11.3.1;FBSS/2;FBCR/vodafoneP;FBID/phone;FBLC/en_GB;FBOP/5;FBRV/102694127]', 'Mozilla/5.0 (Linux; Android 7.0; Studio Mega Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.111 Mobile Safari/537.36 OPR/46.3.2246.127744', 'Mozilla/5.0 (iPhone; CPU iPhone OS 11_2_6 like Mac OS X) AppleWebKit/604.5.6 (KHTML, like Gecko) Mobile/15D100 [FBAN/FBIOS;FBAV/168.0.0.57.90;FBBV/103647182;FBDV/iPhone9,3;FBMD/iPhone;FBSN/iOS;FBSV/11.2.6;FBSS/2;FBCR/MEO;FBID/phone;FBLC/pt_PT;FBOP/5;FBRV/104934021]'];
$user_agent = $user_agents[array_rand($user_agents)];
//$user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36';
$curl_handler = curl_init();
curl_setopt_array($curl_handler, array(
CURLOPT_URL => $link,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_USERAGENT => $user_agent,
CURLOPT_PROXY => $proxy['server'],
));
$result = curl_exec($curl_handler);
curl_close($curl_handler);
$result = mb_convert_encoding($result, 'UTF-8');
header('Content-type: text/html; charset=utf-8');
echo($result);
However, when I run this code inside of my US servers it does not work.
script execution takes time and nothing happens, nothing appears
But when I change the URL, I put
www.google.com
This script is also working on my servers. I've added proxies to my code but it didn't help with the URL that I need.
I guess it is related to the URL I need, any help?
I have this code
<?php
$ua = array(
"Mozilla/5.0 (compatible; MSIE 9.0; AOL 9.7; AOLBuild 4343.19; Windows NT 6.1; WOW64; Trident/5.0; FunWebProducts)",
"Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; XH; rv:8.578.498) fr, Gecko/20121021 Camino/8.723+ (Firefox compatible)",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.1 Safari/537.36",
"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.1",
"Mozilla/5.0 (compatible, MSIE 11, Windows NT 6.3; Trident/7.0; rv:11.0) like Gecko",
"Mozilla/5.0 (X11; U; Linux i686; fr-fr) AppleWebKit/525.1+ (KHTML, like Gecko, Safari/525.1+) midori/1.19",
"Opera/9.80 (X11; Linux i686; Ubuntu/14.10) Presto/2.12.388 Version/12.16",
"Mozilla/5.0 (Linux; U; Android 4.0.3; de-ch; HTC Sensation Build/IML74K) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30");
$uar = array_rand($ua);
$url = "sometestserverisetup";
$ip = '127.0.0.1';
$port = '9051';
$auth = 'mypwwhateveritis';
$command = 'signal NEWNYM';
$fp = fsockopen($ip,$port,$error_number,$err_string,10);
if(!$fp) { echo "ERROR: $error_number : $err_string";
return false;
} else {
fwrite($fp,"AUTHENTICATE \"".$auth."\"\n");
$received = fread($fp,512);
fwrite($fp,$command."\n");
$received = fread($fp,512);
}
fclose($fp);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_PROXY, "127.0.0.1:9050");
curl_setopt($ch, CURLOPT_PROXYTYPE, CURLPROXY_SOCKS5);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_VERBOSE, 0);
curl_setopt($ch,CURLOPT_USERAGENT,$ua[$uar]);
$response = curl_exec($ch);
echo $response;
?>
everything works fine. With my test site and it displays correctly. However certain sites (google.com, amazon.com, youtube, facebook. only display a blank page for echo response.
Is there some curl set opt that needs to be enabled for pages to display properly.
Looking at a var_dump(curl_getinfo($ch)); after calling curl_exec can be helpful.
I tested your code and found in some cases the sites send a 302 Moved response with a Location header to redirect the browser which would result in an empty response on a successful request.
Adding
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
made it so that every site you mentioned always returned a response in my tests. And depending on what you are doing (searches, logins, form submissions) you will probably find redirects are common so you need to tell cURL to follow them with that option.
Beyond that, you can set CURLOPT_HEADER to true so you can look at the response headers sent to see what's going on in addition to curl_getinfo to make sure the connection was successful (either through Tor or to the site).
I am using CURL to load a web page and return it onto a page on my server. However when the page is returned there are no images showing as they are linked using href="/image.png" etc.... Is there a way using CURL to add the url to any link that starts href="/
function pull_html($url, $device)
{
$ch = curl_init();
if($device == 'iPhone'){
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25');
}elseif($device == 'iPad'){
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (iPad; CPU OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25');
}
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch,CURLOPT_BINARYTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,10);
return curl_exec($ch);
curl_close($ch);
}
Use a simple str_replace()
return str_replace('href="/', 'href="'.$url.'/', curl_exec($ch));
http://php.net/manual/en/function.str-replace.php
My code isn't working, tried a few things but I'm new to php so yeah... here's what I got, always returns me a blank page.
<?php
ini_set('display_errors',1);
error_reporting(E_ALL);
$rnd = $_GET['rnd'];
$ch = curl_init("http://chat.website.com/script/login.php?rnd=".$rnd);
$request_headers = array();
$request_header[] = (
'User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.143 Safari/537.36',
'Content-Type: application/x-www-form-urlencoded',
'onLoad: [type Function]',
'p: password',
'u: username',
'owner: [object Object]
');
curl_setopt($ch, CURLOPT_HTTPHEADER, $request_headers);
$userdata = curl_exec($ch);
echo $userdata;
?>
you are passing $request_headers but the data you have in $request_header and better see your array is fine.
or may be try something like this:
$request_header[] = array('User-Agent'=>'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.143 Safari/537.36',
'Content-Type'=> 'application/x-www-form-urlencoded',
'onLoad'=>'[type Function]',
'p'=>'username',
'u'=>'password',
'owner'=>'[object Object]
');
I found my error, I wasn't making the request in POST.
Here's the code that is working if anyone needs it:
<?php
ini_set('display_errors',1);
error_reporting(E_ALL);
$rnd = 1;
$rnd = $_GET['rnd'];
$ch = curl_init("http://chat.website.com/scripts/login.php?rnd=".$rnd);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, "onLoad=%5Btype%20Function%5D&p=password&u=username&owner=%5Bobject%20Object%5D");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$userdata = curl_exec($ch);
echo $userdata;
?>
How can I scrape a site using a User-Agent for Ipad?
I have this code below using curl in PHP which outputs the source but can't find the tags still. On Ipad or Safari browser using an Ipad User-Agent, the tags displays when the site is loaded.
Thanks!
<?php
$useragent= "Mozilla/5.0 (iPad; U; CPU OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Version/4.0.4 Mobile/7B334b Safari/531.21.10')";
$ch = curl_init ("http://www.cbsnews.com/video/watch/?id=7370279n&tag=mg;mostpopvideo");
curl_setopt ($ch, CURLOPT_USERAGENT, $useragent); // set user agent
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true);
// curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
echo $output = curl_exec ($ch);
curl_close($ch);
?>
Try using curl from the command line, with a perl script such as this:
my $ua = "Mozilla/5.0 (iPad; U; CPU OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Version/4.0.4 Mobile/7B334b Safari/531.21.10";
my $curl = "curl -A '$ua'";
my $server = "http://www.cbsnews.com";
my $startpage = "$server/video/watch/?id=7370279n&tag=mg;mostpopvideo";
my $path = "/path/to/download/to";
open(f, "$curl -L $startpage |") or die "Cannot open website: $!";
while (<f>)
{
if (/<a\s+[^>]*href=\"$server\/([^\"\/])*\"/)
{
my $file = $2;
system("$curl -e $startpage $server/$file > $path/$file");
next;
}
if (/<a\s+[^>]*href=\"$server\/([^\"]+)\/([^\"\/])*\"/)
{
my $folder = $1;
my $file = "$folder/$2";
system("mkdir -p $path/$folder");
system("$curl -e $startpage $server/$file > $path/$file");
next;
}
}
close(f);