I'm using the following function that based on cURL
$url = "http://www.web_site.com";
$string = #file_get_contents($url);
if(!$string){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0');
$string = curl_exec($ch);
curl_close($ch);
}
But suddenly my website stopped due to this function and once i remove curl it works fine
so i thought my hosting disabled it so i checked it out
Click here to check it out
and it should be working so what is wrong ?
~ any help , what shall i say to my hosting provider !!
The file_get_contents method doesn't look to the URL header, try using cURL with the CURLOPT_FOLLOWLOCATION enabled and CURLOPT_MAXREDIRS to the value you prefer.
Related
I'm creating a script that is scraping the site www.piratebay.se. The script was working OK two-three days ago but now I'm having problems with it.
This is my code:
$URL = 'http://thepiratebay.se';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $URL);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1");
curl_setopt($ch, CURLOPT_COOKIE, "language=pt_BR; c[thepiratebay.se][/][language]=pt_BR");
$fonte = curl_exec ($ch);
curl_close ($ch);
echo $fonte;
The response of this code is not clean HTML, but looks like this instead:
��[s۸N>��k�9��-ىmI7��$�8�.v��͕���$h���y�G�Sg:ӷ>�5����ʱ�aor&���.v)���������) d�w��8w�l����c�u""1����F*G��ِ�2$�6�C�}��z(bw�� 4Ƒz6�S��t4�K��x�6u���~�T���ACJb��T^3�USPI:Mf��n�'��4��� ��XE�QQ&�c5�`'β�T Y]D�Q�nBfS�}a�%� ���R) �Zn��̙ ��8IB�a����L�
I already tried to use user agent on .htaccess, PHP and cURL but to no success.
Add this:
curl_setopt($ch, CURLOPT_ENCODING , "gzip");
Tested on my local environment, works fine with it.
M trying to crawl some data from a URL
with the help of simple html dom.
But when id start my crawler its giving an error
** failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found**
i have tried cUrl but 404 error is thrown.
here my php simple dom code
function getURLContent($url)
{
$html = new simple_html_dom();
$html->load_file($url);
/* i perfome some opetions here*/
}
and with cUrl
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_HEADER, false);
$data = curl_exec($curl);
echo $data;
curl_close($curl);
How could i do this..?
Thanks in advance..
Yes try to configure the useragent
curl_setopt($curl,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
add these to your code and try
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1");
curl_setopt($ch, CURLOPT_HEADER, $url);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers); //set headers
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); // set true for https urls
404 Error is obvious, page not found. Try Fiddler for catching the parameters needed as your physical browser catches, and pass the same parameters via cURL in your script.
If you are getting Blocked error page, means try changing User-Agent OR use a proxy address(you can easily get free proxy on internet) OR try to maintaining the session while requesting your page, Fiddler will help you in this.
I've made, using CURL, script that log in to the page which provides free streaming, then with CURL I'm going to subpage with choosen stream to watch.
Everything works fine while script is running via localhost (I'm using xampp), but when I put it on my web server it says that it can't connect to the network. Only thing that looks different is the cookie, on the web server it has not new lines /n. Everything is in one line.
How to deal with it? This is my class, which i use to connect with page:
class openTV {
public $channel;
function __construct($channel) {
$this -> channel = $channel;
}
function openChannel() {
$login_email = 'mail#gmail.com';
$login_pass = 'pass';
$fp = fopen("cookie.txt", "w");
fclose($fp);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://strona/user/login');
curl_setopt($ch, CURLOPT_POSTFIELDS,'email='.urlencode($login_email).'&password='.urlencode($login_pass));
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookie.txt');
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookie.txt');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3");
curl_setopt($ch, CURLOPT_REFERER, "http://strona/user/login");
$page = curl_exec($ch);
curl_setopt($ch, CURLOPT_URL, $this->channel);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3");
curl_setopt($ch, CURLOPT_REFERER, $this->channel);
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookie.txt');
curl_setopt($ch, CURLOPT_POST, 0);
$info = curl_getinfo ($ch);
$page = curl_exec($ch);
preg_match('/session_token=\[[a-zA-Z0-9]{8}\]/', $page, $matches);
$return['token'] = substr($matches[0], 31, 8);
preg_match('/<object(.*)>[.\s\S]*<\/object>/', $page, $matches);
$return['player'] = $matches[0];
//$return['player'] = $page;
$return['channel'] = $this->channel;
return $return;
}
}
You're using http://strona/ as your host.
Your server probably uses different configuration that doesn't try to append .example.com when it fails to find the host directly, on Linux that can be seen in /etc/resolve.conf:
search example.com
nameserver 1.2.3.4
Using full domain name (http://strona.example.com) or IP should fix the problem.
If not so, try whether you are able to ping (or otherwise connect) from server to target host, it may be an networking issue.
cURL returns nothing when on server. Everything works well on localhost, but when it's in remote hosting getSearchResults() returns nothing (or 302 header). Is this something wrong with server configuration (tried 2 different). Can it be something with CURLOPT_FOLLOWLOCATION? Tried both true and false on localhost - still works. On remote hosting it's not allowed to follow location for some reason, but if it works without on local I don't think that matters.
<?php
class cURL
{
private $username;
private $password;
private static $tmpfname;
public function __construct($username,$password) {
$this->username = $username;
$this->password = $password;
$this->makeCookies($username, $password);
}
private function makeCookies($username, $password) {
self::$tmpfname = tempnam("/tmp", "Cookie");
$useragent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1";
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
curl_setopt($ch, CURLOPT_COOKIEFILE, self::$tmpfname);
curl_setopt($ch, CURLOPT_COOKIEJAR, self::$tmpfname);
curl_setopt($ch, CURLOPT_URL,"http://vk.com/login.php");
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, "email={$username}&pass={$password}");
ob_start();
curl_exec($ch);
ob_end_clean();
curl_close($ch);
unset($ch);
}
private function getHTML($url){
$useragent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1";
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
curl_setopt($ch, CURLOPT_COOKIEFILE, self::$tmpfname);
curl_setopt($ch, CURLOPT_COOKIEJAR, self::$tmpfname);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
$contents = curl_exec($ch);
curl_close($ch);
return $contents;
}
public function getSearchResults($songname) {
$songname = urlencode($songname);
$contents = $this->getHTML("http://vk.com/search?c[section]=audio&c[q]={$songname}");
return $contents;
}
}
?>
A 302 code is a redirect, so you'll need to be able to use CURLOPT_FOLLOWLOCATION to get anything useful out of it.
There are plenty of implementations of redirecting mechanism on web for web servers that run php in safe mode. For example, here (the first place you should look it for actually) is the one I one day modified for my own script. It can process multiple redirects and is written in a way that you can easily understand and modify it.
ive searched everywhere and cannot find how to post data using vb.net
So i was wondering if someone can convert this curl code I made into vb.net :)
$useragent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE ); // return into a variable
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_POST, TRUE);
curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
$result = curl_exec( $ch ); // run!
curl_close($ch);
$data being an array, not sure how it will work in vb.net though.
CURL is an independent application and a PHP extension allows you to utilize it seamlessly from inside PHP code. So you can install CURL and get it work via your shell commands from .NET ... while at the same time you might find this useful as well.
http://curl.haxx.se/libcurl/dotnet/