I'm a novice programmer in PHP. Last week I read about cURL that capture my attention to study it. first, I copy and paste codes posted on different blogs and it goes run good like my code below.
<?php
$handle=curl_init('http://www.google.co.kr/');
curl_setopt($handle, CURLOPT_VERBOSE, true);
curl_setopt($handle, CURLOPT_RETURNTRANSFER, true);
curl_setopt($handle, CURLOPT_SSL_VERIFYPEER, false);
$content = curl_exec($handle);
echo $content;
?>
BUT WHY I CAN'T cURL the website
http://www.todayhumor.co.kr/
since that, i am using same code above it outputs
Looking for your positive response guys. thank you in advance.
After calling curl_exec($handle) you should close the session with curl_close($handle). Maybe you tried so many times and now it doesn't work anymore, because you have so many open sessions on your local server. I would add that line to your code, restart xampp and try again.
Edit:
The server rejects requests without a valid user-agent. Add a user-agent to your request: curl_setopt($handle, CURLOPT_USERAGENT, 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36'); this worked for me
Related
I am trying to echo site data & for 95% of sites file_get_content, curl works just fine but for few sites, it never works whatever I tried. I tried to define proper user agent, changes SSL verify to false but nothing worked.
test site where it fails with forbidden https://norskbymiriams.dk/
wget is unable to copy ssl sites however wget is compiled with ssl support. checked with wget -V
i tried these codes.none worked for this particular site
file_get_contents
$list_url = "https://norskbymiriams.dk/";
$html = file_get_contents($list_url);
echo $html;
curl
$handle=curl_init('https://norskbymiriams.dk');
curl_setopt($handle, CURLOPT_HEADER, true);
curl_setopt($handle, CURLOPT_VERBOSE, true);
curl_setopt($handle, CURLOPT_RETURNTRANSFER, true);
curl_setopt($handle, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($handle, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($handle, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36");
curl_setopt($handle, CURLOPT_FOLLOWLOCATION, true);
$content = curl_exec($handle);
echo $content;
any help will be great
Some websites analyse a request extremely good. If there is a single thing that makes that web server think you are a crawling bot, it might return 403.
I would try this:
make a request from browser, see all request headers, and place them in my curl request (simulate a real browser).
my curl request would look like this:
curl 'https://norskbymiriams.dk/'
-H 'Upgrade-Insecure-Requests: 1'
-H
'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100
Safari/537.36'
--compressed
Please try it. it works.
You can make a request in Chrome for example, and use Network tab from Developer tools to inspect a page request. If you right click on it, you will see Copy as cURL
Therefore test each header separately in your actual cURL request, see which is the missing link, then add it and continue your crawling.
I've got a really odd problem and no idea how to debug it. Maybe some experienced developer can help me. I've the following code:
$url = 'https://home.mobile.de/home/ses.html?customerId=471445&json=true&_='.time();
echo $url;
$agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.130 Safari/537.36';
// Initiate curl
$ch = curl_init();
// Activate debugging
curl_setopt($ch, CURLOPT_VERBOSE, true);
// Disable SSL verification
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
// Will return the response, if false it print the response
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
// Set browser user agent
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
// Set the url
curl_setopt($ch, CURLOPT_URL,$url);
// Execute
$result=curl_exec($ch);
// Closing
curl_close($ch);
$php_object = json_decode($result);
var_dump($php_object);
I've put this code into a php file called playground.php. If I open playground.php with Chrome (I am using MAMP as local server) then everything works as expected. Also if I run on the osx command line "php playground.php" it works as expected, but for any reason it does not work if I run it inside the Phpstorm cli as shown below.
Any idea what could be wrong and how I can debug this issue?
Many thanks in advance.
Thanks to LazyOne I was able to find out that a firewall rule was blocking the outgoing request. Many thanks!
Seem to be in a bit of a predicament. As far as I am aware, there have been no changed to PHP or Apache, however a code that has worked for almost 6 months just stoped working today at 2pm.
The code is:
function ls_record($prospectid,$campid){
$api_post = "method=NewProspect&prospect_id=".$prospectid."&campaign_id=".$campid;
$ch = curl_init();
curl_setopt($ch, CURLOPT_FRESH_CONNECT, TRUE);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_POST, TRUE);
curl_setopt($ch, CURLOPT_POSTFIELDS, $api_post);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_URL, "http://XXXXX/XXXXXX/store.php");
$x = print_r(curl_exec($ch), TRUE);
return $x;
}
It returns NULL, I tried usingfile_get_contents()which also returnsNULL`. I checking the Apache error logs and see nothing...I need some help on this one.
Do you have access to the command line of the server? It could be that the destination has blocked you somehow.
If you have command line access, try this
wget http://XXXXX/XXXXXX/store.php
That should at least return something (if not headers)
use curl_getinfo to check your curl execution status, it maybe that the server you try to extract content from need your curl to set user-agent, some site check user-agent to block unwanted curl access.
below are the user agent I used to disguise my curl as desktop chrome browser.
curl_setopt($ch,CURLOPT_USERAGENT,' Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.101 Safari/537.36');
I face same problem on my server , because of the low internet speed. Internet speed is go down for some time and curl take so many time to execute , so it return a timeout error . After a few minute it is working fine without any changes on server.
I am transferring an Object Array. I have a cURL client (submitter) on own Server and listening script on other's Server, which one is not under my control. Then i think there, they are blocking the incoming cURL requests because when i test with the normal HTML <form>, it is working. But not via cURL anyway.
So i think they have done some restriction to cURL.
Then my questions here are:
Can a Server restrict/block the cURL incoming requests?
If so, can i trick/change the HTTP Header (User Agent) in my initiating cURL script?
Or is there any other possible stories?
Thanks!
IF you are still facing the problem then do the following.
1.
$config['useragent'] = 'Mozilla/5.0 (Windows NT 6.2; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0';
curl_setopt($curl, CURLOPT_USERAGENT, $config['useragent']);
curl_setopt($curl, CURLOPT_REFERER, 'https://www.domain.com/');
2.
$dir = dirname(__FILE__);
$config['cookie_file'] = $dir . '/cookies/' . md5($_SERVER['REMOTE_ADDR']) . '.txt';
curl_setopt($curl, CURLOPT_COOKIEFILE, $config['cookie_file']);
curl_setopt($curl, CURLOPT_COOKIEJAR, $config['cookie_file']);
NOTE: You need a COOKIES folder in directory.
3.
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
If doing these don't solve the problem then Give the Sample Input/Output/Error/etc.
So, that more precise solution can be provided.
$agent = 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)';
$curl=curl_init();
curl_setopt($curl, CURLOPT_USERAGENT, $agent);
In the server side, we can block some requests by recognize the header fields(including refer, cookie, user-agent and so on) in http request, the ip address, access frequency. And in most case, requests generated by machine usually has something different than human requests,for example, no refer & cookie, or with higher access frequency, we can write some rules to deny these requests.
According to 1, you can try your best to simulate real requests by filling the header fields, using random and slower frequency, using more ip addresses. (sounds like attack)
Generally, using lower frequency and do not make heavy load for their server, follow their access rules, they will seldom block your requests.
Server cannot block only cURL requests because they are just HTTP requests. So changing User Agent of your cURL can solve your problem, as server will think you are connecting through browser presented in UA.
Example of curl GET call in php.
ftp file in a variable.
The solution was on Stackoverflow... where ?!?
not mine.
BTW, you need to be able to execute php code from within html
modify your /etc/apache2/mods-enabled' edit '#mime.conf
if you want to do so...
Go to end of file and add the following line:
"AddType application/x-httpd-php .html .htm"
BEFORE tag '< /ifModules >'
verified and tested with 'apache 2.4.23' and 'php 5.6.17-1' under 'debian'
I choose to execute php in html file because faster development.
example code begin :
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title></title>
</head>
<body>
<?php
$host = "https://tgftp.nws.noaa.gov/data/observations/metar/decoded/CYHU.TXT";
$agent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)";
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $host);
curl_setopt($curl, CURLOPT_USERAGENT, $agent);
curl_setopt ($curl, CURLOPT_RETURNTRANSFER, 1) ;
curl_exec($curl);
$ftp_result=curl_exec($curl);
print_r($ftp_result);
//and the big work commencing,
//extracting text ...
$zelocation="";
$zedatetime="";
$zewinddirection="";
$zewindspeed="";
$zeskyconditions="";
$zetemp="";
$zehumidity="";
?>
</body>
</html>
I've faced the same issue when I was trying login to a website using CURL, the server was rejecting my request until I've sent the user-agent header and the cookies returned when entering the login page, however, you can use this curl library if you don't familiar with curl.
$curl = new Curl();
$curl->setHeaders('user-agent', 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:84.0) Gecko/20100101 Firefox/84.0');
// Disable SSL verification
$curl->setOpt(CURLOPT_SSL_VERIFYPEER, '0');
$curl->post($url, $data);
$response = $curl->getRawResponse();
I'm writing a cURL script, but how can I check if it's working and passing properly when it's visiting the website?
$ckfile = '/tmp/cookies.txt';
$useragent= "Mozilla/5.0 (iPhone; U; CPU iPhone OS 3_0_1 like Mac OS X; en-us) AppleWebKit/528.18 (KHTML, like Gecko) Mobile/7A400";
$ch = curl_init ("http://website.com");
curl_setopt($ch, CURLOPT_AUTOREFERER , true);
=> true
curl_setopt($ch, CURLOPT_USERAGENT, $useragent); // set user agent
curl_setopt ($ch, CURLOPT_COOKIEJAR, $ckfile);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
$output = curl_exec ($ch);
curl_close($ch);
just make a php page like this on your server and try your script on your own url
var_dump($_SERVER);
and check the HTTP_USER_AGENT string.
You can also achieve the same things by looking at the Apache logs.
But I am pretty sure curl is setting the User-Agent string like it should ;-)
You'll find the FF extension LiveHTTPHEaders will help you see exactly what happens to the headers when using a normal browsing session.
http://livehttpheaders.mozdev.org/
This will increase your understanding of how your target server responds, and even shows if it redirects your request internally.