I was using cURL to scrape content from a site and just recently my page stated hanging when it reached curl_exec($ch). After some tests I noticed that it could load any other page from my own domain but when attempting to load from anything external I'll get a connect() timeout! error.
Here's a simplified version of what I was using:
<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,'http://www.google.com');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 0);
$contents = curl_exec ($ch);
curl_close ($ch);
echo $contents;
?>
Here's some info I have about my host from my phpinfo():
PHP Version 5.3.1
cURL support enabled
cURL Information 7.19.7
Host i686-pc-linux-gnu
I don't have access to SSH or modifying the php.ini file (however I can read it). But is there a way to tell if something was recently set to block cURL access to external domains? Or is there something else I might have missed?
Thanks,
Dave
I'm not aware about any setting like that, it would not make much sense.
As you said you are on a remote webserver without console access I guess that your activity has been detected by the host or more likely it caused issues and so they firewalled you.
A silent iptables DROP would cause this.
When scraping google you need to use proxies for more than a few hand full of requests and you should never abuse your webservers primary IP if it's not your own. That's likely a breach of their TOS and could even result in legal action if they get banned from Google (which can happen).
Take a look at Google rank checker that's a PHP script that does exactly what you want using CURL and proper IP management.
I can't think of anything that's causing a timeout than a firewall on your side.
I'm not sure why you're getting a connect() timeout! error, but the following line:
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 0);
If it's not set to 1, it will not return any of the page's content back into your $contents.
Related
Ok, I am having a hard time simply trying to get contents from Go Daddy Host server to our company's proprietary server. Originally I was using file_get_contents, then I searched all over SO and realized curl was a better option to bypass security and configuration. Here is my code:
function get_content($URL){
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $URL);
$data = curl_exec($ch);
if(curl_errno($ch)){
echo 'Curl error: ' . curl_error($ch);
}
curl_close($ch);
return $data;
}
echo 'curl:' . get_content('https://xxx-xxxxx:4032/test2.html');
Here is the error:
Curl error: Failed to connect to xxx-xxxxx.com port 4032: Connection refused
Here are some facts:
If I enter the URL into my browser, I will be able to retreive test2.html
If execute the EXACT script on a different web host (Lunar Pages), then it will work perfectly fine
get_content() will work on google.com
Go Daddy representatives cannot help us
On our server, we've disabled the firewall (while we tested this)
I would have posted this with a comment, but I don't have enough upvotes to do that. GoDaddy is one of the worst hosts for custom code. Sure they're good for things like WordPress, but if you're wanting custom functionality within your code, they're one of the worst.
This is just an example, GoDaddy blocks most file_get_contents and cURL calls within their firewall. I would go with a host like HostGator or Digital Ocean... Both are cheap but not near as limiting.
Before making a switch, I would try to run this same code on another environment locally and make sure you can connect.
Here is the code in question:
$ch = curl_init();
curl_setopt(CURLOPT_URL, 'https://api.ipify.org?format=json');
curl_setopt(CURLOPT_PROXY, 'ip:port');
curl_setopt(CURLOPT_PROXYUSERPWD, 'user:pass');
$result = curl_exec($ch);
echo curl_error($ch);
The proxy and proxyauth I'm using most definitely work. I've actually tried multiple proxies from various sources with and without auth, but every time I get a connection timeout when connecting to the proxy.
Is there a config setting preventing proxies from being used or something that I'm not aware of? Any help here will be greatly appreciated.
You need to use CONNECT method for https urls :
curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, 1);
But some proxies doesn't support this feature.
I solved this issue and just wanted to update this question.
The problem was with the server configuration. Only the most common outbound ports were enabled, and I was using proxies with random ports across the whole spectrum. Enabling the ports that I needed (you could just have the restriction disabled altogether) fixed the issue.
So if you run into this same issue where you're timing out while trying to connect to a proxy, contact your support and ask about any restrictions on the outbound ports.
Sounds like an HTTPS issue to me. Try:
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch,CURLOPT_SSL_VERIFYHOST, false);
It might also just be a slow URL, so try:
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT ,0);
curl_setopt($ch, CURLOPT_TIMEOUT, 400);//max seconds to allow cURL functions to run
If there's any redirects involved, you can also try:
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
If that doesn't work, try testing more URLs and see if it's an https issue with every site you try, and whether you can recreate the same issue with http. Also check make sure you have PHP's error reporting enabled, and see if you're getting any errors or warnings. Try deliberately causing PHP to generate an error or warning to see if error reporting is working correctly.
I used cURL to get data from another website. Sometimes it shows data and sometimes empty result
Here is my Code
function get_data($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
$agent=$_SERVER["HTTP_USER_AGENT"];
curl_setopt($ch,CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER, false);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$returned_content = get_data('www.example.com');
echo $returned_content;
Possibly you call too many connections from your curl_init to one Ip Address, So the server blocks the connection and causes the on/off errors.
Not receiving any content back could be due to one or more out of many different reasons and you need to figure out which.
Use curl_error($ch) in your code after the transfer has been performed to see if curl has told you about a problem.
Investigate the response headers. A HTTP transfer can be successful and not return any error but might only respond with HTTP headers and no body at all. The HTTP headers may then contain clues as to why. Perhaps it is a redirect you should follow, perhaps it requires HTTP authentication or perhaps the server indicates a temporary server glitch.
Curl in PHP returns null on AWS EC2 instance
I had similar issue and I fixed it by making sure versions and settings in php.ini file which is in PHP5 and Apache2 folders were same. If its not then Apache2 tries to execute versions set inside the php.ini settings. Also make sure PHP5-libcurl is installed. Its important too.
Possibly the intermittent failures are due to flaky DNS servers, even if it isn't, you'll still benefit from using Google's DNS servers..
See: https://developers.google.com/speed/public-dns/docs/using
Solution
Make sure you have no space for any variables or in url. Like "$user_name = "Lokman Hosen"". You have to give urldecode "$user_name = "Lokman_Hosen"" or "$user_name = "Lokman%20Hosen"". You can use
urldecode()
Function to remove space. Now cURL dont allow any space.
I have an url with RSS feed:
$url = 'http://www.myurl.com/sth?format=RSS';
I can open it in a browser without a problem. But
$feed->load($url)
returned 'false'. So I started investigating:
$ch = curl_init($file);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_exec($ch);
$retcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
print curl_error($ch); // prints 'couldn't connect to host'
echo "CODE: ".$retcode; // $retcode is 0
$file_headers = get_headers($file);
echo $file_headers[0]; // is null
So, what can be the reason for such behaviour? Is some port blocked on myurl.com server? Is there a way to work around it (like create local copy of the file and work on it)?
Probably the site has some block for external connections implemented, such check for User-Agent, referal...
maybe the server is doing some sniffing and doesnt serve anything on that url if it finds that curl is doing the work. you could try phantomJS and/or Selenium to get around such filters. Selenium has PHP bindings.
If you're on CentOS (known issue on that flavour), do the following to test and make sure that's the not issue. later you can later issue specific filtering.
> emacs /etc/selinux/config
locate following line
SELINUX=enforcing
Change this to
SELINUX=disabled
save the file and try again. it could be your localhost firewall if you can open it in a browser without a problem.
if this is an issue, set SELinux back to enforcing and issue
setsebool -P httpd_can_network_connect
if you want httpd to be able to connect to tcp ports
I send an item code to a web service in xml format using cUrl(php). I get the correct response in localhost, but when do it server it shows
cURL Error (7): couldn't connect to host
And here's my code:
function xml_post($post_xml, $url)
{
$user_agent = $_SERVER['HTTP_USER_AGENT'];
$ch = curl_init(); // initialize curl handle
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FAILONERROR, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_TIMEOUT, 50);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_xml);
curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
// curl_setopt($ch, CURLOPT_PORT, $port);
$data = curl_exec($ch);
$curl_errno = curl_errno($ch);
$curl_error = curl_error($ch);
if ($curl_errno > 0) {
echo "cURL Error ($curl_errno): $curl_error\n";
} else {
echo "Data received\n";
}
curl_close($ch);
echo $data;
}
I send the item code to the tally and fetch the details from it. I tried using both the versions php 4+ and php5+, nothing works out Any solution.
CURL error code 7 (CURLE_COULDNT_CONNECT)
is very explicit ... it means Failed to connect() to host or proxy.
The following code would work on any system:
$ch = curl_init("http://google.com"); // initialize curl handle
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$data = curl_exec($ch);
print($data);
If you can not see google page then .. your URL is wrong or you have some firewall or restriction issue.
“CURL ERROR 7 Failed to connect to Permission denied” error is caused, when for any reason curl request is blocked by some firewall or similar thing.
you will face this issue when ever the curl request is not with standard ports.
for example if you do curl to some URL which is on port 1234, you will face this issue where as URL with port 80 will give you results easily.
Most commonly this error has been seen on CentOS and any other OS with ‘SElinux’.
you need to either disable or change ’SElinux’ to permissive
have a look on this one
http://www.akashif.co.uk/php/curl-error-7-failed-to-connect-to-permission-denied
Hope this helps
If you have tried all the ways and failed, try this one command:
setsebool -P httpd_can_network_connect on
In PHP, If your network under proxy. You should set the proxy URL and port
curl_setopt($ch, CURLOPT_PROXY, "http://url.com"); //your proxy url
curl_setopt($ch, CURLOPT_PROXYPORT, "80"); // your proxy port number
This is solves my problem
In my case I had something like cURL Error (7): ... Operation Timed Out. I'm using the network connection of the company I'm working for. I needed to create some environment variables. The next worked for me:
In Linux terminal:
$ export https_proxy=yourProxy:80
$ export http_proxy=yourProxy:80
In windows I created (the same) environment variables in the windows way.
I hope it helps!
Regards!
Are you able to hit that URL by browser or by PHP script? The error shown is that you could not connect. So first confirm that the URL is accessible.
Check if port 80 and 443 are blocked. or enter - IP graph.facebook.com and enter it in etc/hosts file
you can also get this if you are trying to hit the same URL with multiple HTTP request at the same time.Many curl requests wont be able to connect and so return with error
This issue can also be caused by making curl calls to https when it is not configured on the remote device. Calling over http can resolve this problem in these situations, at least until you configure ssl on the remote.
In my case, the problem was caused by the hosting provider I was using blocking http packets addressed to their IP block that originated from within their IP block. Un-frickin-believable!!!
For a couple of days I was totally blocked on this. I'm very very new to networking/vms but was keen to try set it up myself instead of paying a hosting company to do it for me.
Context
I'm rebuilding the server side for an app that uses php routines to return various bits of data from internal sources as well as external APIs for a map based app. I have started an Oracle VM instance and have installed/set up Apache and php. All running totally fine, until one of my php routines tries to execute a cURL. I start implementing error logging to find that I don't even get a message - just '7', despite implementation being very similar to the above. My php routine accessing an internal file for data was running successfully so I was fairly sure it wasn't an Apache or php issue. I also checked my Apache error logs, nothing telling.
Solution
I nearly gave up - there's talk on disabling SELinux above and in other articles, I tried that and it did work for my purposes, but here's a really good article on why you shouldn't disable SELinux https://www.electronicdesign.com/technologies/embedded-revolution/article/21807408/dont-do-it-disabling-selinux
If temporarily disabling it works and like me you don't want to do this (but it confirms that SELinux is blocking you!), I found a neat little command that actually prints out any SELinux issues in a more readable fashion:
sealert -a /var/log/audit/audit.log
This returned the following:
found 1 alerts in /var/log/audit/audit.log
--------------------------------------------------------------------------------
SELinux is preventing php-fpm from name_connect access on the tcp_socket port 443.
Great, I now get a bit more information than just '7'. Reading further down, I can see it actually makes suggestions:
***** Plugin catchall_boolean (24.7 confidence) suggests ******************
If you want to allow httpd to can network connect
Then you must tell SELinux about this by enabling the 'httpd_can_network_connect' boolean.
Do
setsebool -P httpd_can_network_connect 1
This has been mentioned further above but now I have a bit more context and an explanation as to what it does. I run the command, and I'm in business. Furthermore, my SELinux is still set to enforcing, meaning my machine is more secure.
There are many other suggestions logged out, if you're blocked it might be worth logging out/checking out /var/log/audit/audit.log.