Problem with getting source code of a web page using PHP curl - php

I have absolutely no problem in getting source code of the webpage in my local server with this:
$html = file_get_contents('https://opac.nlai.ir');
And I was also okay on my host using code below until just a few days ago:
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, 'http://opac.nlai.ir');
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 10);
$result = curl_exec($curl);
But today I figured out that now the site is using ssl and it is not working with http anymore and force redirect to https. So I did some search & found this as a fix:
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($curl CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
The code above works just fine for sites like e.g. "https://google.com" (and any other https sites that I've tried! )
But not for that specific website ("https://opac.nlai.ir")
In that case, page takes about a minute to load (!) and finally with var_dump($result) , I get "bool(false)"
I don't know how that website can be different from other websites and I really want to know what cause the problem.
Sorry for my English.

Just filling this answer for the records.
As said in question comments, the website where you were trying to get its source code was blocking your requests due your server location area (that executed the requests). It seems it only responds to specific IP location.
The solution verified by #Hesam has been to use cUrl via a proxy IP located in allowed location area, and he found one at least running well.
He followed the instructions found in this other SO post:
How ot use cUrl via a proxy

Related

PHP CURL: Instagram returns empty string to remote server

I found out that I can get latest user posts in JSON format with a simple CURL request to
https://www.instagram.com/some_nickname/?__a=1
So I am trying:
$url = 'https://www.instagram.com/some_nickname/?__a=1';
$curl = curl_init();
curl_setopt($curl, CURLOPT_POST, false);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_URL, $url);
$response = curl_exec($curl);
The code works fine on my local machine and returns a JSON array, from which I get the latest posts of a user. The problem is that the very same code returns empty string from any of my remote servers (I've tried 3 different web-hosting providers).
Please, advice.
Instagram blocked the ranges of IP hostings. I have seen the following information:
Instagram has started applying extremely restrictive limits to
requests from IPs that are detected as cloud servers.
A couple of requests will go through at first, but after a moment, all
requests will be blocked, with a very long timer until they are
unblocked again. After this timer, you get a couple more requests, and
the cycle repeats.
Since websites must be hosted in the cloud, any interaction with
Instagram becomes extremely difficult if the cloud provider is on
Instagram's block list. All the cloud providers I've tried are on that
list.
You need to use proxy. But it will be a difficult task, because many proxy(in my experience) are also blocked

PHP cURL not working when send POST request to fetch data

I am recently working on a project on which I need to scrape some data from an external website. It is working on Localhost but stopped working on the live host. I explored on google as well on StackOverflow where people suggested that open PHP curl extension etc but everything already opened because I am doing a lot more scraping on that hosting which is working as cake.
Code is
$url = "https://pakesimdata.com/sim/search-result.php";
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
curl_setopt($ch, CURLOPT_TIMEOUT, 40);
curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt");
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POSTFIELDS, "cnnum=3005210911");
echo $html = curl_exec($ch);
When I echo the result with URL https://pakesimdata.com/sim/search.php or https://pakesimdata.com/ I got the result but it is not working when I do the POST request to acquire the result on URL https://pakesimdata.com/sim/search-result.php. It shows me nothing. I also do the error handling but got no luck, gave no error which put me a great headache. I can not grab what's going on and on which section I need to work to get the results.
It is working on Localhost but stopped working on the live host.
This would suggest two possibilites
1.) Your host does not like outgoing http connections which you rules out already
2.) The remote host does not like your scraping
ad 2.)
Can it be they have blocked your IP address or some other mechanism is protecting the page from beeing used by your script as usually operators of services like this do not like to be scraped by bots.
LOCALHOST will probably appear as your own private IP address with probably a dynamic
always changing IP address - like an ordinary user.
Your real server will have a fixed IP adress and probably the external website analzes the traffic and blocks ip adresses for abuse if a lot of requests are coming from the same IP.

PHP curl call not working when the caller script is called from outside the LAN

So I am encountering a strange situation with a php curl request.
It works perfectly when the script is called from the same local network as the server which is running it but when somebody calls it from outside is not working anymore.
This is my code:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, ROOT_DIR."directory/someScript.php");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 1);
$output = curl_exec($ch);
curl_close($ch);
When I run someScript.php manually even from outside the LAN it works just fine. The issue occurs only when it I try to run it by calling it with curl via another php script.
Anybody have any idea? The curl library is enabled. And both scripts are on the same server.
I am using port forwarding on port 80 to make the server visible in the Internet.
I decided to post the solution in case somebody else has the same problem.
In the end I found the cause for this. The problem was that I was calling the script using the external IP instead of the internal one. Further more because of this XAMPP redirected the call to its default page and because I didn't use the output I did not noticed that.

Steam as OpenID stopped working when I changed server

I did a website which uses Steam as OpenID provider. I firstly hosted it on a shared hosting.
But the traffic grew from 50 users in a day to 1000 in a day. I wasn't expecting that and had to change my host. I took another shared hosting with better performance, etc. to see how it is going to grow. But there's now a problem.
My OpenID login with Steam which worked perfectly on the last host doesn't work anymore. I tried with Google, and it worked. So I don't think my script uses a functionality that isn't enabled on my new host.
So when I put Steam identity, it loads during about 30 seconds and then Chrome returns me an error, ERR_EMPTY_RESPONSE. I tried to activate error_reporting E_ALL, but it does the same.
I am using LightOpenID, and here is the portion of the code incriminated:
$openid->identity = 'http://steamcommunity.com/openid';
header('Location: ' . $openid->authUrl());
Actually, it doesn't work whenever I call $openid->authUrl(). Here is the complete code: http://pastebin.com/rChDzECq
How can I resolve this? Thank you in advance.
I also had problems fighting the LightOpenID code in the last couple of hours. I finally made it work and here is what I learned in the process.
Absolutely no HTML output before the header() command. Even the tiniest space prevented the command from redirecting anything.
My server wouldn't allow the use of file_get_contents() and just returned a 404 error from the URL passed to it. You can solve this problem with a custom file_get_contents_curl() command. Here's the one I used myself:
function file_get_contents_curl($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 15);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
I know this is a late answer but I hope it will help someone.

PHP curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false) too slow

I use this method to get facebook api data. just a search query. but I find use curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); cost more time during a curl time (over 10+ seconds).
Is there other curl method can run faster?
NOTE: I am now testing in localhost
$url = "https://graph.facebook.com/search?access_token=".$token."&q=dallas&type=post&scope=publish_stream,offline_access,user_status,read_stream";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
//curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 2);
//curl_setopt($ch, CURLOPT_CAINFO, dirname(__FILE__). '/file.crt'); the way as Lumbendil recommend, download a crt file via firefox. still slowly.
$body= curl_exec($ch);
curl_close ($ch);
PS:I do not want to use a SDK, becuase I failed set SDK in localhost test. Although I have read many articles of how to set in localhost. I have set http://127.0.0.1/facebook as my callback url. But just failed. So I still want to get an easy curl way.
Thanks.
You could use a .crt file and verify against that instead of ignoring SSL verification, as explained here.
To keep all the information in one place: In your code, you should write the following:
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt($ch, CURLOPT_CAINFO, '/path/to/crt/file.crt');
To obtain the certificate, you should go with the browser to the page, and then with "view certificate" you have to export it. Remember that you must export it as X.509 Certificate (PEM) for this to work. For a more detailed guide on how to export the certificate, visit the link provided.
If ignoring to check a certificate takes 10 seconds, the problem is not with the certificate or with the checking and quite frankly, it probably isn't with SSL at all.
Ignoring to check the certificate should be very fast and not be measurable compared to how long the rest of the SSL handshake procedure takes.
To properly track down the problem, I would recommend you use the curl command line tool and its --trace-ascii and --trace-time options to see what seems to take time. You may need to snoop on the network with wireshark or similar to get an even better picture of what's going on.
I can't see how the other suggestions of adding a certificate check to the mix will make anything faster.
Just a side note, but if you do wish to use the SDK you can work around the local issue by editing your hosts file and adding localhost.local for 127.0.0.1. /etc/hosts on a linux machine and C:\WINDOWS\system32\drivers\etc\hosts on a windows machine.
Then in the Facebook app settings, simply set localhost.local as your domain and set your site url accordingly.
You should be ready to go then.

Categories