Can servers block curl requests specifically? - php

Generally speaking, is it possible for a server to block a PHP cURL request?
I've been making cURL requests every 15 minutes to a certain public-facing URL for about 6-8 months. Suddenly the other day it stopped working, and the URL started returning an empty string.
When I hit the URL in a browser or with a python get request, it returns the expected data.
I decided to try hitting the same URL with a file_get_contents() function in PHP, and that works as expected as well.
Since I found a bandaid solution for now, is there any difference between the default headers that cURL sends vs file_get_contents() that would allow one request to be blocked and the other to get through?

Generally speaking, is it possible for a server to block a PHP cURL
request?
Sort of. The server can block requests if your user agent string looks like it comes from curl. Try using the -A option to set a custom user agent string.
curl -A "Foo/1.1" <url>
Edit: Oops I see you said "from PHP", so just set the CURLOPT_USERAGENT option:
curl_setopt($curl, CURLOPT_USERAGENT, 'Foo/1.1');

A lot of websites block you based on user agent. Best workaround that I can think of is to simply open up your developer console in Chrome, and click on network tab. Go to the URL of the website that you are trying to access and find the request that gets data that you need. Right click on that request and copy it as cURL. It will have all the headers that your browser is sending.
If you add all of those headers, to your cURL request in php, web-server will not be able to tell the difference between request from your curl and your browser's.
You will need to update those headers once every couple years (some websites try to forbid old versions of firefox or chrome which bots have been abusing for years).

Forget curl. Think about it from the perspective of an HTTP request. All the server sees is that. If your curl request contains something (user agent header for instance) that the server can use to filter out requests, it can use this to reject those requests.

Related

PHP fopen not working on one particular domain

I'm trying to download file from remote url with fopen function.
Problem it's function return false from one website that i need. From other domains functions works fine.
How could it be? Maybe have some options in php? Or that website can protect file from some access(but from browser file available)?
There are a number of checks the server side can do to prevent "miss usage" of their service. One example is a check of the "HTTP Referer Header" which indicates that your request is done by a browser navigating from a link to the object.
You can simulate all that if you want to, but for that you have to find out exactly what the difference is between your request and one the browser successfully makes. Two things to do for that:
find out the exact error message you receive back. Easiest for that is to use php's cURL extension instead of file_open() for your request, it allows you to dump everything you get back. There might be valuable information like a reason in the reply.
monitor both requests by means of a network sniffer, for example tcpdump or wireshark. The comparison of both requests allows to tell the exact difference. That again is the information you need to precisely rebuilt the browsers request in your script.
On some shared hosting or some VPS fopen not work or are disabled inside PHP. Try to use CURL to get contnt. If that not work, the last solution (only if you send some informations via GET but not to recive data) is to use <img> tag and inside "src" to send request. That work ONLY if you send informations, but if you need to recive something, you need to use or AJAX or cURL.

Selenium 2 WebDriver: How to verify that an image request has been received and fulfilled?

Is there a good way to check that a GET request for an image, something like https://api.google.com/v1/__x.gif, has been received & fulfilled in Selenium 2 WebDriver with php?
Initially I thought that I could make an XHR request and alert() the responseText, using assertEquals to compare my expected string to the actual output. Quickly realized this wasn't going to work, since I wanted to see the page's network requests that I'm testing.
After more research, I found two very different possibilites:
First being captureNetworkTraffic (pending response from Sauce Labs support to see if this is possible):
The second option (which I don't completely understand) would be setting up a proxy server.
I'm new to stackoverflow and a beginner when it comes to server requests. Thank you for the help in advance!
Option 1 and 2 are the same; you use a proxy to capture network traffic.
When creating a driver instance in Webdriver, you have the ability to set a proxy. This is a server and port through which the browser will direct all network traffic. Proxies can do many things such as creating mock responses, manipulating requests etc, but in your case, you want the proxy to record the request made, forward on to the required server, record the response, and return response back to browser.
If you use a proxy like Browsermob, you can interrogate the requests during the test run as the proxy has an API (e.g get me the latest request the browser made and assert it was a POST)
There appears to be a PHP library to wrap interaction with the Browsermob instance https://packagist.org/packages/chartjes/php-browsermob-proxy
So, in your test;
Start proxy
Create driver using the proxy setting
Go to required page
Assert that request was made in Browsermob
Of course, the other simpler approach could be to get the image src url from the html via seleniuml, then make a GET request in the test using a http client. If it returns an image, then you can say that the url from the ing tag works , and that may be good enough for your testing.

RESTful PHP: how does it work from client side to server side?

I'm getting confused with the tutorials/ examples on RESTful PHP. Most of them are using frameworks (such as Slim) and ending up as APIs and that confuse me more. I would like to avoid all frameworks and API makings from start first before understanding thoroughly how to create a simple RESTful PHP.
I understand that,
REST, at it's core, is really just about using the right HTTP verb for
the job. GET, POST, PUT, DELETE all have meanings, and if something is
described as being RESTful, all it really means is that the site/app
in question adheres to those meanings.
And from the php server side, I understand this is how I can detect the REST request types,
$_SERVER['REQUEST_METHOD']
But I main problem is how to send these REST types (GET, POST, PUT, DELETE) via URLs to the server?
For instance from this tutorial,
A PUT request is used when you wish to create or update the resource identified by the URL. For example,
http://restfulphp.com/clients/robin
Then,
DELETE should perform the contrary of PUT; it should be used when you want to delete the resource identified by the URL of the request.
http://restfulphp.com/clients/anne
http://restfulphp.com/clients/robin and http://restfulphp.com/clients/anne are the same clean URL pattern. How can I know that the former is meant for PUT and the latter is meant for DELETE? Or where should I set in the form/ html to differentiate them?
And it gets complicated when curl comes in - what is it to do with a REST website?
From that tutorial above,
Once you have cURL installed, type:
curl -v google.com
Where should I type??
How curl is going to help me to DELETE or to PUT? How $_SERVER['REQUEST_METHOD'] is going to detect REST type from curl?
curl -v -X DELETE /clients/anne
and
curl -v -X PUT -d "some text"
The questions I asked may sound stupid but it would great if someone can help me to understand these.
EDIT:
Argg PHP cURL - another thing that confuses me further and deeper! Why do I need it if I am going to send REST request types via XMLHttpRequest object? It seems that it is meant for communicating with other API providers which I want to avoid at this stage.
And if I can send REST request types with PHP cURL within my local website, how and where should I place these lines from this Q&A,
$ch = curl_init();
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookieFile);
curl_setopt($ch, CURLOPT_COOKIEFILE,$cookieFile);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_POST, 0);
curl_setopt($ch, CURLOPT_TIMEOUT, 4);
If I use PHP cURL then what happens to $_SERVER['REQUEST_METHOD'] at the server side?
http://restfulphp.com/clients/robin and http://restfulphp.com/clients/anne are the same clean URL pattern. How can I know that the former is meant for PUT and the latter is meant for DELETE?
You can't tell from the URL. The client has to couple the URL with an HTTP verb. Part of the point of REST is that you might want to do different things to the same resource (there isn't much point if PUTting something if you will never want to GET it in the future).
Or where should I set in the form/ html to differentiate them?
An HTML form can only trigger POST and GET requests. If you want to use a different HTTP verb from a webpage then you need to use JavaScript and the XMLHttpRequest object.
And it gets complicated when curl comes in - what is it to do with a REST website?
curl is a library (and a command line client that uses that library) for making HTTP requests with.
When you are developing a REST interface, you can use the command line client to test your server. You can also use it as an actual client, but you would probably be better off writing a dedicated tool so you have have a nice command line interface instead of one that forces you to deal with the HTTP side of things whenever you use it.
As a library, you can use it to write clients for your REST interface.
Where should I type??
In a command line shell (such as bash or Windows Power Shell) in a terminal emulator (such as XTerm or Windows Power Shell).
How curl is going to help me to DELETE or to PUT?
It lets you specify the HTTP verb you send. You can see that in the examples you put after your question.
How $_SERVER['REQUEST_METHOD'] is going to detect REST type from curl?
It tells you which verb was used to make the request.
Why do I need PHP cURL if I am going to send REST request types via XMLHttpRequest object?
You don't. You can use PHP if you want to make requests from your server. You can use XMLHttpRequest if you want to make requests from your browser.
It seems that it is meant for communicating with other API providers which I want to avoid at this stage.
You can use it to communicate between a customer facing web server that you control and a backend system with a (non-customer facing) HTTP interface that you also control.
And if I can send REST request types with PHP cURL within my local website, how and where should I place these lines
At the point in your server side code (or your other PHP client) that you want to make the request to the REST service.
If I use PHP cURL then what happens to $_SERVER['REQUEST_METHOD'] at the server side?
Then, assuming that you are using server side PHP (and not, for instance, command line PHP):
The browser will make a request to your server
$_SERVER['REQUEST_METHOD'] will be the method the browser makes the request with
The PHP running on the server will make a request to some other server (or possibly another part of the same server)
$_SERVER['REQUEST_METHOD'] will be the method the PHP makes the request with
The server for the second request will respond to the PHP client script
The server for the first request will respond to the browser
First of all: you are right about the POST, GET, DELETE, PUT methods. cURL(http://nl1.php.net/curl) is important if you need to call another webservice from your code(you can execute POSTs, PUTs, DELETEs, GETs). If you just want to test your webservices you could use RESTcliet(https://addons.mozilla.org/nl/firefox/addon/restclient/). With this tool you can clone these methods easily and see the result.
Ofcourse you have to define in your HTML the form method to call the right webservce:
<form method="PUT_YOUR_METHOD" action="WS_URL">
It also possible to call these methods from jQuery, javascript, Java(Android), Objective C(iOS) and so on.
But I main problem is how to send these REST types (GET, POST, PUT,
DELETE) via URLs to the server?
How you send a the request type depends on what you're using. For example as you mentioned curl can use:
curl -v -X DELETE /clients/anne
To send a delete request. $_SERVER['REQUEST_METHOD'] on the server will then contain the appropriate request type and the script can decide what to do (in this case delete the resource provided the user has access).
In jQuery you can use:
$.ajax({
url: '/clients/anne',
type: 'DELETE',
success: function(result) {
// Success!
}
});
Where should I type??
In that example the curl command would be typed into the command line. PHP also has a curl library that you could use but the syntax is different.
Or where should I set in the form/ html to differentiate them?
In theory you would place, say, "PUT" in the forms method attribute. However see this question.
You're asking two questions.
"(...) the same clean URL pattern. How can I know that the former is meant for PUT and the latter is meant
for DELETE?"
It isn't. The idea of REST:
The information in your application is structured as a collection of resources.
Each resource is addressable: it has a URL.
All operations on the information are expressed in terms of elementary CRUD operations on resources.
The HTTP verbs are used to implement the different operations.
So you don't use URLs to express operations; you use them to express
the objects being operated on.
"Where should I type??''
Into a command line interpreter such as CMD (on Windows) or bash (on Linux).
You probably need to install curl first.

How to unblock cURL on ANY_XYZ_WEBSITE.com?

I have a website that was grabbing data from "ANY_XYZ_WEBSITE.com."
I was using cURL to grab data automatically and then modifying it for my needs. But recently "ANY_XYZ_WEBSITE.com" has blocked all cURL requests and I am unable to grab data from their website. Is there any other way to get the data?
I am using PHP on IIS.
With all probability, they are blocking you based on the User-Agent header.
So --
curl_setopt($ch, CURLOPT_USERAGENT, "SomethingElse/1.0");
before firing the request off.
If you want to masquerade as a real browser, http://www.user-agents.org/ is a comprehensive resource of different user-agents actually in current use.
But I'm seconding Polynomial's sentiment -- there's probably a reason for the site blocking cURL, so just don't be evil while requesting data from them.
You can try changing the agent string. CURLOPT_USERAGENT
Never ever hit in parallel / more than once on the same domain in an interval of three seconds atleast. If you can wait try to keep it atleast ten seconds.
Make sure your crawler read and follow robot.txt file before crawling a domain.
p,s,: Your curl has not been blocked, you have been blocked. And its not user_agent problem.
What to do now?
Have patience. Wait for a while. Refresh your IP (if dynamic) And hit again but following above two instructions. If still getting blocked, you need to specify your code and website you are talking about for a legal solution.

Observe the HTTP request made on a page and post it to a PHP script

I have a web page that contains a SWF object(external) that loads up random content by making HTTP requests to its server. Is there any way I can implement a sort of observer for the page that stores all the HTTP request that were made once the page was loaded.
I'll appreciate any help on the topic, I just need a point to start on.. I don't even know if this is possible.
HTTP requests from a Flash file occur on the client, so if the request goes off to a different server (not yours) your server won't even know that it's occurred. You'd need to install something on the client to track it.

Categories