I want to allow users to upload urls for images (a bit like on this site in the markup). The only difference is that I'm going to store these in my database. I want to ensure nothing too malicious can be done.
After looking around I've seen the recommendation of cURL to check for the content_type as apparently getimagesize() actually downloads the full image, which not only has security implications (apparently - I'm really not an expert) but will therefore be slow.
So far my code is looking like this:
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL,$url);
// don't download content
curl_setopt($curl, CURLOPT_NOBODY, 1);
curl_setopt($curl, CURLOPT_FAILONERROR, 1);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)');
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
if(curl_exec($curl)===FALSE)
{
curl_close($curl);
return false;
}
$contentType = curl_getinfo($curl, CURLINFO_CONTENT_TYPE); //get content type
curl_close( $curl );
if (strpos($contentType, 'image')){
// valid image
}
However, I'm not entirely sure if this is the correct way to go about doing this. I've also seen lots about sanitising the urls, but not entirely sure what this would entail.
Any help on securing this part of my web app and preparing it for storage would be highly appreciated.
As a quick aside, I'm hoping to do the same for YouTube links so if you have any recommendations for that, I'd appreciate it - though I've not begun research into this yet.
Regards,
Mike
You can also escape special chars and what not using the function below
<?php
$url = htmlspecialchars(addslashes($_POST["inputName"]));
?>
This will add splashes and turn & signs to html entities https://www.w3schools.com/html/html_entities.asp instead so you might need to reverse this process when reading the data from the database.
If you're going to allow users to upload a file by URL, start by downloading the file (either by using cURL, or any other tools you like). Don't bother making an initial request to check the Content-Type -- what matters is ultimately the content of the file, not the headers it happens to served with by the original server.
Once you've downloaded the image, perform any further checks on the local file. Make sure it is an appropriate format, and is not too large, then convert it to your preferred format.
Other notes:
Don't use a fake User-Agent. Use an accurate one which represents what web site is responsible for the request, e.g. "MySite/1.0 http://example.com/". (Other webmasters will thank you for this!)
It's a good idea to do a DNS lookup on the domain before requesting it, to protect your server from DNS rebinding attacks. Make sure that the resulting IP does not point to your private network, or to localhost, before you make an HTTP request.
Related
I have a HTML/PHP/JS page that I use for an automation process.
On load, it performs a curl request like :
function get_data($url) {
$curl = curl_init();
$timeout = 5;
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
$data = curl_exec($curl);
curl_close($curl);
return $data;
}
$html = get_data($url);
Then it uses DOMDocument to retrieve a specific element on the remote page. My PHP code handles it, makes some operations, then stores it in a variable.
My purpose as you can guess is to simulate a "normal" connexion. To do so, I used the Tamper tool to see what requests are performed, when I was physically interacting with the remote page. HTTP headers are made of UA, cookies (among them, a session cookie), and so on. The only POST variable I have to send back is my PHP variable (you know, the one wich was calculated and stored in a PHP var). I also tested the process with Chrome, which allows me to copy/paste requests as curl.
My question is simple : is there a way to handle HTTP requests / cookies in a simple way ? Or do I have to retrieve them, parse them, store them and send them back "one by one" ?
Indeed, a request and a response are slightly different, but in this case they share many things in common. So I wonder if there is a way to explore the remote page as a browser would do, and interact with it, using for instance an extra PHP library.
Or maybe I'm doing it the wrong way and I should use other languages (PERL...) ?
The code shown above does not handle requests and cookies, I've tried but it was a bit too tricky to handle, hence I ask this question here :) I'm not lazy, but I wonder if there is a more simple way to achieve my goal.
Thanks for your advices, sorry for the english
I am writing a code in PHP which fetches the content in a particular format from around 20 websites.
It is working normally for all the websites except one. Now, here is the issue.
I am using file_get_contents() to fetch images from the website and save it on my server. The image is present on the remote server and is accessible via browser but I am getting 404 response while doing it via code.
I am unable to understand the issue behind this as this method works perfectly for other websites.
Has it something to do with the headers being sent? Any help will be greatly appreciated.
The answer is probably: yes...
They're checking user-agents, I suppose.
And those are sent in your headers. You can fake your user-agent. Don't use file_get_contents() though, as that one doens't allow faking your user-agent.
Look into curl.
Edit 1
Barmar's link shows a possibility to use file_get_contents() with a different user-agent at the same time. It's worth while looking into...
Edit 2
But it could also be about then checking the referrer... If that is the case you really need to use curl to be able to set the referrer.
Edit 3
Having seen the URL now, and looking at error 404 that you get (not a 50x) , I advise you to check if the URL is being escaped and parsed ok. I see that the URL contains spaces, and two slashes after the domain name. Check if spaces are escaped into %20 and if the double slashed shouldn't be stripped to just one slash.
So
http://celebslam.celebuzz.com//bfm_gallery/2014/03/Lindsay Lohan 2 Broke Girls/gallery_enlarged/gallery_enlarged-lindsay-lohan-2-broke-girls-01.jpg
Should become
http://celebslam.celebuzz.com/bfm_gallery/2014/03/Lindsay%20Lohan%202%20Broke%20Girls/gallery_enlarged/gallery_enlarged-lindsay-lohan-2-broke-girls-01.jpg
And notice, the server is CaSe-SeNsItIvE !
Yep, first of all - check, if that site check referrer on images access. For example try to get image directly in browser
It also can check user-agent field and something else
Probably it will help to get file by curl ( code examples easy to find or i'll give you simple class )
P.S> just interesting. Can you give some images url examples to try?
Probably the referral or user agent. This includes both:
function file_get_contents_custom($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_REFERER, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION , 1);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (X11; U; Linux; i686; en-US; rv:1.6) Gecko Debian/1.6-7');
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
Update:
The image you linked works fine for me using file_get_contents. It might be that the server has some sort of DDOS protection. How many requests are you making a second on average?
I have the same code running on multiple sites/servers. 2 days ago the code started returning http_code = 0 and the error message "empty reply from server" on one of the servers.
Can anyone shed any light as to why a particular server would be working one day, then not working the next? I have submitted a ticket to the ISP explaining the issue but they cannot seem to find what is wrong (yet).
I guess the question really is, what would/could change on a server to stop this from working?
What is interesting tho is the url I am referencing doesnt get touched on the server returning the error. If I change the url to point to something that doesnt exist, the same error is returned. So it appears that CURL POST references in total are being rejected by the server. I currently have other CURL scripts that are hitting these problem sites that are still working, but they do not have POST options in them.
The issue is definitely related to CURL POST requests on this server, and they are being rejected pretty much immediately.
On the server in question I have 15+ separate accounts and every one of them returns the same result so I dont think its anything I have changed as I know I havent made any wholesale changes to ALL the sites at the time when this issue arose. Of the 6 other sites I have hosted elsewhere, everything is still working fine with exactly the same code.
I have tried various combinations/changes to options from posts I have read but nothing has really made a difference, the working sites still work and the non-working sites still dont.
function sendWSRequest($url, $xml) {
// $headers[] = 'Content-Type: application/xml; charset=utf-8';
$headers[] = 'Content-Type: text/xml; charset=utf-8';
$headers[] = 'Content-Length: ' . strlen($xml);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_HEADER, true);
// curl_setopt($ch, CURLINFO_HEADER_OUT, false);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_POSTFIELDS, $xml);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
// curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 20);
$result = curl_exec($ch);
if($result===false) {
print 'error with curl - '.curl_error($ch).'<br />';
}
$info = curl_getinfo($ch);
curl_close($ch);
return $result;
}
Any help would be greatly appreciated.
EDIT
To summarise based on further investigations, when the script errors, nothing registers in the server access logs. So it appears that CURL requests containing POST options are being rejected before access is granted/logged...
Cheers
Greg J
I know this is an old thread, but I found a solution that may save someone else a headache:
I just began encountering this exact problem with a web site hosted at GoDaddy which was working until recently. To investigate the problem I created an HTML page with a form containing the same fields being submitted in the POST data via cURL.
The browser-submitted HTML form worked while the cURL POST resulted in the Empty reply from server error. So I examined the difference between the headers submitted by the browser and those submitted by cURL using the PHP apache_request_headers() function on my development system where both the cURL and browser submissions worked.
As soon as I added the "User-Agent" header submitted by my browser to the cURL POST, the problem site worked as expected instead of returning an empty reply:
CURLOPT_HTTPHEADER =>
array("User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0")
I did not experiment with other/simpler User-Agent headers since this quick fix solved my problem.
According to the PHP manual, upload should be urlencoded:
CURLOPT_POSTFIELDS The full data to post in a HTTP "POST" operation.
[...] This parameter can either be
passed as a urlencoded string like 'para1=val1¶2=val2&...' or as
an array with the field name as key and field data as value. If value
is an array, the Content-Type header will be set to
multipart/form-data. As of PHP 5.2.0, value must be an array if files
are passed to this option with the # prefix. As of PHP 5.5.0, the #
prefix is deprecated and files can be sent using CURLFile.
So you might try with
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, 'xml=' . urlencode($xml));
and see what happens. Or, anyway, start with an empty or very simple FIELD to see if it at least arrives to the destination server.
Update
I've checked this setup on a test machine and it works. The problem is then likely not to be PHP or cURL side at all, at this point. Can you request a list of software/hardware updates on that machine and network in the last days?
Otherwise, I'd try to capture outgoing traffic so as to determine whether the request leaves the server (and the problem is in between, e.g. a misconfigured firewall: hence my inclusion of "hardware" in the change list), or doesn't leave the server at all. In this latter case the culprits could be:
updates to cURL library
updates to PHP cURL module and/or PHP binaries
updates to "software" firewall rules
updates to ancillary network libraries (unlikely; they should be HTTP agnostic and not differentiate a POST from, say, a GET or HEAD)
OK, as it turns out, a rather reluctant host recompiled Apache2 and PHP which has resolved the issue.
The host claims (their opening statement to my support ticket) that no updates to either Apache2 or PHP had been performed around the time the issue occurred.
the behavior was as such that it wasnt even acknowledging a CURL request that contained the POST commands. The target URL was never reached.
Thank you so much to all who provided their advice. Particularly Isemi who has gone to great lengths to find a resolution.
I'm trying to write a simple PHP script which automatically sets up new etherpads (see http://etherpad.com/).
They don't have an API (yet) for creating new pads so I'm trying to figure if I can do things another way.
After playing around some, I found that if you append a random string to etherpad.com to a not-yet-created pad, it'll come back with a form asking if you want to create a new etherpad at that address. If you submit that form, a new pad will be created at that URL.
My thought then was I could just create a PHP script using CURL that would duplicate that form and trick etherpad into creating a new pad at whatever URL I give it. I wrote the script but so far I can't get it working. Can someone tell me what I'm doing wrong?
First, here's the HTML form on the etherpad creation page:
`
<p><tt id="padurl">http://etherpad.com/lsdjfsljfa-fdj-lsdf</tt></p>
<br/>
<p>There is no EtherPad document here. Would you like to create one?</p>
<input type="hidden" value="lsdjfsljfa-fdj-lsdf" name="padId"/>
<input type="submit" value="Create Pad" id="createPad"/>
`
Then here's my code which tries to submit the form using CURL
$ch = curl_init();
//set POST variables
$url = "http://etherpad.com/ep/pad/create?padId=ldjfal-djfa-ldkfjal";
$fields = array(
'padId'=>urlencode("ldjfal-djfa-ldkfjal"),
);
$useragent="Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30)";
// set user agent
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
//url-ify the data for the POST
foreach($fields as $key=>$value) { $fields_string .= $key.'='.$value; }
print_r($fields_string);
//open connection
$ch = curl_init();
//set the url, number of POST vars, POST data
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_POST,count($fields));
curl_setopt($ch,CURLOPT_POSTFIELDS,$fields_string);
//execute post
$result = curl_exec($ch);
print_r($result);
//close connection
curl_close($ch);
When I run the script, PHP reports back that everything executed correctly but etherpad doesn't create my pad. Any clues what's going on?
I have not investigated this specific site but I guess there are some important headers which are missing. Here is a very general approach that is applicable for nearly any website:
Use a network sniffer such as Wireshark to capture all connectons. Then compare the sent POST fields with yours.
An even easier way is to use Netcat. Just save the page to disk, change the form-URL to http://localhost:3333/ and run
$ nc -l -p 3333
Now open the local HTML file and fill in the fields appropriately. Immediately you will see all headers that would have been transmitted to the host.
(There are also extensions for Mozilla Firefox but in general they just slow down the browser without providing much benefit.)
Also read what I have posted on To auto fill a text area using php curl as it might help you with your realization in PHP.
By the way, you are sending the parameter "padId" via GET and POST. That is not necessary. Check what the Etherpad-form actually uses and stick with it.
My guess is that you're missing the cookies and/or the referrer. It may be checking the referrer to ensure people aren't creating pads without confirmation.
Wireshark will help, but add that to your curl and see if it works.
Here's the answer a friend helped me come up with:
They're apparently doing some cookie
validation, that's why your script
isn't working. You can find this out
by loading the new pad creation prompt
page, clearing your cookies, and then
reloading the page. It won't work.
Tricky, but effective for most casual
bots.
Here's a script that gets around the
limitation. Just insert your desired
$padId and away you go.
<?php
$padId = 'asdfjklsjfgadslkjflskj';
$ch = curl_init();
# for debugging
curl_setopt($ch, CURLOPT_HEADER, true);
# parse cookies and follow all redirects
curl_setopt($ch, CURLOPT_COOKIEFILE, '/dev/null');
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
# first, post to get a cookie
curl_setopt($ch, CURLOPT_URL, 'http://etherpad.com/' . urlencode
($padId));
$result = curl_exec($ch);
echo $result;
# next, post to actually create the etherpad
curl_setopt($ch, CURLOPT_URL, 'http://etherpad.com/ep/pad/create');
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, 'padId=' . urlencode($padId));
$result = curl_exec($ch);
echo $result;
curl_close($ch);
To create a file directly from HTML or TEXT
Use the setText or setHTML API endpoint. http://etherpad.org/doc/v1.5.0/#index_sethtml_padid_html
To easily do this use the Etherpad PHP Client https://github.com/TomNomNom/etherpad-lite-client
To post from a file
This feature is provided by an Etherpad plugin. To enable it...
Install the Etherpad ep_post_data plugin by typing npm install ep_post_data on your Etherpad instance.
At your client machine CLI type: curl -X POST -d #yourfile.here http://youretherpad/post
Replace yourfile.here with your file
Replace the url with the Etherpad instance you want to work to.
Source: http://blog.etherpad.org/2014/12/17/post-to-etherpad-with-this-simple-plugin/
What is the best way to check if a given url points to a valid file (i.e. not return a 404/301/etc.)? I've got a script that will load certain .js files on a page, but I need a way to verify each URL it receives points to a valid file.
I'm still poking around the PHP manual to see which file functions (if any) will actually work with remote URLs. I'll edit my post as I find more details, but if anyone has already been down this path feel free to chime in.
The file_get_contents is a bit overshooting the purpose as it is enough to have the HTTP header to make the decision, so you'll need to use curl to do so:
<?php
// create a new cURL resource
$ch = curl_init();
// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, "http://www.example.com/");
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_NOBODY, 1);
// grab URL and pass it to the browser
curl_exec($ch);
// close cURL resource, and free up system resources
curl_close($ch);
?>
one such way would be to request the url and get a response with a status code of 200 back, aside from that, there's really no good way because the server has the option of handling the request however it likes (including giving you other status codes for files that exist, but you don't have access to for a number of reasons).
If your server doesn't have fopen wrappers enabled (any server with decent security won't), then you'll have to use the CURL functions.