Images with BBcode (php, preg_replace). Security question

Images with BBcode (php, preg_replace). Security question - php

Bbcode question. This:
$text = preg_replace("#\[img\](.*)\[\/img\]#si",
"<img src=\"$1\" border=\"0\" />", $text);
works fine, but at the same time it's a big security problem, for example:
[img]http://www.domain.com/delete-account/[/img]
or
[img]http://www.domain.com/logout/[/img]
Any ideas how to control this so that only image links which ends with .jpg are being converted into html?
[img]http://www.domain.com/image.jpg[/img]
Thanks.

According to the HTTP1.1 standard, requesting URLs with GET (the method used to acquire images) should not result in any actions, such as logout. Therefore, you don't need to restrict to URLs with a .jpg at the end, and in general, it is a bad idea because there are other image formats, and the URL is in general unrelated to its content type.
More to the point, if requesting a URL does change a state of a server vulnerable.net, this Cross Site Request Forgery Vulnerability can be exploited anyway by setting up a custom server that 302-redirects http://evil.com/img.jpg to http://vulnerable.net/logout.
FYI, if you really wanted to replace only URLs ending with .jpg, you can just insert it in the group:
$text = preg_replace("#\[img\](.*\.jpg)\[\/img\]#si",
"<img src=\"$1\" border=\"0\" />", $text);
But this is not a security mechanism, and fails if the browser (or a aggressively caching proxy, or a virus scanner, or ...) prefetches URLs. GET requests should not result in any action.

One way to think about this problem is as well to check on server side that GEt and POST request are not equivalent.
A POST request can alter data in server side, a GET request musn't change anything. That's the HTTP protocol. An IMG tag is a GET request, always. And the browser can perform this GET request without any risk, so the problem is on server side, every action that can change alter data (database, session, etc) must check the request is a POST one. For example your /post or /delete-account url, should return either a 403 or a 200 code but with a form page, asking for a POST confirmation. If this is wrong in your application, then you'll have problems not only with altered IMG tags, but maybe as well with 'html page speeders' that make preload of GET referecnes, or even bots.
If you can find a copy of this excellent book you may find some advanced image links problems and filtering tricks. For example links on foreign websites can sometime be a problem. But this is a problem far more complex than starting by handling GET and POST requests in a convenient way.

Related

Security of fetching a url content in php

I am concerned about the safety of fetching content from unknown url in PHP.
We will basically use cURL to fetch html content from user provided url and look for Open Graph meta tags, to show the links as content cards.
Because the url is provided by the user, I am worried about the possibility of getting malicious code in the process.
I have another question: does curl_exec actually download the full file to the server? If yes then is it possible that viruses or malware be downloaded when using curl?

Using cURL is similar to using fopen() and fread() to fetch content from a file.
Safe or not, depends on what you're doing with the fetched content.
From your description, your server works as some kind of intermediary that extracts specific subcontent from a fetched HTML content.
Even if the fetched content contains malicious code, your server never executes it, so no harm will come to your server.
Additionally, because your server only extracts specific subcontent (Open Graph meta tags, as you say),
everything else that is not what you're looking for in the fetched content is ignored,
which means your users are automatically protected.
Thus, in my opinion, there is no need to worry.
Of course, this relies on the assumption that the content extraction process is sound.
Someone should take a look at it and confirm it.
does curl_exec actually download the full file to the server?
It depends on what you mean by "full file".
If you mean "the entire HTML content", then yes.
If you mean "including all the CSS and JS files that the feched HTML content may refer to", then no.
is it possible that viruses or malware be downloaded when using curl?
The answer is yes.
The fetched HTML content may contain malicious code, however, if you don't execute it, no harm will come to you.
Again, I'm assuming that your content extraction process is sound.

Short answer is file_get_contents is safe you retrieve data, even curl is. It is up to you what you do with that data.
Few Guidelines:
1. Never Run eval on that data.
2. Don't save it to database without filtering.
3. Don't even use file_get_contents or curl.
Use: get_meta_tags
array get_meta_tags ( string $filename [, bool $use_include_path = false ] )
// Example
$tags = get_meta_tags('http://www.example.com/');
You will have all meta tags parsed, filtered in an array.

you can use httpclient.class instead of file_get_content or curl. because it connect's the page through the socket.After download the data you can take the meta data using preg_match.

Expanding on the answer made by Ray Radin.
Tips on precautionary measures
He is correct that if you use sound a sound process to search the fetched resource there should be no problem in fetching whatever url is provided. Some examples here are:
Don't store the file in a public facing directory on your webserver. Then you expose yourself to this being executed.
Don't store it in a database, this might lead to a second order sql injection attack
In general, don't store anything from the resource you are requesting, if you have to do this use a specific whitelist of what you are searching for
Check the header information
Even though there is no foolprof way of validating what you are requesting with a specific url. There are ways you can make your life easier and prevent some potential issues.
For example a url might point to a large binary, large image file or something similar.
Make a HEAD request first to get the header information. Then look at the Content-type and Content-length headers to see if the content is a plain text html file
You should however not trust these since they can be spoofed. Doing this will hovewer make sure that even non-malicous content won't crash your script. Requesting image files is presumably something you don't want users to do.
Guzzle
I recommend using Guzzle to do your request since it is in my opinion provides some functionallity that should make this easier

It is safe but you will need to do a proper data check before using it. As you should with any data input anyway.

XSS Vulnerability in PHP scripts

I have been searching everywhere to try and find a solution to this. I have recently been running scans on our websites to find any vulnerabilities to XSS and SQL Injection. Some items have been brought to my attention.
Any data which is user inputted is now validated and sanitized using filter_var().
My issue now is with XSS and persons manipulating the URL. The simple one which seems to be everywhere is:
http://www.domainname.com/script.php/">< script>alert('xss');< /script >
This then changes some of the $_SERVER variables and causes all of my relative paths to CSS, links, images, etc.. to be invalid and the page doesn't load correctly.
I clean any variables that are used within the script, but I am not sure how I get around removing this unwanted data in the URL.
Thanks in advance.
Addition:
This then causes a simple link in a template file:
Link
to actually link to:
"http://www.domainname.com/script.php/">< script>alert('xss');< /script >/anotherpage.php

This then changes some of the $_SERVER variables and causes all of my relative paths to CSS, links, images, etc.. to be invalid and the page doesn't load correctly.
This sounds you made a big mistake with your website and should re-think how you inject link-information from the input into your output.
Filtering input alone does not help here, you need to filter the output as well.
Often it's more easy if your application recieves a request that does not match the superset of allowed requests to return a 404 error.
I am not sure how I get around removing this unwanted data in the URL.
Actually, the request has been already send, so the URL is set. You can't "change" it. It's just the information what was requested.
It's now your part to deal upon it, not to blindly pass it around any longer, e.g. into your output (and then your links are broken).
Edit: You now wrote more specifically what you're concerned about. I would go in one with dqhendricks here: Who cares?
If you really feel uncomfortable with the fact that a user is just using her browser and enters any URL she feels free to do so, well, the technically correct response is:
400 Bad Request (ref)
And return a page with no or only fully-qualified URIs (absolute URIs) or a redefinition of the Base-URI, otherwise the browser will take the URI entered into it's address bar as the Base-URI. See Uniform Resource Identifier (URI): Generic Syntax RFC 3986; Section 5. Reference ResolutionSpecs.

first, if someone adds that crap to their url, who cares if the page doesn't load images correctly? also if the request isn't valid, why would it load any page? why are you using SERVER vars to get paths anyways?
second, you should also be escaping any user submitted database input with the appropriate method for your particular database to avoid sql injection. filter_var generally will not help.
third, xss is simple to protect from. Any user submitted data that is to be displayed on any page needs to be escaped with htmlspecialchars(). this is easier to ensure if you use a view class that you can build this escaping in to.

To your concern about XSS: The altered URL won't get into your page unless you blindly use the related $_SERVER variables. The fact that the relative links seem to include the URL injected script is a browser behavior that risks only breaking your relative links. Since you are not blinding using the $_SERVER variables, you don't have to worry.
To your concern about your relative paths breaking: Don't use relative paths. Reference all your resources with at least a root-of-domain path (starting with a slash) and this sort of URL corruption will not break your site in the way you described.

How can I prevent content from unauthorized views (php/js)

How can I prevent (unauthorized) people from reading a message on a website (e.g. by looking in the browser cache for the text/images)?**
It's a PUBLIC (!) site (means: no logins here!)
But:
the (secret) message is only shown for a certain time.
the message might be shown only if a passwort is given.
Problems:
In Opera for example page(=page contents/text) could be indexed by the browser and searched.
One idea was to create an image with the message ... but: Also images - even when a header "no cache" is send could be retrieved from FireFox's cache.
Also: Recreating the message from single characters as image does not work (at least I think so at the moment). I tried this method, but it makes output quite slow (writing this: I notice that I do not need to create the images at runtime, but could create images (of single letters) in advance and display/refer to them not by real, but pseudo random names in the HTML )
I also had the idea to output a encoded message (ROT13) (in HTML) but use JS .onload to decode the message immediately. Problem: If this code is in the HTML it could be recovered from the cache later on. At least if someone searches through the (Opera) cache the person would probably not think of entering search terms encoded.
Programming language is PHP.

You can't. What if someone takes a screenshot of this?

You could add the secret code to the page with javascript, after the page is loaded. You'd want to retrieve the secret code via AJAX, then write it to the page - that way, the code isn't cached in the HTML part of the source, and it isn't sitting in the javascript within the page's source code.
Content piped in with AJAX is pretty ephemeral, it won't be cached or otherwise recorded.
Since I don't know anything about your HTML or what (if any) javascript framework you might be using, I can't give you a code sample, but you should be able to work with the concept.

Realistically if it is sent to the client and displayed on screen then you can not prevent the message from being saved or stored on the client machine. Whatever you do to prevent that save could still be bypassed by a simple screenshot.
If you are not concerned about the person the message is targeted at saving said message then I think your best course of action would be to use Flash with Flash doing a call to the server to retrieve the message and display it. Another option may be to use javascript to perform some form of call (AJAX) to the server which then sends back the message and you alter the DOM to display the message. I don't think that would be cached but unless you use SSL it could be stored by intermediate proxies.

How can I ensure a URL points to safe, non-adult, non-spam content when allowing people to post content to my website?

I am working on a PHP site that allows users to post a listing for their business related to the sites theme. This includes a single link URL, some text, and an optional URL for an image file.
Example:
<img src="http://www.somesite.com" width="40" />
ABC Business
<p>
Some text about how great abc business is...
</p>
The HTML in the text is filtered using the class from htmlpurifier.org and the content is checked for bad words, so I feel pretty good about that part.
The image file URL is always placed inside a <img src="" /> tag with a fixed width and validated to be an actual HTTP URL, so that should be Ok.
The dangerous part is the link.
Question:
How can I be sure that the link does not point to some SPAM, unsafe, or porn site (using code)?
I can check headers for 404, etc... but is there a quick and easy way to validate a sites content from a link.
EDIT:
I am using a CAPTCHA and do require registration before posting is allowed.

Its going to be very hard to try and determine this yourself by scraping the site URL's in question. You'll probably want to rely on some 3rd party API which can check for you.
http://code.google.com/apis/safebrowsing/
Check out that API, you can send it a URL and it will tell you what it thinks. This one is mainly checking for malware and phishing... not so much porn and spam. There are others that do the same thing, just search around on google.

is there a quick and easy way to validate a sites content from a link.
No. There is no global white/blacklist of URLs which you can use to somehow filter out "bad" sites, especially since your definition of a "bad" site is so unspecific.
Even if you could look at a URL and tell whether the page it points to has bad content, it's trivially easy to disguise a URL these days.
If you really need to prevent this, you should moderate your content. Any automated solution is going to be imperfect and you're going to wind up manually moderating anyways.

Manual moderation, perhaps. I can't think of any way to automate this other than using some sort of blacklist, but even then that is not always reliable as newer sites might not be on the list.
Additionally, you could try using cURL and downloading the index page and looking for certain keywords that would raise a red flag, and then perhaps hold those for manual validation.
I would suggest having a list of these keywords in array (porn, sex, etc). If the index page that you downloaded with cURL has any of those keywords, reject or flag for moderation.
This is not reliable nor is it the most optimized way of approving links.
Ultimately, you should have manual moderation regardless, but if you wish to automate it, this is a possible route for you to take.

you can create a little monitoring system that will transfer this content created by user
to an approval queue that only administrators can access to approve the content that should
displayed at the site

Efficient Method for Preventing Hotlinking via .htaccess

I need to confirm something before I go accuse someone of ... well I'd rather not say.
The problem:
We allow users to upload images and embed them within text on our site. In the past we allowed users to hotlink to our images as well, but due to server load we unfortunately had to stop this.
Current "solution":
The method the programmer used to solve our "too many connections" issue was to rename the file that receives and processes image requests (image_request.php) to image_request2.php, and replace the contents of the original with
<?php
header("HTTP/1.1 500 Internal Server Error") ;
?>
Obviously this has caused all images with their src attribute pointing to the original image_request.php to be broken, and is also the wrong code to be sending in this case.
Proposed solution:
I feel a more elegant solution would be:
In .htaccess
If the request is for image_request.php
Check referrer
If referrer is not our site, send the appropriate header
If referrer is our site, proceed to image_request.php and process image request
What I would like to know is:
Compared to simply returning a 500 for each request to image_request.php:
How much more load would be incurred if we were to use my proposed alternative solution outlined above?
Is there a better way to do this?
Our main concern is that the site stays up. I am not willing to agree that breaking all internally linked images is the best / only way to solve this. I refuse to tell our users that because of something WE changed they must now manually change the embed code in all their previously uploaded content.

Ok, then you can use mod_rewrite capability of Apache to prevent hot-linking:
http://www.cyberciti.biz/faq/apache-mod_rewrite-hot-linking-images-leeching-howto/

Using ModRwrite will probably give you less load than running a PHP script. I think your solution would be lighter.
Make sure that you only block access in step 3 if the referer header is not empty. Some browsers and firewalls block the referer header completely and you wouldn't want to block those.

I assume you store image paths in database with ids of images, right?
And then you query database for image path giving it image id.
I suggest you install MemCached to the server and do caching of user requests. It's easy to do in PHP. After that you will see server load and decide if you should stop this hotlinking thing at all.

Your increased load is equal to that of a string comparison in PHP (zilch).
The obfuscation solution doesn't even solve the problem to begin with, as it doesn't stop future hotlinking from happening. If you do check the referrer header, make absolutely certain that all major mainstream browsers will set the header as you expect. It's an optional header, and the behavior might vary from browser to browser for images embedded in an HTML document.
You likely have sessions enabled for all requests (whether they're authenticated or not) -- as a backup plan, you can also rename your session cookie name to something obscure (edit: obscurity here actually doesn't matter as long as the cookie is set for your host only (and it is)) and check that a cookie by that name is set in image_request.php (no cookie set would indicate that it's a first-request to your site). Only use that as a fallback or redundancy check. It's worse than checking the referrer.
If you were generating the IMG HTML on the fly from markdown or something else, you could use a private key hash strategy with a short-live expire time attached to the query string. Completely air tight, but it seems way over the top for what you're doing.
Also, there is no "appropriate header" for lying to a client about the availability of a resource ;) Just send a 404.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.