PHP Input validation for a single input for a url - php

I have this very simple script that allows the user to specify the url of any site. The the script replaces the url of the "data" attribute on an object tag to display the site of the users choice inside the object on the HTML page.
How could I validate the input so the user can't load any page from my site inside the object because I have noticed that it will display my code.
The code:
<?php
$url = 'http://www.google.com';
if (array_key_exists('_check', $_POST)) {
$url = $_POST['url'];
}
//gets the title from the selected page
$file = # fopen(($url),"r") or die ("Can't read input stream");
$text = fread($file,16384);
if (preg_match('/<title>(.*?)<\/title>/is',$text,$found)) {
$title = $found[1];
} else {
$title = "Untitled Document";
}
?>
Edit: (more details)
This is NOT meant to be a proxy. I am letting the users decide which website is loaded into an object tag (similar to iframe). The only thing php is going to read is the title tag from the input url so it can be loaded into the title of my site. (Don't worry its not to trick the user) Although it may display the title of any site, it will not bypass any filters in any other way.
I am also aware of vulnerabilities involved with what I am doing that's why im looking into validation.

As gahooa said, I think you need to be very careful with what you're doing here, because you're playing with fire. It's possible to do safely, but be very cautious with what you do with the data from the URL the user gives you.
For the specific problem you're having though, I assume it happens if you get an input of a filename, so for example if someone types "index.php" into the box. All you need to do is make sure that their URL starts with "http://" so that fopen uses the network method, instead of opening a local file. Something like this before the fopen line should do the trick:
if (!preg_match('/^http:\/\//', $url))
$url = 'http://'.$url;

parse_url: http://us3.php.net/parse_url
You can check for scheme and host.
If scheme is http, then make sure host is not your website. I would suggest using preg_match, to grab the part between dots. As in www.google.com or google.com, use preg_match to get the word google.
If the host is an ip, I am not sure what you want to do in that situation. By default, the preg match would only get the middle 2 numbers and the dot(assuming u try to use preg_match to get the sitename before the .com)

Are you aware that you are creating an open HTTP proxy, which can be a really bad idea?
Do you even need to fetch the contents of the URL? Why don't you let your user's browser do that by supplying it with the URL?
Assuming you do need to fetch the URL, consider validating against a known "whitelist" of URLs. If you can't restrict it to a known list, then you are back to the open proxy again...
Use a regular expression (preg) to ensure it is a good HTTP url, and then use the CURL extension to do the actual request.
Mixing the fopen() family of functions with user supplied parameters is a recipe for potential disaster.

You could use PHP filter.
filter_var($url, FILTER_VALIDATE_URL) or
filter_input(INPUT_POST, 'url', FILTER_VALIDATE_URL);
http://php.net/manual/en/function.filter-input.php
Also try these documents referenced by this PHP wiki post related to filter
https://wiki.php.net/rfc/add_validate_functions_to_filter?s[]=filter
by Yasuo Ohgaki
https://www.securecoding.cert.org/confluence/display/seccode/Top+10+Secure+Coding+Practices
https://www.owasp.org/index.php/OWASP_Secure_Coding_Practices_-_Quick_Reference_Guide
http://cwe.mitre.org/top25/mitigations.html

Related

How to find if I have the same domain name but using a with different extension

I have a problem that I was thinking it was going to be simple to solve but I cannot figure it out.
I have a database full of URLs like:
http://www.domain.com/page.html
http://domain.com/page.html
http://sub.domain.co.in/page.html
http://sub.sub.domain.it/page.html
http://other.domain.net/page.html
http://ok.domain.com/ok.html
etc...
now, given http://www.domain.co.uk/page.html
I need to figure out if such page it is already in the database assuming that the different extension does not change the content.
The final goal is simple, I am building a site where people can submit pages, those pages needs to be unique to avoid duplication of content. Users are submitting google maps .com and google maps .co.in creating a duplication of the same page, what I need to do is to figure out if the page submitted is already been submitted with a different domain extension. I will also do a check on title and content if found, just in case the domain extension DOES change the content ( like www.wyska.net and www.wyska.com )
in other words:
maps.google.com === maps.google.it === maps.google.co.in === maps.google.co.uk .....
only if content is "similar" (I will have to work on figure out what "similar" means too)
so far I have (but it doesn't work):
<?php
$url = 'http://www.domain.com/text.html'; //works with this domain
$parse = parse_url($url);
var_dump($parse);
var_dump(pathinfo($parse['host']));
$url = 'http://sub.sub.domain.co.in/text.html'; //does not work with this domain
$parse = parse_url($url);
var_dump($parse);
var_dump(pathinfo($parse['host']));
?>
if necessary I can even break the domain in different parts and store those parts instead than the full domain.
I as thinking to do a search replace on the domain extension, but I haven't been able to find a full list of domain extension to use. Something like: if it ends with any of those strings, then remove that part from the domain

How to deep link to a Facebook App (NOT Page Tab)

I need to link to a specific page in my Facebook app. The app is not in a page tab, and cannot be in one due to the project constrictions.
This is the url format:
https://apps.facebook.com/myappname
I would need to pass a parameter at the end (like /next.html or ?page=next) so that I can link to the specific page directly from outside the app (from an email).
How would I set this up? My project uses PHP and jQuery. I would love to be able to do this strictly in Javascript if possible.
I have found tons of info on how to deep link a page tab or a mobile app, but not to a regular application. I have found messages stating it's possible, but nothing about how to actually do it anywhere online or on Facebook.
Thanks for your help.
EDIT:
Okay, I got it working in PHP. For anyone else with this issue, this is what I did.
Add a "?" at the very end of the 'Site URL' in your FB app, then create a redirect file similar to this as your app landing page (just use absolute paths instead of relative ones like I did below):
<?php
$query = $_SERVER['QUERY_STRING'];
$params = explode("/", $query);
if (in_array("gallery", $params)) {
header("Location: /gallery.html");
exit;
}
else {
header("Location: /index.html");
exit;
}
?>
This answer is what helped me figure this out:
$_GET on facebook iframe app
I may be missing something here, but why don't you just link to http://apps.facebook.com/yourapp/something.php - this should automatically load your canvas URL, with something.php appended to the path
Obviously this won't work if your canvas URL points to a specific file and not a directory, but plenty of apps do this with success
When you are using the ? all you are doing is issuing a $_GET request, so all of the info you require will exist in the $_GET array.
Rather than query the $_SERVER array, query the $_GET array.
So if you had:
http://myurl.com?info=foobar
You can simply access that info using:
$info = $_GET['info'];
It is good practice to check for the existence first though:
if (isset($_GET['info']))
{
$info =$_GET['info'];
}
else
{
$info="default";
}
Incidently if you use the & character you can have multiple parameters:
http://myurl.com?info=foo&moreinfo=bar
You get a special parameter called app_data that you can use however you want. I've used it in the past to encode a full querystring of my internal app. for example, &app_data=My/Custom/Page
More found in this SO question: Retrieve Parameter From Page Tab URL

url or content as a variable in the header of the page

I am designing a site where external links form various are being shown on my page. I am using
$url=$_GET['url'];
$website_data = file_get_contents($url);
echo $website_data;
so essentially a user would click on a hyperlink which is something like www.test.com/display_page.php?url=http://www.xyz.com/article/2.jpg
My page, list_of_images.php, typically has a list of images with href for each image as above on the page and when any image is clicked it would go to display_page.php, which would show our banner on the top of this page, some text and then this image beneath that. This image could be from any website.
I am currently sending the url directly and grabbing it using GET. I understand that users/hackers can actually do some coding and send commands for the url variable and could break the server or do something harmful and so i would like to avoid this method or sending the url directly in the header. what is the alternate approach for this problem?
The safe approach is to use a fixed set of resources stored in either an array or a database, and the appropriate key as a parameter.
$ress = Array('1' => 'http://www.google.com/', ...);
$res = $ress[$_GET['res']];
I would make sure the url starts with http:// or https://:
if(preg_match("`^https?://`i", $_GET['url']))
// do stuff
You may also want to make sure it isn't pointing anywhere internal:
if(preg_match('`^https?://(?!localhost|127\.|192\.|10\.0\.)`i', $_GET['url']))
// do stuff
Rather than a big dirty regex, you could go for a more elegant host black-list approach, but you get my drift...
Try POST....
Try doing this using POST method

In php, can parse_url and http_build_url be used to detect malformed urls and prevent xss attacks? Is there something better?

I want to allow users of my site to post urls. These urls would then be rendered on the site in the href attributes of a tags. Basically, user A posts a url, my site displays it on the page as an tag, then user B clicks it to see pictures of kittens.
I want to prevent javascript execution and xss attacks, and ensure there are no malformed urls in the output I generate.
Example: User A posts a malformed url, supposedly to pictures of kittens. My site tries to generate an tag from user A's data, then user B clicks the resulting link. User A has actually posted a malformed url which adds a javascript "onclick" event in the to send the victim's cookies to another site.
So I want to only allow correctly formed urls, and block out anything other than http/https protocols. Since I'm not allowing anything here which doesn't look like a url, and the user is not providing me html, it should be pretty simple to check by parsing and reforming the url.
My thinking is that parse_url should fail with an error on malformed urls, or it replaces illegal characters with '_'. I can check the separated parts of the url for allowed protocols as well. Then by constructing a url using http_build_url, I take the parts separated by parse_url and put them back together into a url which is known to be correctly formed. So by breaking them down this way first, I can give the user an error message when it fails instead of putting a sanitized broken url in my page.
The question is, will this prevent xss attacks from doing evil if a user clicks the link? Does the parsed and rebuilt url need further escaping? Is there a better way to do this? Shouldn't this be a solved problem by now with functions in the standard php libraries?
I really don't want to write a parser myself and I'm not going to even consider regular expressions.
Thanks!
What you need to do is just escape content properly when building your html. this means that when a value has a " in it, you build your html with "
Protecting against XSS isn't primarily about validating URL's it's about proper escaping. (although you probably want to be sure that it's a http: or https: link)
For a more detailed list of what to escape when building html strings (ie: the href attribute) see HTML, URL and Javascript Escaping
No, parse_url is not meant to be a URL validator.
You can use filter_var for this:
filter_var($someURL, FILTER_VALIDATE_URL);
So, in PHP, you would use something like:
<?php
$userlink = "http://google.com";
$newlink = htmlentities($userlink);
$link = "$newlink";
?>
Depending on a few other things, you might just validate the URL by checking if it points to any content. Here is an example:
figure 1
<?php
// URL to test
// $url = "";
$content = file_get_contents($url);
if(!empty($content)){
echo "Success:<br /><iframe src=\"$url\" style=\"height:400px; width:400px; margin:0px auto;\"></iframe>";
}else{
echo "Failed: Nothing exists at this url.";
}
?>
Curl is another option. With cURL you can just return http headers then check the error code it returns. ie Error 404 = page not found, 200 = OK, 201 = Created, 202 = Accepted, etc etc
Good luck!
~John
http://iluvjohn.com/

How to obtain anchor part of URL after # in php

While using LightBox mechanism in my project I got an URL
http://nhs/search-panel.php#?patientid=2
I need to collect that patientid from this through GET mechanism, Is that possible in PHP?
Simply put: you can't! Browsers don't send the fragment (the part of the URL after the hashmark) in their requests to the server. You must rely on some client-side javascript: perhaps you can rewrite the url before using it.
Maybe everybody else is right and a simple $_GET is enough but if the # in your URL ( http://nhs/search-panel.php#?patientid=2 ) is supposed to be there you would have to do that with JavaScript (and Ajax e.g. JQuery) because everything after # is not included in the request as far as I know.
If you check your server logs, you should see that no browser actually transmits the #anchor part of the URL the request, so you can't pick it up on the server side.
If you need to know it, you'll need to write some Javascript to extract it from the document.location.href and send it to your server, either by turning it into a regular GET parameter and redirecting the user, or in the background with an XMLHttpRequest/AJAX.
Edit: Whoops, this won't work. The other posters are correct in saying that anything after the hash never reaches your server.
Something along these lines should do you:
//Get complete URI, will contain data after the hash
$uri = $_SERVER['REQUEST_URI'];
//Just get the stuff after the hash
list(,$hash) = explode('#', $uri);
//Parse the value into array (will put value in $query)
parse_str($hash, $query);
var_dump($query);

Categories