How to check if there is anything at URL?

How to check if there is anything at URL? - php

Part of my site requires user to input URLs, but in case they type the URL incorrectly or just input a non-existent one on purpose I end up with a bad record on my database.
E.G in Chrome if there isn't anything at a URL you get the error
message "Oops! Google Chrome could not find fdsafadsfadsf.com". (this is the case I'm referring)
This could be solved by checking the URL to see if there is anything, I can only think of one which is loading the external URL in a PHP file and then parsing it's content. But I hope there is a method that doesn't put unneeded strain on my server.
What other ways exist to check if there is anything at a particular URL?

I would just make a HEAD request. This will work with most servers, and avoids downloading the entire page, so it is very efficient.
http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html
All you have to do is parse the status code returned. If it is 200, then you're good.
Example implementation with cURL here: http://icfun.blogspot.com/2008/07/php-get-server-response-header-by.html

You can use php get_headers($url), which will return false in case there isn't an answer

if you're willing to include a tiny Flash embed you can do a crossdomain AJAX call from the client to see if anything useful is at the destination. This would alleviate any Server involvement at all.
http://jimbojw.com/wiki/index.php?title=Introduction_to_Cross-Domain_Ajax

I would use cURL to do this, that way you can specify a timeout on it.
See the comments on: http://php.net/manual/en/function.get-headers.php

Related

Convert external resource to https

My site is loading images from other sites and this is causing warnings when I implemented HTTPS instead of plain HTTP. I know why this is happening but I'm wondering how to correct.
Best solution I have seen is here, but I don't understand how that works.
The poster suggests prepending https://example.com/imageserver?url= to the image url. This doesn't work. So what am I missing? What is imageserver?
I hope this makes sense, I'm not sure if I'm not just missing something obvious here.

imageserver could be a php script that fetch the image and display its contents.
a very simple example, not very safe
echo file_get_contents($_GET['url']);
The idea here is that the browser now gets the images from your secure server instead of the original non-https server.

Mod Rewrite and passing a URL as parameter

I am having a small trouble with mod rewrite. A friend of mine is writing a script that allows you to upload images.
What we want to do is allow the user to append a domain name to a direct image link, and the script would retrieve the image from the supplied URL.
For example, if the image is at: http://www.test.com/image.jpg, adding domain.com/http://www.test.com/image.jpg would allow a script to retrieve that url (test.com) to get the image we want.
EDIT: HTTP is in front of the URL because I don't want the user to have to remove the HTTP manually. They see an image in their browser, they append "domain.com" before it, http and all and the script retrieves that image and stores it on our server.
The rule I am using is:
RewriteRule ^([\w|.|/]+(jpg|png|gif))$ /upload.php?url=http://$1 [B,L,NC,R=302]
this correctly matches URLs but the colon in http:// causes problems.
If the user inputs: domain.com/www.test.com/image.jpg, it works.
If the user inputs: domain.com/http://www.test.com/image.jpg, it doesn't work and I get a 403 forbidden page (XAMPP on Windows).
If the user inputs: domain.com/http//www.test.com/image.jpg, it works (no colon in http).
EDIT: By working, I mean if I test it locally, I get to see the URL I pass in a $_GET['url'] parameter correctly, instead of seeing an error 403.
Can you please tell me what is wrong with this rule and how to fix it? Or any alternative solutions to achieve the behavior we want?
Thank you.

Well, I think I've found the problem. It wasn't the regex, nor mod_rewrite itself.
So it's a bug in Apache on Windows that has been declared WONTFIX.
For reference, see this StackOverflow thread: and this bug report
I'm posting what I found and will consider this question answered. Thank you all!

You could use urlencode() in php

This approach is cumbersome, error prone and insecure (for example, an image URL isn't required to end with those well known file extensions)
If I understand your use case, it starts when the user is surfing the web and he's viewing an image, and he wants to share it via your service. Then he types by hand http://your.sharing.service in the browser's address bar, just before any text. Then you use mod_rewrite to trigger your script, but I think your regex (and your service too) will fail in a number of unpredictable ways.
I never used a service like this, and I think that the standard approach of using a button to submit the URL to some script (let's say http://my.service.com/upload?url=...) should be preferred.

The problem is in your regex...
Try:
^((http|https)+[\:\/\/\w\.]+(jpg|png|gif))$

How do I get this URL without considering the Apache settings?

HEllo I have this URL I need to get with PHP
http://www.domain.com/forum/#forum/General-discussions-0.htm
The problem is this is not a real URL, but this the mask created by the .htaccess.
I need to get the visible URL and not the real path of the file, because I need to compare it with some PHP variables I have.
In fact the real path will look like this:
http://domain.com/modules/boonex/forum/index.php
And in that way is totally useless for me.
How do I get the first URL as it is?

You can't get that from http://www.domain.com/forum/#forum/General-discussions-0.htm. Everything after the fragment (#) is not even send to the server, there is no way to retrieve it save for a delayed update with javascript. All you'll get it is http://www.domain.com/forum/ send to the server, and on the onload event of your document you can possibly load something in with javascript.

Look into the source code or it may not have real urls at all. The part is for ajax based navigation. It may mean that there are no real urls on that site and if there are then they should be extracted from <a href="someurl"> as they might masked using javascript.

With
file_get_contents();
for example. Neither user nor your server mind about .htaccess
It's server proccessing the request who have to direct you to correct address
however php does ignore everything after #, so in this case you have no chance to get it without real url
As #Wrikken said, there is no way to get url after # fragment

How can I prevent content from unauthorized views (php/js)

How can I prevent (unauthorized) people from reading a message on a website (e.g. by looking in the browser cache for the text/images)?**
It's a PUBLIC (!) site (means: no logins here!)
But:
the (secret) message is only shown for a certain time.
the message might be shown only if a passwort is given.
Problems:
In Opera for example page(=page contents/text) could be indexed by the browser and searched.
One idea was to create an image with the message ... but: Also images - even when a header "no cache" is send could be retrieved from FireFox's cache.
Also: Recreating the message from single characters as image does not work (at least I think so at the moment). I tried this method, but it makes output quite slow (writing this: I notice that I do not need to create the images at runtime, but could create images (of single letters) in advance and display/refer to them not by real, but pseudo random names in the HTML )
I also had the idea to output a encoded message (ROT13) (in HTML) but use JS .onload to decode the message immediately. Problem: If this code is in the HTML it could be recovered from the cache later on. At least if someone searches through the (Opera) cache the person would probably not think of entering search terms encoded.
Programming language is PHP.

You can't. What if someone takes a screenshot of this?

You could add the secret code to the page with javascript, after the page is loaded. You'd want to retrieve the secret code via AJAX, then write it to the page - that way, the code isn't cached in the HTML part of the source, and it isn't sitting in the javascript within the page's source code.
Content piped in with AJAX is pretty ephemeral, it won't be cached or otherwise recorded.
Since I don't know anything about your HTML or what (if any) javascript framework you might be using, I can't give you a code sample, but you should be able to work with the concept.

Realistically if it is sent to the client and displayed on screen then you can not prevent the message from being saved or stored on the client machine. Whatever you do to prevent that save could still be bypassed by a simple screenshot.
If you are not concerned about the person the message is targeted at saving said message then I think your best course of action would be to use Flash with Flash doing a call to the server to retrieve the message and display it. Another option may be to use javascript to perform some form of call (AJAX) to the server which then sends back the message and you alter the DOM to display the message. I don't think that would be cached but unless you use SSL it could be stored by intermediate proxies.

Efficient Method for Preventing Hotlinking via .htaccess

I need to confirm something before I go accuse someone of ... well I'd rather not say.
The problem:
We allow users to upload images and embed them within text on our site. In the past we allowed users to hotlink to our images as well, but due to server load we unfortunately had to stop this.
Current "solution":
The method the programmer used to solve our "too many connections" issue was to rename the file that receives and processes image requests (image_request.php) to image_request2.php, and replace the contents of the original with
<?php
header("HTTP/1.1 500 Internal Server Error") ;
?>
Obviously this has caused all images with their src attribute pointing to the original image_request.php to be broken, and is also the wrong code to be sending in this case.
Proposed solution:
I feel a more elegant solution would be:
In .htaccess
If the request is for image_request.php
Check referrer
If referrer is not our site, send the appropriate header
If referrer is our site, proceed to image_request.php and process image request
What I would like to know is:
Compared to simply returning a 500 for each request to image_request.php:
How much more load would be incurred if we were to use my proposed alternative solution outlined above?
Is there a better way to do this?
Our main concern is that the site stays up. I am not willing to agree that breaking all internally linked images is the best / only way to solve this. I refuse to tell our users that because of something WE changed they must now manually change the embed code in all their previously uploaded content.

Ok, then you can use mod_rewrite capability of Apache to prevent hot-linking:
http://www.cyberciti.biz/faq/apache-mod_rewrite-hot-linking-images-leeching-howto/

Using ModRwrite will probably give you less load than running a PHP script. I think your solution would be lighter.
Make sure that you only block access in step 3 if the referer header is not empty. Some browsers and firewalls block the referer header completely and you wouldn't want to block those.

I assume you store image paths in database with ids of images, right?
And then you query database for image path giving it image id.
I suggest you install MemCached to the server and do caching of user requests. It's easy to do in PHP. After that you will see server load and decide if you should stop this hotlinking thing at all.

Your increased load is equal to that of a string comparison in PHP (zilch).
The obfuscation solution doesn't even solve the problem to begin with, as it doesn't stop future hotlinking from happening. If you do check the referrer header, make absolutely certain that all major mainstream browsers will set the header as you expect. It's an optional header, and the behavior might vary from browser to browser for images embedded in an HTML document.
You likely have sessions enabled for all requests (whether they're authenticated or not) -- as a backup plan, you can also rename your session cookie name to something obscure (edit: obscurity here actually doesn't matter as long as the cookie is set for your host only (and it is)) and check that a cookie by that name is set in image_request.php (no cookie set would indicate that it's a first-request to your site). Only use that as a fallback or redundancy check. It's worse than checking the referrer.
If you were generating the IMG HTML on the fly from markdown or something else, you could use a private key hash strategy with a short-live expire time attached to the query string. Completely air tight, but it seems way over the top for what you're doing.
Also, there is no "appropriate header" for lying to a client about the availability of a resource ;) Just send a 404.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.