verifying a domain using php - php

I have a member area, where they can add their domains and it will be displayed in the profile page..but now I want to add a verification process, just like google web-masters does..where they need to upload a certain file and so..
please tell me whats the best way to do this ?
Thanks :)

Generate a token for each domain (sha-1 of domain or so), store it in your DB or what have you.
Generate a text-file containing the token on user request.
Ask the user to inform you to poll or poll every now and then to check the URL. This can easily be done by file_get_contents in PHP if fopen_wrappers are enabled.
The token is obviously compared to the token in your DB to make sure it wasn't just a random file present at a random domain..
Could be a good idea to check at some time interval if the file is still there, to keep someone from selling the domain but remain in control
It's not really black art as we can assume the user has access to its domain once any specific request which proves access can be fulfilled by the user. There's no real way to fool the system except doing some DNS-magic, or gaining entry to the webserver running on the domain, which is out of your control anyway.

Not sure if that's the best way, but I think Google does something like this:
get user's domain name (e.g. "http://example.com")
generate unique code and store in db
tell user where to upload the code (e.g. something like "/verification.txt")
after confirmation, make a HTTP request for the code ("http://example.com/verification.txt") from own server to the user's server
compare the code you received to the code in the db
You may want to generate consistently the same code for the same domain.

This question is convoluted. I think you need to spell out what you are looking for a little better.
EDIT #1:
Generate an md5 and give it to the user, tell them to put it on their domain and provide a URL to where it is. This could be in a txt file or anything.
Then read that file and check if the md5 string exists in there.
Actually I would come up with something slightly different than an md5. Maybe three of them, so that you reduce the chance they find it on some other domain and then give you that URL.
This can still be spoofed unless you nail down constraints, like it has to be a text file, the file must only contain the md5... etc.
Right now I can type in an md5 but it doesn't mean I control this website:
md5("i fooled you") = "0afb2d659b709f8ad499f4b87d9162f0"
But if I handed the URL to this answer, your system might accidentally think I have admin here.
I recommend creating a file and making them upload the file and give you the URL to it. But even that won't necessarily work because there are many sites where you can just upload something.
Maybe if it's a php encoded file that can execute? That's kind of a security flaw because I don't know if I would upload just anyone's PHP file. Typically if you don't have admin nobody is going to let you upload a php file that would work.
You might want to create a php call-home script but that's gonna be bad. People wouldn't use it.

Another way it could be done is:
Get the domain name
Generate a random code/string.
Sore this in your database
Make a meta tag and the random code in the content.
Use file get contents of the index page of the website.
Then search the page for the meta tag with the code sorted in the database.
If statement for success or unsuccessful.
The meta tag should look like this:
<meta name="site-verification" content="1010101010101010101010101010101010101010" />

Actually, just creating an md5-string for the domainname, letting the site owner put that in a meta-tag so you can check that would allready work fine ...

Related

cURL source retrieval (ignore files/images)

We've developed an irc bot in php which, (among many other functions), will respond with the page title of any url a user sends to the channel. The problem i'm having is that when someone puts the url of an image or a file, the bot tries to retrieve that file or image.
I'm trying to determine the best way to go about solving this issue. Should I filter the url inputs and regex them for all possible file types? That seems daunting and exhaustive, to say the least. If anyone caught on to it they could simply put a huge file somewhere with a senseless extension and then say that url in the channel and time the bot out.
I feel like i'm missing a curl option which could make it simply ignore file retrievals which aren't simply ascii in nature. Any advice or suggestions?
One idea could be that you do a HEAD request first and if the content type is text/html you download it otherwise you don't. Or you could just read the first 1000 characters (or something small) and check if the title is there. And if isn't you assume it is something else than html.

how to fake url detection by php

im working on a script for indexing and downloading whole website by user sent url
for example when a user submit a domain like http://example.com then i will copy all links in index page and go for download the its inside links and start from first.....
i do this part with curl and regular expression to download and extract the links
however
some yellow websites are making fake urls for example if you go to http://example.com?page=12 it have some links to http://example.com?page=12&id=10 or http://example.com?page=13 and etc..
this will make a loop and the script cant complete the site downloading
is there any way to detect these kind of pages!?
p.s.: i think google and yahoo and some other search engines face this kind of problem too but their database are clear and on searches thay dont show these kind of data....
Some pages may use GET variables and be perfectly valid (like as you've mentioned here, ?page=12 and ?page=13 may be acceptable). So what I believe you're actually looking for here is a unique page.
It's not possible however to detect these straight from their URL. ?page=12 may point to exactly the same thing as ?page=12&id=1 does; they may not. The only way to detect one of these is to download it, compare the download to pages you've already got, and as a result find out if it really is one you haven't seen yet. If you have seen it before, don't crawl its links.
Minor side note here: Make sure you block websites from a different domain, otherwise you may accidentally start crawling the whole web :)

Making a PDF or page viewable only once, and protecting it from being printed or shared

On our Wordpress site we would like to have some pages or files made available only to members who pay a fee to view the material once. The content could be either a site page or a PDF, but the key is we want a member to only be able to see it just that one time, and we also want to be sure the client cannot print, copy or share it.
We realize anything on the screen can be grabbed, and yes, there in theory will always be some who will then run it through an OCR or simply type it out, but the number will be relatively small, especially within our specific group. So with all that said, do you know of a "best" way to protect a page or file from being easily shared or printed?
Thank you!
By the very nature of web content being loaded into your browser it's technically on your system as a temporary file. PDF is meant to be a PORTABLE document.
As for a webpage you can create new print styles that will mess up the printing, and add some javascript to make copying a pain, but this is hacking the intended purpose of web documents.
Another alternative (not that I'm endorsing this!) would be to make the content in Flash! It's always a pain to rip off ;)
You could setup a simple database access table that stores a userid against the page/file URL.
access_id | user_id | resource_url
when the user views a page then you can check against this table, something like:
SELECT access_id
FROM access_table
WHERE user_id = {YOUR_USER_ID}
AND resource_url = "{CURRENT_URL}"
if you get a result then allow access, and delete the row. Next time they try the URL, there will be no result, so deny access.
With pdfs, to implement this you would need a wrapper script that you call with a parameter ($_GET['resource_id']), which contains the above code and then outputs the pdf contents to screen using headers and file_get_contents().
NOTE: This of course wont solve the problem completely as others have mentioned, but should add an extra layer of protection, as it will prevent a URL from being shared

Using a QR code securely

All,
I'm going to use a QR code from the following URL:
http://qrcode.kaywa.com/
I want to use the URL option so when someone scans it they are sent to the URL that I specified on the code. I want to have something like the following URL:
http://www.website.com/web-page/?type=uplights&action=checkout
Based on the variables in the URL I want to allow my user to insert some data.
Is there a way to secure this do that I know a user got to this URL from scanning the QR code instead of just typing that information into the URL?
Thanks!
Short Answer: Not directly.
QR codes were not designed to keep content stored within it secret. Someone could use a QR reader to scan your URL, store it and keep using it over and over again, without actually scanning it again.
One way we used to circumvent this issue was to encrypt our URL such that our own application (Based on ZXing) would be the only one capable of reading our QR code. It then sends the actual request with a nonce over a secure channel such that a replay attack would also be rendered useless (in case someone was sniffing outbound connections). All other readers see the encrypted URL which isn't of any use.
Other than that, there isn't another way of ensuring the user actually does scan your QR and doesn't type it out/paste it in.
The way we implemented this:
We stored the URL as http://www.website.com/app.php?<encrypted_string>. If someone read our URL a different QR decoder, they would be taken to our app.php page, which urged them to read the QR using our application.
Our app itself, on encountering that URL stripped off the encrypted query-string, decrypted it, and formed its own request to the right page. In PHP, you could execute that request at the server-end itself, so it is never visible to the user. You could use mcrypt as detailed here for encryption.
You can add a secret-ish parameter to the URL and not publish the URL with that parameter. But basically, no, you still won't know if someone didn't just type in that URL. (For example, I may have used the QR code, then cut and paste the URL in an email to a friend, and that friend may have typed it in.) But you'll know that they probably didn't just type it in.
QR codes are just easily reversible encodings for text. There's no magic there. So there are things you can do to make it less likely that someone typed in the URL, but you can never be certain.

Efficient Method for Preventing Hotlinking via .htaccess

I need to confirm something before I go accuse someone of ... well I'd rather not say.
The problem:
We allow users to upload images and embed them within text on our site. In the past we allowed users to hotlink to our images as well, but due to server load we unfortunately had to stop this.
Current "solution":
The method the programmer used to solve our "too many connections" issue was to rename the file that receives and processes image requests (image_request.php) to image_request2.php, and replace the contents of the original with
<?php
header("HTTP/1.1 500 Internal Server Error") ;
?>
Obviously this has caused all images with their src attribute pointing to the original image_request.php to be broken, and is also the wrong code to be sending in this case.
Proposed solution:
I feel a more elegant solution would be:
In .htaccess
If the request is for image_request.php
Check referrer
If referrer is not our site, send the appropriate header
If referrer is our site, proceed to image_request.php and process image request
What I would like to know is:
Compared to simply returning a 500 for each request to image_request.php:
How much more load would be incurred if we were to use my proposed alternative solution outlined above?
Is there a better way to do this?
Our main concern is that the site stays up. I am not willing to agree that breaking all internally linked images is the best / only way to solve this. I refuse to tell our users that because of something WE changed they must now manually change the embed code in all their previously uploaded content.
Ok, then you can use mod_rewrite capability of Apache to prevent hot-linking:
http://www.cyberciti.biz/faq/apache-mod_rewrite-hot-linking-images-leeching-howto/
Using ModRwrite will probably give you less load than running a PHP script. I think your solution would be lighter.
Make sure that you only block access in step 3 if the referer header is not empty. Some browsers and firewalls block the referer header completely and you wouldn't want to block those.
I assume you store image paths in database with ids of images, right?
And then you query database for image path giving it image id.
I suggest you install MemCached to the server and do caching of user requests. It's easy to do in PHP. After that you will see server load and decide if you should stop this hotlinking thing at all.
Your increased load is equal to that of a string comparison in PHP (zilch).
The obfuscation solution doesn't even solve the problem to begin with, as it doesn't stop future hotlinking from happening. If you do check the referrer header, make absolutely certain that all major mainstream browsers will set the header as you expect. It's an optional header, and the behavior might vary from browser to browser for images embedded in an HTML document.
You likely have sessions enabled for all requests (whether they're authenticated or not) -- as a backup plan, you can also rename your session cookie name to something obscure (edit: obscurity here actually doesn't matter as long as the cookie is set for your host only (and it is)) and check that a cookie by that name is set in image_request.php (no cookie set would indicate that it's a first-request to your site). Only use that as a fallback or redundancy check. It's worse than checking the referrer.
If you were generating the IMG HTML on the fly from markdown or something else, you could use a private key hash strategy with a short-live expire time attached to the query string. Completely air tight, but it seems way over the top for what you're doing.
Also, there is no "appropriate header" for lying to a client about the availability of a resource ;) Just send a 404.

Categories