Detect when request is for preview generation

Detect when request is for preview generation - php

We have certain action links which are one time use only. Some of them do not require any action from a user other than viewing it. And here comes the problem, when you share it in say Viber, Slack or anything else that generates a preview of the link (or unfurls the link as Slack calls it) it gets counted as used since it was requested.
Is there a reliable way to detect these preview generating requests solely via PHP? And if it's possible, how does one do that?

Not possible with 100% accuracy in PHP alone, as it deals with HTTP requests, which are quite abstracted from the client. Strictly speaking you cannot even guarantee that user have actually seen the response, even tho it was legitimately requested by the user.
The options you have:
use checkboxes like "I've read this" (violates no-action requirement)
use javascript to send "I've read this" request without user interaction (violates PHP alone requirement)
rely on cookies: redirect user with set-cookie header, then redirect back to show content and mark the url as consumed (still not 100% guaranteed, and may result with infinite redirects for bots who follow 302 redirects, and do not persist cookies)
rely on request headers (could work if you had a finite list of supported bots, and all of them provide a signature)

I've looked on the entire internet to solve this problem. And I've found some workarounds to verify if the request is for link preview generation.
Then, I've created a tool to solve it. It's on GitHub:
https://github.com/brunoinds/link-preview-detector
You only need to call a single method from the class:
<?php
require('path\to\LinkPreviewOrigin.php');
$response = LinkPreviewOrigin::isForLinkPreview();
//true or false
I hope to solve your question!

Related

how to prevent access file from curl use php [duplicate]

I have a webserver, and certain users have been retrieving my images using an automated script.I wish to redirect them to a error page or give them an invalid image only if it's a CURL request.
my image resides in http://example.com/images/AIDd232320233.png, is there someway I can route it with .htaccess to my controller index function to where I can check if it's an authentic request?
and my other question, how can I check browser headers to distinguish between most likely authentic ones and ones done with a cURL request?

Unfortunately, the short answer is 'no.'
cURL provides all of the necessary options to "spoof" any browser. That is to say, more specifically, browsers identify themselves via specific header information, and cURL provides all of the tools to set header data in whatever manner you choose. So, directly distinguishing two requests from one another is not possible.*
*Without more information. Common methods to determine if there is a Live Human initiating the traffic are to set cookies during previous steps (attempts to ensure that the request is a natural byproduct of a user being on your website), or using a Captcha and a cookie (validate someone can pass a test).
The simplest is to set a cookie, which will really only ensure that bad programmers don't get through, or programmers who don't want to spend the time to tailor their scraper to your site.
The more tried and true approach is a Captcha, as it requires the user to interact to prove they have blood in their veins.
If the image is not a "download" but more of a piece of a greater whole (say, just an image on your site), a Captcha could be used to validate a human before giving them access to the site as a whole. Or if it is a download, it would be presented before unlocking the download.
Unfortunately, Captchas are are "a pain," both to set up, and for the end-user. They don't make a whole lot of sense for general-purpose access, they are a little overboard.
For general-purpose stuff, you can really only throttle IPs, download limits and the like. And even there, you have nothing you can do if the requests are distributed. Them's the breaks, really...

What should I do in my webpage so that only a user from a web browser can request a page and post some data and not from user using Fiddler?

I recently created two pages, front-end.php and back-end.php.
front-end.php post some data to back-end.php on mouse click (I am currently using ajax for this).
Now if I use Fiddler, then also I am able to post data to back-end.php. I do not want this to happen. What should I do?
I searched on Internet for answer and came to a word 'Setting User_agent', but solution is not given clearly.
Regarding what I want, Actually I do not want some bot or some other type of automatic program to get some data from some source and post it to my back-end.php. I want to assure that the user comes to my webpage and then post some data.

User Agent is a header that your browser sends to the web server with each request identifying itself. Here you can see what it is like.
Fiddler sends "*" or "Fiddler" as user agent, so you can ignore requests having those values. However, this is far from optimal solution to your problem because one can simply spoof the user-agent header by sending whatever she likes.
An other non-secure condition would be to check the referer. So, you can ignore all requests except those coming from "front-end.php". Keep in mind that this, too, can be spoofed by the user.
You should keep also in mind that since a user can send data to the web-server using her browser, there is nothing that can stop her from sending data or making requests using any other way.
In general, web developers should respect the user's freedom and not force such tactics, so please be more specific and tell us what exactly is the real problem you want to solve and a more elegant/secure solution may exist.
EDIT: If you do not want crawlers to index some/all of your pages you should add them in your robots.txt file.
Regarding all other automations/programs I'm afraid there is no perfect way to be sure if the request was made from a human being or a robot. I would do two things: a) Make sure to add validation rules to my backend and b) as a last resort implement a CAPTCHA test.
I would use CAPTCHA only if absolute necessary because it irritates most users and makes their lives difficult.

You should add a hash of some internal secret and the value you want to send. As you are the only one who knows how to make the hash, a fiddler cannot know how to create the secret.
For instance, you make a hash of "asdflj8######GJlk" concatenated by whatever your value in your form is. Now the hacker cannot change the form. The problem is, you can post the same value (with the same hash) from another place. To stop this from happening you should make sure all hashes can be used only once. The only thing a hacker can do now is to post your request from fiddler instead of from your site, but not at the same time. As a final step you can add something as a time-limitation
So what you need is a hash with:
a secret
a method to make the hash single-use
a methode to make the hash time-limited
add this as a field. Specific implementation is left as an exercise ;)
I would not use user_agent, these can be easily faked.
(these methods are the same that payment-providers use to ensure the data (e.g. the amount to be payed!) is not tampered with)

The shortest answer is that anything your browser can do, Fiddler can do. It can send any header it wants using any value it wants.
If your goal is to be able to pass some values from one page to another, without ANYONE changing them (either the browser or Fiddler) you use a Message Authentication Code (a signed hash of the data).
ASP.NET builds this feature in for their "ViewState" data; see http://msdn.microsoft.com/en-us/library/system.web.configuration.pagessection.enableviewstatemac(v=vs.110).aspx
However, that precludes the client (e.g. your JavaScript) from changing the values at all; if JavaScript can change the values, it means that it has the key, and if it has the key, so does Fiddler.

Is there any secure way to allow cross-site AJAX requests?

I am currently working on a script that website owners could install that would allow users to highlight a word and see the definition of the word in a small popup div. I am only doing this as a hobby in my spare time and have no intention of selling it or anything, but nevertheless I want it to be secure.
When the text is highlighted it sends an AJAX request to my domain to a PHP page that then looks up the word in a database and outputs a div containing the information. I understand that the same-origin policy prohibits me from accomplishing this with normal AJAX, but I also cannot use JSONP because I need to return HTML, not JSON.
The other option I've looked into is adding
header("Access-Control-Allow-Origin: *");
to my PHP page.
Since I really don't have much experience in security, being that I do this as a hobby, could someone explain to me the security risks in using Access-Control-Allow-Origin: * ?
Or is there a better way I should look into to do this?

Cross-Origin Resource Sharing (CORS), the specification behind the Access-Control-Allow-Origin header field, was established to allow cross-origin requests via XMLHttpRequest but protect users from malicious sites to read the response by providing an interface that allows the server to define which cross-origin requests are allowed and which are not. So CORS is more than simply Access-Control-Allow-Origin: *, which denotes that XHR requests are allowed from any origin.
Now to your question: Assuming that your service is public and doesn’t require any authentication, using Access-Control-Allow-Origin: * to allow XHR requests from any origin is secure. But make sure to only send that header field in those scripts your want to allow that access policy.

"When the text is highlighted it sends an AJAX request to my domain to a PHP page that then looks up the word in a database and outputs a div containing the information. I understand that the same-origin policy prohibits me from accomplishing this with normal AJAX, but I also cannot use JSONP because I need to return HTML, not JSON."
As hek2mgl notes, JSONP would work fine for this. All you'd need to do is wrap your HTML in a JSONP wrapper, like this:
displayDefinition({"word": "example", "definition": "<div>HTML text...</div>"});
where displayDefinition() is a JS function that shows a popup with the given HTML code (and maybe caches it for later use).
"The other option I've looked into is adding header("Access-Control-Allow-Origin: *"); to my PHP page. Since I really don't have much experience in security, being that I do this as a hobby, could someone explain to me the security risks in using Access-Control-Allow-Origin: *?"
The risks are essentially the same as for JSONP; in either case, you're allowing any website to make arbitrary GET requests to your script (which they can actually do anyway) and read the results (which, using normal JSON, they generally cannot, although older browsers may have some security holes that can allow this). In particular, if a user visits a malicious website while being logged into your site, and if your site may expose sensitive user data through JSONP or CORS, then the malicious site could gain access to such data.
For the use case you describe, either method should be safe, as long as you only use it for that particular script, and as long as the script only does what you describe it doing (looks up words and returns their definitions).
Of course, you should nor use either CORS or JSONP for scripts you don't want any website to access, like bank transfer forms. Such scripts, if they can modify data on the server, generally also need to employ additional defenses such as anti-CSRF tokens to prevent "blind" CSRF attacks where the attacker doesn't really care about the response, but only about the side effects of the request. Obviously, the anti-CSRF tokens themselves are sensitive user-specific data, and so should not be obtainable via CORS, JSONP or any other method that bypasses same-origin protections.
"Or is there a better way I should look into to do this?"
One other (though not necessarily better) way could be for your PHP script to return the definitions as HTML, and for the popups to consist of just an iframe element pointing to the script.

JSONP should fit your needs. It is a widely deployed web technique that aims to solve cross domain issues. Also you should know about CORS which addresses some disadvantages of JSONP. The links I gave you will also contain information about security considerations about these techniques.
You wrote:
but I also cannot use JSONP because I need to return HTML, not JSON.
Why not? You could use a JSONP response like this:
callback({'content':'<div class="myclass">...</div>'});
and then inject result.content into the current page using DOM manipulation.

Concept of CSRF(Cross Site Request Foregery) can be your concern
http://en.wikipedia.org/wiki/Cross-site_request_forgery
there are multiple ways to limit this issue, most commonly used technique is employing use of csrf token.
Further you should also put a IP based Rate limiter for "Limiting execution of a to number requests made from a certain ip", to limit DoS attacks that can be done if you are a target, you can seek some help from the How do I throttle my site's API users?

CORS issues are simple - do you want anyone to be able to remotely AJAX your stuff on your domain? This could be extremely dangerous if you've got forms that are prone to CSRF. Here is an example plucked straight out of my head.
The set-up:
A bank whose online banking server has CORS headers set to accept all (ACAO: *) (call it A)
A legitimate customer who is logged in (call them B)
A hacker who happens to be able to make the client run anything (call it E)
A<->B conversation is deemed lawful. However, if the hacker can manage to make the mark (B) load a site with a bit of JS that can fire off AJAX requests (easy through permanent XSS flaws on big sites), he/she can get B to fire requests to A by JSON, which will be allowed and treated as normal requests!
You could do so many horrible things with this. Imagine that the bank has a form where the input are as follows:
POST:
* targetAccountID -> the account that will receive money
* money -> the amount to be transferred
If the mark is logged in, I could inject:
$.ajax({ url: "http://A/money.transfer.php"; data { targetAccountID: 123; money: 9999; }; });
And suddenly, anyone who visits the site and is logged in to A will see their account drained of 9999 units.
THIS is why CORS is to be taken with a pinch of salt - in practice, DO NOT open more than you need to open. Open your API and that is it.
A cool side note, CORS does not work for anything before IE9. So you'll need to build a fallback, possibly iframes or JSONP.
I wrote about this very topic a short while back: http://www.sebrenauld.co.uk/en/index/view/access-json-apis-remotely-using-javascript in a slightly happier form than Wikipedia, by the way. It's a topic I hold dear to my heart, as I've had to contend with API development a couple of times.

Sharing Sessions with 302 Redirects/IMG SRC/ JSON-P and implications with Google SEO/Pagerank or Other Problems

I am currently researching the best way to share the same session across two domains (for a shared shopping cart / shared account feature). I have decided on two of three different approaches:
Every 15 minutes, send a one time only token (made from a secret and user IP/user agent) to "sync the sessions" using:
img src tag
img src="http://domain-two.com/sessionSync.png?token="urlsafebase64_hash"
displays an empty 1x1 pixel image and starts a remote session session with the same session ID on the remote server. The png is actually a PHP script with some mod_rewrite action.
Drawbacks: what if images are disabled?
a succession of 302 redirect headers (almost same as above, just sending token using 302's instead:
redirect to domain-2.com/sessionSync.php?token="urlsafebase64_hash"
then from domain-2.com/sessionSync, set(or refresh) the session and redirect back to domain-1.com to continue original request.
QuestionL What does Google think about this in terms of SEO/Pagerank?? Will their bots have issues crawling my site properly? Will they think I am trying to trick the user?
Drawbacks: 3 requests before a user gets a page load, which is slower than the IMG technique.
Advantages: Almost always works?
use jsonp to do the same as above.
Drawbacks: won't work if javascript is disabled. I am avoiding this option because of particularly this.
Advantages: callback function on success may be useful (but not really in this situation)
My questions are:
What will google think of using 302's as stated in example 2 above? Will they punish me?
What do you think the best way is?
Are there any security considerations introduced by any of these methods?
Am I not realizing something else that might cause problems?
Thanks for all the help in advance!

Just some ideas:
You could use the jsonP approach and use the <noscript> tag to set the 302-chains mode.
You won't find a lot of js disabled clients in the human part of your web clients.
But the web crawlers will mostly fall in the 302-chain mode, and if you care about them you could maybe implement some user-agent checks in sessionSync to give them specific instructions. For example give them a 301 permanent redirect. Your session synchronistation needs are maybe not valid for web crawlers, maybe you can redirect them permanently (so only the first time) without handling any specific session synchronisation for them. Well it depends ofg your implementation of this 302-chains but you could as well set something in the crawlers session to let them crawl domain-1 without any check on domain-2, as this depends on the url you generate on the page, and that you could have something in the session to prevent the domain-2 redirect on url generation.

How to identify curl request

Is there a way to detect in my script whether the request is coming from normal web browser or some script executing curl. I can see the headers and can distinguish with "User-Agent and other few headers" but in curl fake headers can be set, so i am not able to track the request.
Please suggest me ways about identifying the curl or other similar non browser request.

The only way to catch most "automated" requests is to code in logic that spots activity that couldn't possibly be human with a browser.
For example, hitting pages too fast, filling out a form too fast, have an external source in the html file (like a fake css file through a php file), and check to see if the requesting IP has downloaded it in the previous stage of your site (kind of like a reverse honeypot), but you would need to exclude certain IP's/user agents from being blocked, otherwise you'll block google's webspiders. etc.
This is probably the only way of doing it if curl (or any other automated script) is faking its headers to look like a browser.

Strictly speaking, there is no way.
Although there are non-direct techiques, but I would never discuss it in public, especially on a site like Stackoverflow, which encourage screen scraping, content swiping autoposting and all this dirty roboting stuff.
In some cases you can use CAPTCHA test to tell a human from a bot.

As far as i know, you can't see the difference between a "real" call from your browser and one from curl.
You can compare the header (User-agent) but its all i know.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.