How do bookmarks work? - php

Im interesting in how bookmarks work for social networks sites like facebook for example, when you look at someone's profile its
www.facebook.com/customname
or if they didnt make one yet its
www.facebook.com/generatedname
Is there a get request somewhere im missing??? Is the
www.facebook.com/profile.php?key=
hidden in the url? But how does the server know to interpret the url to look for someone's profile page? How does it work!!!!! Thanks!

Yes, the request is usually hidden using rewrite engines such as mod_rewrite.
As such something like facebook.com/customname is rewritten to facebook.com/profile.php?key=customname, which then internally looks up the correct profile page from the database.

There is some solution called mod_rewrite, which actually translates the URL visited by the user (and visible to the user) into the path of the script (along with all the parameters).
Example: when you visit eg. http://www.facebook.com/ben, server may actually translate it into www.facebook.com/profile.php?name=ben without you noticing it (because it happens on the server side).
That is how it is done.
But there is still another, loosely related solution that happens on the client side (within the user's browser, not on the server). This solution is called pushState and it is HTML5's feature (HTML5 is new standard, supporting application-like behaviours in modern browsers).
Just look at this demonstration (it allows you to change URL, go back and forth, but if you type the visited URL directly you will show that there is nothing on the server). To make similar thing, you will need to learn JavaScript (language of the scripts executed on browser's side).
Alternatively to pushState some pages (like Twitter and - afair - Facebook) use solutions based on location hash (the part of the URL after #), which lets them maintain compatibility with some deprecated browsers, like IE7 etc.
Maybe this is far too much to answer your question, but you now should be pretty informed about how the URL visible to the user may differ from what is really invoked.
If you have any additional questions, let me know.

They probably use .htaccess or a similar mechanism to redirect all requests to a single entry file. That file starts processing the request and can also check to see if there is an account for customname that was specified on the url.

Related

Detect when request is for preview generation

We have certain action links which are one time use only. Some of them do not require any action from a user other than viewing it. And here comes the problem, when you share it in say Viber, Slack or anything else that generates a preview of the link (or unfurls the link as Slack calls it) it gets counted as used since it was requested.
Is there a reliable way to detect these preview generating requests solely via PHP? And if it's possible, how does one do that?
Not possible with 100% accuracy in PHP alone, as it deals with HTTP requests, which are quite abstracted from the client. Strictly speaking you cannot even guarantee that user have actually seen the response, even tho it was legitimately requested by the user.
The options you have:
use checkboxes like "I've read this" (violates no-action requirement)
use javascript to send "I've read this" request without user interaction (violates PHP alone requirement)
rely on cookies: redirect user with set-cookie header, then redirect back to show content and mark the url as consumed (still not 100% guaranteed, and may result with infinite redirects for bots who follow 302 redirects, and do not persist cookies)
rely on request headers (could work if you had a finite list of supported bots, and all of them provide a signature)
I've looked on the entire internet to solve this problem. And I've found some workarounds to verify if the request is for link preview generation.
Then, I've created a tool to solve it. It's on GitHub:
https://github.com/brunoinds/link-preview-detector
You only need to call a single method from the class:
<?php
require('path\to\LinkPreviewOrigin.php');
$response = LinkPreviewOrigin::isForLinkPreview();
//true or false
I hope to solve your question!

url shortener services API final link gets hit

I have used URL Shortener services, such as goo.gl or bit.ly, to shorten long URLs in my applications using their respective APIs. These APIs are very convenient, unfortunately I have noticed that the long URL gets hit when they shorten it. Let me explain a bit the issue I have. Let's say for instance that I want users to validate something (such as an email address, or a confirmation) and propose to them in my application a link for them to visit in order to validate something. I take this long URL, and use the API to shorten it. The target link (a PHP script for example) is getting hit when I call the shorten API, which makes the validation process useless.
One solution would be to make an intermediate button on the target page which the user has to click to confirm, but that solution makes another step in the validation process, which I would like to simplify.
I would like to know if anyone has already encountered this problem of if anyone has a clue in how to solve it.
Thanks for nay help.
I can't speak to Google but at Bitly we crawl a portion of the URLs shortened via our service to support various product features (spam checking, title fetching, etc) which is the cause of the behavior you are seeing.
In this type of situation we make two recommendations:
Use robots.txt to mark relevant paths as "disallowed". This is a light form of protection as there's nothing forcing clients to respect robots.txt but well behaved bots like BitlyBot or GoogleBot will respect your robots.txt file.
As mentioned by dwhite.me in a comment and as you acknowledged in your post, it is usually best to not do any state changing actions in response to GET requests. As always there's a judgement call on the risks associated vs the added complexity of a safer approach.

How to determine real user are browsing my site or just crawling or else in PHP

I want to know whether a user are actually looking my site(I know it's just load by the browser and display to human, not actually human looking at it).
I know two method will work.
Javascript.
If the page was load by the browser, it will run the js code automatically, except forbid by the browser. Then use AJAX to call back the server.
1×1 transparent image of in the html.
Use img to call back the server.
Do anyone know the pitfall of these method or any better method?
Also, I don't know how to determine a 0×0 or 1×1 iframe to prevent the above method.
A bot can access a browser, e.g. http://browsershots.org
The bot can request that 1x1 image.
In short, there is no real way to tell. Best you could do is use a CAPTCHA, but then it degrades the experience for humans.
Just use a CAPTCHA where required (user sign up, etc).
I want to know whether a user are actually looking my site(I know it's just load by the browser and display to human, not actually human looking at it).
The image way seems better, as Javascript might be turned off by normal users as well. Robots generally don't load images, so this should indeed work. Nonetheless, if you're just looking to filter a known set of robots (say Google and Yahoo), you can simply check for the HTTP User Agent header, as those robots will actually identify themselves as being a robot.
you can create an google webmasters account
and it tells you how to configure your site for bots
also show how robot will read your website
I agree with others here, this is really tough - generally nice crawlers will identify themselves as crawlers so using the User-Agent is a pretty good way to filter out those guys. A good source for user agent strings can be found at http://www.useragentstring.com. I've used Chris Schulds php script (http://chrisschuld.com/projects/browser-php-detecting-a-users-browser-from-php/) to good effect in the past.
You can also filter these guys at the server level using the Apache config or .htaccess file, but I've found that to be a losing battle keeping up with it.
However, if you watch your server logs you'll see lots of suspect activity with valid (browser) user-agents or funky user-agents so this will only work so far. You can play the blacklist/whitelist IP game, but that will get old fast.
Lots of crawlers do load images (i.e. Google image search), so I don't think that will work all the time.
Very few crawlers will have Javascript engines, so that is probably a good way to differentiate them. And lets face it, how many users actually turn of Javascript these days? I've seen the stats on that, but I think those stats are very skewed by the sheer number of crawlers/bots out there that don't identify themselves. However, a caveat is that I have seen that the Google bot does run Javascript now.
So, bottom line, its tough. I'd go with a hybrid strategy for sure - if you filter using user-agent, images, IP and javascript I'm sure you'll get most bots, but expect some to get through despite that.
Another idea, you could always use a known Javascript browser quirk to test if the reported user-agent (if its a browser) is really actually that browser?
"Nice" robots like those from google or yahoo will usually respect a robots.txt file. Filtering by useragent might also help.
But in the end - if someone wants to gain automated access it will be very hard to prevent that; you should be sure it is worth the effort.
Inspect the User-Agent header of the http request.
Crawlers should set this to anything but a known browser.
here are the google-bot header http://code.google.com/intl/nl-NL/web/controlcrawlindex/docs/crawlers.html
In php you can get the user-agent with :
$Uagent=$_SERVER['HTTP_USER_AGENT'];
Then you just compare it with the known headers
as a tip preg_match() could be handy to do this all in a few lines of code.

Showing my website in a users language

I have a small search engine site and I was wondering if there was any way of displaying my site in the users language. I am looking for an inventive and quick way that can also reside on just one URL.
I hope you can understand my question.
You could use the HTTP header "Accept-Language", to detect which languages the user has choosen as its prefered ones, in his browser.
In PHP, this will be available (if sent by the browser) in $_SERVER, which is an array that contains (amongst other things) HTTP headers sent by the client.
This specific header should be available as $_SERVER['HTTP_ACCEPT_LANGUAGE'].
I am assuming you already have different versions of the site in various languages. Most sites seem to just ask the user what their language is and then save that in a cookie. You can probably guess a users language using an ip to location tool.
You are probably more interested in this though: http://techpatterns.com/downloads/php_language_detection.php. This php script allows you to detect the users language based on info sent from their browser. It might not be completely accurate though, so you should always have an option to switch the language.
If you don't have translations of your page, you can redirect users to a google translate page.
There is a really easy solution for this. Just use Google's Translate Elements JS addon. You drop the JS on the page and Google takes care of the rest.
http://translate.google.com/translate_tools
The only downside is that they cannot fully interact with the site using this. By that I mean they cannot input something in their own language and you get back the input in yours. Also searches will have to be done in the sites native language. So really this just depends on what you are trying to accomplish here.
You could use a script which checks for a language cookie.
If language-cookie is set, you can use that value for using the right language-vars,
if not you find out the users current language by a way, you prefer. I think there are lot of ways, dont know which is the best.
Additional you would place a form somewhere on the site, where the user can klick a language, and u give that by post to a script which then sets a cookie, or overwrites the current cookie, if there is one allready.
This method obviously works with one url for all your languages, which i think is quite nice about it...

Sharing Sessions with 302 Redirects/IMG SRC/ JSON-P and implications with Google SEO/Pagerank or Other Problems

I am currently researching the best way to share the same session across two domains (for a shared shopping cart / shared account feature). I have decided on two of three different approaches:
Every 15 minutes, send a one time only token (made from a secret and user IP/user agent) to "sync the sessions" using:
img src tag
img src="http://domain-two.com/sessionSync.png?token="urlsafebase64_hash"
displays an empty 1x1 pixel image and starts a remote session session with the same session ID on the remote server. The png is actually a PHP script with some mod_rewrite action.
Drawbacks: what if images are disabled?
a succession of 302 redirect headers (almost same as above, just sending token using 302's instead:
redirect to domain-2.com/sessionSync.php?token="urlsafebase64_hash"
then from domain-2.com/sessionSync, set(or refresh) the session and redirect back to domain-1.com to continue original request.
QuestionL What does Google think about this in terms of SEO/Pagerank?? Will their bots have issues crawling my site properly? Will they think I am trying to trick the user?
Drawbacks: 3 requests before a user gets a page load, which is slower than the IMG technique.
Advantages: Almost always works?
use jsonp to do the same as above.
Drawbacks: won't work if javascript is disabled. I am avoiding this option because of particularly this.
Advantages: callback function on success may be useful (but not really in this situation)
My questions are:
What will google think of using 302's as stated in example 2 above? Will they punish me?
What do you think the best way is?
Are there any security considerations introduced by any of these methods?
Am I not realizing something else that might cause problems?
Thanks for all the help in advance!
Just some ideas:
You could use the jsonP approach and use the <noscript> tag to set the 302-chains mode.
You won't find a lot of js disabled clients in the human part of your web clients.
But the web crawlers will mostly fall in the 302-chain mode, and if you care about them you could maybe implement some user-agent checks in sessionSync to give them specific instructions. For example give them a 301 permanent redirect. Your session synchronistation needs are maybe not valid for web crawlers, maybe you can redirect them permanently (so only the first time) without handling any specific session synchronisation for them. Well it depends ofg your implementation of this 302-chains but you could as well set something in the crawlers session to let them crawl domain-1 without any check on domain-2, as this depends on the url you generate on the page, and that you could have something in the session to prevent the domain-2 redirect on url generation.

Categories