Stop Facebook probing a site for content with PHP - php

Okay, so when you post a link on Facebook, it does a quick scan of the page to find images and text etc. to create a sort of preview on their site. I'm sure other social networks such as Twitter do much the same, too.
Anyway, I created a sort of "one time message" system, but when you create a message and send the link in a chat on Facebook, it probes the page and renders the message as "seen".
I know that the Facebook probe has a user agent of facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php), so I could just block all requests from anything with that user agent, but I was wondering if there's a more efficient way of achieving this with all sites that "probe" links for content?

No there's no fool-proof way to do this. The easiest way to achieve something like this is to manually block certain visitors from marking the content as seen.
Every entity on the web identifies itself with a user agent, although not every non-human entity identfies itself in an unique way there are online database like this one that can help achieve your goal.
In case of trying to block all bots via robots.txt, not every bot holds up to that standard. I will speculate that Facebook may try to prevent malware from being spread across their network by visiting any shared link.

you could try something like this in your robots.txt file
User-agent: *
Disallow: /

Related

Tracking visitors coming from Facebook

I have written a PHP based blog for the company i work for. Not using any frameworks. I am having trouble tracking users who come from my facebook page's posts to my blog (not wordpress).
I have created a shortlink url. Let's say it is sample.co and it redirects traffic to sample.com. Everything seems fine until here. The problem starts here.
I am adding all user's ip's, user agents. But if even i get 500 visits, my code adds somethig like 3.000 visits. Facebook stats and Analytics shows similar stats (~500 visits). I see that ip's added to MySQL are all different. It usually happens with Android users. I have read somewhere that Facebook sometimes renders to their users the actual URL when FB shows the post. I mean instead of the widget, Facebook shows the whole page. I am not quite sure about that to be honest.
To solve this problem, I have created and added an jquery script to my page and listened users' scroll event. It worked great. Not seeing too much traffic. But this time the problem is i am counting less users. Even I get 500 users from facebook and Analytics shows similar results, my script adds only 200-300 to MySQL.
Does anyone know a better way to track real traffic? Or do you aware of such problem?
Thanks
It should be filtered on the basis of user agent.
https://developers.facebook.com/docs/sharing/webmasters/crawler
how to detect search engine bots with php?
Identifying users through IP is a good idea, but if your IP keeps changing, it's a good idea to use cookies.
http://php.net/manual/en/function.uniqid.php
If the cookie does not exist, you should see it as a new user.
I have found the answer. The problem is called preview (prefetch). Here is the link:
https://www.facebook.com/business/help/1514372351922333
Simply, facebook preloads everything when FB shows the thumbnail to the visitor to speed up your page's load speed. They send X-Purpose: preview header. So you can simply check if HTTP_X_PURPOSE header's value is preview or not. If so, do not count it as a visitor.!
Here are more detailed descriptions:
http://inchoo.net/dev-talk/mitigating-facebook-x-fb-http-engine-liger/
http://inchoo.net/dev-talk/magento-website-hammering-facebook-liger/

Can I stop robots from ruining statistics in Google Analytics?

I use Google Analytics to get visitors statistics on my webiste (PHP) and I see that a lot of traffic comes from sites like share-buttons.xyz, traffic2cash.xyz and top1-seo-service.com. I think this is because I use SEO-firendy URL:s (for looks in the addess bar).
This is not really a problem for the site itself, but when I look at the statistics in Google Analytics it includes these robots and non-users and the numbers are therefore not true.
Is there a way to block these robots or do I have to subtract the robots visits from the statistics manually every time I want a report?
If you see this happening you can prospectively exclude them from all future reports in GA by using a filter on that view (admin - filters, create filter, then apply to specific view)
If you specifically want to do it proactively using PHP then you could use some regex to match undesirable referrers in request headers and return nothing.
The answer to the main question is yes, but it requires to be be persistent and it is basically an ongoing task that you will need to perform. Yes, I know is a pain.
Just to let you know this has nothing todo with PHP or your friendly URL, your website is being a victim of what is known as ghost referrals. Google has not publicly said anything on the topic but just recently I found this article reporting that Google has finally found a solution here.
However, I choose to be sceptical about this. In the mean time this is what you need to do:
Make sure to leave a view untouched without any filters (Read the fourth paragrah)
In Google Analytics > admin > view > View Settings> Check "Exclude all hits from known bots and spiders" like this.
In the same view block spam bots: a) Check the list of ghost referrals in YOUR REPORT following this method and b) Create a filter like this.
I recommend you to read this article in full that contains lots of details and more information.
Some people like to create filters with Regex listening all the spammy bots, if you want to check a up to date list visit this repository.

What is proper way to block bots/crawlers hitting link/page?

I am working on analytics and I am getting many in accurate results mostly because of either social media bots or other random bots like BufferBot,DataMinr etc from Twitter.
Is there any Web API/Database of all known bots available which I can use to check if it is a bot or human ?
Or is there any good way to block such kind of bots so that it doesn't effect the stats in terms of analytics?
You can link to a hidden page that is blocked by robots.txt. When visited, captures the user-agent and IP address of the bot and then appends one or both of them to a .htaccess file which blocks them permanently. It only catches bad bots and is automated so you don't have to do anything to maintain it.
Just make sure you set up the robots.txt file first and then give the good bots a fair chance to read it and update their crawling accordingly.
Create a file callled robots.txt in your route and add the following lines:
User-agent: *
Disallow: /
There is no way to outright block ALL bots, it would be an insane amount of time spent, you could use a .htaccess file or a robots.txt, stopping google indexing the site is easy but blocking bot traffic can get complicated and act like a house of cards
I suggest using this list of crawlers/web-bots http://www.robotstxt.org/db.html

How to know if user came from a Facebook link?

If there is a link posted on Facebook to my website, and a user follows it, I would like to display custom content using PHP. I tried with the following method...
$_SERVER['HTTP_REFERER']
Facebook must block this feature because it is not working. Is there a method for this that actually works with Facebook?
This news is a year old, yet I see they are still using it.. I post it here because It is still pretty informative The srouce of the link is https://www.facebook.com/note.php?note_id=10151070897728920
Restricting the Referrer
We still need to let the websites you navigate to know the traffic is
from Facebook, but we also want to prevent them from reading the full
source url. Otherwise, they could know where on the site you were when
you clicked their link. In order to strike this balance, we've taken
advantage of a new feature called the meta referrer, currently
available in Chrome 17+ and Safari 6+. This allows us to specify how
much of the source url to share with the external site via the Referer
header. If you're using one of these supported browsers you can take
advantage of this new feature. Otherwise, your browser will be routed
to the slightly slower older system.
This change should reduce the impact of the link shim on your browsing
(especially when accessing Facebook from a cellular network) and
should help save around a second for a typical user.

How to programmatically "press" a 'Like' button through a Facebook Application?

I'm developing this Facebook Application and I was wondering if it's possible (and how) to programmatically, through the Facebook PHP Graph API, press some 'Like' button on some page?
Of course, this is optional on my application... I'm still not ready to really explain what application I'm doing, but it would be interesting to code such a feature.
Is it possible somehow?
By your description it sounds like you're trying to get a user to like something without the users knowingly clicking a like-button. This sort of interaction is not condoned by Facebook, I think. There are various black-hatty ways to accomplish this though, one fairly elaborate one is descriped here: http://www.liquidrhymes.com/2010/08/25/smoking-hot-bartender-is-some-smoking-hot-facebook-spam/
UPDATE Sorry, I might be wrong. If you get stream_publish extended permissions from the user, you might be able to like posts on their behalf by doing a POST to /POST_ID/likes. See Publishing to Facebook in http://developers.facebook.com/docs/api
You cannot do this. Facebok wont let you do a POST to /POST_ID/likes, you can only do a get request to retrieve their likes. What you are trying to do is a violation of facebook's TOS. I would suggest just adding a like button and "forcing" them to like before they continue with your application. However, in my opinion even that is kind of silly because they can instantly go unlike it after they have used your application.
i was looking for the same thing, but not to force a user into liking something, but actually for their own protection.
here is where i come from: on a web site (maybe on multiple pages) there is an "I Like" button, implemented as described by facebook.
each time a user goes to that page, the browser will make a request to facebook, throught the iframe that contains the button, providing all the info that we are used to from a web server log file.
if the user has in the past logged in facebook and not cleared the cache. the request will also contain the cookie indentifying the facebook user.
so even more then analytics, facebook know all about the user activity on those pages.
so i wanted the user to only give this info when they decide to.
my solution was to have a button (as graphic only) on the page. when the user clicks it a new frame should open and only there the facebook code should be executed.
obviously on the new frame i could not put the normal "i like" code, since that would require a 2nd click for the user. at this point i would need the "programmatically clicking of the i like button".
it is not an opengraph solution, but it works: the frame just does a redirect to
http://www.facebook.com/share.php?u=URL

Categories