I would like to track all views to a page using php and mysql. I will be tracking the number of times a person viewed the page and the ip address along with the current date. However is there a way to make sure your tracking actual users rather than bots/spiders?
Two options that I see:
Create a "hidden" link on your home page to a honey pot. Any one who hits the honey pot page should be considered a bot and not included in your stats
2: Not a fool proof way, but you could compare the browser's User Agent string to a white list of known web browsers. This string can be spoofed so its not the most reliable.
Personally, I'd go with the first option.
For the honey pot:
on your home page I'd add something like this:
ReallyNotATrap
and on the honey pot page itself something like this:
$BotIp=$_SERVER['REMOTE_ADDR'];
//DB connection
Insert into BlackList($BotIp,$Date,$otherDataYouCareAboutLogging);
//close DB Connection
Then for your stats code simply compare every user's Ip to the BlackList table. If the user isn't on it, record the stats.
EDIT
As pointed out below, googlebot can get tricked by this. If this is something that matters to you (if your just filtering for your own stats and not filtering content it shouldn't matter), include your honeypot page in your Robots.txt. Google will read the text file and avoid the trap. Other nasty bots will fall into it. Since google will avoid our trap, I would also use option 2 and filter out Google's User Agent String from the stats.
The amount of real users should be basically the same number as the number of real users - bots. If you want to you can check the User Agent which will tell you who is browsing the site.
You could try out my tracking script, it's pretty simple to implement and bots and spiders will come up as a bunk browser so it's easy to weed them out. I use this on all my company's sites for analytics. There's one caveat though, if you use this for keyword tracking you may be disappointed real soon because Google is starting to change the structure of their query strings for logged in users.
https://github.com/k4t434sis/tracking.php
Related
I have built a pretty basic advertisement manager for a website in PHP.
I say basic because it's not complex like Google or Facebook ads or even most high end ad servers. Doesn't handle payments or anything or even targeting users.
It serves the purpose for my low traffic site though to simply show a random banner ad, count impression views and clicks.
Features:
Ad slot/position on page
Banner image
Name
View/impression counter
Click counter
Start and end date, or never ending
Disable/enable ad
I am wanting to gradually add more functionality to the system though.
One thing I have noticed is the Impressions/views counter often seems inflated.
I believe the cause of this is from Social networks' spiders and bots as well as search engine spiders.
For example, if someone enters a URL from a page on my website into Facebook, Google+, Twitter, LinkedIn, Pinterest, and other networks, those sites will often spider my site to gather the webpages Title, images, and description.
I would really like to be able to disable this from counting as Advertisement impressions/view counts when an actual human is not viewing the page.
I realize this will be very hard to detect all these but if there is a way to get a majority of them, at least it will make my stats a little more accurate.
So I am reaching out for any help or ideas on how to achieve my goal? Please do not say to use another advertisement system, that is not in the cards, thank you
You need to serve the ADs with JavaScript. That's the only way to avoid most of the crawlers. Only browsers load dependencies like Images, JS and CSS. 99% of the robots avoid them.
You can also do this:
// basic crawler detection and block script (no legit browser should match this)
if(!empty($_SERVER['HTTP_USER_AGENT']) and preg_match('~(bot|crawl)~i', $_SERVER['HTTP_USER_AGENT'])){
// this is a crawler and you should not show ads here
}
You'll have much better stats this way. Use JS for ads.
PS: You could also try setting a cookie in JS and later checking for it. Crawlers might get cookies sent in PHP by HTTP but those set in JS, 99.9% chances they'll miss it. Because they need to load a JS file and interpret it. That's only done by browsers.
You could do something like this:
There is a good list of crawlers in text format here: http://www.robotstxt.org/db/all.txt
assume you've collected all of the user agents in that file in an array called $botList
$ua = isset($_SERVER['HTTP_USER_AGENT']) ? strtolower($_SERVER['HTTP_USER_AGENT']) : NULL;
if($ua && in_array($ua, $botList)) {
// this is probably a bot
}
Of course, user agent easily can be changed or may be missing sometimes, but search engines like Google and Yahoo are honest about themselves.
A crawler will download robots.txt, even if it doesn't respect it and does it out of curiosity. This is a good indication you might be dealing with one, although it's not definite.
You can detect a crawler if he visits a huge number of links in a very short time. This can be quite complicated to do in code though.
But that's only feasible if you don't want or can't run Javascript. Otherwise go with CodeAngry's answer.
Edit: In response to #keune's answer, you could keep all the visitor IPs and run them through the list in a cron job, then publish the updated visitor count.
Try this:
if (preg_match("/^(Mozilla|Opera|PSP|Bunjalloo|wii)/i", $_SERVER['HTTP_USER_AGENT']) && !preg_match("/bot|crawl|crawler|slurp|spider|link|checker|script|robot|discovery|preview/i", $_SERVER['HTTP_USER_AGENT'])) {
It's not a bot
} else {
It's a bot
}
At the moment, I'm working on a website that could use some extra user usability, so I want to launch a couple of modal windows to aid users on their first time visiting of a couple pages.
I want to check if it is a users time time viewing a specific page. I've read about how you can run into problems when using cookies to do this. They can be deleted, the user can use a different PC or device, etc.
Also, I want to check for multiple pages if it's their first time viewing, not only directly after login.
I'm guessing a good idea for this would be to make a separate table with the pages in it that I need and setting a boolean for it if it is viewed or not.
Would this be the best way going about doing this?
There isn't a highly reliable way of doing that:
You can use cookies, but as you said, they are not reliable, a user can change PC, delete cookies, change browser, etc.
You can try using an IP address, but that's also not reliable. If a user switches address (which can today happen as you walk down the street with your mobile phone) he'll see the page over and over again. Moreover, if some other user happens to stumble upon the IP address the first user used, he won't see your tour/tutorial.
What I can suggest you is that you use cookies to detect if the user is new, but don't automatically throw the help modules on him, but prompt him using an non-obstructive toolbar at the top or bottom (never a popup window or lightbox).
That way, you get most of the users (because many people use the same browser and computer and rarely delete all their cookies), and even if a user has deleted his cookies/he still won't be disturbed that much.
There is no reliable approach if user is not registered and logged in with her/his username & password.
As mentioned before, there is no reliable way of detecting users ( and detecting if the user visits the site the first time), I also recomend Madara Uchiha's aproach, also you colud use html5 local storage in addition to cookies, both are not 100% reliable
u can however try user recognition without relying on cookies or html5 storage, but this is extremly complicated, u dont want to do this.
Just to satisfy your curiosity about how to do this, check this epic answer on a related question:
User recognition without cookies or local storage
I think, as I believe, there is no way with no solution. I think, a possible way consists of some parameters which first to be said and and finally by considering those, we can be able to talk about possibilities and impossibilities.
My parameters are in the below;
talk about features of a webpage as "User Detection" and detail them
think about reactions (I mean being fast to click on any elements of a page or not) on a webpage
inspect elements
URL injection
other reactions like click on some parts as spots placed on the page
stay on that page up to a time defined for being and checking authorizing
and so some solutions like the ones above.
I'm currently working on my Referral System, but I have a problem with protecting it of frauds.
Okay, here's how it works for now:
user registers and activate it's account
user now have access to the control panel and there is it's uniqe link in following format: domain.tld/ref/12345
when someone other click to user's link, he or she must to click a specific button to confirm that is not some kind of fraud (like "click here, you'll get $100" or something)
system writes visitor's IP in a database and some data to cookies to prevent re-pressing the button. User now have +1 point.
But, the problem is that visitor can change it's IP, clear cookies and hit button again. It takes a few seconds, and that's not OK, that's cheating.
How to prevent it? Is there some trick to get some unique computer ID or something can't be changed that easy?
Really the only options are to tie the process to something which is not so easily manipulated by the user - super cookies, browser fingerprints, OpenID, Email addresses and telephome numbers (the latter 2 using some sort of validaton step before a vote is counted)
The only way you can be certain a referred party does not reuse a referral code is for the original user to send different one-time-use-only referral URLs to each person. Once the code has been used, it is flagged as such in (or removed entirely from) your database so that it can not be used again.
How you prevent the original user from sending multiple links out to the same person is another matter - and not an easy one to resolve.
Who do you perceive to be the threat?
Although it's certainly not 100% accurate, you can still fingerprint visitors using for example a combination of their ip, browser user agent, and with some javascript you can even go for screen size or installed fonts. Using these pieces of information you can set up a system where you save the fingerprints in datatable and in the same record you store the session id (from the cookie). Now when a new visitor arrives you can test their fingerprint against the db of recent fingerprints with different visitor ids. If you find a large number of matching fingerprints (you define the threshold) with different sessions then you can alert for the possibility of fraud.
Cheers
How about storing the link with with the user when they navigate to the link. then in the database you will have the link and if the users has already been to the link then deny them. Seems like it could work then you wouldn't have to worry about the cookies etc...
I want to create a private url as
http://domain.com/content.php?secret_token=XXXXX
Then, only visitors who have the exact URL (e.g. received by email) can see the page. We check the $_GET['secret_token'] before displaying the content.
My problem is that if by any chance search bots find the URL, they will simply index it and the URL will be public. Is there a practical method to avoid bot visits and subsequent index?
Possible But Unfavorable Methods:
Login system (e.g. by php session): But I do not want to offer user login.
Password-protected folder: The problem is as above.
Using Robots.txt: Many search engine bots do not respect it.
What you are talking about is security through obscurity. Its never a good idea. If you must, I would offer these thoughts:
Make the link expire
Lock the link to the C or D class of IPs that it was accessed from the first time
Have the page challenge the user with something like a logic question before forwarding to the real page with a time sensitive token (2 step process), and if the challenge fails send a 404 back so the crawler stops.
Try generating a 5-6 alphanumeric password and attach along with the email, so eventhough robots spider it , they need password to access the page. (Just an extra added safety measure)
If there is no link to it (including that the folder has no index
view), the robot won't find it
You could return a 404, if the token is wrong: This way, a robot (and who else doesn't have the token) will think, there is no such page
As long as you don't link to it, no spider will pick it up. And, since you don't want any password protection, the link is going to work for everyone. Consider disabling the secret key after it is used.
you only need to tell the search engines not to index /content.php, and search engines that honor robots.txt wont index any pages that start with /content.php.
Leaving the link unpublished will be ok in most circumstances...
...However, I will warn you that the prevalence of browser toolbars (Google and Yahoo come to mind) change the game. One company I worked for had pages from their intranet indexed in Google. You could search for the page, and a few results came up, but you couldn't access them unless you were inside our firewall or VPN'd in.
We figured the only way those links got propagated to Google had to be through the toolbar. (If anyone else has a better explanation, I'd love to hear it...) I've been out of that company a while now, so I don't know if they ever figured out definitively what happened there.
I know, strange but true...
I currently have a website built in PHP, I'm hoping to build a referral system tonight.
My theory is that if I dynamically generate a url and place it on my users' homepage such as
"Referral url = www.mysite.co.uk/referral.php?user=myuser"
Then I could have a script in the page referral.php which gets the username and runs an sql query updating their corresponding row in my table.
The only thing is anybody could then add there own name and sign up multiple accounts.
What is the best way to go about building something like this?
Thanks
Suppose to get the referral url my users had to click a button which generates the referral url as a rand ie mysite.com/refer.php?user=234234, at the same time storing it in a the db.
Once somebody visits the page refer.php, the referrer then gets his credits or benefit added to his row in the db, at the same time setting his referral code to 0, making the code only available once.
Each time he hits the button on his page, his referal code would change.
Would this be valid do you think?
You generally have the right idea, but protecting against fraud in a referral system is difficult. You can check for unique IP addresses in $_SERVER and add that to the database for each request, throwing away duplicates or limiting referrals from IP addresses that don't come from the user who signed up. Like HTTP_REFERRER, this can be spoofed as well with ease (using TOR for example).
It's a tough problem that isn't something you can truly "solve." Like most fraud cases, you can only do your best to mitigate the effect.
EDIT TO ADD: You can also require referrals to "mature" by forcing the referred user to be active on the site for a defined period of time (say, 30 days) to increase the effort of spammers/cheaters. But again, this doesn't "solve" the problem - all you can do is make it tougher for them to game the system. And occasionally, by doing this stuff, you can ruin user experience. So how do you balance it? Tough question. :)
EDIT TO ADDRESS YOUR EDIT: Contemplate the following scenarios if we implement your plan:
1) I click the button and get my code. Then I paste it to... who exactly? One person? That's not particularly good for sharing on Facebook, MySpace (does this exist?), or my personal blog. I have to generate a referral code for EVERY person I send to the site? That not only scales terribly, but is a horrible user experience as well.
2) Let's say I figure out what you're doing. I develop a bot that clicks that button 4 trillion times. What now?
You could use the 'HTTP_REFERER' value to check what page the request came from, and only allow votes for that user from their page. This can be spoofed, but most people won't know how.
You can count the referral points for some additional action on your site, for example registration or payment or anything that is behind captcha check.
But do not build any obstacles for your regular legit users!
you could embed this information in the POSTDATA - it's a tad more secure.
Or you can add a restriction that any user may be upvoted ONLY by any other EXISTING user only once. And to make it more "secure", generate userIds with a random seed.