Advanced affiliate tracking techniques - php

I'm building an open source affiliate tracking system to release as a Codeigniter Spark and to incorporate into a few of my own projects. Anyway, at a basic level this is very simple stuff. This is how I'm tracking referrals:
User is referred to a specific endpoint (e.g. /ref/293203 with 293203 being the affiliate id) that sets a cookie, sets a session, and stores information about the user (their ip address and user agent). In this case, even if cookies are disabled, I can still match them either by sessions (which don't have to be cookie based) or by useragent + ip match. There are some other edge cases which this wouldn't cover, and I was hoping some of you might be able to provide some insight:
User deletes cookies and revisits later from a different browser & ip address (ie, they check out something for the first time on their iPhone, but end up purchasing it from home on their computer)
IP address and user agents match, but this user isn't actually unique (for example, I've worked from an incubator where all 40 people in the building are using the same static IP and many are using the same browser)
Any ideas on how to address those edge cases? Any others which I have not considered?

I've written a number of affiliate tracking engines from scratch as well as helped build a number that are used by major affiliate networks, and unfortunately you have to either rely on cookies or use your 2nd other option. However i would recommend including more then just the IP address and user agent match. I have heard of some people using which plugins are installed in the browser, which language the browser is in etc. There have been studies done that I cant find a link for at the moment you can achieve a very high level of accuracy using a combination of browser metrics.
That being said i would not recommend relying on PHP sessions to track users as it will no longer work if you end up having enough traffic that will require additional servers where the session stored on one, will not be stored in the other (also the fact that PHP sessions rely on cookies anyway, so you may as well use your own).
Your other case you mentioned about deleting cookies or using a different device to complete the transaction are unfortunately impossible to tie together without some sort of unifying system that will let you associate multiple devices/browsers etc. Its always known to networks that there will be a untraceable drop off that is usually worked into the pricing structure that affiliates are paid out. Ie maybe you can expect that 15% of sales will not be able to be tied to an affiliate properly and that can be either extra money for you, or you can bump up your payout. (as you would expect, most just eat the profit =)
There is another method i will share a link for, I've seen it talked about many times however i have never seen a proof of concept showing that it will work using the caching ETAG header. http://kuza55.blogspot.com/2007/05/tracking-users-with-cache-data.html
Hope this helps, and good luck its a fun thing to write!

Checkout https://github.com/samyk/evercookie and I cannot WAIT until you release your spark.

Related

Is there any reliable way to identify the user machine in a unique way? [duplicate]

I need to figure out a way uniquely identify each computer which visits the web site I am creating. Does anybody have any advice on how to achieve this?
Because i want the solution to work on all machines and all browsers (within reason) I am trying to create a solution using javascript.
Cookies will not do.
I need the ability to basically create a guid which is unique to a computer and repeatable, assuming no hardware changes have happened to the computer. Directions i am thinking of are getting the MAC of the network card and other information of this nature which will id the machine visiting the web site.
Introduction
I don't know if there is or ever will be a way to uniquely identify machines using a browser alone. The main reasons are:
You will need to save data on the users computer. This data can be
deleted by the user any time. Unless you have a way to recreate this
data which is unique for each and every machine then your stuck.
Validation. You need to guard against spoofing, session hijacking, etc.
Even if there are ways to track a computer without using cookies there will always be a way to bypass it and software that will do this automatically. If you really need to track something based on a computer you will have to write a native application (Apple Store / Android Store / Windows Program / etc).
I might not be able to give you an answer to the question you asked but I can show you how to implement session tracking. With session tracking you try to track the browsing session instead of the computer visiting your site. By tracking the session, your database schema will look like this:
sesssion:
sessionID: string
// Global session data goes here
computers: [{
BrowserID: string
ComputerID: string
FingerprintID: string
userID: string
authToken: string
ipAddresses: ["203.525....", "203.525...", ...]
// Computer session data goes here
}, ...]
Advantages of session based tracking:
For logged in users, you can always generate the same session id from the users username / password / email.
You can still track guest users using sessionID.
Even if several people use the same computer (ie cybercafe) you can track them separately if they log in.
Disadvantages of session based tracking:
Sessions are browser based and not computer based. If a user uses 2 different browsers it will result in 2 different sessions. If this is a problem you can stop reading here.
Sessions expire if user is not logged in. If a user is not logged in, then they will use a guest session which will be invalidated if user deletes cookies and browser cache.
Implementation
There are many ways of implementing this. I don't think I can cover them all I'll just list my favorite which would make this an opinionated answer. Bear that in mind.
Basics
I will track the session by using what is known as a forever cookie. This is data which will automagically recreate itself even if the user deletes his cookies or updates his browser. It will not however survive the user deleting both their cookies and their browsing cache.
To implement this I will use the browsers caching mechanism (RFC), WebStorage API (MDN) and browser cookies (RFC, Google Analytics).
Legal
In order to utilize tracking ids you need to add them to both your privacy policy and your terms of use preferably under the sub-heading Tracking. We will use the following keys on both document.cookie and window.localStorage:
_ga: Google Analytics data
__utma: Google Analytics tracking cookie
sid: SessionID
Make sure you include links to your Privacy policy and terms of use on all pages that use tracking.
Where do I store my session data?
You can either store your session data in your website database or on the users computer. Since I normally work on smaller sites (let than 10 thousand continuous connections) that use 3rd party applications (Google Analytics / Clicky / etc) it's best for me to store data on clients computer. This has the following advantages:
No database lookup / overhead / load / latency / space / etc.
User can delete their data whenever they want without the need to write me annoying emails.
and disadvantages:
Data has to be encrypted / decrypted and signed / verified which creates cpu overhead on client (not so bad) and server (bah!).
Data is deleted when user deletes their cookies and cache. (this is what I want really)
Data is unavailable for analytics when users go off-line. (analytics for currently browsing users only)
UUIDS
BrowserID: Unique id generated from the browsers user agent string. Browser|BrowserVersion|OS|OSVersion|Processor|MozzilaMajorVersion|GeckoMajorVersion
ComputerID: Generated from users IP Address and HTTPS session key.
getISP(requestIP)|getHTTPSClientKey()
FingerPrintID: JavaScript based fingerprinting based on a modified fingerprint.js. FingerPrint.get()
SessionID: Random key generated when user 1st visits site. BrowserID|ComputerID|randombytes(256)
GoogleID: Generated from __utma cookie. getCookie(__utma).uniqueid
Mechanism
The other day I was watching the wendy williams show with my girlfriend and was completely horrified when the host advised her viewers to delete their browser history at least once a month. Deleting browser history normally has the following effects:
Deletes history of visited websites.
Deletes cookies and window.localStorage (aww man).
Most modern browsers make this option readily available but fear not friends. For there is a solution. The browser has a caching mechanism to store scripts / images and other things. Usually even if we delete our history, this browser cache still remains. All we need is a way to store our data here. There are 2 methods of doing this. The better one is to use a SVG image and store our data inside its tags. This way data can still be extracted even if JavaScript is disabled using flash. However since that is a bit complicated I will demonstrate the other approach which uses JSONP (Wikipedia)
example.com/assets/js/tracking.js (actually tracking.php)
var now = new Date();
var window.__sid = "SessionID"; // Server generated
setCookie("sid", window.__sid, now.setFullYear(now.getFullYear() + 1, now.getMonth(), now.getDate() - 1));
if( "localStorage" in window ) {
window.localStorage.setItem("sid", window.__sid);
}
Now we can get our session key any time:
window.__sid || window.localStorage.getItem("sid") || getCookie("sid") || ""
How do I make tracking.js stick in browser?
We can achieve this using Cache-Control, Last-Modified and ETag HTTP headers. We can use the SessionID as value for etag header:
setHeaders({
"ETag": SessionID,
"Last-Modified": new Date(0).toUTCString(),
"Cache-Control": "private, max-age=31536000, s-max-age=31536000, must-revalidate"
})
Last-Modified header tells the browser that this file is basically never modified. Cache-Control tells proxies and gateways not to cache the document but tells the browser to cache it for 1 year.
The next time the browser requests the document, it will send If-Modified-Since and If-None-Match headers. We can use these to return a 304 Not Modified response.
example.com/assets/js/tracking.php
$sid = getHeader("If-None-Match") ?: getHeader("if-none-match") ?: getHeader("IF-NONE-MATCH") ?: "";
$ifModifiedSince = hasHeader("If-Modified-Since") ?: hasHeader("if-modified-since") ?: hasHeader("IF-MODIFIED-SINCE");
if( validateSession($sid) ) {
if( sessionExists($sid) ) {
continueSession($sid);
send304();
} else {
startSession($sid);
send304();
}
} else if( $ifModifiedSince ) {
send304();
} else {
startSession();
send200();
}
Now every time the browser requests tracking.js our server will respond with a 304 Not Modified result and force an execute of the local copy of tracking.js.
I still don't understand. Explain it to me
Lets suppose the user clears their browsing history and refreshes the page. The only thing left on the users computer is a copy of tracking.js in browser cache. When the browser requests tracking.js it recieves a 304 Not Modified response which causes it to execute the 1st version of tracking.js it recieved. tracking.js executes and restores the SessionID that was deleted.
Validation
Suppose Haxor X steals our customers cookies while they are still logged in. How do we protect them? Cryptography and Browser fingerprinting to the rescue. Remember our original definition for SessionID was:
BrowserID|ComputerID|randomBytes(256)
We can change this to:
Timestamp|BrowserID|ComputerID|encrypt(randomBytes(256), hk)|sign(Timestamp|BrowserID|ComputerID|randomBytes(256), hk)
Where hk = sign(Timestamp|BrowserID|ComputerID, serverKey).
Now we can validate our SessionID using the following algorithm:
if( getTimestamp($sid) is older than 1 year ) return false;
if( getBrowserID($sid) !== createBrowserID($_Request, $_Server) ) return false;
if( getComputerID($sid) !== createComputerID($_Request, $_Server) return false;
$hk = sign(getTimestamp($sid) + getBrowserID($sid) + getComputerID($sid), $SERVER["key"]);
if( !verify(getTimestamp($sid) + getBrowserID($sid) + getComputerID($sid) + decrypt(getRandomBytes($sid), hk), getSignature($sid), $hk) ) return false;
return true;
Now in order for Haxor's attack to work they must:
Have same ComputerID. That means they have to have the same ISP provider as victim (Tricky). This will give our victim the opportunity to take legal action in their own country. Haxor must also obtain HTTPS session key from victim (Hard).
Have same BrowserID. Anyone can spoof User-Agent string (Annoying).
Be able to create their own fake SessionID (Very Hard). Volume atacks won't work because we use a time-stamp to generate encryption / signing key so basically its like generating a new key for each session. On top of that we encrypt random bytes so a simple dictionary attack is also out of the question.
We can improve validation by forwarding GoogleID and FingerprintID (via ajax or hidden fields) and matching against those.
if( GoogleID != getStoredGoodleID($sid) ) return false;
if( byte_difference(FingerPrintID, getStoredFingerprint($sid) > 10%) return false;
These people have developed a fingerprinting method for recognising a user with a high level of accuracy:
https://panopticlick.eff.org/static/browser-uniqueness.pdf
We investigate the degree to which modern web browsers
are subject to “device fingerprinting” via the version and configuration information that they will transmit to websites upon request. We
implemented one possible fingerprinting algorithm, and collected these
fingerprints from a large sample of browsers that visited our test side,
panopticlick.eff.org. We observe that the distribution of our finger-
print contains at least 18.1 bits of entropy, meaning that if we pick a
browser at random, at best we expect that only one in 286,777 other
browsers will share its fingerprint. Among browsers that support Flash
or Java, the situation is worse, with the average browser carrying at least
18.8 bits of identifying information. 94.2% of browsers with Flash or Java
were unique in our sample.
By observing returning visitors, we estimate how rapidly browser fingerprints might change over time. In our sample, fingerprints changed quite
rapidly, but even a simple heuristic was usually able to guess when a fingerprint was an “upgraded” version of a previously observed browser’s
fingerprint, with 99.1% of guesses correct and a false positive rate of only
0.86%.
We discuss what privacy threat browser fingerprinting poses in practice,
and what countermeasures may be appropriate to prevent it. There is a
tradeoff between protection against fingerprintability and certain kinds of
debuggability, which in current browsers is weighted heavily against privacy. Paradoxically, anti-fingerprinting privacy technologies can be self-
defeating if they are not used by a sufficient number of people; we show
that some privacy measures currently fall victim to this paradox, but
others do not.
It's not possible to identify the computers accessing a web site without the cooperation of their owners. If they let you, however, you can store a cookie to identify the machine when it visits your site again. The key is, the visitor is in control; they can remove the cookie and appear as a new visitor any time they wish.
A possibility is using flash cookies:
Ubiquitous availability (95 percent of visitors will probably have flash)
You can store more data per cookie (up to 100 KB)
Shared across browsers, so more likely to uniquely identify a machine
Clearing the browser cookies does not remove the flash cookies.
You'll need to build a small (hidden) flash movie to read and write them.
Whatever route you pick, make sure your users opt IN to being tracked, otherwise you're invading their privacy and become one of the bad guys.
There is a popular method called canvas fingerprinting, described in this scientific article: The Web Never Forgets:
Persistent Tracking Mechanisms in the Wild. Once you start looking for it, you'll be surprised how frequently it is used. The method creates a unique fingerprint, which is consistent for each browser/hardware combination.
The article also reviews other persistent tracking methods, like evercookies, respawning http and Flash cookies, and cookie syncing.
More info about canvas fingerprinting here:
Pixel Perfect: Fingerprinting Canvas in HTML5
https://en.wikipedia.org/wiki/Canvas_fingerprinting
You may want to try setting a unique ID in an evercookie (it will work cross browser, see their FAQs):
http://samy.pl/evercookie/
There is also a company called ThreatMetrix that is used by a lot of big companies to solve this problem:
http://threatmetrix.com/our-solutions/solutions-by-product/trustdefender-id/
They are quite expensive and some of their other products aren't very good, but their device id works well.
Finally, there is this open source jquery implementation of the panopticlick idea:
https://github.com/carlo/jquery-browser-fingerprint
It looks pretty half baked right now but could be expanded upon.
Hope it helps!
There is only a small amount of information that you can get via an HTTP connection.
IP - But as others have said, this is not fixed for many, if not most Internet users due to their ISP's dynamic allocation policies.
Useragent String - Nearly all browsers send what kind of browser they are with every request. However, this can be set by the user in many browsers today.
Collection of request fields - There are other fields sent with each request, such as supported encodings, etc. These, if used in the aggregate can help to ID a user's machine, but again are browser dependent and can be changed.
Cookies - Setting a cookie is another way to identify a machine, or more specifically a browser on a machine, but as others have said, these can be deleted, or turned off by the users, and are only applicable on a browser, not a machine.
So, the correct response is that you cannot achieve what you would live via the HTTP over IP protocols alone. However, using a combination of cookies, as well as IP, and the fields in the HTTP request, you have a good chance at guessing, sort of, what machine it is. Users tend to use only one browser, and often from one machine, so this may be fairly relieable, but this will vary depending on the audience...techies are more likely to mess with this stuff, and use more machines/browsers. Additionally, this could even be coupled with some attempt to geo-locate the IP, and use that data as well. But in any case, there is no solution that will be correct all of the time.
There are flaws with both cookie and non-cookie approaches. But if you can forgive the shortcomings of the cookie approach, here's an idea.
If you're already using Google Analytics on your site, then you don't need to write code to track unique users yourself. Google Analytics does that for you via the __utma cookie value, as described in Google's documentation. And by reusing this value you're not creating additional cookie payload, which has efficiency benefits with page requests.
And you could write some code easily enough to access that value, or use this script's getUniqueId() function.
As with the previous solutions cookies are a good method, be aware that they identify browsers though. If I visited a website in Firefox and then in Internet Explorer cookies would be stored for both attempts seperately. Some users also disable cookies (but more people disable JavaScript).
Another method to consider would be I.P. and hostname identification (be aware these can vary for dial-up/non-static IP users, AOL also uses blanket IPs). However since this only identifies networks this might not work as well as cookies.
The suggestions to use cookies aside, the only comprehensive set of identifying attributes available to interrogate are contained in the HTTP request header. So it is possible to use some subset of these to create a pseudo-unique identifier for a user agent (i.e., browser). Further, most of this information is possibly already being logged in the so-called "access log" of your web server software by default and, if not, can be easily configured to do so. Then, a utlity could be developed that simply scans the content of this log, creating fingerprints of each request comprised of, say, the IP address and User Agent string, etc. The more data available, even including the contents of specific cookies, adds to the quality of the uniqueness of this fingerprint. Though, as many others have stated already, the HTTP protocol doesn't make this 100% foolproof - at best it can only be a fairly good indicator.
When i use a machine which has never visited my online banking web site i get asked for additional authentification. then, if i go back a second time to the online banking site i dont get asked the additional authentification...i deleted all cookies in IE and relogged onto my online banking site fully expecting to be asked the authentification questions again. to my surprise i was not asked. doesnt this lead one to believe the bank is doing some kind of pc tagging which doesnt involve cookies?
This is a pretty common type of authentication used by banks.
Say you're accessing your bank website via example-isp.com. The first time you're there, you'll be asked for your password, as well as additional authentication. Once you've passed, the bank knows that user "thatisvaliant" is authenticated to access the site via example-isp.com.
In the future, it won't ask for extra authentication (beyond your password) when you're accessing the site via example-isp.com. If you try to access the bank via another-isp.com, the bank will go through the same routine again.
So to summarize, what the bank's identifying is your ISP and/or netblock, based on your IP address. Obviously not every user at your ISP is you, which is why the bank still asks you for your password.
Have you ever had a credit card company call to verify that things are OK when you use a credit card in a different country? Same concept.
Really, what you want to do cannot be done because the protocols do not allow for this. If static IPs were universally used then you might be able to do it. They are not, so you cannot.
If you really want to identify people, have them log in.
Since they will probably be moving around to different pages on your web site, you need a way to keep track of them as they move about.
So long as they are logged in, and you are tracking their session within your site via cookies/link-parameters/beacons/whatever, you can be pretty sure that they are using the same computer during that time.
Ultimately, it is incorrect to say this tells you which computer they are using if your users are not using your own local network and do not have static IP addresses.
If what you want to do is being done with the cooperation of the users and there is only one user per cookie and they use a single web browser, just use a cookie.
You can use fingerprintjs2
new Fingerprint2().get(function(result, components) {
console.log(result) // a hash, representing your device fingerprint
console.log(components) // an array of FP components
//submit hash and JSON object to the server
})
After that you can check all your users against existing and check JSON similarity, so even if their fingerprint mutates, you still can track them
Because i want the solution to work on all machines and all browsers (within reason) I am trying to create a solution using javascript.
Isn't that a really good reason not to use javascript?
As others have said - cookies are probably your best option - just be aware of the limitations.
I guess the verdict is i cannot programmatically uniquely identify a computer which is visiting my web site.
I have the following question. When i use a machine which has never visited my online banking web site i get asked for additional authentification. then, if i go back a second time to the online banking site i dont get asked the additional authentification. reading the answers to my question i decided it must be a cookie involved. therefore, i deleted all cookies in IE and relogged onto my online banking site fully expecting to be asked the authentification questions again. to my surprise i was not asked. doesnt this lead one to believe the bank is doing some kind of pc tagging which doesnt involve cookies?
further, after much googling today i found the following company who claims to sell a solution which does uniquely identify machines which visit a web site. http://www.the41.com/products.asp.
i appreciate all the good information if you could clarify further this conflicting information i found i would greatly appreciate it.
I would do this using a combination of cookies and flash cookies. Create a GUID and store it in a cookie. If the cookie doesn't exist, try to read it from the flash cookie. If it's still not found, create it and write it to the flash cookie. This way you can share the same GUID across browsers.
I think cookies might be what you are looking for; this is how most websites uniquely identify visitors.
Cookies won't be useful for determining unique visitors. A user could clear cookies and refresh the site - he then is classed as a new user again.
I think that the best way to go about doing this is to implement a server side solution (as you will need somewhere to store your data). Depending on the complexity of your needs for such data, you will need to determine what is classed as a unique visit. A sensible method would be to allow an IP address to return the following day and be given a unique visit. Several visits from one IP address in one day shouldn't be counted as uniques.
Using PHP, for example, it is trivial to get the IP address of a visitor, and store it in a text file (or a sql database).
A server side solution will work on all machines, because you are going to track the user when he first loads up your site. Don't use javascript, as that is meant for client side scripting, plus the user may have disabled it in any case.
Hope that helps.
I will give my ideas starting from simpler to more complex.
In all the above you can create sessions and the problem essentialy translates to match session with request.
a) (difficulty: easy) use client hardware to store explicitely a session id/hash of some sort (there are quite some privace/security issues so make sure you hash anything you store ), solutions include:
cookies storage
browser storage/webDB/ (more exotic browser solutions )
extensions with permission to store things in files.
The above suffer from the fact the the user can just empty his cache in case he doesn want.
b) (difficulty: medium) Login based authentication.
Most modern web frameworks provide such solution the core idea is you let the user voluntarily identify himself, quite straghtforward but adds complexity in the architecture.
The above suffer from additional complexity and making essentially non public content.
c)(difficulty: hard -R&D) Identification based on metadata, (browser ip/language /browser / and other privace invasice stuff so make sure you let your users know or you miay get sued )
non perfect solution can get more complicated (a user typing with specific frequency or using mouse with specific patterns ? you even apply ML solutions ).
The claimed solutions
The most powerful since the user even without wanting explicitely he can be identified. It is straight invasion of privacy(see GDPR) and not perfect eg. ip can change .
Assuming you don't want the user to be in control, you can't. The web doesn't work like that, the best you can hope for is some heuristics.
If it is an option to force your visitor to install some software and use TCPA you may be able to pull something off.
My post might not be a solution, but I can provide an example, where this feature has been implemented.
If you visit the signup page of www.supertorrents.org for the first time from you computer, it's fine. But if you refresh the page or open the page again, it identifies you've previously visited the page. The real beauty comes here - it identifies even if you re-install Windows or other OS.
I read somewhere that they store the CPU ID. Although I couldn't find how do they do it, I seriously doubt it, and they might use MAC Address to do it.
I'll definitely share if I find how to do it.
A Trick:
Create 2 Registration Pages:
First Registration Page: without any email or security check (just with username and password)
Second Registration Page: with high security level (email verification request and security image and etc.)
For customer satisfaction, and easy registration, default
registration page should be the (First Registration Page) but in the
(First Registration Page) there is a hidden restriction. It's IP
Restriction. If an IP tried to register for second time, (for example less than 1 hour) instead of
showing the block page. you can show the (Second Registration Page)
automatically.
in the (First Registration Page) you can set (for example: block 2
attempts from 1 ip for just 1 hour or 24 hours) and after (for example) 1 hour, you can open access from that ip automatically
Please note: (First Registration Page) and (Second Registration Page) should not be in separated pages. you make just 1 page. (for example: register.php) and make it smart to switch between First PHP Style and Second PHP Style

How to recognize 2 PCs with same IP and browser(version)

I want to give a "like" option on my page for non-logged users.
The simpliest thing would be to detect user IP ( e.g. by $_SERVER['REMOTE_ADDR']).
More sophisticated would be detecting user's agent (e.g. by $_SERVER['HTTP_USER_AGENT']).
But I want to give like-posibility for "each PC in family" (real-life family) - this could also mean they all have not only the same IP, not only the same browser but also the same browser-version...
So how would I determinate whether it is a different PC? (without using cookies/session)
I want to store one "like" per PC and since cookies can be cleared I didn't want to use them :)
I wanted to abstract my particular interest from the whole problematics - so I did.
However you should never trust user input (as David pointed out) - so do not base your final like-count on just that ! At least put a likes/per IP limit and combine it with cookies/session.
Your only option to do this outside the simple methods of using cookies, logins, etc. is to do browser fingerprinting. This technique involves gather a variety of information that the browser outputs to the server/webpage and making a hash of it to create a unique ID for that client. It has a remarkably high accuracy and would work fairly well under the circumstances you are describing.
It is based on the idea that "no two browsers are exactly the same". In other words, you look at screen resolution, user agent strings, active plugins, etc. and create a "fingerprint" of those settings. There is almost always going to be a variance in some way.
There are available libraries that can help get you started. Here is one that is very easy to implement and understand... https://github.com/Valve/fingerprintjs
You can use sessions without using cookies. When the user logs in, they get a token, and this token is appended to every URL they visit. (In PHP you can see this if you disable cookies in the browser, you will get "PHPSESSIONID" in the URL). So, if you make users log in before voting / liking / whatever, then you can achieve this using sessions but not cookies.
If you are talking about public users without a login mechanism, then there really isn't any way to achieve this, unless you set a cookie recording the fact that this browser has voted.
Note however that not only can cookies be deleted, but it won't actually achieve what you want unless everyone in the family uses a different browser or has a separate login on their operating system. Otherwise they are effectively all the same user as far as you can tell. Also people can use multiple browsers so one person could vote / like more than once anyway.
Detecting the User Agent can easily be spoofed; so it isnt a reliable way. The best way to do this is sessions or cookies. Why do you not wish to use them?
Short answer: you can't.
Remember, each request to a web server is a new event. Cookies are the only way to persist data between calls. So if you rule them out you really can't differentiate them. This is a major reason why Google puts long life cookies on their site.
Can cookies be deleted? Sure. But they're really the only option you have.
You cannot give a single identity to a PC.
Cookies can be cleared.
User logins can be done from different computers.
$ip.$http_user_agent will not work.
User may restart the modem and ISP might assign a new IP.
Or use a different browser to change $http_user_agent.
Or another system on a LAN might have the same $http_user_agent.
What is the significance of giving one "like" per PC (provided you are able to even identify a PC correctly)?
What if two different users with different tastes use the same PC?

Tracking Computers in an Online Game

So i am running into a problem with people making multiple accounts to make there account better with more resource in game.
So my dilemma is, Many users go thru proxys or NATd info so some legimated users would be banned if i only have 1 user per ip.
Is there a way (with Javascript and PHP) to Get a uniq identifier specific to a computer (without Hardware changes, computer hardware change probably would change the identifier).
Any idea or comments would be much appericated
(The following was revived from a response made by Paul, but deleted by another for being out of place.)
I cant change the client to much because its a browser based game so getting the hwid would be possible. But how with JS or PHP.
Adding timers and restrictions to prevent transfers are in place but doesnt stop them entirely there is an option to email for an IP exception. but that is slow an tedious. Im wondering if there is a definitive to generate a specific id or identifier for a specific computer (Not ip based) that would make it so multiple accounts cant be logged in from the same computer but can be logged in from the same ip
As we are talking about a different account, probably on a different IP and client, you cannot easily find out clone accounts.
You can go for two more heuristic and gameplay options
As suggested before (by #dqhendricks), divide your resources and implement your sharing etc in such a way that you can't easily help your other account with every new account. Make finding other accounts in the beginning hard/impossible, make shareable resources a higher level feature etc. Downside is that this changes the gameplay, it doesn't have to be desireable.
You can perform heuristics on behaviour. There can be specific behaviour that is unwanted: only interaction with 1 other account etc. You could tweak some of the variables etc, but you could easily see suspicious behaviour. Make some sort of 'balance' calculation. Most ingame interactions have some sort of balance. Ofcourse, better players may have a good deal because they know more, or the other way around: they make a bad deal to help smaller players. But when one player only gives and never takes, it's "helping" without acutally playing itself: that might mean it's a clone
Everything with ip-adresses or client-information ($_SERVER) etc is worthless in this case as far as I'm concerned..
You could prevent multiple logins on the same Account (username/password).
If the issue is that they make multiple accounts with many different email addresses etc... and new usernames and passwords then you might be able to do it with Cookies for example that use a unique hardware id and then you simple check that not more than one account is active at any one time based on this hwid. if the hardware changes it doesnt matter as it is relative.
To clarify if the HW id for the first login is 1234 then the second login with generate the same hwid. If you check the cookies or your database (doesnt matter where you store it) for the same hwid then you know its already logged in.
If the hardware changes it doesnt matter as they will still both generate the same hwid.
If they use two computers though the haardware id will be different and this will work.
{sharable resources} = {total resources} - MAX(({starting resources} - {spent resources}), 0)
make only non-starting resources sharable, or maybe make sharing resources an ability you don't gain until level x.
Preventing spoofed/duplicate accounts (while ensuring all legitimate accounts work) is a very difficult task -- for reasons laid out by others. In addition to trying to guard against concurrent multiple accounts, one must guard against non-concurrent multiple account usage.
The core issue isn't so much in determining where an account connected from, but being able to trust that a user only has one account -- and to this end the only "real" solution is to use a system which already provides this sort of information, such as a credit card or paypal account ;-) That is, simply prevent someone from creating a new account (although an account can have multiple aliases/profiles, but these can be trivially tracked) unless they can prove "uniqueness".
(Also consider that two people may have two different accounts on the same machine.)
Happy coding.
I run an online game server aswell. To prevent your dilema, either modify the game client to read the MAC Address, and only allow 1 account per computer. Or log the ip's and only allow the resources to be given to that ip twice.
3rd option: Don't allow transfering of materials from the same ip addresses
4th option: Add a timer on the transfering of resources, make them wait 10 minutes of gameplay before they can do anything like getting rid of the items for others to get

Best method to prevent gaming with anonymous voting

I am about to write a voting method for my site. I want a method to stop people voting for the same thing twice. So far my thoughts have been:
Drop a cookie once the vote is complete (susceptible to multi browser gaming)
Log IP address per vote (this will fail in proxy / corporate environments)
Force logins
My site is not account based as such, although it aggregates Twitter data, so there is scope for using Twitter OAuth as a means of identification.
What existing systems exist and how do they do this?
The best thing would be to disallow anonymous voting. If the user is forced to log in you can save the userid with each vote and make sure that he/she only votes once.
The cookie approach is very fragile since cookies can be deleted easily. The IP address approach has the shortcoming you yourself describe.
One step towards a user auth system but not all of the complications:
Get the user to enter their email address and confirm their vote, you would not eradicate gaming but you would make it harder for gamers to register another email address and then vote etc.
Might be worth the extra step.
Let us know what you end up going for.
If you want to go with cookies after all, use an evercookie.
evercookie is a javascript API available that produces
extremely persistent cookies in a browser. Its goal
is to identify a client even after they've removed standard
cookies, Flash cookies (Local Shared Objects or LSOs), and
others.
evercookie accomplishes this by storing the cookie data in
several types of storage mechanisms that are available on
the local browser. Additionally, if evercookie has found the
user has removed any of the types of cookies in question, it
recreates them using each mechanism available.
Multi-browser cheating won't be affected, of course.
What type of gaming do you want to protect yourself against? Someone creating a couple of bots and bombing you with thousands (millions) of requests? Or someone with no better things to do and try to make 10-20 votes?
Yes, I know: both - but which one is your main concern in here?
Using CAPTCHA together with email based voting (send a link to the email to validate the vote) might work well against bots. But a human can more or less easily exploit the email system (as I comment in one answer and post here again)
I own a custom domain and I can have any email I want within it.
Another example: if your email is
myuser*#gmail.com*, you could use
"myuser+1#gmail.com"
myuser+2#gmail.com, etc (the plus sign and the text after
it are ignored and it is delivered
to your account). You can also include
dots in your username (my.user#gmail.com). (This only
works on gmail addresses!)
To protect against humans, I don't know ever-cookie but it might be a good choice. Using OAuth integrated with twitter, FB and other networks might also work well.
Also, remember: requiring emails for someone to vote will scare many people off! You will get many less votes!
Another option is to limit the number of votes your system accepts from each ip per minute (or hour or anything else). To protect against distributed attacks, limit the total number of votes your system accepts within a timeframe.
Different approach, just to provide an alternative:
Assuming most people know how to behave or just can't be bothered to misbehave, just retroactively clean the votes. This would also keep voting unobtrusive for the voters.
So, set cookies, log every vote and afterwards (or on a time interval?) go through the results and remove duplicates based on the cookie values, IP/UserAgent combinations etc.
I'd assume that not actively blocking multiple votes from same person keeps the usage of highly technical circumvention methods to a minimum and the results are easy to clean.
As a down side, you can't probably show the actual vote counts live on the user interface, or eyebrows will be raised when bunch of votes just happen to go missing.
Although I probably wouldn't do this myself, but look at these cookies, they are pretty hard to get rid of:
http://samy.pl/evercookie/
A different way that I had to approach this problem and fight voting fraud, was to require an email address, then a person could still vote, but the votes wouldn't count until they clicked on a link in the email. This was easier than full on registration, but was still very effective in eliminating most of the fraudulent votes.
If you don't want force users to log, consider this evercookie, but force java script to enable logging!
This evercookie is trivial to block because it is java script based. The attacker would not likely use browser, with curl he could generate tousends of requests. Hovewer such tools have usually poor javascript support.
Mail is even easier to cheat. When you run your own server, you can accept all email addresses, so you will have practically unlimited pool of addresses to use.

Ways to determine returning "anonymous" guests in PHP

Two types of users visit my website: registered users and guests. Registered users are tracked and retained by PHP session, cookies and individual logins. Guests I find trickier to manage so I give little to no authority to contribute content.
I want to open up editing for users (registered or not) to save favourites, shopping carts, vote in polls, moderate, tag, upload and comment. At the same time, protect the data from other non-registered users.
What are some options or best practices to determine a unique visitor from another and some security measures to consider when allowing them to contribute more? Should I work around their security/restriction settings to provide contribution service or should I expect them to meet in the middle and relax some of their settings to allow cookies etc?
IP Address - The IP is an option but only temporary. It can change for a user reconnecting to their Internet with a different IP, and for that matter another user may eventually share the same IP. IP can also be anonymous, shared or misleading.
PHP Sessions - OK for a session (and if enabled) but lost on closing the browser.
Cookies - Can store session data but is easily (and often) disabled from the client-side.
Header data - combining known details of he user might at least group users - ISP, browser, operating system, referring website etc.
Edit: I'm still having trouble getting my head around all the the key factors involved... we set up a cookie for a guest. Attach a bunch of edits and user history to that session. If the cookie is removed, the data floats around attached to nothing and the user loses their data. Or if the user logs in, the guest and user data should be merged...
I think cookies would probably be the best option here as it's the only way you are going to be 100% sure requests are unique. Of course, you could possibly add a mix: if cookies are disabled you might be able to try other options of identification such as the IP address method, but that could make it overly-complex.
As you say, IP address is liable to change and in some organizations there may be a group of proxy servers setup which make requests originate from different IPs. Of course, you could check X_FORWARDED_FOR, but they are still liable to change.
Header data is probably going to prove difficult to get good results on I think. If you've got an organization that has the same browser, OS, IP it is going to show people as being the same. Even people not in the same organization may still appear similar (i.e AOL users who get their traffic usually routed through proxy servers, the majority will probably be using the 'AOL browser' that gets shipped with it giving similar headers).
Out of those two options, the IP one is going to be easy to implement but obviously there are the problems I outlined. Checking for unique data in the headers will prove to be absolute pain I think.
Obviously as you say, sessions are lost on closing the browser, and it appears you want the system to behave as if they were a registered user so cookies seem a more obvious choice (especially as you want the 'favourites' to be persistent).
I would just go with sessions.
Your users could change IP addresses (prone to mixup behind NATs and proxies), modify/delete cookies (certainly possible), or change their header (easily through switching browsers).
There is no secure way of identifying a guest if they do not want to be identified. Just go with the standard: cookies/sessions.
You should use sessions.
Sessions id are stored in a cookie (or for users who doesn't accept cookie, stored in the url with the PHPSID argument)
They won't be erased when the user will close his browser, it just depends on how you set your session/cookies options.
You can set up the timelife of a session to whatever you want, so don't bother with this.
You should also tell to your user about this (enable cookie)
Concerning the data which could be merged when log in, it's your job, to merge it in a proper way, or even ask the user if the option should be saved or not.
You could use a combination of all of the identifying points that you can find, that are not likely to change, and are likely to be unique - looking at panopticlick you can gather a bunch of data such as installed fonts and browser plugins. You could take those kinds of more unique data points and hash them to give you an id, and then compare it against the less unique data like useragents and ip addresses.
But honestly, that's super complicated and sneaky. Use cookies/session. If the user doesn't want to enable cookies then they don't want all your anonymous tracking for a reason, and you should honor their decision.

Categories