Tracking users on my site to avoid duplicate accounts created

Tracking users on my site to avoid duplicate accounts created - php

I'm building a website that sells items cheap and proceeds go to charity. You can't just buy an item though, because they will be very limited in quantity, so we want to give out free raffle tickets (daily) to users who visit the site. We'll then do a random drawing and the winner can buy the item.
My concern is people making 1,000 accounts to improve their odds at winning. I need a good way to prevent this from happening. Right now I'm thinking of checking IP ranges (12.12.x.x) to see if that IP has already received daily raffle tickets, but how reliable is that - what with proxies allowing people to use different IP's.

The somewhat-standard solution would be to require each user to provide an email account when they make their account. You then send an email to that email address, containing a unique link. When that link is clicked, you activate the account associated with that email; before that, they can do nothing.

You can have multiple steps of security in this case.
Have users sign up with unique email address and verify that when they sign up.
Log the IP address they signed up with in your database.
This will not keep them from using proxies or creating multiple email addresses.
I suggest having them also add a unique Street Address.
If they try signing up again with that same address, reject them.
You can also check for the phone number. For extra security.

Related

notifying user you use their IP address

I'm making a satistics program in PHP but do i need to notify the user that i am storing their Ip address? I've converted it to an INT with ip2long but i am not sure if it is allowed without notification or a privacy statement on your site.
In short:
Is it allowed to Save the visitors IP without their knowing?

An IP address in isolation is not personal data under the Data Protection Act, according to the Information Commissioner. But an IP address can become personal data when combined with other information or when used to build a profile of an individual, even if that individual's name is unknown.
In the hands of an ISP an IP address becomes personal data when combined with other information that is held – which will include a customer's name and address. In the hands of a website operator, it can become personal data through user profiling.
Most sites do not profile their users using IP addresses. They typically use IP addresses for demographic purposes such as counting visitors, their countries of origin and their choice of ISP. Their organisation might also be identifiable.
Sites typically gather statistical data about the path that users take through a website and the page from which they left the site. Banking websites might also use IP addresses as a security measure – for example, if a customer regularly accesses his account from an IP address in London, access to that customer's account from an IP address in Moscow might indicate fraud.
The most common privacy concern surrounding IP addresses is their use in marketing. A visitor's path through a website could be followed and any adverts that are clicked can be identified. On the next visit, that user could be shown ads that are similar to those he clicked on the previous visit. But this fails when the user has a dynamic IP address: the user will be unknown.
Other alternative would be:
Accordingly, most websites prefer to use cookies to track users for personalised marketing purposes in preference to IP addresses. A cookie is a small text file that is sent from a website to a visitor's computer. The cookie file can be used to identify an individual and a website operator can build a detailed profile of that person's activity at its site. Users can set their web browsers to refuse cookies but most users accept them, often unwittingly.
I may be wrong but since IP is not a personal data and since you are just using it for demographic purposes, you don't have to notify the users about it. The process is pretty much legal, and almost all websites are using this for tracking purposes.
For more information
Others

Preventing abuse to an invite system

recently I helped some friends ship an invite system in their website that works like this: A user creates an account, we send a verification email and when he verifies the e-mail he gets one free credit to spend on the website. In addition to that, he has personalized links he can share on social networks or via e-mail and when people register using this link (e-mail verified accounts again) he gets one credit per invite. Much like the invite system on thefancy.com or any other reward driven invite system on the web.
Lately we see elevated rates of fake user account which probably are automated. The registration page features a CAPTCHA but we're aware this can be bypassed. We also see elevated rates of users creating disposable email addresses to create accounts following specific invite links thus crediting one legit users that onwards uses the free credits he earns.
I am looking for an automated way to prevent such kind of abuse. I currently investigating putting rate limits on invites/registrations that come from the same ip address but this system itself has it own flaws.
Any other production tested ideas?
Thank you
Edit:
I've also proposed 2 factor registration via SMS but was turned down due to budget shortage.

It seems you need to require more than just a verified email address before a user can send invites, ideally something that shows the user has participated in your site in some way. Without knowing what your site is it's hard to give specifics, but the StackOverflow equivalent would be requiring users to have at least X reputation before they can invite others. If you're running a forum you could require that they've made at least X posts.
I'd also suggest a small time limit before new accounts can invite - e.g. they have to have been a member for at least X days. This complicates automated invites somewhat.

An extremely simple method that I have used before is to have an additional input in the registration form that is hidden using CSS (i.e. has display:none). Most form bots will fill this field in whereas humans will not (because it is not visible). In your server-side code you can then just reject any POST with the input populated.
Simple, but I've found it to be very effective!

A few ideas:
Ban use of emails like 'mailinator'.
Place a delay on the referral reward, allowing you to extend fraud detection time period, giving you more time to detect bogus accounts and respond accordingly.
Require the referred user to create a revenue generating transaction before you give out any referral rewards (I know that might not be a shift you can make) - possibly in turn increasing the reward to account for the inconvenience to the referrer (you should be saving money through decreased fraud so not a hard sell).
Machine learning. Ongoing observations and tuning with your fraud detection. The more data you have the better you will be able to identify these cases. (IP addresses as you mention.) Shipping / billing info even more telling if it applies - beware adjacent PO boxes.

Add a CAPTCHA test to the confirmation page. I would be wondering if your CAPTCHA is sturdy enough if it is getting bypassed somehow. You might consider using the (hateful) reCaptcha which seems popular. A CAPTCHA on the confirmation page would reduce the risk that a 'bot is submitting the confirmation page. In other words, it would implement the idea of client interaction with the site after registration. A similar method would be to ask for the registrant's password.

Storing information on unregistered customer users

An issue that has come to light is to open up our application (we can visualise it a bit like an online shop) to unregistered users.
At the moment, there is an admin system where staff are added by superusers and a website with customers who add themselves by registering.
We have been asked to allow customers to use the website without registering or logging in, but we don't want to break the 'orders' table - we still need to refer to each customer individually and maintain the registered users functionality (address lookup, purchase history, etc). The main idea we've been mulling over is to use the unregistered customer's email address as a replacement for the surrogate key (or a hash of it) in the customer table so that new and old customers can just enter their email address at checkout to be added to our database and receive confirmation of their order. The problem of different email addresses per customer can be alleviated by a 'merge' tool on the admin side, and the problem of multiple customers sharing the same email (some office environments) isn't that much of a problem for us.
The main question is this: how do real-world applications handle unregistered users?
Update in response to answers
We don't want to force registered users to login each time even if their email address is already on our system as a registered user. Also, if people are advocating using the email address as a key, how would you deal with a scenario where a registered account holder gives up their email address to someone else?

In our company, we do it this way:
when ANY user makes an order, we look up his email (which he is required to specify and is unique) in customers table.
If it isn't there, we simply create the user (we already have all required data from the user's order) and we mark him as registred=0.
now we continue the order process with his user id.
when somebody registers under that email, we simply update his credentials (whatever he specifies), while keeping his order history and whatever else. I don't think that makes a security concern, the user is required to confirm the email address, so unless the account is really his, he wouldn't register anyway.
We don't allow already registred email to create an order, so that should clear out your merging of emails, because nobody will be able to create a registered and unregistered account under one email address and when he's done, he will never be able to shop unregistered again. Hope this helps.

I would recommend against using a natural key in this scenario since they are not interested in registering and wouldn't expect their details to be remembered (at least that's what I think of in un-registered mode).
Use a completely synthetic key (like a counter) and just go with that.

Unlike Pal, I would strongly favour using natural data as a key.
use the unregistered customer's email address as a surrogate key
First, that statement is an oxymoron. A surrogate key, by definition, is unrelated to the actual data.
Next, if you have their email address that means they have completed some sort of registration process.
If you have users sharing an email address (i.e. an assertion of identity) then by definition they want to share that identity - trying to differentiate between them is not your problem - particularly where you're already providing a mechanism for them to be individually identifiable.
The only thing you have to worry about is whether by using the email address in the absence of other authentication, you are leaking information which would otherwise be confidential (e.g. previous orders)

What ways are there to store information about an anonymous/guest user in a database?

Our application has an online shop among other features, and users are normally requested to register before completing a sale, creating a unique customer_ID in the process. When they return, they can log in and their contact details and transaction history are retrieved from the database.
We are now exploring what to do in the case of an 'anonymous' or 'guest' customer, opening up the online shop to customers who don't want to register, and also for sales logged in the backend application, where taking the customer's email, postal address, etc is just too time consuming. The solution has applications outside the online shop too.
Multiple companies use the same database, and the database is built on a party model structure, so we have explored a few options:
Store all anonymous customers under one pre-defined customer_ID in the transaction table:
customer_ID = 0 for every anonymous user, and customer_ID > 0 for every real user
This is straight-forward to hard-code into the application
But more involved to determine which customers belong to which company
Should details for customer_ID = 0 exist in the customer table in the database or as an object in the application?
If in the database, what database-level constraints can be made to ensure that it always exists?
If not in the database, then foreign key constraints from transaction.customer_ID to customer.customer_ID no longer work
customer_ID is the same as the company party_ID
Easier to determine aggregate sales for each company, etc
This would confuse matters as it would appear that the company is its own customer, rather than other unique customers
Generate a unique customer_ID for every new anonymous customer (per session)
What if the same physical user returns? There will be many records repeating the same sort of data; email, shipping address, etc.
Use another unique key, such as email address, to refer to a customer
Not always reliable as people sometimes use more than one email address, or leave old addresses behind.
What if there is no email address to be taken, as is the case on the shop floor, pro forma invoices, etc?
Some other Stack Overflow inspired solution!
Addition
A combination of #2 and #3 has been suggested elsewhere - attempt to store a single record for each customer, using the email address if possible, or a new record on every visit if not.
I should point out that we don't need to store a record for every anonymous customer, but it just seems that the relational database was built to deal with relationships, so having a NULL or a customer_ID in the transaction table that doesn't reference an actual customer record just seems wrong...
I must also stress that the purpose of this question is to determine what real-world solutions there are to recording 'casual' transactions where no postal address or email address are given (imagine a supermarket chekout) alongside online shop transactions where an email address and postal address are given whether they are stored or not.
What solutions have the SO community used in the past?

Assuming you require an e-mail address for all online orders, you could create a temporary account for every customer at the completion of each order when they are not logged in.
This can be done by using the shipping address and other information provided during checkout to fill in the account, and e-mailing a random temporary password to them (optionally flagging it to require changing on the first log-in, if that functionality is built into the website). This requires minimal effort on their part to setup the account, and allows them to sign in to check their order status.
Since the primary key in your database is the customer_id, it should not cause conflicts if they continue making new accounts with the same e-mail/address/etc, unless you have code in place to prevent duplicates already. It's rare for someone to create more than one temporary account though, since it's easier to log in with the password e-mailed to them than entering their data again.
For the backend orders, we generally create an account in the same way as above for every customer. However, if they don't have an e-mail address (or they only want to purchase by phone), we generate an account with their shipping information and a blank e-mail address (have to code an exception to not send temporary passwords/order confirmations when it's blank). The customer_id is given to them, and their shipping information and company name are stored in the account to look up and expedite future orders.

I doubt there are any perfect solutions to this problem. You simply have to make a choice: How important is it to guarantee recognizable customer history in contrast to the improvement in conversions you get from not forcing a customer to go through a full registration process.
If you go without forcing registration, you will not be able to recognize returning customers 100% of the time. One might argue that even with registration that will not be possible, as users sometimes choose to create new accounts for various reasons. But you might be able to do something that's "good enough" by understanding the data you already have.
For example, in some countries, postcodes are quite specific. Are they specific enough? Depends in which countries you operate and also how your customer base is built. If you tend to only have one user per household, maybe.
Or depending on which payment methods you support, you might consider building a one-way hash of the credit card number ("pseudo-unique ID"). Some payment solutions actually do return a unique "payer ID", which could be perfect -- assuming that you get something from all the payment services you support.

I would assign a unique Customer ID to save the data, and then on future purchases that the same anonymous purchaser makes you could look to see if the same email address and/or first line of the address and post code already exists. If you ask for a phone number, compare that. Basically you need something fairly unique whilst getting rid of possible errors (eg. only looking at first line of the address - it's very likely that there is more than one 123 Main Street! But there will be only one 123 Main Street with post code ABC123).
Once you know this, you could automatically create them an account - sending the customer an email saying that you've noticed that they purchased previously, and to save them time use this email address and this automatically generated password. When they login for the first time, do a quick security check (maybe value of last invoice), then let them check their details. You could even do this during the checkout process. I think by showing that you can save them time by doing it automatically could be a bonus.
If you don't want to do this, then it's possible to set a cookie, though you'd have to warn the user if you are trading from within the EU (cookie laws).
The problem with someone having different email address and postal address, could be that they order for business, then they could order for personal use (hard to say as we don't know what you're selling).

Ofcourse you can track the sale session by using cookies. but Also register anonymous users to your site but don't let them realize that they are filling any registration form .Let them follow the sale process and at the end of the sale process generate a unique customer id and email them and also display that notification on the site after they complete their sale and make them login into the site by just entering their customer id.

I usually use the user IP, I create an account with just an ID and his IP address. When he registers, I just update his record. Other things than IP could (and should) be used too, like creating a random token to put in a cookie to bypass any session system in use so you can make it last longer for example.
Now, in the application, you have to make your user class able to "identify" your users with either this IP/token, a real session or any other login system you may have in place.

Mmmm ... we generate a uniq ID for guest users -- a hash-value or uuid for the username and store this one in the basket table. You cold store this also in the customer table, if you don't mind cluttering your database with such data. The uuid is stored in a cookie in the customers browser, until he checks out the basket. The cookie thing is also nice for assigning the anonymous account (and the content of it's basket) to a valid account, if the user decided to register later on / before checking out the basket.

I would create a unique customer ID, and store it on the user's machine as a cookie. This should decrease the number of users with multiple customer IDs.
But in all seriousness, as long as your number of ids isn't getting into the hundreds of thousands (and congrats if it is!), it's not going to hurt your database as long as you have proper indexes on the customer id column.
If you had id=0 for each customer wouldn't you have just as many rows though? I think that's inescapable if you want to keep all the data, right? A zero id might cause your index extra problems too.

The way I do it is : Not at all.
Simply put, let people pay through paypal or whatever you put as payment solution and get the data from there, automatically create the user and the order after the payment is processed using the provided API.
At that point you have all the information you need and can definitely store just enough for your statistics / e-spam marketing.
Keep it simple, require NOTHING, not even people to enter a customer id or an e-mail, and they'll all love it.
Doing that I still have every bit of information I could be interested in and the user experience is as fast/easy as can be.

Is It Easy to Make an Email Address Book Invite?

Twitter and Facebook invite new users to send an invitation to everyone in their Gmail, Hotmail, or Yahoo Mail accounts.
Is it easy to add this functionality to a website?
Thanks,
John

Last I checked you basically have to pretend to be a web browser then programatically log in to the site, scrape the contacts, then compose/send the message. It isn't difficult, but it is time consuming as each of these services works differently.
I does, however, look like people have written script for some of this though: example.

Yes!
What they generally do is to send in the email a special URL that contain a code, for example:
www.mysite.com?UserCode=ABC
That code (ABC) is associated to the email of the user so the application undestand which user is trying to subscribe. You must keep in a database the pair: email, code.
HTH

All of the above answers are correct, here is a summary and some more explanation:
You first need to get the user's login for each service you want to get contacts from (I personally don't understand why people would do that - I would never give my GMail password to Facebook, let along some little known web site).
Then you can simulate a login to the said website and grab their contact list as an export (all serious email services allow you to export the contact list as CSV or something). You can implement this yourself or use some external library such as contactgrabber mentioned by Haim.
You then go over their list of contacts and for each contact you generate a key (you want to generate a unique key for each email you send so you'd know who responded to you). Generating the keys is easy - take some info like the current user's email plus the target email address, add the current time and pass everything to a hashing function like SHA1 - should do the trick.
Now store in a database table for each contact you got: the inviting user's ID, the email address being invited and the key you generated.
Lastly send a nice email to each contact with a URL to your website's "invitation activation page" with the correct key applied - like so: http://www.somesite.com/invited?key=123456780abcdefgh
when that page is accessed, get the key from the URL and find it in the table - that would give you the email address that activated the invite and the user that invited them. From here you can take it to where ever you want.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.