Encrypting php-sql user data and structure

Encrypting php-sql user data and structure - php

I'm working on a online platform that processes financial data and i'm looking to encrypt all data on it, to avoid leaking the information in any way other than the person giving out his personal password. I hope you can help me with this following:
The platform interacts between a php file and a mysql database. A user personal key is defined by a session_id, given after login.
Table 1: Users (user_id, email, password, etc) - this stores the relevant data for any user to logon to our platform and access any of our templates.
Table 2: User_template1_data (template_user_id, user_id, email, name, specific info.) this stores the specific data to the data template
Table 3,4,5,6: logs (log_id, date time, details, template_user_id etc.).
In the log tables (which could be, payments, jobs, 'expanses', etc) we store the sensitive data, that nobody else than the user may read, not even the admins, unless given access to it.
Now the way I had it in mind was following:
Example:
Table 1:
user_id - Name - email - pass
1 - Jay - jay.. - (salt / hashed)
Table 2:
User_template1_data, user_id, division *
1 - 1 - Jay - jay.. - accountant
The php will see that user 1 obviously is an accountant. Which is not really sensitive data.
How ever in Table 3 we add the following data:
> log_id - User_template1_data - date - client - earnings 1 - 1 -
> 2016,01,01 - Gucci SLR - 5000,00 2 - 1 - 2016,02,01 - Prada SLR -
> 51000,00 3 - 1 - 2016,03,01 - Chanel SLR - 15000,00
This connection should be encrypted, so nobody who hacks into the database can see the connections. I was thinking of the following, please correct me if i'm wrong:
> log_id - User_template1_data - date - client - earnings 1 - hash and
> salt the User_template1_data - 2016-01-01 - Gucci SLR - 5000 2 - hash
> and salt the User_template1_data - 2016-02,01 - Prada SLR - 51000 3 -
> hash and salt the User_template1_data - 2016-03-01 - Chanel SLR -
> 15000
This way only the person who is logged in, stores the sensitive data encrpyted, it will not have be be decrypted then.
I AM A ROOKIE TO ENCRYPTING SO PLEASE ADVISE ME IF I'M WRONG
Most importantly, which encryption should i use, how should i build it up etc. Much appreciated!

In general you should set up the database so it can be trusted to be safe, but it could be a fun experiment to solve this. The most important part is using standard encrypt methods in your programming language, in case of php go for the openssl package.
Hash with salt will not help much here, since hashing is a one way operation so not even the user owning the data will be able to decrypt it.
I would have gone for something like
Use an encryption key which you store encrypted with the users password that you decrypt and put in session when the user log in. Of course make sure sessions are encrypted on the disc and that you store the password using bcrypt, but encrypt the key using the clear text password so that the encryption key cannot be found.
When storing sensitive data encrypt it using the key in session and when getting the data decrypt it using the same key
Remember to update the stored encrypted key when the user change password
I suggest using AES-256-CBC for encryption.
If this is going to be used for something more than experimenting or learning I suggest first to search for existing solutions for this.
Also remember if the user forgets the password it will be no way of getting the data back

Related

how to save users visits in an optimized way in Laravel 5.4

I am saving my users visits statistics like this in mySQL database :
visit_id
visit_page
visit_route
visit_referrer
visit_ip_address
visit_device
created_at
updated_at
my question is will my mySQL database handle this amount of visits like 100,0000 per day? and can it make problems in 5 years being saved like this?(100000 per day)
and if the answers are YES, so what is the optimized way to do this(I don't want to empty my visits table)

First of all, you may need to rethink MySQL. If you want to keep staying in RDBs you may want to research postgres, and with some optimization, it can handle the number of rows. On the other hand if you open to switch to NoSQL, then i recommend elastic search. It can handle this kind of data very well.
If you chose to stay with postgres/mysql. Then you can restructure your schema by separating the visitor data (unique users) from the visited_pages data as follows:
visitors
- id
- ip_address
- device
- first_visit (created_at)
- latest_visit (updated_at)
visited_pages
- id
- page_title
- page_route
- first_visit (created_at)
- latest_visit (updated_at)
page_visit
- id
- visitor_id
- page_id
- visited_at (created_at) //no need for updated at
The largest table will be the last one and it won't contain much data. and you will not have to use the same data like route, page title, ip address every time.

PHP-MYSQL encryption that is safe

I'm working on a messagging app, and I would like to encrypt the messages the safest way possible.
Currently I'm doing the following:
chatMessages table
userId | message
---------|---------------------
1 | ENCRYPTED%withChatMasterKey%
2 | ENCRYPTED%withChatMasterKey%
...
chatUsers table
userId | chatMasterKey (it is encrypted aswell with users ID's)
---------|---------------------
1 | ENCRYPTED%withUserId(1)%
2 | ENCRYPTED%withUserId(2)%
...
The messages are encrypted with a random masterKey that are stored in the chatUsers table where with another encryption the masterKey is encrypthed but now the key is the user's Id
So to decrypt the messages, you have to do this in php:
$message = $messagesArray[0]["message"];
$chatMasterKey = decrypt($chatUser[0]["chatMasterKey"], withKey: $userId);
$messageDecoded = decrypt($message, withkey: $chatMasterKey);
Tthat works well, if somebody hacks the dataBase then he would not be able to read out the messages.
But is this safe? If someone hacks the .PHP files aswell, then he will be able to figure out the method I'm using and could be able to decrypt the messages.
Is there a better way to encrypt messages?
/ for security reasons I changed some details in the post /

Using PHP to display varying text based on user, and recording click-through rate

Let's say that I have 3 different headlines for an article:
"Man Bites Dog"
"This Man Unhinged His Jaw as He Approached A Dog, What Happens Next Will Shock You!"
"Only 90's Kids Will Remember That Time a Man Bit a Dog"
I want to use PHP to randomly display one of these three headlines based on the current user (so they're not getting new headlines each time they refresh), then record the number of clicks for each version of the headline via SQL where I get something similar to:
USER HEADLINE CLICK?
1    1     No
2    3     Yes
3    2     Yes
4    3     No
5    2     Yes
6    1     No
Specifically, I'd like advice about:
- Retrieving some sort of variable that's unique to the user (IP address, maybe?)
- Randomly assigning a number (1-3, in the example) based on that unique user variable.
- Displaying different text based on the assigned number.
I can figure out the SQL stuff once I figure this part out. I appreciate any advice you can provide.

You have three problems here:
How to identify user constantly
How to count user clicks(actions)
How to get result statistics
Here I think that showing different subjects on one page is not a problem
Problem 1
Basically you can use an IP address but it is not a constant id for user. For example if user uses mobile phone and walks, he can switch between towers or loose connection and then restore it with different ip.
There are many ways to identify user by the web, but there is no way to identify user on 100% without authorization (active action done by user)
For example you can set Cookie to user with his generated ID. You can easily generate id you can look here. When you set up cookie and user will come back to you, you will know who it is and do the stuff you need.
Also within user uniqueness you can reed this article - browser uniqueness
Also if you use Cookie, you can easily store there subject id for your task. If you will not use Cookie i recommend you use mongodb for this kind of tasks (many objects with small data, that must be retrieved from db very fast, inserted to db very fast and there are no updates in your case)
Problem 2
You showed table that has 3 fields: ID, Used title, Is title clicked.
In this kind of table you will lose all not unique click (when user clicks on subject twice, goes there tomorrow or refreshes target page multiple times)
I suggest you to use following kind of table
ID - some unique id, auto increment field will be good here
Date - some period of measurements (daily, hourly or something like that)
SubjectID - id of subject that was shown
UniqueClicks - count of users that clicks on subject
Clicks - Total count of clicks on subject
In this case you will have aggregated data by period of time and you will easily show data in admin panel
But still we have problem with collecting this data. Solution of this problem depends on count of users. If there is more than 1000 clicks in minute, I think that you need some logging system. For example you will send all data to file 'clickLog-' . date('Ymd_H') . '.log' and send data to this file in some static format, for example:
clientId;SubjectId;
When hour is end you can aggregate this data by shell script or your code and put it to db:
cat clickLog-20160907_12.log | sort -u | awk -F';' '{print $2}' | sort | uniq -c
after this code you will have 2 columns of data. First will be count of unique clicks and second will be subject id
Modifying this script you can get total clicks with just removing sort -u section
Also if you have several subject ids you can do it with for:
For example bash script for unique clicks can be following
for i in subj1 subj2 subj3; do
uniqClicks=$(cat clickLog-20160907_12.log |
grep ';'$i'$' |
sort -u |
wc -l);
clicks=$(cat clickLog-20160907_12.log |
grep ';'$i'$' |
wc -l);
# save data here
done
After this manipulations you will have prepared aggregated data for calculating and source data for future processing (if needed)
And also your db will be small and fast and all source data will be stored in files.
Problem 3
If you will do solution in Problem 2 section, all queries for getting statistic will be so simple, that your database will do it very fast
For example you can run this query in PostgreSQL:
SELECT
SubjectId,
sum(uniqueClicks) AS uniqueClicks,
sum(clicks) AS clicks
FROM
statistic_table
WHERE
Date BETWEEN '2016-09-01 00:00:00' and '2016-09-08 00:00:00'
GROUP BY
SubjectId
ORDER BY
sum(uniqueClicks) DESC
in this case if you have 3 subject ids and hourly based aggregation you will have 504 new rows in weeks (3 subjects * 24 hours * 7 days) that is really small amount of data for database.
Alternatives
You can also use Google Analytics for all calculations. But in this case you need to do some other steps. Most of them are configuration steps that need to be done to enable google analytics monitoring scripts on your site. If you have it, you can easily configure goals support and just apply to script additonal data with subjectid by using GA script api

You can use the IP of the user or his MAC, if the user is registered on the web you can use the user id.
For the second part you can use the function mt_rand() for PHP:
mt_rand(min,max) -> if you want a number bettween 1 and 3 user mt_rand(1,3);
the use an array to store the three diferent headlines and use the ramdomly generated number to acces the array.
Better you can generate a number bettween 0-2 because the arrays start with 0.

Best way to scale data, decrease loading time, make my webhost happy

For a Facebook Application, I have to store a list of friends of a user in my MySQL database. This list is requested from my db, compared with other data, etc.
Currently, I store this list of friends within my user table, the uids of the friends are put together in one 'text' field, with a '|' as separator. For example:
ID - UID - NAME - FRIENDS => 1 - 123456789 - John Doe - 987654321|123456|765432
My PHP file requests this row and extracts the list of friends by exploding that field ('|'). This all works fine, every 1000 users are about 5MB diskspace.
Now the problem:
For an extra feature, I also need to save the names of the friends of the user. I can do this in different ways:
1) Save this data in an extra table. For example:
ID - UID - NAME => 1 - 1234321 - Jane Doe
If I need the name of the friend with ID 1234321, I can request the name from this table. However, the problem is that this table will keep growing, until all users on Facebook are indexed (>500million rows). My webhost is not going to like this! Such a table will take about 25GB of diskspace.
2) Another solution is to extend the data saved in the user table, by adding the name to the UID in the friends field (with an extra separator, let's use ','). For example:
ID - UID - NAME - FRIENDS => 1 - 123456789 - John Doe - 987654321,Mike Jones|123456,Tom Bright|765432,Rick Smith
For this solution I have to alter the script, to add another extra explode (','), etc. I'm not sure how many extra diskspace this is going to take... But the data doesn't get easy to handle this way!
3) A third solution gives a good overview of all the data, but will cause the database to be huge. In this solution we create a table of friends, with a row for every friendship. For example:
ID - UID - FRIENDUID => 1 - 123456789 - 54321
ID - UID - FRIENDUID => 3 - 123456789 - 65432
ID - UID - FRIENDUID => 2 - 987654321 - 54321
ID - UID - FRIENDUID => 4 - 987654321 - 65432
As you can see in this example, it gives a very good overview of all the friendships. However, with about 500million users, and let's say an average of 300 friendships per user, this will create a table with 150billion rows. My host is definitely not going to like that... AND I think this kind of table will take a lot of diskspace...
So... How to solve this problem? What do you think, what is the best way to store the UIDs + names of friends of a user on Facebook? How to scale this kind of data? Or do you have another (better) solution than the three possibilities mentioned above?
Hope you can help me!

If I need the name of the friend with
ID 1234321, I can request the name
from this table. However, the problem
is that this table will keep growing,
until all users on Facebook are
indexed (>500million rows). My webhost
is not going to like this! Such a
table will take about 25GB of
diskspace.
If storing the names of the users you need really takes 25GB, then it takes 25GB. You can't move data around and expect it to get smaller - and the overhead of a table is not that much. Instead, you need to focus on only storing the data you actually need. It is unlikely that everyone on Facebook uses your application (if it were the case, you shouldn't be using a host where 25GB of space is a worry).
So instead of indexing the entirety of Facebook (which would be difficult regardless), just store the data relevant for the people who actually use your application and their immediate friends, which is a much smaller dataset.
Your first proposed solution is the proper way to do it; it eliminates any potential redundancy in name storage.

I agree with Amber, solution 1 is going to be the most efficient way to store this data. If you want to stick with your current approach (similar to solution 2), you may want to consider storing the friendship data as a JSON string. It won't produce the shortest possible string, but it will be very easy to parse.
To save the data:
$friends = array(
'uid1' => 'John Smith',
'uid2' => 'Jane Doe'
);
$str = json_encode($friends);
// save $str to the database in the "friends" column
To get the data back:
// get $str from the database
$friends = json_decode($str, TRUE);
var_dump($friends);

I really think you should go with the third option. For scalability you would want to do this.
With the first method you have a LOT of redundant data because if 1 is friends with 2, 2 is also friends with 1. But you are storing both relations.
This also makes the 150 billion row count impossible. It is more likely that this will be at most half, because the relations table can work both ways!!
So the first user will generate 300 rows in the table, but the second user (if he is friends with 1) will generate just 299. Continue to do so and the last user won't even generate a relation row, because they are all already present!
Also when you want to start searching for certain relations the third option will be much faster since you'll have a int index in stead of a fulltext index which probably saves another 50% in both storage and processing speed.
If your application will reach 500 million users you will just have to get a better hosting service.

Dilemma, searching a hashed field when no other information is known

I'm having a dilemma. I have a field hashedX that is a hashed/salted value and the salt is saved in the same row in the mysql database as is common practice.
hashedX saltX
------ ------
hashed1 ssai3
hashed2 woddp
hashed3 92ofu
When I receive inputX, I need to know if it matches any of the values in hashedX such as hashed1 hashed2 or hashed3. So typically I would take my input, hash/salt it, and compare it to the values of hashedX. Pseudo code:
$hashed_input = hash ($input with $salt );
select * from tablename where $hashed_input is hashedX
The problem is I don't know which saltX I need to even get to the $hashed_input before I can do any select.
I could go through the database rows, one by one, try that salt on my input, then check if the input as hashed/salted with this salt matches hashedX of that same row. If I have a 100,000 records, my guess is that this would be painfully slow. I have no idea how slow since I'm not that great at databases.
Is there a better way to do this, than selecting all rows, looping through them, using that row's salt to hash input, then comparing again to the hashed value in the db?

If it is possible (depends on your hash formula) define a MySQL User Defined Function database side for the hash formula (see CREATE FUNCTION). This way you will be able to get your results in one simple request:
SELECT hashedX, saltX FROM tablename WHERE UDFhash(input, saltX) = hashedX ;

You don't specify which hash algorithm you're using in PHP. MySQL supports MD5 and SHA1 hash algorithms as builtin functions:
SELECT ...
FROM tablename
WHERE SHA1(CONCAT(?, saltX)) = hashedX;
SHA2 algorithms are supported in MySQL 5.5, but this is only available in pre-beta release at this time. See http://dev.mysql.com/doc/refman/5.5/en/news-5-5-x.html for releases.

Is there a better way to do this, than selecting all rows, looping
through them, using that row's salt to
hash input, then comparing again to
the hashed value in the db?
Yes. A much better way.
Typically a salt is only used to prevent exactly what you are trying to do. So either you don't want to use a salt, or you don't want to do this kind of lookup.
If you are checking an entered password against a given user account or object, you should reference the object on the same line that you have the salt and hashed salt+password. Require the account name / object to be referenced when the password is given, then look up the row corresponding to that account name and object and compare the password against that salt + hash.
If you are keeping a record of items that you've seen before, then you should just go with a hash, (or a bloom filter) and forget the salt, because it doesn't buy you anything.
If you're doing something new / creative, please describe what it is.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.