I would like to ask you for advice related to my own analytics system.
So far my system collects all the clicks and save them in a SQL database.
First part of analytics.
The SQL database logs looks like this:
+----+----------------------+-------------+---------------------------------------------+----------------+--------------+----------+
| id | time | address | address_to | ip | resolution | id_guest |
|----+----------------------+-------------+---------------------------------------------+----------------+--------------+----------|
| 1 | 2013-12-03#14:31:35 | index.php | https://www.youtube.com/watch?v=6VJBBUqr1wM | 89.XX.XXX.6 | 1366x768 | 6 |
| 2 | 2013-12-03#14:48:21 | file.php | https://www.youtube.com/watch?v=0EWbonj7f18 | 89.XX.XXX.6 | 1366x768 | 6 |
| 3 | 2013-12-03#16:16:55 | contact.php | https://www.youtube.com/watch?v=_o-XIryB2gg | 178.XX.XXX.140 | 1920x1080 | 11 |
| 4 | 2013-12-03#16:21:32 | index.php | https://www.youtube.com/watch?v=z0M96LyTyX4 | 178.XX.XXX.140 | 1920x1080 | 11 |
| 5 | 2013-12-03#16:44:32 | movies.php | https://www.youtube.com/watch?v=cUhPA5qIxDQ | 178.XX.XXX.140 | 1920x1080 | 11 |
+----+----------------------+-------------+---------------------------------------------+----------------+--------------+----------+
Each click is added to the database as a new record.
All movie on my website is on second table in SQL database (movies):
+----+----------------------+-------------+---------------------+
| id | name | address | tags |
|----+----------------------+-------------+---------------------|
| 1 | 2013-12-03#14:31:35 | 6VJBBUqr1wM | bass,electro,trance |
| 2 | 2013-12-03#14:48:21 | 0EWbonj7f18 | electro,house,new |
| 3 | 2013-12-03#16:16:55 | _o-XIryB2gg | electro,party,set |
| 4 | 2013-12-03#16:21:32 | z0M96LyTyX4 | trance,house,new |
| 5 | 2013-12-03#16:44:32 | cUhPA5qIxDQ | techno,new,set |
+----+----------------------+-------------+---------------------+
Everything works flawlessly. In the database I have all the movies viewed by the user, which I want precisely define, so write down the IP + resolution.
First question:
Is this a good method for determining user?
--
Second part of analytics.
Now I want to use the collected logs and display interface with movies based on browsed materials.
I choose all logs from the database for the user who enters the website.
From the logs I choose identifier film and scan it in the table components for take logs and put into an array. For example, a user with ID = 6 will have an array:
array(
[0] = > bass,
[1] = > electro,
[2] = > trance,
[3] = > electro,
[4] = > house,
[5] = > new
);
Now I will sort the contents of the array in order of most frequently occurring:
array(
[2] = > electro,
[1] = > bass,
[1] = > trance,
[1] = > house,
[1] = > new
);
On the basis of the contents of the array can show user videos that might interest him.
Everything worked perfectly, but the problem I discovered only now ...
In the table logs I've had more than 4.5 million records. As you can imagine, searching of such a large number of records takes a lot of time and enter the site sometimes lasts up to 10 seconds...
I hope my poor English is fairly clear.
Please, any advice how to solve this problem with loading page.
Use indexes where needed, hard to tell exactly where - you didnt show any queries - basically you want to have indexed columns in WHERE part of the queries and also in JOINS. You dont have to index column that stays the same most of the time - isloggedin, isadmin, language or so
Make search tables for data you nead to search - for example if you need to know prefered resolution or how many times user had visited a site, you can make a cron job to parse this data for all users and store it in search table. This can be also used to make some statistics if you need it. For those Tags you could have a table with user_id, tag, count
If you need only last visited site, last resolution,... just make a table for that, where you can store and update one row for every user
Related
I'm struggling to come up with an efficient solution to determine user access to a specified folder, using PHP (specifically Laravel) and MySQL. I want to create a system that has Google Drive-esque functionality...
For example, Joe Bloggs creates many folders within folders, e.g. Level 1 > Level 2 > Level 3 > Level 4 > Level 5. Within any of these folders, can be any number of additional sub files and folders.
This would be the resulting database structure -
Table name: users
| id | name |
| -- | ---------- |
| 1 | Joe Bloggs |
| 2 | John Snow |
Table name: folders
| id | parent_id | author_id | name |
| -- | --------- | --------- | --------- |
| 1 | NULL | 1 | Level 1 |
| 2 | 1 | 1 | Level 2 |
| 3 | 2 | 1 | Level 3 |
| 4 | 3 | 1 | Level 4 |
| 5 | 4 | 1 | Level 5 |
| 6 | 2 | 1 | Level 3.1 |
| 7 | 2 | 1 | Level 3.2 |
Table name: folders_users
| id | folder_id | user_id | owner | read | write |
| 1 | 1 | 1 | 1 | 1 | 1 |
| 2 | 3 | 2 | 0 | 1 | 1 |
So based on record 1 in folder_users, Joe Bloggs should have owner, read & write permissions for all folders underneath Level 1. Joe Bloggs, then gives John Snow read & write access to Level 3, which in turn should give Joe Bloggs read & write access to Level 3, Level 3.1, Level 3.2 and anything created under any of these in future.
Additionally, it should be possible for a user to star a folder. I'd imagine this can simply be achieved with a separate table and query this separately -
Table name: starred_folders
| id | folder_id | user_id |
| -- | --------- | ------- |
| 1 | 7 | 2 |
The current solution I have is for every folder in the chain a user has permission to access, a record is created in the folders_users table. I feel like this is just overcomplicating things and creating excessive numbers of records. This is especially true when it comes to sharing a folder as I have to recreate the entire tree for that one user. Or, imagine if a user revokes write access to one of the shared users, the entire tree (potentially hundreds of records) has to be updated for a single flag.
What would be the best way to generate these trees, and to quickly and efficiently determine the user's access level in any given folder? I suspect the only way to do this is recursion, but I'm concerned about its efficiency? Or, should I perhaps be using something entirely different from MySQL for this? I've had a brief look into graph databases but I can't see it being a way forward for us as we don't have the infrastructure to support it.
Thanks,
Chris.
I'm writing this as a solution not the most efficient one.
You can add a column to your folders table (let's call it access) and then put ids of people that have access to that folder and it's children. I assume when you want to show information about a folder you must get its parents information from table as well so you won't need to add new queries for that.
And if you just have a access definition you can simple add records to this column like user1,user2,... and if not you can serialize an array like this
[
"read" => [user1,user2,...],
"write" => [user2]
]
Of course you can add a column for each access but if you have so many accesses this might be a solution too.
For an online game, I have a table that contains all the plays, and some information on those plays, like the difficulty setting etc.:
+---------+---------+------------+------------+
| play-id | user-id | difficulty | timestamp |
+---------+---------+------------+------------+
| 1 | abc | easy | 1335939007 |
| 2 | def | medium | 1354833214 |
| 3 | abc | easy | 1354833875 |
| 4 | abc | medium | 1354833937 |
+---------+---------+------------+------------+
In another table, after the game has finished, I store some stats related to that specific game, like the score etc:
+---------+----------------+--------+
| play-id | type | value |
+---------+----------------+--------+
| 1 | score | 201487 |
| 1 | enemies_killed | 17 |
| 1 | gems_found | 4 |
| 2 | score | 110248 |
| 2 | enemies_killed | 12 |
| 2 | gems_found | 7 |
+---------+----------------+--------+
Now, I want to make a distribution graph so users can see in what score percentile they are. So I basically want the boundaries of the percentiles.
If it would be on a score level, I could rank the scores and start from there, but it needs to be on a highscore level. So mathematically, I would need to sort all the highscores of users, and then find the percentiles.
I'm in doubt what's the best approach here.
On one hand, constructing an array that holds all the highscores seems like a performance heavy thing to do, because it needs to cycle through both tables and match the scores and the users (the first table holds around 10M rows).
On the other hand, making a separate table with the highscore of users would make things easier, but it feels like it's against the rules of avoiding data redundancy.
Another approach that came to mind was doing the performance heavy thing once a week and keep the result in a separate table, or doing the performance heavy stuff on only a (statistically relevant) subset of the data.
Or maybe I'm completely missing the point here and should use a completely different database setup?
What's the best practice here?
I have a problem that I can't figure out, I'm not experienced enough (or it can't be done!) I've trawled Google for the answer with no luck.
I have a system where I need to assign an ID to each row, with the ID from another table. The catch is that the ID must be unique for each row created in this batch.
Basically I'm selling links on my Tumblr accounts, I need to assign a Tumblr account to each link that a customer purchases but I want to assign all possible Tumblr accounts so that duplicates are kept to the minimum possible.
The URLs - each link that a customer buys is stored in this table (urls_anchors):
+----------+--------------------+------------+-----------+------+
| clientID | URL | Anchor | tumblrID | paid |
+----------+--------------------+------------+-----------+------+
| 1234 | http://example.com | Click here | 67 | Yes |
| 1234 | http://example.com | Click here | 66 | Yes |
| 1234 | http://example.com | Click here | 65 | Yes |
| 1234 | http://example.com | Click here | 64 | Yes |
+----------+--------------------+------------+-----------+------+
All of the Tumblr accounts available for allocation are stored in this table (tumblrs):
+----------+-------------------+------------+
| tumblrID | tumblrURL | spacesLeft |
+----------+-------------------+------------+
| 64 | http://tumblr.com | 9 |
| 65 | http://tumblr.com | 9 |
| 66 | http://tumblr.com | 9 |
| 67 | http://tumblr.com | 9 |
+----------+-------------------+------------+
My best attempt at this has been the following query:
INSERT INTO `urls_anchors` (`clientID`, `URL`,`Anchor`, `tumblrID`, `paid`) VALUES ('$clientID','$url','$line', (SELECT #rank:=#rank+1 AS tumblrID FROM tumblrs WHERE #rank < 68 LIMIT 1), 'No')
Which works but keeps adding incrementally indefinitely, when there are only X number of Tumblrs to assign. I need the query to loop back around when it reaches the last row of Tumblrs and run through the list again.
Also i'm using this in a PHP script, I'm not sure if that's of any significance.
Any help would be MASSIVELY appreciated!
Thanks for looking :)
You can use a SELECT query as the source of data to insert.
INSERT INTO urls_anchors (`clientID`, `URL`,`Anchor`, `tumblrID`, `paid`)
SELECT '$clientID','$url','$line', tumblrID, 'No'
FROM tumblrs
LIMIT $number_of_rows
DEMO
This will assign $number_of_rows different tumblrID values to the rows.
If you need to assign more tumbler IDs than are available, you'll need to do this in a loop, subtracting the number of rows inserted from $number_of_rows each time. You can use mysqli_affected_rows() to find out how many rows were inserted each time.
OK, Last post on this subject (I hope). I've been trying to look into normalisation for tables in a website that I've been building and I have to be honest that I've struggled with it, however after my last post it seems that I may have finally grasped it and set my tables properly.
However, one question remains. If I create a table that is seemingly in 3rd normal form, is it acceptable to have areas of white space or empty cells if the data is relevant to that specific table? Let me give you an example:
On a news website I have an Authors_Table
+----+-----------+----------+-----------------+-------------------+---------+----------+---------+
| ID | FIRSTNAME | SURNAME | EMAIL | BIO ( REQUIRED ) | TWITTER | FACEBOOK | WEBSITE |
+----+-----------+----------+-----------------+-------------------+---------+----------+---------+
| 01 | Brian | Griffin | brian#gmail.com | About me... | URL | | URL |
| 02 | Meg | Griffin | meg#gmail.com | About me... | URL | | |
| 03 | Peter | Griffin | peter#gmail.com | About me... | | URL | URL |
| 04 | Glen | Quagmire | glen#gmail.com | About me... | URL | URL | |
+----+-----------+----------+-----------------+-------------------+---------+----------+---------+
This would be used on the article page to give a little details about who has written it, which is very common in newspapers and on modern blogs. Now the last 3 columns Facebook, Twitter, Website are obviously relevant to the Author & therefore to the PK (ID). As you know though, not everyone has either twitter or a wesbite or facebook so the content of these cells is rather flexible so obviously empty cells will occur in some cases.
It was suggested to do it another way so I produced:
Links
+----+-------------------+
| ID | TYPE |
+----+-------------------+
| 01 | Facebook |
| 02 | Twitter |
| 03 | Website |
+----+-------------------+
Author_Links
+----------+--------+------+
| AUTHOR | TYPE | LINK |
+----------+--------+------+
| 01 | 01 | URL |
| 01 | 02 | URL |
| 01 | 03 | URL |
| 02 | 02 | URL |
| 02 | 03 | URL |
| 03 | 01 | URL |
+----------+--------+------+
Now I understand the concept of this however isn't it just as "correct" to have and to use the original table. Updates can be made using a form & php to say:
$update_link_sql = "UPDATE authours SET facebook = ' NEW VALUE ' WHERE id = '$author_id'";
$update_link_res = mysqli_query($con, $update_links_sql);
As for me Authors_Table is correct.
| ID | FIRSTNAME | SURNAME | EMAIL | BIO ( REQUIRED ) | TWITTER | FACEBOOK | WEBSITE |
The only reason to have three tables:
Authors
| ID | FIRSTNAME | SURNAME | EMAIL | BIO ( REQUIRED ) |
Link_types
| ID | TYPE |
Author_links
| AUTHOR_ID | LINK_TYPE_ID | URL |
...is that your authors could have more than one link of specific type (for example two twitter accounts, btw, is it legal?)
If we suppose that any author can have no more than one account of each type - your version with single table is correct.
Either way is acceptable depending on functional requirements.
If you need to dynamically add more url types/fields to profile then use latter.
If there is ever going to be only 3 then former is better.
No need to over-engineer.
Yes, it's "correct" to store "optional" attributes as columns in the entity table. It's just when we have repeated values, e.g. multiple facebook pages for an author, for example, that we'd want to implement the child table. (We don't want to store "repeating" attributes in the entity table.)
As long as there's a restriction in the model, that an attribute will be limited to a single value (a single facebook page, a single twitter, etc.) those attributes can be stored in the entity table. We'd just use a NULL value to indicate that a value is not present.
One benefit of the separate table approach (outlined in your post) is that it would be "easier" to add a new "type" of URL. For example, if in the future we want to store a blogspot URL, or an instagram URL, instead of having to modify the entity table to add new columns, we can simply add rows to the "link_type" table and "author_link" table. That's the big benefit there.
Lets say we have following tables
Table Pages:
id | short_name | long_name | token
1 | Mail | My mail box | mail
2 | All mails | All mails | all
3 | Inbox | Inbox only | inb
4 | Users | Users | users
5 | All users | All users | all
and table navigation:
id | parent_id | page_id
1 | 0 | 4
2 | 0 | 1
3 | 1 | 2
4 | 1 | 3
5 | 4 | 5
I was working with only page ids for a long time. It was easy to find details of page with only 1 value - $_GET['id'], because ids of pages all are unique.
Now, I want to create human readable (token based) navigation system.
But there is 1 problem. Tokens are not always unique.
For ex. index.php?page=mail&subpage=all and index.php?page=users&subpage=all
Can't figure out, how to find short_name and long_name (or other information of page) for these 2 pages (by 2 - $_GET['page'] and $_GET['subpage'] or more variables)?
Maybe I'm in wrong way. If you think so, please suggest your idea, and explain. Thx in advance.
Sorry if this doesn't work out of the box, but does this help?
SELECT * FROM Pages
JOIN navigation ON Pages.id=navigation.page_id
WHERE navigation.parent_id=(SELECT id FROM Pages WHERE token={$page})
AND Pages.token={$subpage}