Recording user pageviews, then turning this into useful data? - php

My site logs clicks to a database whenever a user views an article. This table is automatically cleared out every 3 days. We use the data to work out the most viewed pages over that 3 day period.
I'd also like to use this data for marketing purposes, eg, to determine which users like which sections of the site.
I've written a script in php that does the following:
grab the user IDs of any logged-in member who viewed an article in the past 3 days
for each user, i query to count how many times they viewed articles within each section, Eg, Bobby viewed 10 pages in Food & Drink, and 6 pages in Sport. I tried combining this step and the previous one together but got weird results for the totals.
This gives me a nice array in this form:
[Bobby Jones] => Array
(
[Film] => 10
[Home Page] => 1
[Food & Drink] => 2
[Health & Beauty] => 1
[Gifts & Gadgets] => 3
)
What I want from this data is to eventually have a table that logs the data in the array above, and then when I run my queries, increments it.
Unfortunately, this adds a lot of overhead. When I have my arrays like the one above, I have to run another query to check if that combination of user and category already exists in the database, and if it does, I have to increment it by that day's values. Eg, if Bobby viewed 10 Film pages last week, and this week viewed 6, I need to UPDATE the table. If he's never viewed a Food page before, I need to INSERT instead.
This query is returning 400 or so users who've interacted with the site in the last 3 days. This means that for each I user I have to do 1 query to get their browsing totals, 1 query to see if they've already browsed that category before, and another query to update/insert, depending on whether they've browsed it or not. You can see how inefficient this is.
Can anyone suggest a better way of doing this? My ultimate goal is to end up with a table that shows me how frequently my users browse my categories, so I can say "show me all the users who like Food & Drink" etc.
Thanks,
Matt

You can accomplish the UPDATE/INSERT behavior using MySQL's INSERT...ON DUPLICATE KEY UPDATE.
You can combine the SELECT and INSERT queries using MySQL's INSERT...SELECT.
If you post more details (say, a schema, for example), I could show you a sample query combining those two techniques, though the MySQL Manual is pretty in-depth on both subjects.

If you're using MySQL and the version is sufficiently high, look into INSERT ... ON DUPLICATE KEY UPDATE. That should cut down on a query.
Then make sure your tables are properly keyed, and those two queries should be a breeze.

Related

MySQL Performance for Online Games Highscore Lists

I have a question about making "Highscore-Lists".
Lets say I have an online game with 1.000.000 active users. Each user has points from 0 to X. Now, I want to show a ranking-list. It would be insane to show all million entries in one page so it is divided into Y pages (100 entries each page => 10.000 pages).
I am not really sure how to solve it.
1. The easiest way to do that would be loading all 1m entries
in one SELECT, get the result and find current user with a for loop and show that specific page. (but all other 999.900 entries will be saved in RAM eventhough its not showing up). For a page change I could just use the result data with no second database call. (So I don't care about point changes during that time)
SELECT UserName, UserID, Points FROM UserAccount ORDER BY Points;
2. My second idea was, to load each page individually but than I do not know
2.1 if it is really better performance
2.2 how to get the right start page because I only have the points of the user but not really his place
So how could I solve that problem. I dont really know what mysql can handle. Are more small calls better then one huge call.
Can I even save huge result data?
Second solution would update all changed points with each page change, though but i care more about performance then always uptodate list-data.
Thank you for your help!
Markus
Use pagination. In SQL it's a "limit" clause:
SELECT UserName, UserID, Points FROM UserAccount ORDER BY Points LIMIT 0, 20;
The above query will return only the first 20 rows of the original selection.
You can pass page parameters via get, like this: highscore.php?page=1 or ?page=2 and so on.

PHP MySQL advanced filtering

For a new version of a website, I am making a "Top User" section on the homepage. This allows people to vote for the user on a bunch of different categories. All this information is stored in a MySQL database in two main tables. One table has the information(id*autogenerated*, Username, Date added, etc.) and the other has the rates (id*autogenerated*, linkID, rowAvg, avg, avg2, etc.). The linkID from the rate table corresponds to the id from the information table. How the query works is it queries through the rate_table, orders it by highest averages then selects the first 10 results. Then using the linkID of that row, I select the actual information for that user from the info_table. This is fine and dandy as long as each user only has 1 rate. What happens when a user has 2 or more rates, and both of the rates are positive, is the query selects both of those and pulls the information for each of the 2 rows which is then displayed as two different users even though it is the same person. How can I make the rate_table query know when there are multiple rates for the same linkID and average them together instead of showing them as two different people. I was thinking of inserting each linkID into an array, then for each subsequently selected row, check if that linkID is already in the array. If not, insert it, if it is, then average them together and store it in the array and use the array to populate the table on the homepage. I feel like I am overthinking this. Sorry if this is a little confusing. If you need more information, let me know.
P.S. I'm still learning my way around MySQL queries so I apologize if I am going about this the completely wrong way and you spent the last few minutes trying to understand what I was saying :P
P.P.S. I know I shouldn't be using MySQL_Query anymore, and should be doing prepared statements. I want to master MySQL queries because that is what FMDB databases for iOS use then I will move onto learning prepared statements.
Take a look at some of the MySQL Aggregate Functions.
AVG() may be what you're looking for. The average of one result is just that result, so that should still work. Use the GROUP BY clause on the column that should be unique in order to run the aggregate calculation on the grouped rows.

Save additional information to MYSQL Database and use a simple query, or use complex query?

I have a drupal site, and am trying to use php to grab some data from my database. What I need to do is to display, in a user's profile, how many times they were the first person to review a venue (exactly like Yelp's "First" tally). I'm looking at two options, and trying to decide which is the better way to approach it.
First Option: The first time a venue is reviewed, save the value of the reviewer's user ID into a table in the database. This table will be dedicated to storing the UID of the first user to review each venue. Then, use a simple query to display a count in the user's profile of the number of times their UID appears in this table.
Second Option: Use a set of several more complex queries to display the count in the user's profile, without storing any extra data in the database. This will rely on several queries which will have to do something along the lines of:
Find the ID for each review the user has created
Check the ID of the venue contained in each review
First review for each venue based on the venue ID stored in the review
Get the User ID of the author for the first review
Check which, if any, of these Author UIDs match the current user's UID
I'm assuming that this would involve creating an array of the IDs in step one, and then somehow executing each step for each item in the array. There would also be 3 or 4 different tables involved in the query.
I'm relatively new to writing SQL queries, so I'm wondering if it would be better to perform the set of potentially longer queries, or to take the small database hit and use a much much smaller count query instead. Is there any way to compare the advantages of either, or is it like comparing apples and oranges?
The volume of extra data stored will be negligible; the simplification to the processing will be significant. The data won't change (the first person to review a venue won't change), so there is a negligible update burden. Go with the extra data and simpler query.

Need to user-to-user data in the database. How to avoid a disaster?

In my requirements, every user on the website can see a score attached to other users. It gets calculated based of their profile parameters. My score to someone else will be one, but their score to me will be another one.
What I have done so far
Table in the MySQL database like so:
___UserID1___|___UserID2___|___Score___|___Last_Uopdated___
1 | 2 | 45 | 1235686744
2 | 1 | 24 | 1235645332
When a user views someones page, my score class is checking if the record for this pair exists in the database and if not, calculates it and records it. This works fine, because no one will look at absolutely every user page on the site.
Now I need to pull users and sort them based on score. SO I thought, I can create a cronjob, and run it every night, so it will update scores in the database and create them for every pair of user both ways.
Well, problem is I am planing a system for over 500,000 users and I am worried, it will bring my database down and create huge database. So for 500,000 we are talking about 250 billion records... :/
Does anyone know any other way of creating this feature? May be calculation on the fly... or any other way?
If I was in your situation I would create the calculation on the fly. I would generate the scores using your function and then store the values into the database then. That way whenever any user visits any page, the scores are updated. This is an incremental approach rather than trying to run the function on every single combination possible at once. Plus no more database disaster :)
If you have a page that ranks all the users by score, it would be much simpler if you use pagination and use the ORDER BY and OFFSET, LIMIT features of SQL queries instead of fetching all users at once.

MySQL - creating an SQL Algorithm to determine random 'popular' content

I'm looking to create an SQL query (in MySQL) that will display 6 random, yet popular entries in my web application.
My database has the following tables:
favorites
submissions
submissions_tags
tags
users
submissions_tags and tags are cross-referencing tables that give each submission a certain number of tags.
submissions contains boolean featured, int downloads, and int views, all three of which I'd like to use to weight this query with.
The favorites table is again a cross-reference table with the fields submission_id and user_id. Counting the number of times each submission has been favorited would be good to weigh the results with.
So basically I want to select 6 random rows weighted with these four variables - featured, downloads, views, and favorite count. Each time the user refreshes the page, I want a new random 6 to be selected. So maybe the query could limit it to 12 most-recent but only pluck 6 random results out to show. Is that a sensible idea in terms of processing etc.?
So my question is, how can I go about writing this query? Where should I begin? I am using PHP/CodeIgniter to drive this site with. Is it possible to get the entire lot in one query, or will I have to use multiple queries to do this? Or, do I need to simplify my ideas?
Thanks,
Jack
I've implemented something similar to this before. The route I took was to have a script run on the server every XX minutes to fill a table with a pool of items (say 20-30 items). Then the query to use in your application would be randomly pick 5 or so from that table.
Just need to setup an algorithm to select those 20-30 items. #Emmerman's is similar to what I used before to calculate a popularity_number where I took weights of multiple associations to the item (views, downloads, etc) to get an overall number. We also used an age to make sure the pool of items stayed up-to-date. You'll have to tinker with the algorithm over time to make sure the relevant items are being populated.
The idea is to calc some popularity which can be for e.g.
popularity = featured*W1 + downloads*W2 + views*W3 + fcount*W4
Where W1-W4 are constant weights.
Then add some random number to popularity and sort for it.

Categories