I'm working on already made Facemash-Alike Script. It's script that shows two pictures, and user make a choice which picture is better for him.
I wanted to create a small improvement that won't show a user the same combination of two pictures he already voted.
I tried to do this in two ways. But any of this ways is not good enough or not comfortable for user.
First one - Choices of two pictures are randomized. After vote, in database, new record is created with this specific combination, and value of vote. If combination of two pictures already exist as record in database then page shows historical vote, and after few seconds page refreshing, making another random combination.
Second one - In the moment when names of pictures are added to database then scripts creates all possible combinations as records in database. It's good way, because script pulls out from database a random record that doesn't contains any result, and after vote saves with a value. So it's no way to make any repeats. The main problem of this way is in the moment of adding new pictures. Database at the start becoming huge, and creating all possible combination at start taking forever.
Because of that I'm looking for another solution. I would like to hear even small advice that might help find me a way.
Your first approach scales better, you just want to avoid showing an historical vote. You need to keep a history of votes anyway, so use that history as a filter. In the SELECT statement you are using to get the random faces, left join on the history table to use the join as a filter.
Example:
SELECT faces.uid f_uid, votes.uid v_uid FROM faces
LEFT JOIN votes ON votes.user_id=# AND faces.uid=votes.face_id1 AND
faces.uid=votes.face_id2
WHERE v_uid IS NULL
ORDER BY RAND() LIMIT 2
That will make sure they never see the same face twice. It will become slower the more faces a user votes on. It won't be noticeably slower until they have done many hundreds of votes.
That said, you could change the LIMIT to something like 20 and cache that (i.e. in the session). You then have the next 10 pairings (20/2=10) ready to go. That is sort of a combination of 1 & 2.
Related
I have a question about making "Highscore-Lists".
Lets say I have an online game with 1.000.000 active users. Each user has points from 0 to X. Now, I want to show a ranking-list. It would be insane to show all million entries in one page so it is divided into Y pages (100 entries each page => 10.000 pages).
I am not really sure how to solve it.
1. The easiest way to do that would be loading all 1m entries
in one SELECT, get the result and find current user with a for loop and show that specific page. (but all other 999.900 entries will be saved in RAM eventhough its not showing up). For a page change I could just use the result data with no second database call. (So I don't care about point changes during that time)
SELECT UserName, UserID, Points FROM UserAccount ORDER BY Points;
2. My second idea was, to load each page individually but than I do not know
2.1 if it is really better performance
2.2 how to get the right start page because I only have the points of the user but not really his place
So how could I solve that problem. I dont really know what mysql can handle. Are more small calls better then one huge call.
Can I even save huge result data?
Second solution would update all changed points with each page change, though but i care more about performance then always uptodate list-data.
Thank you for your help!
Markus
Use pagination. In SQL it's a "limit" clause:
SELECT UserName, UserID, Points FROM UserAccount ORDER BY Points LIMIT 0, 20;
The above query will return only the first 20 rows of the original selection.
You can pass page parameters via get, like this: highscore.php?page=1 or ?page=2 and so on.
This question already has answers here:
Many database rows vs one comma separated values row
(4 answers)
Closed 8 years ago.
I'm interested how and why many to many relationship is better than storing the information in one row.
Example: I have two tables, Users and Movies (very big data). I need to establish a relationship "view".
I have two ideas:
Make another column in Users table called "views", where I will store the ids of the movies this user has viewed, in a string. for example: "2,5,7...". Then I will process this information in PHP.
Make new table users_movies (many to many), with columns user_id and movie_id. row with user_id=5 and movie_id=7 means that user 5 has viewed movie 7.
I'm interested which of this methods is better and WHY. Please consider that the data is quite big.
The second method is better in just about every way. Not only will you utilize your DBs indexes to find records faster, it will make modification far far easier.
Approach 1) could answer the question "Which movies has User X viewed" by just having an SQL like "...field_in_set(movie_id, user_movielist) ...". But the other way round ("Which user do have viewed movie x") won't work on an sql basis.
That's why I always would go for approach 2): clear normalized structure, both ways are simple joins.
It's just about the needs you have. If you need performance then you must accept redundancy of the information and add a column. If your main goal is to respect the Normalization paradigma then you should not have redundancy at all.
When I have to do this type of choice I try to estimate the space loss of redundancy vs the frequency of the query of interest and its performance.
A few more thoughts.
In your first situation if you look up a particular user you can easily get the list of ids for the films they have seen. But then would need a separate query to get the details such as the titles of those movies. This might be one query using IN with the list of ids, or one query per film id. This would be inefficient and clunky.
With MySQL there is a possible fudge to join in this situation using the FIND_IN_SET() function (although a down side of this is you are straying in to non standard SQL). You could join your table of films to the users using ON FIND_IN_SET(film.id, users.film_id) > 0 . However this is not going to use an index for the join, and involves a function (which while quick for what it does, will be slow when performed on thousands of rows).
If you wanted to find all the users who had view any film a particular user had viewed then it is a bit more difficult. You can't just use FIND_IN_SET as it requires a single string and a comma separated list. As a single query you would need to join the particular user to the film table to get a lot of intermediate rows, and then join that back against the users again (using FIND_IN_SET) to find the other users.
There are ways in SQL to split up a comma separated list of values, but they are messy and anyone who has to maintain such code will hate it!
These are all fudges. With the 2nd solution these easy to do, and any resulting joins can easily use indexes (and possibly the whole queries can just use indexes without touching the actual data).
A further issue with the first solution is data integretity. You will have to manually check that a film doesn't appear twice for a user (with the 2nd solution this can easily be enforced using a unique key). You also cannot just add a foreign key to ensure that any film id for a user does actually exist. Further you will have to manually ensure that nothing enters a character string in your delimited list of ids.
My situations is this... I have a table of opportunities that is sorted. We have a paid service that will allow people to view the opportunities on the website any time. However we want an unpaid view that will show a random %/# of opportunities, which will always be the same. The opportunities are sorted out by dates; e.g. they will expire and be removed from the list, and a new one should be on the free search. However the only problem is that they will always have to show the same opportunity. (For example, I can't just pick random rows because it will cycle through them if they keep refreshing, and likewise can't just take the ones about to expire or furthest form expiry because people still end up seeing the entire list.
My only solution thus far is to add an extra column to the table to mark that it is open display. Then to count them on display, and if we are missing rows then to randomly select a few more. Below is a mock up...
SELECT count(id) as total FROM opportunities WHERE display_status="open" LIMIT 1000;
...
while(total < requiredNumber) {
UPDATE opportunities SET display_status="open" WHERE display_status="private" ORDER BY random() LIMIT (required-total);
}
Can anyone think of a better way to solve this problem, preferably one that does not leave me adding another column to the table, and possible conflicts if many people load the page at a single time. One final note as well, it can't be a random set number of them (e.g. pick one, skip a few, take the next).
Any thought/comments would be very helpful,
Thanks.
One way to make sure that a user only sees the same set of random rows is to feed the random number generator a seed that is linked to that user (such as their user_id). That means every user gets a random ordering of rows but it's always the same random ordering for each user.
Your code would be something:
SELECT ...
FROM ...
WHERE ...
ORDER BY random(<user id>)
LIMIT <however many>
Note: as Twelfth pointed out, as new rows are created, they will get new order values and may end up in your random selection.
I'm the type that doesn't like to lose information...including what random rows someone got to see. However I do not like the modification of your existing table idea...
Create a second table as randon_rows or something to that extent to save the ID's of the user and the ID's of the random records they got to see. Inner join to the table whenever you need to find those same rows again. You can also put expirey dates and the sort in the table as well, so the user isn't perma stuck with the same 10 rows.
I have recently written a survey application that has done it's job and all the data is gathered. Now i have to analyze the data and i'm having some time issues.
I have to find out how many people selected what option and display it all.
I'm using this query, which does do it's job:
SELECT COUNT(*)
FROM survey
WHERE users = ? AND table = ? AND col = ? AND row = ? AND selected = ?
GROUP BY users,table,col,row,selected
As evident by the "?" i'm using MySQLi (in php) to fetch the data when needed, but i fear this is causing it to be so slow.
The table consists of all the elements above (+ an unique ID) and all of them are integers.
To explain some of the fields:
Each survey was divided into 3 or 4 tables (sized from 2x3 to 5x5) with a 1 to 10 happiness grade to select form. (questions are on the right and top of the table, then you answer where the questions intersect)
users - age groups
table, row, col - explained above
selected - dooooh explained above
Now with the surveys complete and around 1 million entries in the table the query is getting very slow. Sometimes it takes like 3 minutes, sometimes (i guess) the time limit expires and you get no data at all. I also don't have access to the full database, just my empty "testing" one since the costumer is kinda paranoid :S (and his server seems to be a bit slow)
Now (after the initial essay) my questions are: I left indexing out intentionally because with a lot of data being written during the survey, it would be a bad idea. But since no new data is coming in at this point, would it make sense to index all the fields of a table? How much sense does it make to index integers that never go above 10? (as you can guess i haven't got a clue about indexes). Do i need the primary unique ID in this table? I
I read somewhere that indexing may help groups but only if you group by the first columns in a table (and since my ID is first and from my point of view useless can i remove it and gain anything by it?)
Is there another way to write my query that would basically do the same thing but in a shorter period of time?
Thanks for all your suggestions in advance!
Add an index on entries that you "GROUP BY" or do "WHERE". So that's ONE index incorporating users,table,col,row and selected in your case.
Some quick rules:
combine fields to have the WHERE first, and the GROUP BY elements last.
If you have other queries that only use part of it (e.g. users,table,col and selected) then leave the missing value (row, in this example) last.
Don't use too many indexes/indeces, as each will slow the table to updates marginally - so on really large system you need to balance queries with indexes.
Edit: do you need the GROUP BY user,col,row as these are used in the WHERE. If the WHERE has already filtered them out, you only need group by "selected".
I'm looking to create an SQL query (in MySQL) that will display 6 random, yet popular entries in my web application.
My database has the following tables:
favorites
submissions
submissions_tags
tags
users
submissions_tags and tags are cross-referencing tables that give each submission a certain number of tags.
submissions contains boolean featured, int downloads, and int views, all three of which I'd like to use to weight this query with.
The favorites table is again a cross-reference table with the fields submission_id and user_id. Counting the number of times each submission has been favorited would be good to weigh the results with.
So basically I want to select 6 random rows weighted with these four variables - featured, downloads, views, and favorite count. Each time the user refreshes the page, I want a new random 6 to be selected. So maybe the query could limit it to 12 most-recent but only pluck 6 random results out to show. Is that a sensible idea in terms of processing etc.?
So my question is, how can I go about writing this query? Where should I begin? I am using PHP/CodeIgniter to drive this site with. Is it possible to get the entire lot in one query, or will I have to use multiple queries to do this? Or, do I need to simplify my ideas?
Thanks,
Jack
I've implemented something similar to this before. The route I took was to have a script run on the server every XX minutes to fill a table with a pool of items (say 20-30 items). Then the query to use in your application would be randomly pick 5 or so from that table.
Just need to setup an algorithm to select those 20-30 items. #Emmerman's is similar to what I used before to calculate a popularity_number where I took weights of multiple associations to the item (views, downloads, etc) to get an overall number. We also used an age to make sure the pool of items stayed up-to-date. You'll have to tinker with the algorithm over time to make sure the relevant items are being populated.
The idea is to calc some popularity which can be for e.g.
popularity = featured*W1 + downloads*W2 + views*W3 + fcount*W4
Where W1-W4 are constant weights.
Then add some random number to popularity and sort for it.