Large Dataset Cron Job Suggestions

Large Dataset Cron Job Suggestions - php

I have an interesting situation of calculations that need to be made in a project, and am looking for an efficient way to handle it. Here is the scenario.
We are making a "Polling" website, where users answer Poll questions. They can answer each question once.
We are generating a "Score" for every user based on their answers. They receive 1 point for each other user that answered the same.
For Example:
Question 1 has 2 answers, "Yes" and "No"
7 Users answered "Yes" and 3 answered "No"
Each User that answered "Yes" adds 7 points to their score
Each User that answered "No" adds 3 points to their score
If a 4th User answers "No", 1 extra point is added to each User that answered "No"
As you can imagine, it would be far too many calculations to do this on the fly, since lots of user scores must be regenerated every time a question is answered. So I want to do this as a Cron Job every X hours.
My Data currently returns a single Row for each Question answered by a single user, along with how many points each answer is worth (comma separated: 7,3)
How should I go about regenerating these results? I do not want to use a simple "Foreach" to loop through every User, as this doesn't seem like it will scale as the User base grows. Is there a way to run PHP scripts in the background or concurrently, as to not cause the loop to hang?
Any help or suggestions are greatly appreciated!
EDIT:
Sorry, I should have explained the database a bit too.
This is a WordPress website, so some of the data is in the default WordPress postmeta table. The tally is stored as comma separated meta_key value for the "Post" (Poll question)
All answers are store in their own answers table. Each answer is a row in the table, and it includes user_id, post_id (of poll question), answer chosen(index of comma separated meta_key value)
And this is the query I am using to get all the answers for a particular User:
SELECT * FROM `wp_myo_ip` LEFT JOIN `wp_postmeta` ON `wp_myo_ip`.`myo_polling_id` = `wp_postmeta`.`post_id` AND `wp_postmeta`.`meta_key` = 'myo-votes' WHERE `wp_myo_ip`.`myo_polling_ip` = 1
The myo_polling_ip column is actually the User ID

Based on absolutly no database information given ...
UPDATE answer_tbl
LEFT JOIN (SELECT answer_tbl.id, IF(answer_tbl.answer = 'YES', COUNT(yes_tbl.id), COUNT(no_tbl.id)) AS score
FROM answer_tbl
LEFT JOIN answer_tbl AS yes_tbl ON answer_tbl.question_id = yes_tbl.question_id AND yes_tbl.answer = 'YES'
LEFT JOIN answer_tbl AS no_tbl ON answer_tbl.question_id = no_tbl.question_id AND no_tbl.answer = 'NO'
WHERE 1
GROUP BY answer_tbl.id) AS score_tbl
SET answer_tbl.score AS score_tbl.score
WHERE answer_tbl.id = score_tbl.score

Related

Double selection with RAND() in php

Please I need help with this problem I'm facing. I'm building an examination system and I'm using the Rand() function to select questions from the "question" table. The user's answers are also saved in the "user_answer" table.
Now my problem is a question sometimes gets selected twice or thrice so I need a query that will check that if a question has already been answered in the "user_answer" table, it should reselect another question from the "question" table.

You cannot exclude with Rand() directly.
If you query from a database, you could add something like
select *
from question q
left join user_answer ua on ua.question_id = q.id
where ua.id is null
group by q.id
This will try to connect to an answer (ANY answer, you probably want to add some user selection into that), and only give back the questions where it FAILS (ua.id is null) to do so.
If you cannot do it by query and have it it all in some PHP array, what you could do is keep track of the available question ID's in an array. Each time you pick a question random, you remove that item (value!) from the array, and reindex the array (keeping the values, which are the question ID's, and ordering the keys from 0 to the number of questions - 1).
That way you can do a rand(0, count($questionIds)) again to pick the next one.
Another way would be to use a loop, and continue as long as the picked question is already in the used questions array.

Implementing voting or "likes" using MySQL [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I am creating a system where users (who are identified by a user id number) will be allowed to vote on posts (think Reddit, StackOverflow, etc).
Users can vote a post up or not vote at all on it.
The number of votes on a given post can easily be stored within the table containing the posts.
Keeping track of who has voted, however, is a different task entirely that I'm not sure how to approach.
I was thinking I could have a table that would have two columns: user id and post id.
When they vote on a post, I add their user id and post id to that table. If they unvote, I remove that entry from the table.
EG:
User ID | Post ID
1 | 3949
1 | 4093
2 | 3949
etc...
Is this a reasonable solution?

Yes this is reasonably simple and easy solution to the problem. You can do the same for your comments(if you like to). In your MAIN_POST table assign a post_id and use this same post_id in other tables (comments(post_id, user_id, post_comment, comment_time) and votes(post_id, user_id, vote_status(you can use 1 for vote up and 0 for vote down))). It will complicate your sql queries, to retrieve data, a little but you can do it. And on android side there are alot of tricks to handle and furnish this data in application and you can make this vote(like) and comments idea just like facebook (YOU for your comments and likes and NAMES for others).

I wouldn't remove rows from the table. I understand why you would want to do that, but why lose the information? Instead, keep a +1/-1 value for each entry and then sum up the values for a post:
select sum(vote)
from uservotes
where postid = 1234;
And, I agree with Rick that you should also include the creation date/time.

Using an 'in between' or 'joining' table is a perfectly acceptable solution in this case. If relevant you could even add a timestamp to the relation and show to the user when a user has upvoted something.
Also it is important to take care of proper Indexes and Keys to have your table structure also perform properly once the dataset grows.

MySql -- How to keep a record of used entries?

I have an application (More likely a quiz app) where i have saved all my 1000 quizzes in MySQL database, I want to retrieve a random question from this table when a user request one, I can easily do it using the RAND() function in MySQL.. my problem is , I don't want to give the same question two or more times to a user, how can i keep a record of retrieved questions? Do I have to create tables for each and every users? won't that increase the load time?? please help me, any help would be a big favor ..
-regards

If you want it for a short time, use the user's $_SESSION for that.
If you need the long term ( say tomorrow, not to ask the same questions) - you'll have to create additional table for usersToQuestions, where you'll store the user id and the questions the user had been already asked.
Retrieving a question in both cases would require a simple IN condition:
SELECT * FROM questions
WHERE id not IN ('implode(",", $_SESSION["asked"])')
SELECT * FROM questions
WHERE id not IN (
SELECT question_id FROM questions2users WHERE userid = 123
)

my problem is , I don't want to give the same question two or more times to a user,
how can i keep a record of retrieved questions? Do I have to create tables for each
and every users? won't that increase the load time?
Yes, but possibly not so much.
You keep a single extra table with userId, questionId and insert there the questions already asked to the various users.
When you ask question 123 to user 456, you run a single INSERT
INSERT INTO askedQuestions (userId, questionId) VALUES (456, 123);
Then you extract questions from questions with a LEFT JOIN
SELECT questions.* FROM questions
LEFT JOIN askedQuestions ON (questions.id = askedQuestions.questionId AND askedQuestions.userId = {$_SESSION['userId']} )
WHERE askedQuestions.userId IS NULL
ORDER BY RAND() LIMIT 1;
if you keep askedQuestions indexed on (userId, questionId), joining will be very efficient.
Notes on RAND()
Selecting on a table like this should not done with ORDER BY RAND(), which will retrieve all the rows in the table before outputting one of them. Normally you would choose a questionId at random, and select the question with that questionId, and that would be waaaay faster. But here, you have no guarantee that the question has not been already asked to that user, and the faster query might fail.
When most questions are still free to ask, you can use
WHERE questions.questionId IN ( RAND(N), RAND(N), RAND(N), ... )
AND askedQuestions.userId IS NULL LIMIT 1
where N is the number of questions. Chances are that at least one of the random numbers you extract will still be free. The IN will decrease performances, and you will have to strike a balance with the number of RANDs. When questions are almost all asked, chances of a match decrease, and your query might return nothing even with many RANDs (also because RANDs will start yielding duplicate IDs, in what is known as the Birthday Paradox).
One way to achieve the best of both worlds could be to fix a maximum number of attempts, say, three (or better still, based on the number of questions left over).
For X times you generate (in PHP) a set of Y random ids betweeen 1 and 1000, and try to retrieve (userId, questionId) from askedQuestions. The table is thin and indexed, so this is really fast. If you fail, then the extracted questionId is random and free, and you can run
SELECT * FROM questions WHERE id = {$tuple['questionId']};
which is also very fast. If you succeed X times, i.e., for X times, all Y random questionIds are registered as being already asked, then you run the full query. Most users will be served almost instantly (two very quick queries), and only a few really dedicated users will require more processing. You might want to set some kind of alerting to warn you of users running out of questions.

One solution is to add an ID column in the question table and when you serve it to a user you check that ID with the list of questions that you served the user.
You can use in memory data structure like List to keep track of the questions that are served to a particular user. This way, you only need array of Lists instead of tables to get the job done.

mysql Query - no more entries for same qID and same userID

The following description is a simple example with questions and answers. But the logic of my site is similar.
Lets say tables are:
USERS table: USER_ID, etc
QUESTIONS table: QUESTION_ID, TEXT, CATEGORY, CORRECT_RESPONSE, AVAILABLE
RESPONSES table: QUESTION_ID, USER_ID, RESPONSE_VALUE
PROFILE table: USER_ID, CATEGORY_Questions, YEAR, NUMBER_OF_ANSWERED, Number_OF_CORRECT, POINTS
The questions will be available to be answered by users for few hours. Every question has the same 3 choices for answers YES/NO/DEPENDS.
So I want users to go click on one of them for example and store an entry on RESPONSES table (ok this query is easy) and then not be able to answer the same question again.
Users will be able to edit the question for some time and after this period I want the question to be shown as answered, until the end of the day that I will mark the question as AVAILABLE=NO and it will removed from the unanswered questions... What is the most efficient way to do this?

There are alot of ways to achieve this depending on the context one of these is create a boolean bit column called answered and another column AnswerDate datetime or timestamp then when the user answer a question add the answer time then using php or javascript in handling the update of the flag answered in the table after a period of time that you want has elapsed.

Choosing data pseudo-randomly with even distribution

I'm currently working on a medium-sized web project, and I've ran into a problem.
What I want to do is display a question, together with an image. I have a (global) list of questions, and a (global) list of images, all questions should be asked for all images.
As far as the user can see the question and image should be chosen at random. However the statistics from the answers (question/image-pair) will be used for research purposes. This means that all the question/image-pair must be chosen such that the answers will be distributed evenly across all question, and across all images.
A user should only be able to answer a specific question/image-pair one time.
I am using a mysql database and php. Currently, i have three database tables:
tbl_images (image_id)
tbl_questions (question_id)
tbl_answers (answer_id, image_id, question_id, user_id)
The other columns are not related to this specific problem.
Solution 1:
Track how many times each image/question has been used (add a column in each table). Always choose the image and question that has been asked the least.
Problem:
What I'm actually interested in is distribution among questions for an image and vice versa, not that each question is even globally.
Solution 2:
Add another table, containing all question/image-pairs along with how many times it has been asked. Choose the lowest combination (first row if count column is sorted by ascending order).
Problem:
Does not enforce that the user can only answer a question once. Also does not give the appearance that the choice is random to the user.
Solution 3:
Same as #2, but store question/image/user_id in table.
Problem:
Performance issues (?), a lot of space wasted for each user. There will probably be semi-large amounts of data (thousands of questions/images and atleast hundreds of users).
Solution 4:
Choose a question and image at true random from all available. With a large enough amount of answers they will be distributed evenly.
Problem:
If i add a new question or image they will not get more answers than the others and therefore never catch up. I want an even amount of statistics for all question/image-pairs.
Solution 5:
Weighted random. Choose a number of question/image pairs (say about 10-100) at true random and pick the best (as in, lowest global count) of these that the user has not answered.
Problem:
Does not guarantee that a recently added question or image gets a lot of answers quickly.
Solution #5 is probably the best once I've come up with so far.
Your input is very much appreciated, thank you for your time.

From what I understand of your problem, I would go with #1. However, you do not need a new column. I would create an SQL View instead becuase it sounds like you'll need to report on things like that anyway. A view is basically a cached select, but acts similar to a table. Thus you would create a view for keeping the total of each question answered for each image:
DROP VIEW IF EXISTS "main"."view_image_question_count";
CREATE VIEW "view_image_question_count" AS
SELECT a.image_id, a.question_id, SUM(b.question_id) as "total"
FROM answer AS a
INNER JOIN answer AS b ON a.question_id = b.question_id
GROUP BY a.image_id, a.question_id;
Then, you need a quick and easy way to get the next best image/question combo to ask:
DROP VIEW IF EXISTS "main"."view_next_best_question";
CREATE VIEW "view_next_best_question" AS
SELECT a.*, user_id
FROM view_image_question_count a
JOIN answer USING( image_id, question_id )
JOIN question USING(question_id)
JOIN image USING(image_id)
ORDER BY total ASC;
Now, if you need to report on your image to question performace, you can do so by:
SELECT * FROM view_image_question_count
If you need the next best image+question to ask for a user, you would call:
SELECT * FROM view_next_best_question WHERE user_id != {USERID} LIMIT 1
The != {USERID} part is to prevent getting a question the user has already answered. The LIMIT optimizes to only get one.
Disclaimer: There is probably a lot that could be done to optimize this. I just wanted to post something for thought.
Also, here is the database dump I used for testing. http://pastebin.com/yutyV2GU

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.