PHP coders suggestions on alternative to WHERE NOT IN query - php

I am trying to build a simple randomised voting system for a site, currently, and I believe wrongly, I have the following setup
as the user goes to the random voting section of the site, he is presented with a votable item that firstly, is an item that was last voted on the longest time agp, Secondly, isn't something the user voted for before and thirdly must be relevant to them based upon their list of relevant subjects. As a side note, if the user skips the vote, it can't show them the same thing again later in the list and all the single votes must be recorded to produce statistics.
Currently, the way I am doing this is by holding a serialized array against their account in the database containing a list of vote item Id numbers that they have previously voted for.
I would like to say at this point that I don't condone this, and inserting a serialized array into the database was silly, and I regret my actions ;) .
Nevertheless, I couldn't figure out another way to do this. at this point, the array is built upon by users voting which adds to the list. but the query would become massive if I was to continue and someone had up to 100 things they could vote for. At this point I am using this query to get the next item in the list:
//$finarr is the received unserialized list of values from the database
$sql = "SELECT * FROM vote_item_headers WHERE Id NOT IN (".$finarr.") ORDER BY LastVoted DESC LIMIT 1"
//execute query and display results
I should also note that at this point, I haven't even bothered adding in the third requirement above because frankly, I didn't even know where to begin or anything when I had to make a query that needed to encompass all of the requirements above.
A bit more information you might find relevant:
I have some other tables which are explained below.
core_users = list of users and their interests
core_interests = a list of all interests
core_language = a list of the different possible languages that an item could fall into
vote_item_headers = a list of all the votable items with a reference in the interests and lang tables to define extra properties of the votable item.
core_votes = the master list of people's votes
I am really sorry if this is too vague for you guys, all I really want is guidance in this instance when dealing with large amounts of information that needs to be combined to get a result.
Any suggestions welcome. I am happy to restructure the entire thing just to get it right.

voted
user_id | Id
----------+------
1 | 12
1 | 14
1 | 187
2 | 23
SELECT * FROM vote_item_headers
WHERE Id NOT IN (
SELECT Id FROM voted WHERE user_id=1234
)
ORDER BY LastVoted DESC LIMIT 1
but you also want relevant posts
relevant_posts
user_id | Id
----------+------
1 | 342
1 | 253
1 | 32
2 | 53
SELECT vote_item_headers.* FROM vote_item_headers
# cut down the amount returned with relevant_posts table
INNER JOIN relevant_posts ON (
vote_item_headers.Id=relevant_posts.Id
AND user_id=1234
)
WHERE vote_item_headers.Id NOT IN (
SELECT Id FROM voted WHERE user_id=1234
)
ORDER BY LastVoted DESC LIMIT 1

Related

How to analyze item's path through system

I'm looking for a little bit of direction for how to analyze a problem. I work for a small manufacturing company. We paint about 150 items per day. Those items then go to Quality Control. About 70% pass QC. The remaining 30% have to be repaired in some way.
We have 5 different repair categories:Repaint, Reclear, Remake, Reglaze, Fix
Every time an order gets QC'd my system inputs some data in a "Repairs" mysql table. If it passes QC, it's given a category of Great. It's structure is like this:
id | Repair | Date
5 | repaint| 2013-01-01
6 | reclear| 2013-01-01
5 | great | 2013-01-02 ...etc
I need to be able to perform analysis on what actions are happening. I'd like to know what 'paths' items are going down.
For example. What percentage of items have these categories Reclear->Repaint->Great. What percentage have Repaint->Repaint->Remake->Great (every item should eventually end with 'Great)
I'm kind of stuck on where to start in figuring out how to analyze this.
Should I be keeping track of the repair number in the table? If I did that then maybe I could use a self join to select orders where repairnum=1 AND repair=Repaint joined with repairnum=2 AND repair='Great' This would tell me which orders went down the path Repaint->Great I'm a little hesitant to go this route because 1) I don't want to have to do a query and get the repairnumber before I insert a new row into the table and 2) It seems like I'd have to have some pretty nasty querys to analyze items that have 5 or 6 (or more) repairs.
Perhaps someone can point me in the right direction?
My app is in php and mysql.
You don't need a separate "repair number", because you have the date when each repair was made, so can order by that (assuming you store time as well if more than one repair can be made in a day).
The "path" for an item is the list of its repairs, in order of date. If you just say SELECT repair FROM repairs WHERE id=5 ORDER BY date ASC you'll get them as rows.
The trick is to turn these into a single value representing the whole path, using GROUP_CONCAT - SELECT GROUP_CONCAT(repair ORDER BY date ASC SEPARATOR '->') FROM repairs WHERE id=5
Once you have that, you can run that for all products in the DB using a GROUP BY, and then look for patterns in it with HAVING:
SELECT
id,
GROUP_CONCAT(repair ORDER BY date ASC SEPARATOR '->') as path
FROM
repairs
GROUP BY
id
HAVING
path = 'Repaint->Repaint->Remake->Great'
Note that I don't have a copy of MySQL to try this out with, so I may have made a mistake, but the manual suggests that the above should work.

MySQL database structure for infinite items per user

I have a MySQL database with a growing number of users and each user has a list of items they want and of items they have - and each user has a specific ID
The current database was created some time ago and it currently has each users with a specific row in a WANT or HAVE table with 50 columns per row with the user id as the primary key and each item WANT or HAVE has a specific id number.
this currently limits the addition of 50 items per user and greatly complicates searches and other functions with the databases
When redoing the database - would it be viable to instead simply create a 2 column WANT and HAVE table with each row having the user ID and the Item ID. That way there is no 'theoretical' limit to items per user.
Each time a member loads the profile page - a list of their want and have items will then be compiled using a simple SELECT WHERE ID = ##### statement from the have or want table
Furthermore i would need to make comparisons of user to user item lists, most common items, user with most items, complete user searches for items that one user wants and the other user has... - blah blah
The amount of users will range from 5000 - 20000
and each user averages about 15 - 20 items
will this be a viable MySQL structure or do i have to rethink my strategy?
Thanks alot for your help!
This will certainly be a viable structure in mysql. It can handle very large amounts of data. When you build it though, make sure that you put proper indexes on the user/item IDs so that the queries will return nice and quick.
This is called a one to many relationship in database terms.
Table1 holds:
userName | ID
Table2 holds:
userID | ItemID
You simply put as many rows into the second table as you want.
In your case, I would probably structure the tables as this:
users
id | userName | otherFieldsAsNeeded
items
userID | itemID | needWantID
This way, you can either have a simple lookup for needWantID - for example 1 for Need, 2 for Want. But later down the track, you can add 3 for wishlist for example.
Edit: just make sure that you aren't storing your item information in table items just store the user relationship to the item. Have all the item information in a table (itemDetails for example) which holds your descriptions, prices and whatever else you want.
I would recommend 2 tables, a Wants table and a Have table. Each table would have a user_id and product_id. I think this is the most normalized and gives you "unlimited" items per user.
Or, you could have one table with a user_id, product_id, and type ('WANT' or 'HAVE'). I would probably go with option 1.
As you mentioned in your question, yes, it would make much more sense to have a separate tables for WANTs and HAVEs. These tables could have an Id column which would relate the row to the user, and a column that actually dictates what the WANT or HAVE item is. This method would allow for much more room to expand.
It should be noted that if you have a lot of of these rows, you may need to increase the capacity of your server in order to maintain quick queries. If you have millions of rows, they will have a great deal of strain on the server (depending on your setup).
What you're theorizing is a very legitimate database structure. For a many to many relationship (which is what you want), the only way I've seen this done is to, like you say, have a relationships table with user_id and item_it as the columns. You could expand on it, but that's the basic idea.
This design is much more flexible and allows for the infinite items per user that you want.
In order to handle wants and have, you could create two tables or you could just use one and have a third column which would hold just one byte, indicating whether the user/item match is a want or a need. Depending on the specifics of your projects, either would be a viable option.
So, what you would end up with is at least the following tables:
Table: users
Cols:
user_id
any other user info
Table: relationships
Cols:
user_id
item_id
type (1 byte/boolean)
Table: items
Cols:
item_id
any other item info
Hope that helps!

Ordering using PHP but some entries can have the same value

I'm working on a horse rating system and I need to assign values to each horse based on the value of another (already filled) field (all this is stored in a MySQL db).
Consider the following simplified example:-
A four horse race where the odds for each horse are as follows:-
Horse A - 2/1
Horse B - 3/1
Horse C - 3/1
Horse D - 5/1
As Horse A has the lowest price, I want to give it a value of 1.
However, Horse B and C have the same price and so I want to give them both 2.
Horse D has the next highest price and so I want to give it the value of 3.
When I first started to do this, I thought it would be easy but it has now reached the stage where the loops are driving me loopy. Any suggestions would be greatly appreciated.
Many thanks in advance.
In view of the response I received below from Daan then I should also add that my problem is further compounded by the fact that my table has several subsets (i.e. it contains more than one race on any given day and they need to be ranked individually).
My table is currently:-
racedate | racetime | racecourse | horsename | forecast | forecast_rate | id
The racedate for the purposes of this will always be the same. The racetime and racecourse together identify the race in question.
forecast is the price given to each horse (this has already been entered at this stage) and this is what needs to have the shared ranking done on it to be stored in forecast_rate.
id is just the unique index for each entry in the table.
This is what I have now got to (and it doesn't work... surprise...)
$testdude=mysql_query("SELECT DISTINCT racecourse,racetime FROM picking") or die(mysql_error());
while($rih=mysql_fetch_array($testdude)){
$testdude1=mysql_query("SELECT s1.forecast, s1.horsename, COUNT(DISTINCT s2.forecast) AS rank FROM picking s1 JOIN picking s2 ON (s1.forecast <= s2.forecast) GROUP BY s1.horsename;");
while($rih1=mysql_fetch_array($testdude1)){
mysql_query("UPDATE picking SET forecast_rate='$testdude1[rank]' where horsename='$testdude1[horsename]'") or die(mysql_error());
}
}
This is called shared ranking and it's easiest to do this in MySQL. Take a look at this tutorial and see whether you can get that to work. If not, please provide more details about your table lay-out, and I'll get you a tailored example :)

querying records that have revisions related to them

I created a commenting system that allow users to submit comments on each item.
It turned into bit of a project/scope creep and now I need to implement the ability for users to edit their original comments and keep track of those comments.
All comments are located in the comments table
comments: id, comment, item_id, timestamp
Now that revisions must be tracked, I created a new table titled revisions:
comment_id, revision_id, timestamp
All comments (new or old) are entered into the comments table, if the user decides to revise an existing comment, it will be entered as a new record in the comments, then recorded into the revisions table. Once the new comment is entered into the comments table, it will take the id that was created and pass it into the revisions.reivison_id, and it will populate revisions.comment_id with the id of the original comment the user revised (hope I didn't lose you).
Now I've come to the problem I need help with: I need to display a list of all comments for a specific item, which would have a query of something like
select * from comments where item_id = 1
Now that I added the revisions table, I need to retrieve a list of comments for the specific item (just like the above query does) and (and heres the kicker) if any comment is revised, I need to return the most recent version of that comment.
What is the best way of accomplishing this?
I thought about running two queries, one to retrieve all the comments in the comments table, store in an array, and another query to return all records within the revisions table where I would set revisions.comment_id to be distinct and would only want to return the more recent one
the revisions query might look something like this
select comment_id DISTINCT, revision_id, timestamp
from revisions order by timestamp desc
What is the best way of only displaying the most recent version of each comment (some will have revisions and most won't)?
I am not a sql expert, so it might be accomplished using sql or will I need to run two different queries, store data into separate arrays, then run thru each array, compare and strip out the older versions of that comment? example (part in theory) below
foreach($revisions as $r):
$comments = strip key/value from comments array where $r['comment_id'] is
found in comments array
endforeach;
return $comments; // return the comments array after it was stripped of the older comments
I imagine if there was a way of running one query to only return a list of the most recent versions of a comment is the best practice, if so, could you provide the appropriate query for that, otherwise is the two queries into two arrays and striping out values from the comments array the best way or a better way?
Thanks in advance.
First off, I'll add two alternative approaches and then I'll edit with a query to deal with your current schema.
Option 1 - Add a deleted flag to your comments. When a comment is revised, do as you already do but also mak the original as deleted. Then you just need WHERE deleted = 0 wher you want active comments.
Option 2 - Change your revision table to be a clone of the comment table, plus an additional field for when the revision was made. Now, whenever you revise a comment, don't create a new record in comment, just update the existign row and add a new row to the revisions table. This is easily maintained with a trigger and is a very standard auditting pattern.
EDIT Option 3 - A query to cope with your schema.
As described, if I make a comment, then edit it twice (with no other activity), I get something like this...
id | comment | item_id | timestamp
----+--------------+---------+-----------
1 | Hello, | 1 | 13:00
2 | World! | 1 | 14:00
3 | Hello, World | 1 | 15:00
comment_id | revision_id | timestamp
-----------+-------------+-----------
1 | 2 | 14:00
2 | 3 | 15:00
Base on this, the live comment is the only one without an entry in the revision table...
SELECT *
FROM comment
WHERE NOT EXISTS (SELECT * FROM revision WHERE comment_id = comment.id)
AND item_id = #item_id

Select records with values up to and over (but only one over) another value?

I have data such as...
ID | Amount
-----------------
1 | 50.00
2 | 40.00
3 | 15.35
4 | 70.50
etc. And I have a value I'm working up to, in this case let's say 100.00. I want to get all records up to 100.00 in order of the ID. And I want to grab one more than that, because I want to fill it up all the way to the value I'm aiming for.
That is to say, I want to get, in this example, records 1, 2, and 3. The first two total up to 90.00, and 3 pushes the total over 100.00. So I want a query to do that for me. Does such a thing exist in MySQL, or am I going to have to resort to PHP array looping?
Edit:
To put it in English terms: Let's say they have $100 in their account. I want to know which of their requests can be paid, either in toto or partially. So I can pay off the $50 and the $40, and part of the $15.35. I don't care, at this point in the program, about the partialness; I only want to find out which quality in any way.
Yes, is possible
set #total:=0;
select * from
(
select *, if(#total>100, 0, 1) as included, #total:=#total+Amount
from your_table
order by id
) as alls
where included=1
order by id;
Refering to the last sentence: doesn't mysql sum cut it?

Categories