MySQL performance with large WHERE IN() clause

MySQL performance with large WHERE IN() clause - php

Let's say we have a table with 4 columns: id (int 11, indexed), title, content, category (varchar 5).
I have a user select a category. Each category can contain up to 999 objects. Using SELECT id FROM table WHERE category = ? I get a list of all objects.
I then have the user select/deselect some of the objects. After which I need to select the content of the remaining selected objects.
Now my question is as follows, should I worry about performance when using SELECT content FROM table WHERE id IN($array)? Would it be better to use SELECT content FROM table WHERE category = ? AND id IN($array). The idea here being I filter it down to 999 objects before performing the IN...
Does this make any sense? Or should I not be using the IN() at all?

It sounds like you always have content showing on the screen?
999 is a long list to put on the screen. Re-think your UI.
When selected/deselected, what happens? Do you gray out the content? If so, that is a UI issue, not a database issue. If you store the subset that is currently "selected", then how/where is that stored? And, do you want to store it after each select/deselect? Or wait until he clicks "Submit"?
In other words, I don't see why this is a database question.
Back to the queries in question:
INDEX(category)
SELECT ... FROM tbl WHERE category = ...; -- This is optimal
PRIMARY KEY(id)
SELECT ... FROM tbl WHERE id IN (...); -- optimal for an arbitrary set
INDEX(category, id)
SELECT ... FROM tbl WHERE category = ... AND id IN (...)
-- use this only if you both parts are needed for filtering
-- not for optimizing

Related

Add second (conditional) result from second table to SQL query

I have two tables in a database. One stores names/details of users with an index ID; the other stores articles they have written, which just keeps the user's ID as a reference (field author). So far so simple. I can easily query a list of articles and include in the query a request for the user's name and status:
SELECT a.name, a.status, s.* FROM articles s, author a WHERE s.author=a.id
The problem comes when I occasionally have a second author credit, referenced in field author2. Up till now I've been doing what I assume is a very inefficient second query when I iterate through the results, just to get the second author's name and status from the table (pseudocode):
while ( fetch a row ) {
if (author2 != 0) {
query("SELECT name, status FROM author WHERE id=author2") }
etc. }
While this worked fine in PHP/MySQL (even if clunky), I'm forced to upgrade to PHP7/PDO and I'd like to get the benefits of unbuffered queries, so this nested query won't work. Obviously one simple solution would be to PDO->fetchALL() the entire results first before iterating all the result rows in a foreach loop and doing these extra queries per row.
But it would be far more efficient to get that second bit of data somehow incorporated into the main query, pulling from the author table using the second ID (author2) as well as the main ID, so that there are name2 and status2 fields added to each row. I just cannot see how to do it...
It should be noted that while the primary author ID field is ALWAYS non-zero, the author2 field will contain zero if there is no second ID, and there is NO author ID 0 in the author table, so any solution would need to handle an author2 ID of 0 by providing null strings or something in those fields, rather than giving an error. (Or far less elegantly, a dummy author ID 0 with null data could be added to the author table, I suppose.)
Can anyone suggest a revised original query that can avoid such secondary queries?

Never use commas in the FROM clause. Always use proper, explicit, standard JOIN syntax.
For your query, use LEFT JOIN:
SELECT s.*, a1.name, a1.status, a2.name, a2.status
FROM articles s LEFT JOIN
author a1
ON s.author = a1.id LEFT JOIN
author a2
ON s.author2 = a2.id

Gordon Linoff's answer looks like what you need.
I would have added this as a comment but it is too long of a message...
I just have a question/comment regarding normalization of the database. Would there ever be an instance when there is an author3? If so then you should probably have an ArticleAuthor table. Since you are rebuilding the code anyway this may be an improvement to consider.
I don't know the names and data types of the information you are storing so this is a primitive example of the structure I would suggest.
Table Article
ArticleID
ArticleData...
Table Author
AuthorID
AuthorName
AuthorStatus
Table ArticleAuthor
ArticleID
AuthorID
If the Status is dependent on the Author Article combination then AuthorStatus would be moved to ArticleAuthor table like this.
Table ArticleAuthor
ArticleID
AuthorID
Status

SELECT FROM 100 tables in 1 database

I am trying to SELECT id, description, title FROM table1, table2, table100
Say I get this working, is it better for me to just combine all my tables in phpmyadmin?
The problem is I have around 100 tables all of different categories of books so I want to keep them seperated in their individual tables.
I am trying to make a search engine that searches all the books in the entire database. All tables have the same column names.
So really all I really am trying to do is search the entire database's tables for an id, description, title. My search works, just I can only search 1 table and every solution online I have found only really works efficiantly with 2 or 3 tables.
Thanks in advance.

The best is to redesign your database, everything into a single table with an additional "category" column.
in the meantime, you can create a view which union the tables with an additional column for the category.

I recommend redesign the model and unifique this 100 tables to 1, and add a new column with category but integer value, not string value. In this way, you can index the category column with the other fields (id, description, title) for speed up the query.
This resolution is more easy for avoid pain later.

I recommend keeping one table A with id, description, title, category and create another table B with categories. Table A has to have a foreign key with table categories. Then create a query to retrieve the books with a specific category.
Example:
SELECT id, description, title, category FROM books WHERE category = "drama"

I think it speaks to the database design itself as mentioned by most here. You've a few options depending on how much time you have on your hands:
(Short Term / Quick Fix) Central table with all your current fields plus category as a flag to differentiate between the current tables you have. So your insert will be something like "INSERT INTO newtable (ID,AssetID,ServiceID,Category) SELECT id, description, title, 'Fiction' FROM table1 ;"
If you tables are incrementally named like table1, table2 upto table100, you could then maybe write a quick php script that will iterate through the insert loop while incrementing on table on each iteration until the last table.
In the long run, you could invest in a json field that will house all your other data excluding keys that pertaining to a single entry

Order by votes - PHP

I have a voting script which pulls out the number of votes per user.
Everything is working, except I need to now display the number of votes per user in order of number of votes. Please see my database structure:
Entries:
UserID, FirstName, LastName, EmailAddress, TelephoneNumber, Image, Status
Voting:
item, vote, nvotes
The item field contains vt_img and then the UserID, so for example: vt_img4 and both vote & nvotes display the number of votes.
Any ideas how I can relate those together and display the users in order of the most voted at the top?
Thanks

You really need to change the structure of the voting table so that you can do a normal join. I would strongly suggest adding either a pure userID column, or at the very least not making it a concat of two other columns. Based on an ID you could then easily do something like this:
select
a.userID,
a.firstName,
b.votes
from
entries a
join voting b
on a.userID=b.userID
order by
b.votes desc
The other option is to consider (if it is a one to one relationship) simply merging the data into one table which would make it even easier again.
At the moment, this really is an XY problem, you are looking for a way to join two tables that aren't meant to be joined. While there are (horrible, ghastly, terrible) ways of doing it, I think the best solution is to do a little extra work and alter your database (we can certainly help with that so you don't lose any data) and then you will be able to both do what you want right now (easily) and all those other things you will want to do in the future (that you don't know about right now) will be oh so much easier.
Edit: It seems like this is a great opportunity to use a Trigger to insert the new row for you. A MySQL trigger is an action that the database will make when a certain predefined action takes place. In this case, you want to insert a new row into a table when you insert a row into your main table. The beauty is that you can use a reference to the data in the original table to do it:
CREATE TRIGGER Entries_Trigger AFTER insert ON Entries
FOR EACH ROW BEGIN
insert into Voting values(new.UserID,0,0);
END;
This will work in the following manner - When a row is inserted into your Entries table, the database will insert the row (creating the auto_increment ID and the like) then instantly call this trigger, which will then use that newly created UserID to insert into the second table (along with some zeroes for votes and nvotes).

Your database is badly designed. It should be:
Voting:
item, user_id, vote, nvotes
Placing the item id and the user id into the same column as a concatenated string with a delimiter is just asking for trouble. This isn't scalable at all. Look up the basics on Normalization.

You could try this:
SELECT *
FROM Entries e
JOIN Voting v ON (CONCAT('vt_img', e.UserID) = v.item)
ORDER BY nvotes DESC
but please notice that this query might be quite slow due to the fact that the join field for Entries table is built at query time.
You should consider changing your database structure so that Voting contains a UserID field in order to do a direct join.

I'm figuring the Entries table is where votes are cast (you're database schema doesn't make much sense to me, seems like you could work it a little better). If the votes are actually on the Votes table and that's connected to a user, then you should have UserID field in that table too. Either way the example will help.
Lets say you add UserID to the Votes table and this is where a user's votes are stored than this would be your query
SELECT Users.id, Votes.*,
SUM(Votes.nvotes) AS user_votes
FROM Users, Votes
WHERE Users.id = Votes.UserID
GROUP BY Votes.UserID
ORDER BY user_votes

USE ORDER BY in your query --
SELECT column_name(s)
FROM table_name
ORDER BY column_name(s) ASC|DESC

I need some advice on storing data in mysql, where one needs to store more than one, let say userids for a single post?

In cases when some one needs to store more than one value in a in a cell, what approach is more desirable and advisable, storing it with delimiters or glue and exploding it into an array later for processing in the server side language of choice, for example.
$returnedFromDB = "159|160|161|162|163|164|165";
$myIdArray = explode("|",$returnedFromDB);
or as a JSON or PHP serialized array, like this.
:6:{i:0;i:1;i:1;i:2;i:2;i:3;i:3;i:4;i:4;i:5;i:5;i:6;}
then later unserialize it into an array and work with it,
OR
have a new row for every new entry like this
postid 12 | showto 2
postid 12 | showto 3
postid 12 | showto 5
postid 12 | showto 6
postid 12 | showto 8
instead of postid 12 | showto "2|3|4|6|8|5|".
OR postid 12 | showto ":6:{i:0;i:2;i:1;i:3;i:2;i:3;i:3;i:4;i:4;i:5;i:5;i:6;}".
Thanks, looking forward to your opinions :D

In cases when some one needs to store more than one value in a in a cell, what approach is more desirable and advisable, storing it with delimiters or glue and exploding it into an array later for processing in the server side language of choice, for example.
Neither. Oh goodness, neither! Edgar F. Codd is rolling in his grave right now.
Storing delimited data in a text field is no better than storing it in a flat file. The data becomes unqueryable. Storing PHP serialized data in a text field is even worse because then only PHP can parse the data.
You want a nice, happy, normalized database.
The thing you're trying to describe is a many-to-many relationship. Each user can maintain one or more posts. Likewise, each post can be maintained by one or more user. Right? Then something like this will work.
CREATE TABLE users (
user_id INTEGER PRIMARY KEY,
...
);
CREATE TABLE posts (
post_id INTEGER PRIMARY KEY,
...
);
CREATE TABLE user_posts (
user_id INTEGER REFERENCES users(user_id),
post_id INTEGER REFERENCES posts(post_id),
UNIQUE KEY(user_id, post_id)
);
-- All posts made by user 22.
SELECT posts.*
FROM posts, user_posts
WHERE user_posts.user_id = 22
AND posts.post_id = user_posts.post_id
-- All users that worked on post 47
SELECT users.*
FROM users, user_posts
WHERE user_posts.post_id = 47
AND users.user_id = user_posts.user_id

Most of the time the recommendation is that many-to-many relationships (such as posts to users) should have a mapping table with 1 row for each post-user combination (in other words, your "new row for every new entry" version).
It's more optimal for things like join queries, and lets you retrieve only the data you need.

You should only serialize data in the DB if the data is never needed to be processed by the DB. For example, you could serialize user ID in the user_id field if you never need to do a query with the user_id field; e.g. never selecting anything based on user.
If these are posts (blog/news/etc. posts?) then I'm pretty confident you'll need to be able to query them by user. Normalizing the user into another table would serve you:
CREATE TABLE posts (post_id, ....);
CREATE TABLE post_users (post_id, user_id, ...);
You can then get the users in a different query, or use group_concat: SELECT post_id, GROUP_CONCAT(user_id) FROM posts JOIN post_users USING (post_id) GROUP BY post_id. When you need to show user name, just join to the users table to get their name in the group concat.

From RDBMS point of view i would 'have a new row for every new entry'
Thats called m:n relationship table.
You can then query the data however you like.
If you need postid 12 | showto ":6:{i:0;i:2;i:1;i:3;i:2;i:3;i:3;i:4;i:4;i:5;i:5;i:6;}". you can do
SELECT postid, CONCAT(':',count(showto),':{i:',GROUP_CONCAT(showto SEPARATOR ';i:'),';}') AS showto
FROM tablename
GROUP BY postid
However if you only need the data in 1 form and not do any other kind of queries on that data then you may aswell store the string.

How do I assign a rotating category to database entries in the order the records come in?

I have a table which gets entries from a website, and as those entries go into the database, they need to be assigned the next category on a list of categories that may be changed at any time.
Because of this reason I can't do something simple like for mapping the first category of 5 to IDs 1, 6, 11, 16.
I've considered reading in the list of currently possibly categories, and checking the value of the last one inserted, and then giving the new record the next category, but I imagine if two requests come in at the same moment, I could potentially assign them both the same category rather then in sequence.
So, my current round of thinking is the following:
lock the tables ( categories and records )
insert the newest row into records
get the newest row's ID
select the row previous to the insertl ( by using order by auto_inc_name desc 0, 1 )
take the previous row's category, and grab the next one from the cat list
update the new inserted row
unlock the table
I'm not 100% sure this will work right, and there's possibly a much easier way to do it, so I'm asking:
A. Will this work as I described in the original problem?
B. Do you have a better/easier way to do this?
Thanks ~

I would do it way simpler... just make a table with one entry, "last_category" (unsigned tinyint not_null). Every time you do an insert just increment that value, and reset as necessary.

I'm not sure I understand your problem, but as I understand it you would like to have something like
category | data
-----------------
0 | lorem
1 | ipsum
.... | ...
4 | dolor
0 | sit
... | ...
How about having a unique auto_increment column, and let category be the MOD 5 of this column?

If you need 100% correct behaviour it sounds like you will need to lock something somewhere so that all your inserts line up properly. You might be able to avoid locking the category table if you use a single SQL statement to insert your data. I'm not sure how MySQL differs but in Oracle I can do this:
insert into my_table (id, col1, col2, category_id)
select :1, :2, :3, :4, c.id -- :1, :2, etc are bind variables. :1 corresponds to the ID.
from
(select
id, -- category id
count(*) over (partition by 1) cnt, -- count of how many categories there are
row_number() over (partition by 1 order by category.id) rn -- row number for current row in result set
from category) c
where c.rn = mod(:1, cnt)
This way in one statement I insert the next record based on the categories that existed at that moment. The insert automatically locks the my_table table until you commit. It grabs the category based on the modulus of the ID. This link shows you how to do a row-number in mysql. I'm not sure if count(*) requires group by in mysql; in oracle it does so I used a partition instead to count the whole result set.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

MySQL performance with large WHERE IN() clause - php

Related

Add second (conditional) result from second table to SQL query

SELECT FROM 100 tables in 1 database

Order by votes - PHP

I need some advice on storing data in mysql, where one needs to store more than one, let say userids for a single post?

How do I assign a rotating category to database entries in the order the records come in?

Categories

Resources