I have a table of names with structure likes this :
id int(11) -Auto Increment PK
name varchar(20)
gender varchar(10)
taken tinyint - bool value
I want to get a random name of a single row where gender is say male and taken is false. How can I do that without slowing down ?
What comes to mind is, SELECT all the rows where gender = male and taken = false. Then use php's rand(1, total_rows) and use the name from that randomly generated record number for the array of results.
Or I can use, but RAND() is going to slow down the process (taken from other questions on stackoverflow)
SELECT * FROM xyz WHERE (`long`='0' AND lat='0') ORDER BY RAND() LIMIT 1
You can take the following approach:
select the id list that meet your criteria, like SELECT id FROM table WHERE name=...
choose an id randomly with php
fetch whole data with that id, like SELECT * FROM table WHERE id=<id>
This approach would maximize the query cache in MySQL. The query in step 3 has a great chance of hitting the same id, in which case query cache can accelerate database access. Further more, if caching like memcached or redis is used in the future, step 3 can also be taken care of by them, without even going to db.
Related
I have a one-to-many relationship of rooms and their occupants:
Room | User
1 | 1
1 | 2
1 | 4
2 | 1
2 | 2
2 | 3
2 | 5
3 | 1
3 | 3
Given a list of users, e.g. 1, 3, what is the most efficient way to determining which room is completely/perfectly filled by them? So in this case, it should return room 3 because, although they are both in room 2, room 2 has other occupants as well, which is not a "perfect" fit.
I can think of several solutions to this, but am not sure about the efficiency. For example, I can do a group concatenate on the user (ordered ascending) grouping by room, which will give me comma separated strings such as "1,2,4", "1,2,3,5" and "1,3". I can then order my input list ascending and look for a perfect match to "1,3".
Or I can do a count of the total number of users in a room AND containing both users 1 and 3. I will then select the room which has the count of users equal to two.
Note I want to most efficient way, or at least a way that scales up to millions of users and rooms. Each room will have around 25 users. Another thing I want to consider is how to pass this list to the database. Should I construct a query by concatenating AND userid = 1 AND userid = 3 AND userid = 5 and so on? Or is there a way to pass the values as an array into a stored procedure?
Any help would be appreciated.
For example, I can do a group concatenate on the user (ordered ascending) grouping by room, which will give me comma separated strings such as "1,2,4", "1,2,3,5" and "1,3". I can then order my input list ascending and look for a perfect match to "1,3".
First, a word of advice, to improve your level of function as a developer. Stop thinking of the data, and of the solution, in terms of CSVs. It limits you to thinking in spreadsheet terms, and prevents you from thinking in Relational Data terms. You do not need to construct strings, and then match strings, when the data is in the database, you can match it there.
Solution
Now then, in Relational data terms, what exactly do you want ? You want the rooms where the count of users that match your argument user list is highest. Is that correct ? If so, the code is simple.
You haven't given the tables. I will assume room, user, room_user, with deadly ids on the first two, and a composite key on the third. I can give you the SQL solution, you will have to work out how to do it in the non-SQL.
Another thing I want to consider is how to pass this list to the database. Should I construct a query by concatenating AND userid = 1 AND userid = 3 AND userid = 5 and so on? Or is there a way to pass the values as an array into a stored procedure?
To pass the list to the stored proc, because it needs a single calling parm, the length of which is variable, you have to create a CSV list of users. Let's call that parm #user_list. (Note, that is not contemplating the data, that is passing a list to a proc in a single parm, because you can't pass an unknown number of identified users to a proc otherwise.)
Since you constructed the #user_list on the client, you may as well compute #user_count (the number of members in the list) while you are at it, on the client, and pass that to the proc.
Something like:
CREATE PROC room_user_match_sp (
#user_list CHAR(255),
#user_count INT
...
)
AS
-- validate parms, etc
...
SELECT room_id,
match_count,
match_count / #user_count * 100 AS match_pct
FROM (
SELECT room_id,
COUNT(user_id) AS match_count -- no of users matched
FROM room_user
WHERE user_id IN ( #user_list )
GROUP BY room_id -- get one row per room
) AS match_room -- has any matched users
WHERE match_count = MAX( match_count ) -- remove this while testing
It is not clear, if you want full matches only. In that case, use:
WHERE match_count = #user_count
Expectation
You have asked for a proc-based solution, so I have given that. Yes, it is the fastest. But keep in mind that for this kind of requirement and solution, you could construct the SQL string on the client, and execute it on the "server" in the usual manner, without using a proc. The proc is faster here only because the code is compiled and that step is removed, as opposed to that step being performed every time the client calls the "server" with the SQL string.
The point I am making here is, with the data in a reasonably Relational form, you can obtain the result you are seeking using a single SELECT statement, you don't have to mess around with work tables or temp tables or intermediate steps, which requires a proc. Here, the proc is not required, you are implementing a proc for performance reasons.
I make this point because it is clear from your question that your expectation of the solution is "gee, I can't get the result directly, I have work with the data first, I am ready and willing to do that". Such intermediate work steps are required only when the data is not Relational.
Maybe not the most efficient SQL, but something like:
SELECT x.room_id,
SUM(x.occupants) AS occupants,
SUM(x.selectees) AS selectees,
SUM(x.selectees) / SUM(x.occupants) as percentage
FROM ( SELECT room_id,
COUNT(user_id) AS occupants,
NULL AS selectees
FROM Rooms
GROUP BY room_id
UNION
SELECT room_id,
NULL AS occupants,
COUNT(user_id) AS selectees
FROM Rooms
WHERE user_id IN (1,3)
GROUP BY room_id
) x
GROUP BY x.room_id
ORDER BY percentage DESC
will give you a list of rooms ordered by the "best fit" percentage
ie. it works out a percentage of fulfilment based on the number of people in the room, and the number of people from your set who are in the room
I have a table fs_city, with 3 millions cities all around the world, as well as a table fs_country.
When the user visits the website it detects its country code, the user is required to select their country and city from a box (which looks for cities in fs_city on key press), if the user is from USA, and types "Ne", he will get a drop down list with "New York" for i.e. I do this to create an autocomplete input.
The problem is that, the query is sent upon every key press, in my example "Ne", there are two queries to the table fs_city.
Also, even if there's only one query, it takes 6 seconds to return a response from that table... My table has primary keys.
This is my SQL query:
SELECT
ci.city_id,
ci.country_code,
ci.city,
ci.region,
co.country_name
FROM
fs_city as ci,
fs_country as co
WHERE
ci.city LIKE "Ne%"
AND co.country = ci.country_code
AND ci.country_code = :country
ORDER BY
ci.city ASC
LIMIT 0,20
How can I create an autocomplete feature (with keypress), and how to speed up fs_city table queries?
Updated :
fs_city primary key : city_id
fs_country primary key : country_id
Engine : InnoDB
jQuery has nice widget with exactly what you are looking for and I'm pretty sure you can limit it to only start query the database after a set amount of characters so you can limit your results. That should solve your speed problem and shorten your javascript. http://jqueryui.com/autocomplete/
EDIT MySQL also has a LIMIT term to cap the return, which might also help.
I'm trying to get 4 random results from a table that holds approx 7 million records. Additionally, I also want to get 4 random records from the same table that are filtered by category.
Now, as you would imagine doing random sorting on a table this large causes the queries to take a few seconds, which is not ideal.
One other method I thought of for the non-filtered result set would be to just get PHP to select some random numbers between 1 - 7,000,000 or so and then do an IN(...) with the query to only grab those rows - and yes, I know that this method has a caveat in that you may get less than 4 if a record with that id no longer exists.
However, the above method obviously will not work with the category filtering as PHP doesn't know which record numbers belong to which category and hence cannot select the record numbers to select from.
Are there any better ways I can do this? Only way I can think of would be to store the record id's for each category in another table and then select random results from that and then select only those record ID's from the main table in a secondary query; but I'm sure there is a better way!?
You could of course use the RAND() function on a query using a LIMIT and WHERE (for the category). That however as you pointed out, entails a scan of the database which takes time, especially in your case due to the volume of data.
Your other alternative, again as you pointed out, to store id/category_id in another table might prove a bit faster but again there has to be a LIMIT and WHERE on that table which will also contain the same amount of records as the master table.
A different approach (if applicable) would be to have a table per category and store in that the IDs. If your categories are fixed or do not change that often, then you should be able to use that approach. In that case you will effectively remove the WHERE from the clause and getting a RAND() with a LIMIT on each category table would be faster since each category table will contain a subset of records from your main table.
Some other alternatives would be to use a key/value pair database just for that operation. MongoDb or Google AppEngine can help with that and are really fast.
You could also go towards the approach of a Master/Slave in your MySQL. The slave replicates content in real time but when you need to perform the expensive query you query the slave instead of the master, thus passing the load to a different machine.
Finally you could go with Sphinx which is a lot easier to install and maintain. You can then treat each of those category queries as a document search and let Sphinx randomize the results. This way you offset this expensive operation to a different layer and let MySQL continue with other operations.
Just some issues to consider.
Working off your random number approach
Get the max id in the database.
Create a temp table to store your matches.
Loop n times doing the following
Generate a random number between 1 and maxId
Get the first record with a record Id greater than the random number and insert it into your temp table
Your temp table now contains your random results.
Or you could dynamically generate sql with a union to do the query in one step.
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
UNION
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
UNION
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
UNION
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
Note: my sql may not be valid, as I'm not a mySql guy, but the theory should be sound
First you need to get number of rows ... something like this
select count(1) from tbl where category = ?
then select a random number
$offset = rand(1,$rowsNum);
and select a row with offset
select * FROM tbl LIMIT $offset, 1
in this way you avoid missing ids. The only problem is you need to run second query several times. Union may help in this case.
For MySQl you can use
RAND()
SELECT column FROM table
ORDER BY RAND()
LIMIT 4
How can I number my results where the lowest ID is #1 and the highest ID is the #numberOfResults
Example: If I have a table with only 3 rows in it. whose ID's are 24, 87, 112 it would pull like this:
ID 24 87 112
Num 1 2 3
The reason why I want this, is my manager wants items to be numbered like item1, item2, etc. I initially made it so it used the ID but he saw them like item24, item87, item112. He didn't like that at all and wants them to be like item1, item2, item3. I personally think this is going to lead to problems because if you are deleting and adding items, then item2 will not always refer to the same thing and may cause confusion for the users. So if anyone has a better idea I would like to hear it.
Thanks.
I agree with the comments about not using a numbering scheme like this if the numbers are going to be used for anything other than a simple ordered display of items with numbers. If the numbers are actually going to be tied to something, then this is a really bad idea!
Use a variable, and increment it in the SELECT statement:
SELECT
id,
(#row:=#row+1) AS row
FROM table,
(SELECT #row:=0) AS row_count;
Example:
CREATE TABLE `table1` (
`id` int(11) NOT NULL auto_increment,
PRIMARY KEY (`id`)
) ENGINE=InnoDB
INSERT INTO table1 VALUES (24), (87), (112);
SELECT
id,
(#row:=#row+1) AS row
FROM table1,
(SELECT #row:=0) AS row_count;
+-----+------+
| id | row |
+-----+------+
| 24 | 1 |
| 87 | 2 |
| 112 | 3 |
+-----+------+
How it works
#row is a user defined variable. It is necessary to set it to zero before the main SELECT statement runs. This can be done like this:
SELECT #row:=0;
or like this:
SET #row:=0
But it is handy to tie the two statements together. This can be done by creating a derived table, which is what happens here:
FROM table,
(SELECT #row:=0) AS row_count;
The the second SELECT actually gets run first. Once that's done, it's just a case of incrementing the value of #row for every row retrieved:
#row:=#row+1
The #row value is incremented every time a row is retrieved. It will always generate a sequential list of numbers, no matter what order the rows are accessed. So it's handy for some things, and dangerous for other things...
Sounds like it would be better just making that number in your code instead of trying to come up with some sort of convoluted way of doing it using SQL. When looping through your elements, just maintain the sequentiality there.
What is the ID being used for?
If it's only for quick and easy reference then that's fine, but if it's to be used for deleting or managing in any way as you mentioned then your only option would be to assign a new ID column that is unique for each row in the table. Doing this is pointless though because that duplicates the purpose of your initial ID column.
My company had a similar challenge on a CMS system that used an order field to sort the articles on the front page of the site. The users wanted a "promote, demote" icon that they could click that would move an article up or down.
Again, not ideal, but the strategy we used was to build a promote function and accompanying demote function that identified the current sort value via query, added or subtracted one from the previous or next value, respectively, then set the value of the initially promoted/demoted item. It was also vital to engineer the record insert to accurately set the initial value of newly added records so inserts wouldn't cause a duplicate value to be added. This was also enforced at the DB level for safety's sake. The user was never allowed to directly key in the value of the sort, only promote or demote via icons. To be honest, it worked quite well for the user.
If you have to go this route.....it's not impossible. But there is brain damage involved....
Let's say that I've got a table, like that (id is auto-increment):
id | col1 | col2
1 | 'msg'| 'msg'
2 | 'lol'| 'lol2'
3 | 'xxx'| 'x'
Now, I want to delete row number 2 and I get something like this
id | col1 | col2
1 | 'msg'| 'msg'
3 | 'xxx'| 'x'
The thing is, what I want to get is that:
id | col1 | col2
1 | 'msg'| 'msg'
2 | 'xxx'| 'x'
How can I do that in the EASIEST way (my knowledge about MySQL is very poor)?
You shouldn't do that.
Do not take an auto-incremented unique identifier as an ordinal number.
The word "unique" means that the identifier should be stuck to its row forever.
There is no connection between these numbers and enumerating.
Imagine you want to select records in alphabetical order. Where would your precious numbers go?
A database is not like an ordered list, as you probably think. It is not a flat file with rows stored in a predefined order. It has totally different ideology. Rows in the database do not have any order. And will be ordered only at select time, if it was explicitly set by ORDER BY clause.
Also, a database is supposed to do a search for you. So you can tell that with filtered rows or different ordering this auto-increment number will have absolutely nothing to do with the real rows positions.
If you want to enumerate the output - it's a presentation layer's job. Just add a counter on the PHP side.
And again: these numbers supposed to identify a certain record. If you change this number, you'd never find your record again.
Take this very site for example. Stack Overflow identifies its questions with such a number:
stackoverflow.com/questions/3132439/mysql-auto-decrementing-value
So, imagine you saved this page address to a bookmark. Now Jeff comes along and renumbers the whole database. You press your bookmark and land on the different question. Whole site would become a terrible mess.
Remember: Renumbering unique identifiers is evil!
I think there is no way to this directly. Maybe you can do "update" operation. But you must do it for all record after your deleted record. It is very bad solution for this.
Why using an auto-increment if you want to change it manually?
It is not good practice to change the value of an auto_increment column. However, if you are sure you want to, the following should help.
If you are only deleting a single record at a time, you could use a transaction:
START TRANSACTION;
DELETE FROM table1 WHERE id = 2;
UPDATE table1 SET id = id - 1 WHERE id > 2;
COMMIT;
However if you delete multiple records, you will have to drop the column and re-add it. It is probably not guaranteed to put the rows in the same order as previously.
ALTER TABLE table1 DROP id;
ALTER TABLE table1 ADD id INTEGER NOT NULL AUTO_INCREMENT;
Also, if you have data that relies on these IDs, you will need to make sure it is updated.
You can renumber the whole table like this:
SET #r := 0;
UPDATE mytable
SET id = (#r := #r + 1)
ORDER BY
id;