I have a MySQL database that contains over 400,000 rows. For my web based script, I have a page function. One of the steps to determine how many pages there should be is returning the number of rows in the table.
Let's pretend the table name is data.
I'm wondering what is the most efficient method to ONLY return the number of rows in the database.
I could obviously do something like:
$getRows = mysql_query("SELECT id FROM `data`") or die(mysql_error());
$rows = mysql_num_rows($getRows);
So that it only selects the id. But still, that will be selecting 400,000 + ID's worth of data and storing it on the stack (i think?) and seems less efficient as using a method such as finding the table status. I'm just not 100% sure how to use the table status method.
Feedback & opinions would be awesome. Thanks guys!
use count
SELECT count(id) FROM data
See this question for more info on getting counts. Make sure your id has an index in your table.
Now, to find the number of unique rows, you can do
SELECT count(distinct(id)) FROM data
alternatively, if you want to find the highest ID number (if you ID are autoincremental and unique) you can try SELECT max(id) FROM data to return the highest ID number present.
I'd highly recommend this site to learn these basic functions:
http://sqlzoo.net/
400,000 rows is not a lot at all. Keep it simple and just do:
select count(*)
from `data`
Related
I'm new to sql & php and unsure about how to proceed in this situation:
I created a mysql database with two tables.
One is just a list of users with their data, each having a unique id.
The second one awards certain amounts of points to users, with relevant columns being the user id and the amount of awarded points. This table is supposed to get new entries regularly and there's no limit to how many times a single user can appear in it.
On my php page I now want to display a list of users sorted by their point total.
My first approach was creating a "points_total" column in the user table, intending to run some kind of query that would calculate and update the correct total for each user every time new entries are added to the other table. To retrieve the data I could then use a very simple query and even use sql's sort features.
However, while it's easy to update the total for a specific user with the sum where function, I don't see a way to do that for the whole user table. After all, plain sql doesn't offer the ability to iterate over each row of a table, or am I missing a different way?
I could probably do the update by going over the table in php, but then again, I'm not sure if that is even a good approach in the first place, because in a way storing the point data twice (the total in one table and then the point breakdown with some additional information in a different table) seems redundant.
A different option would be forgoing the extra column, and instead calculating the sums everytime the php page is accessed, then doing the sorting stuff with php. However, I suppose this would be much slower than having the data ready in the database, which could be a problem if the tables have a lot of entries?
I'm a bit lost here so any advice would be appreciated.
To get the total points awarded, you could use a query similar to this:
SELECT
`user_name`,
`user_id`,
SUM(`points`.`points_award`) as `points`,
COUNT(`points`.`points_award`) as `numberOfAwards`
FROM `users`
JOIN `points`
ON `users`.`user_id` = `points`.`user_id`
GROUP BY `users`.`user_id`
ORDER BY `users`.`user_name` // or whatever users column you want.
I have a table in MySQL that I'm accessing from PHP. For example, let's have a table named THINGS:
things.ID - int primary key
things.name - varchar
things.owner_ID - int for joining with another table
My select statement to get what I need might look like:
SELECT * FROM things WHERE owner_ID = 99;
Pretty straightforward. Now, I'd like users to be able to specify a completely arbitrary order for the items returned from this query. The list will be displayed, they can then click an "up" or "down" button next to a row and have it moved up or down the list, or possibly a drag-and-drop operation to move it to anywhere else. I'd like this order to be saved in the database (same or other table). The custom order would be unique for the set of rows for each owner_ID.
I've searched for ways to provide this ordering without luck. I've thought of a few ways to implement this, but help me fill in the final option:
Add an INT column and set it's value to whatever I need to get rows
returned in my order. This presents the problem of scanning
row-by-row to find the insertion point, and possibly needing to
update the preceding/following rows sort column.
Having a "next" and "previous" column, implementing a linked list.
Once I find my place, I'll just have to update max 2 rows to insert
the row. But this requires scanning for the location from row #1.
Some SQL/relational DB trick I'm unaware of...
I'm looking for an answer to #3 because it may be out there, who knows. Plus, I'd like to offload as much as I can on the database.
From what I've read you need a new table containing the ordering of each user, say it's called *user_orderings*.
This table should contain the user ID, the position of the thing and the ID of the thing. The (user_id, thing_id) should be the PK. This way you need to update this table every time but you can get the things for a user in the order he/she wants using ORDER BY on the user_orderings table and joining it with the things table. It should work.
The simplest expression of an ordered list is: 3,1,2,4. We can store this as a string in the parent table; so if our table is photos with the foreign key profile_id, we'd place our photo order in profiles.photo_order. We can then consider this field in our order by clause by utilizing the find_in_set() function. This requires either two queries or a join. I use two queries but the join is more interesting, so here it is:
select photos.photo_id, photos.caption
from photos
join profiles on profiles.profile_id = photos.profile_id
where photos.profile_id = 1
order by find_in_set(photos.photo_id, profiles.photo_order);
Note that you would probably not want to use find_in_set() in a where clause due to performance implications, but in an order by clause, there are few enough results to make this fast.
I am indexing all the columns that I use in my Where / Order by, is there anything else I can do to speed the queries up?
The queries are very simple, like:
SELECT COUNT(*)
FROM TABLE
WHERE user = id
AND other_column = 'something'`
I am using PHP 5, MySQL client version: 4.1.22 and my tables are MyISAM.
Talk to your DBA. Run your local equivalent of showplan. For a query like your sample, I would suspect that a covering index on the columns id and other_column would greatly speed up performance. (I assume user is a variable or niladic function).
A good general rule is the columns in the index should go from left to right in descending order of variance. That is, that column varying most rapidly in value should be the first column in the index and that column varying least rapidly should be the last column in the index. Seems counter intuitive, but there you go. The query optimizer likes narrowing things down as fast as possible.
If all your queries include a user id then you can start with the assumption that userid should be included in each of your indexes, probably as the first field. (Can we assume that the user id is highly selective? i.e. that any single user doesn't have more than several thousand records?)
So your indexes might be:
user + otherfield1
user + otherfield2
etc.
If your user id is really selective, like several dozen records, then just the index on that field should be pretty effective (sub-second return).
What's nice about a "user + otherfield" index is that mysql doesn't even need to look at the data records. The index has a pointer for each record and it can just count the pointers.
Please note I am a beginner to this.
I have two questions:
1) How can I order the results of a query randomly.
example query:
$get_questions = mysql_query("SELECT * FROM item_bank_tb WHERE item_type=1 OR item_type=3 OR item_type=4");
2) The best method to select random rows from a table. So lets say I want to grab 10 rows at random from a table.
Many thanks,
If you don't mind sacrificing complexity on the insert/update/delete operations for speed on the select, you can always add a sequence number and make sure it's maintained on insert/update/delete, then whenever you do a select, simply select on one or more random numbers from within this range. If the "sequence" column is indexed, I think that's about as fast as you'll get.
An alternative is "shuffling". Add a sequence column, insert random values into this column, and whenever you select records, order by the sequence column and update the selected record sequences to new random values. The update should only affect the records you've retrieved, so shouldn't be too costly ... but it may be worth running some tests against your dataset.
This may be a fairly evil thing to say, but I'll say it anyway ... is there ever a need to display 'random' data? if you're trying to display random records, you may be doing something wrong.
Think about Amazon ... do they display random products, or do they display popular ones, and 'things other people bought when they looked at this'. Does SO give you a list of random questions to the right of here, or a list of related ones? Just some food for thought.
SELECT * FROM item_bank_tb WHERE item_type in(1,3,4) order by rand() limit 10
Beware that order by rand() is very slow on large recordset.
EDIT. Take a look at this very interesting article that presents a different approach.
http://explainextended.com/2009/03/01/selecting-random-rows/
I'm trying to find a way to check if some IDs are already in the DB, if an ID is already in the DB I'd naturally try to avoid processing the row it represents
Right now I'm doing a single query to check for the ID, but I think this is too expensive in time because if I'm checking 20 id's the script is taking up to 30 seconds
I know i can do a simple WHERE id=1 OR id=2 OR id=3 , but I'd like to know of a certain group of IDs which ones are already in the database and which ones are not
I don't know much about transactions but maybe this could be useful or something
any thoughts are highly appreciated!
Depends how you determine the "Group of IDs"
If you can do it with a query, you can likely use a join or exists clause.
for example
SELECT firstname
from people p
where not exists (select 1 from otherpeople op where op.firstname = p.firstname)
This will select all the people who are not in the otherpeople table
If you just have a list of IDs, then use WHERE NOT IN (1,3,4...)
30 seconds for 20 queries on a single value is a long time. Did you create an index on the ID field to speed things up?
Also if you create a unique key on the ID field you can just insert all ID's. The database will throw errors and not insert those those ID's that already exist, but you can ignore those errors.