I'm trying to find a way to check if some IDs are already in the DB, if an ID is already in the DB I'd naturally try to avoid processing the row it represents
Right now I'm doing a single query to check for the ID, but I think this is too expensive in time because if I'm checking 20 id's the script is taking up to 30 seconds
I know i can do a simple WHERE id=1 OR id=2 OR id=3 , but I'd like to know of a certain group of IDs which ones are already in the database and which ones are not
I don't know much about transactions but maybe this could be useful or something
any thoughts are highly appreciated!
Depends how you determine the "Group of IDs"
If you can do it with a query, you can likely use a join or exists clause.
for example
SELECT firstname
from people p
where not exists (select 1 from otherpeople op where op.firstname = p.firstname)
This will select all the people who are not in the otherpeople table
If you just have a list of IDs, then use WHERE NOT IN (1,3,4...)
30 seconds for 20 queries on a single value is a long time. Did you create an index on the ID field to speed things up?
Also if you create a unique key on the ID field you can just insert all ID's. The database will throw errors and not insert those those ID's that already exist, but you can ignore those errors.
Related
I'm not very experienced with more advanced MySQL query stuff.. (mostly basic queries, return and parse response..etc)
However.. I am not clear on the correct approach when I need multiple things (responses) from the database.. Is there a way to get these things from the single query? or do I need to do a new query for each time?
Background:
I use PDO to do a SELECT statement
ie:
$getAllVideos_sql = "SELECT * as FROM $tableName WHERE active IS NOT NULL OR active != 'no' ORDER BY topic, speaker_last, title;";
$getAllVideos_stmt = $conn->prepare($getAllVideos_sql);
$getAllVideos_stmt->execute();
$getAllVideos_stmt->setFetchMode(PDO::FETCH_ASSOC);
$results = $getAllVideos_stmt->fetch(PDO::FETCH_ASSOC);
//parse as I see fit
This gives me my 'chunk of data' that I can pick apart and display as I want.
However.. I want to also be able to give some stats (totals)
For the total (distinct) 'topics'.. as well as total count for the 'titles' (should all be unique by default)
Do I need to do another query, prepare, execute, setFetchMode, fetch all over again?
Is this the proper way to do this? Or is there a way to crib off the initial commands that are already in play?
To be clear, I'm not really looking for a query... I'm looking to understand the proper way one does this.. when they need several pieces of data like I do? multiple queries and executions..etc?
Or maybe it can and -should- be done in one snippet? With an adjustment to the query itself to return sub select/queries info?
this isnt the correct syntax, because it only returns 1 record..(but the total topic count seems to be correct, even though I only get 1 record returned)
SELECT *, count(DISTINCT topic)as totalTopics, count(DISTINCT title)as totalTitles FROM $tableName;
Maybe this the more proper approach? Try to include these totals/details in the main query to pick out?
Hope this makes sense.
Thanks
I don't think you're going to get anything very clean that'll do this, however something like this might work:
SELECT * from $Table t
INNER JOIN (
SELECT COUNT(DISTINCT Topic) as TotalTopics FROM $Table
) s ON 1 = 1
INNER JOIN (
SELECT COUNT(DISTINCT Title) as TotalTitles FROM $Table
) f ON 1 = 1
WHERE ( Active IS NOT NULL ) AND Active != 'no'
Especially with web applications, many people are regularly doing counts or other aggregations somewhere along the way. Sometimes if it is a global context such as all topics for all users, having some stored aggregates helps rather than requerying all record counts every time.
Example. If you have a table with a list of "topics", have a column in there for uniqueTitleCount. Then, based on a trigger, when a new title is added to a topic, the count is automatically updated by adding 1. You can pre-populate this column by doing a correlated update to said "topics" table, then once the trigger is set, you can just have that column.
This also works as I see many times that people want "the most recent". If your system has auto-increment IDs in the tables, similarly, have the most recent ID created for a given topic, or even most recent for a given title/document/thread so you don't have to keep doing something like.
select documentID, other_stuff
from sometable
where documentID in ( select max( documentID )
from sometable
where the title = 'something' )
Use where these make sense then your optimization pull-downs get easier to handle. You could even have a counter per document "title" and even a most recent posting date so they can quickly be sorted based on interest, frequency of activity, whatever.
I have a MySQL database that contains over 400,000 rows. For my web based script, I have a page function. One of the steps to determine how many pages there should be is returning the number of rows in the table.
Let's pretend the table name is data.
I'm wondering what is the most efficient method to ONLY return the number of rows in the database.
I could obviously do something like:
$getRows = mysql_query("SELECT id FROM `data`") or die(mysql_error());
$rows = mysql_num_rows($getRows);
So that it only selects the id. But still, that will be selecting 400,000 + ID's worth of data and storing it on the stack (i think?) and seems less efficient as using a method such as finding the table status. I'm just not 100% sure how to use the table status method.
Feedback & opinions would be awesome. Thanks guys!
use count
SELECT count(id) FROM data
See this question for more info on getting counts. Make sure your id has an index in your table.
Now, to find the number of unique rows, you can do
SELECT count(distinct(id)) FROM data
alternatively, if you want to find the highest ID number (if you ID are autoincremental and unique) you can try SELECT max(id) FROM data to return the highest ID number present.
I'd highly recommend this site to learn these basic functions:
http://sqlzoo.net/
400,000 rows is not a lot at all. Keep it simple and just do:
select count(*)
from `data`
I have recently written a survey application that has done it's job and all the data is gathered. Now i have to analyze the data and i'm having some time issues.
I have to find out how many people selected what option and display it all.
I'm using this query, which does do it's job:
SELECT COUNT(*)
FROM survey
WHERE users = ? AND table = ? AND col = ? AND row = ? AND selected = ?
GROUP BY users,table,col,row,selected
As evident by the "?" i'm using MySQLi (in php) to fetch the data when needed, but i fear this is causing it to be so slow.
The table consists of all the elements above (+ an unique ID) and all of them are integers.
To explain some of the fields:
Each survey was divided into 3 or 4 tables (sized from 2x3 to 5x5) with a 1 to 10 happiness grade to select form. (questions are on the right and top of the table, then you answer where the questions intersect)
users - age groups
table, row, col - explained above
selected - dooooh explained above
Now with the surveys complete and around 1 million entries in the table the query is getting very slow. Sometimes it takes like 3 minutes, sometimes (i guess) the time limit expires and you get no data at all. I also don't have access to the full database, just my empty "testing" one since the costumer is kinda paranoid :S (and his server seems to be a bit slow)
Now (after the initial essay) my questions are: I left indexing out intentionally because with a lot of data being written during the survey, it would be a bad idea. But since no new data is coming in at this point, would it make sense to index all the fields of a table? How much sense does it make to index integers that never go above 10? (as you can guess i haven't got a clue about indexes). Do i need the primary unique ID in this table? I
I read somewhere that indexing may help groups but only if you group by the first columns in a table (and since my ID is first and from my point of view useless can i remove it and gain anything by it?)
Is there another way to write my query that would basically do the same thing but in a shorter period of time?
Thanks for all your suggestions in advance!
Add an index on entries that you "GROUP BY" or do "WHERE". So that's ONE index incorporating users,table,col,row and selected in your case.
Some quick rules:
combine fields to have the WHERE first, and the GROUP BY elements last.
If you have other queries that only use part of it (e.g. users,table,col and selected) then leave the missing value (row, in this example) last.
Don't use too many indexes/indeces, as each will slow the table to updates marginally - so on really large system you need to balance queries with indexes.
Edit: do you need the GROUP BY user,col,row as these are used in the WHERE. If the WHERE has already filtered them out, you only need group by "selected".
I have a table in MySQL that I'm accessing from PHP. For example, let's have a table named THINGS:
things.ID - int primary key
things.name - varchar
things.owner_ID - int for joining with another table
My select statement to get what I need might look like:
SELECT * FROM things WHERE owner_ID = 99;
Pretty straightforward. Now, I'd like users to be able to specify a completely arbitrary order for the items returned from this query. The list will be displayed, they can then click an "up" or "down" button next to a row and have it moved up or down the list, or possibly a drag-and-drop operation to move it to anywhere else. I'd like this order to be saved in the database (same or other table). The custom order would be unique for the set of rows for each owner_ID.
I've searched for ways to provide this ordering without luck. I've thought of a few ways to implement this, but help me fill in the final option:
Add an INT column and set it's value to whatever I need to get rows
returned in my order. This presents the problem of scanning
row-by-row to find the insertion point, and possibly needing to
update the preceding/following rows sort column.
Having a "next" and "previous" column, implementing a linked list.
Once I find my place, I'll just have to update max 2 rows to insert
the row. But this requires scanning for the location from row #1.
Some SQL/relational DB trick I'm unaware of...
I'm looking for an answer to #3 because it may be out there, who knows. Plus, I'd like to offload as much as I can on the database.
From what I've read you need a new table containing the ordering of each user, say it's called *user_orderings*.
This table should contain the user ID, the position of the thing and the ID of the thing. The (user_id, thing_id) should be the PK. This way you need to update this table every time but you can get the things for a user in the order he/she wants using ORDER BY on the user_orderings table and joining it with the things table. It should work.
The simplest expression of an ordered list is: 3,1,2,4. We can store this as a string in the parent table; so if our table is photos with the foreign key profile_id, we'd place our photo order in profiles.photo_order. We can then consider this field in our order by clause by utilizing the find_in_set() function. This requires either two queries or a join. I use two queries but the join is more interesting, so here it is:
select photos.photo_id, photos.caption
from photos
join profiles on profiles.profile_id = photos.profile_id
where photos.profile_id = 1
order by find_in_set(photos.photo_id, profiles.photo_order);
Note that you would probably not want to use find_in_set() in a where clause due to performance implications, but in an order by clause, there are few enough results to make this fast.
I've trying to create some stats for my table but it has over 3 million rows so it is really slow.
I'm trying to find the most popular value for column name and also showing how many times it pops up.
I'm using this at the momment but it doesn't work cause its too slow and I just get errors.
$total = mysql_query("SELECT `name`, COUNT(*) as b FROM `people` GROUP BY `name` ORDER BY `b` DESC LIMIT 0,5;")or die(mysql_error());
As you may see I'm trying to get all the names and how many times that name has been used but only show the top 5 to hopefully speed it up.
I would like to be able to then do get the values like
while($row = mysql_fetch_array($result)){
echo $row['name'].': '.$row['b']."\r\n";
}
And it will show things like this;
Bob: 215
Steve: 120
Sophie: 118
RandomGuy: 50
RandomGirl: 50
I don't care much about ordering the names afterwards like RandomGirl and RandomGuy been the wrong way round.
I think I've have provided enough information. :) I would like the names to be case-insensitive if possible though. Bob should be the same as BoB, bOb, BOB and so on.
Thank-you for your time
Paul
Limiting results on the top 5 won't give you a lot of speed-up, you'll gain time in the result retrieval, but in mySQL side the whole table still needs to be parsed (to count).
You will speed-up your count query having index on name column, of course as only the index will be parsed and not the table.
Now if you really want to speed up the result and avoid parsing the name index when you need this result (which will still be quite slow if you really have millions of rows), then the only other solution is computing the stats when inserting, deleting or updating rows on this table. That is using triggers on this table to maintain a statistics table near this one. Then you will really only have a simple select query on this statistics table, with only 5 rows parsed. But you will slow down your inserts, delete and update operations (which are already quite slow, especially if you maintain indexes, so if the stats are important you should study this solution).
Do you have an index on name? It might help.
Since you are doing the counting/grouping and then sorting an index on name doesn't help at all MySql should go through all rows every time, there is no way to optimize this. You need to have a separate stats table like this:
CREATE TABLE name_stats( name VARCHAR(n), cnt INT, UNIQUE( name ), INDEX( cnt ) )
and you should update this table whenever you add a new row to 'people' table like this:
INSERT INTO name_stats VALUES( 'Bob', 1 ) ON DUPLICATE KEY UPDATE cnt = cnt + 1;
Querying this table for the list of top names should give you the results instantaneously.