I have currently created a facebook like page that pulls notifications from different tables, lets say about 8 tables. Each table has a different structure with different columns, so the first thing that comes to mind is that I'll have a global table, like a table of contents, and refresh it with every new hit. I know inserts are resource intensive, but I was hoping that since it is a static table, I'd only add maybe one new record every 100 visitors, so I thought "MAYBE" I could get away with this, but I was wrong. I managed to get deadlocks from just three people hammering the website.
So anyways, now I have to redo it using a different method. Initially I was going to do views, but I have an issue with views. The selected table will have to contain the id of a user. Here is an example of a select statement from php:
$get_events = "
SELECT id, " . $userId . ", 'admin_events', 0, event_start_time
FROM admin_events
WHERE CURDATE() < event_start_time AND
NOT EXISTS(SELECT id
FROM admin_event_registrations
WHERE user_id = " . $userId . " AND admin_events.id = event_id) AND
NOT EXISTS(SELECT id
FROM admin_event_declines
WHERE user_id = " . $userId . " AND admin_events.id = event_id) AND
event_capacity > (SELECT COUNT(*) FROM admin_event_registrations WHERE event_id = admin_events.id)
LIMIT 1
Sorry about the messiness. In any event, as you can see, I need to return the user Id from the page as a selected column from the table. I could not figure out how to do it with views so I don't think views are the way that I will be heading because there's a lot more of these types of queries. I come from an MSSQL background, and I love stored procedures, so if there are stored procedures for MYSQL, that would be excellent.
Next I started thinking about temp tables. The table will be in memory, the table will be probably 150 rows max, and there will be no deadlocks. Is it still very expensive to do inserts on a temp table? Will I end up crashing the server? Right now we have maybe 100 users per day, but I want to try to be future proof when we get more users.
After a long thought, I figured that the only way is the user php and get all the results as an array. The problem is that I'd get something like:
$my_array[0]["date_created"] = <current_date>
The problem with the above is that I have to sort by date_created, but this is a multi dimensional array.
Anyways, to pull 150 to 200 MAX records from a database, which approach would you take? Temp Table, View, or php?
Some thoughts:
Temp Tables:
temporary tables will only last as long as the session is alive. If you run the code in a PHP script, the temporary table will be destroyed automatically when the script finishes executing.
Views:
These are mainly for hiding complexity in that you create it with a join and then access it like a single table. The underlining code is a SELECT statement.
PHP Array:
A bit more cumbersome than SQL to get data from. However, PHP does have some functions to make life easier but no real query language.
Stored Procedures:
There are stored procedures in MySQL - see: http://dev.mysql.com/doc/refman/5.0/en/stored-routines-syntax.html
My Recommendation:
First, re-write your query using the MySQL Query Analyzer: http://www.mysql.com/products/enterprise/query.html
Now I would use PDO to put my values into an array using PHP. This will still leaves the initial heavy lifting to the DB Engine and keeps you from making multiple calls to the DB Server.
Try this:
SELECT id, " . $userId . ", 'admin_events', 0, event_start_time
FROM admin_events AS ae
LEFT JOIN admin_event_registrations AS aer
ON ae.id = aer.event_id
LEFT JOIN admin_event_declines AS aed
ON ae.id = aed.event_id
WHERE aed.user_id = ". $userid ."
AND aer.user_id = ". $userid ."
AND aed.id IS NULL
AND aer.id IS NULL
AND CURDATE() < ae.event_start_time
AND ae.event_capacity > (
SELECT SUM(IF(aer2.event_id IS NOT NULL, 1, 0))
FROM admin_event_registrations aer2
JOIN admin_events AS ae2
ON aer2.event_id = ae2.id
WHERE aer2.user_id = ". $userid .")
LIMIT 1
It still has a subquery, but you will find that it is much faster than the other options given. MySQL can join tables easily (they should all be of the same table type though). Also, the last count statement won't respond the way you want it to with null results unless you handle null values. This can all be done in a flash, and with the join statements it should reduce your overall query time significantly.
The problem is that you are using correlated subqueries. I imagine that your query takes a little while to run if it's not in the query cache? That's what would be causing your table to lock and causing contention.
Switching the table type to InnoDB would help, but your core problem is your query.
150 to 200 records is a very amount. MySQL does support stored procedures, but this isn't something you would need it for. Inserts are not resource intensive, but a lot of them at once, or in sequence (use bulk insert syntax) can cause issues.
Related
I have looked around allot and tried different methods and wanted to improve my import mechanic for big data. Importing data on insert works great, however I hit an issue when I want to update existing data based on 2 where statements.
I first load the data from source and place it in a CSV file, than use LOAD DATA LOCAL INFILE, to import the data in a temp table.
Than insert as followed from the temp table to the main table, which works as expected. Fast and uses a low amount of server resources.
INSERT INTO $table ($fields) SELECT $fields FROM $temptable WHERE (ua,gm_id) NOT IN (SELECT ua,gm_id FROM $table)
I than have the following to update the records, the reason I created this method is because the update on duplicate key did not work. As it always inserted a new record. I think I don't understand how this method worked, or have not used it in the right way. Both UA and GM_ID are indexes on both tables, but can't get that to work. The issue with the below script is that, if I update 8000 rows, it uses 200% CPU and takes over 5 to 8 minutes. Which is of course not great.
$query = "UPDATE $table a INNER JOIN $temptable b ON a.gm_id=b.gm_id AND a.ua=b.ua SET ";
foreach($update_columns as $column => $status){
$query .= "a.$column=b.$column,";
}
$query = trim($query, ",");
$result = $pdo->query($query);
Can someone point me in the right direction what I should be using.
I want to update certain columns from the temp table to the main table. This code executes allot of times during the day. Sometimes can update just 100 rows, but sometimes 8k or 60k rows, and the columns can change.
I hope the sample codes are clear.
Thanks in advance for assistance.
"Both UA and GM_ID are indexes on both tables" -- Two separate indexes is the wrong approach. You must have a "composite" UNIQUE(UA, GM_ID) (in either order). If that pair is not unique, then you cannot use IODKU.
WHERE .. NOT IN ( SELECT ... ) is very inefficient. WHERE ... NOT EXISTS ( SELECT ... ) is better; LEFT JOIN ... WHERE .. IS NULL is even better. See "SQL #1" in http://mysql.rjweb.org/doc.php/staging_table#normalization
Read the rest of that blog for more tips on high speed ingestion.
i am running queries on a table that has thousands of rows:
$sql="select * from call_history where extension_number = '0536*002' and flow = 'in' and DATE(initiated) = '".date("Y-m-d")."' ";
and its taking forever to return results.
The SQL itself is
select *
from call_history
where extension_number = '0536*002'
and flow = 'in'
and DATE(initiated) = 'dateFromYourPHPcode'
is there any way to make it run faster? should i put the where DATE(initiated) = '".date("Y-m-d")."' before the extension_number where clause?
or should i select all rows where DATE(initiated) = '".date("Y-m-d")."' and put that in a while loop then run all my other queries (where extension_number = ...) whthin the while loop?
Here are some suggestions:
1) Replace SELECT * by the only fields you want.
2) Add indexing on the table fields you want as output.
3) Avoid running queries in loops. This causes multiple requests to SQL server.
4) Fetch all the data at once.
5) Apply LIMIT tag as and when required. Don't select all the records.
6) Fire two different queries: one for counting total number of records and other for fetching number of records per page (e.g. 10, 20, 50, etc...)
7) If applicable, create Database Views and get data from them instead of tables.
Thanks
The order of clauses under WHERE is irrelevant to optimization.
Pro-tip, also suggested by somebody else: Never use SELECT * in a query in a program
unless you have a good reason to do so. "I don't feel like writing out the names of the columns I need" isn't a good reason. Always enumerate the columns you need. MySQL and other database systems can often optimize things in surprising ways when the list of data columns you need is available.
Your query contains this selection criterion.
AND DATE(initiated) = 'dateFromYourPHPcode'
Notice that this search criterion takes the form
FUNCTION(column) = value
This form of search defeats the use of any index on that column. Your initiated column has a TIMESTAMP data type. Try this instead:
AND initiated >= 'dateFromYourPHPcode'
AND initiated < 'dateFromYourPHPcode' + INTERVAL 1 DAY
This will find all the initiated items in the particular day. And, because it doesn't use a function on the column value it can use an index range scan to do that, which performs well. It may, or may not, also help without an index. It's worth a try.
I suspect your ideal index for this particular search would created by
ALTER TABLE call_history
ADD INDEX flowExtInit (flow, extension_number, initiated)
You should ask the administrator of the database to add this index if your query needs good performance.
You should add index to your table. This way MySql will fetch faster. I have not tested but command should be like this:
ALTER TABLE `call_history ` ADD INDEX `callhistory` (`extension_number`,`flow`,`extension_number`,`DATE(initiated)`);
I am running a select * from table order by date desc query using php on a mysql db server, where the table has a lot of records, which slows down the response time.
So, is there any way to speed up the response. If indexing is the answer, what all columns should I make indexes.
An index speeds up searching when you have a WHERE clause or do a JOIN with fields you have indexed. In your case you don't do that: You select all entries in the table. So using an index won't help you.
Are you sure you need all of the data in that table? When you later filter, search or aggregate this data in PHP, you should look into ways to do that in SQL so that the database sends less data to PHP.
you need to use caching system.
the best i know Memcache It's really great to speed up your application and it's not using database at all.
Simple answer: you can't speed anything up using software.
Reason: you're selecting entire contents of a table and you said it's a large table.
What you could do is cache the data, but not using Memcache because it's got a limit on how much data it can cache (1 MB per key), so if your data exceeds that - good luck using Memcache to cache a huge result set without coming up with an efficient scheme of maintaining keys and values.
Indexing won't help because you haven't got a WHERE clause, what could happen is that you can speed up the order by clause slightly. Use EXPLAIN EXTENDED before your query to see how much time is being spent in transmitting the data over the network and how much time is being spent in retrieving and sorting the data from the query.
If your application requires a lot of data in order for it to work, then you have these options:
Get a better server that can push the data faster
Redesign your application because if it requires so much data in order to run, it might not be designed with efficiency in mind
Optimizing Query is a big topic and beyond the scope this question
here are some highlight that will boost you select statement
Use proper Index
Limit the number records
use the column name that you require (instead writing select * from table use select col1, col2 from table)
to limit query for large offset is little tricky in mysql
this select statement for large offset will be slow because it have to process large set of data
SELECT * FROM table order by whatever LIMIT m, n;
to optimize this query here is simple solution
select A.* from table A
inner join (select id from table order by whatever limit m, n) B
on A.id = B.id
order by A.whatever
I'm making a micro-blogging website. The users can follow each other. I've to make stream of posts (activity stream) for the current user ( $userid ) based on the users the current user is following, like in Twitter. I know two ways of implementing this. Which one is better?
Tables:
Table: posts
Columns: PostID, AuthorID, TimeStamp, Content
Table: follow
Columns: poster, follower
The first way, by joining these two tables:
select `posts`.* from `posts`,`follow` where `follow`.`follower`='$userid' and
`posts`.`AuthorID`=`follow`.`poster` order by `posts`.`postid` desc
The second way is by making an array of users the $userid is following (posters), then doing php implode on this array, and then doing where in:
One thing I'll like to tell here that I'm storing the the number of users a user is following in the `following` record of the `user` table, so here I'll use this number as a limit when extracting the list of posters - the 'followingList':
function followingList($userid){
$listArray=array();
$limit="select `following` from `users` where `userid`='$userid' limit 1";
$limit=mysql_query($limit);
$limit=mysql_fetch_row($limit);
$limit= (int) $limit[0];
$sql="select `poster` from `follow` where `follower`='$userid' limit $limit";
$result=mysql_query($sql);
while($data = mysql_fetch_row($result)){
$listArray[] = $data[0];
}
$posters=implode("','",$listArray);
return $posters;
}
Now I've a comma separated list of user IDs the current $userid is following.And now selecting the posts to make the activity stream:
$posters=followingList($userid);
$sql = "select * from `posts` where (`AuthorID` in ('$posters'))
order by `postid` desc";
Which of the two methods is better?
And can knowing the total number of following (number of users the current user is following), make things faster in the first method as it's doing in the second method?
Any other better method?
You should go all the way with the first option. Always try as much as possible to process the data on the mysql server instead of in your PHP code. PHP will not implicitly cache the results of the operations while MySQL will do it.
The most important thing is to make sure you index your data correctly. Try using "EXPLAIN" statements to make sure you have optimized your database as much as possible and use #1 to link your data together.
http://dev.mysql.com/doc/refman/5.0/en/explain.html
This will allow you later to compute statistics also, while the second method requires you to process a part of the statistics.
The first important point is that PHP is good at building pages but very bad are managing data, everything manipulated by PHP will fill the memory and no special behavior can be applied in PHP to prevent using to much memory, except crashing.
On the other side the datatase job is to analyse relation between the tables, real number used by the query (cardinality of indexes and statictics on rows and index usage in fact), and a lot of different mechanism can be choosen by the engine depending on the size of data (merge joins, temporary tables, etc). That means you could have 256.278.242 posts and 145.268 users, with 5.684 average followers the datatabase job would be to find the fastest way to give you an answer. Well, when you hit really big numbers you'll see that all databases are not equal, but that's another problem.
On the PHP side Retrieving the list of users from the fisrt query coudl became very long (with a big number of followed users, let's say 15.000. Simply building the query string with 15 000 identifiers inside would take a quite big amount a memory. Trasnferring this new query to the SQL server would also be slow. It's definitively the wrong way.
Now be careful of the way you build your SQL request. A request is something you should be able to read from the top to the end, explaining what you really want. This will help the SQL (good) engine in choosing the right solution.
select `posts`.*
from `posts`
INNER JOIN `follow` ON posts`.`AuthorID`=`follow`.`poster`
where `follow`.`follower`='#userid'
order by `posts`.`postid` desc
LIMIT 15
Several remarks:
I have used an INNER JOIN.I want an INNER JOIN, let's write it, it will be easier to read for me later and it should be the same for the query analyser.
if #userid is an int do not use quotes. Please use ints for identifiers (this is really faster than strings). And on the PHP side cast the int "SELECT ..." . (int) $user_id ." ORDER ... or use query with parameters (This is for security).
I have used a LIMIT 15, maybe an offset could be used as well, if you want to show some pagination control around the posts. Let's say this query will retrieve 15.263 documents from my 5.642 folowwed users, you do not want, and the user do not want, to show theses 15.263 documents on a web page. And knowing with $limit that the number is 15.263 is a good thing but certainly not for a request limit. You know this number, but the database may know it as well if it has a good query analyser and some good internal statistics.
The request limit has several goals
1. Limit the size of data transfered from the database to your PHP script
2. Limit the memory usage of your PHP script (an array with 15.263 documents containg some HTMl stuff... ouch)
3. Limit the size of the final user output (and get a faster response)
I've trying to create some stats for my table but it has over 3 million rows so it is really slow.
I'm trying to find the most popular value for column name and also showing how many times it pops up.
I'm using this at the momment but it doesn't work cause its too slow and I just get errors.
$total = mysql_query("SELECT `name`, COUNT(*) as b FROM `people` GROUP BY `name` ORDER BY `b` DESC LIMIT 0,5;")or die(mysql_error());
As you may see I'm trying to get all the names and how many times that name has been used but only show the top 5 to hopefully speed it up.
I would like to be able to then do get the values like
while($row = mysql_fetch_array($result)){
echo $row['name'].': '.$row['b']."\r\n";
}
And it will show things like this;
Bob: 215
Steve: 120
Sophie: 118
RandomGuy: 50
RandomGirl: 50
I don't care much about ordering the names afterwards like RandomGirl and RandomGuy been the wrong way round.
I think I've have provided enough information. :) I would like the names to be case-insensitive if possible though. Bob should be the same as BoB, bOb, BOB and so on.
Thank-you for your time
Paul
Limiting results on the top 5 won't give you a lot of speed-up, you'll gain time in the result retrieval, but in mySQL side the whole table still needs to be parsed (to count).
You will speed-up your count query having index on name column, of course as only the index will be parsed and not the table.
Now if you really want to speed up the result and avoid parsing the name index when you need this result (which will still be quite slow if you really have millions of rows), then the only other solution is computing the stats when inserting, deleting or updating rows on this table. That is using triggers on this table to maintain a statistics table near this one. Then you will really only have a simple select query on this statistics table, with only 5 rows parsed. But you will slow down your inserts, delete and update operations (which are already quite slow, especially if you maintain indexes, so if the stats are important you should study this solution).
Do you have an index on name? It might help.
Since you are doing the counting/grouping and then sorting an index on name doesn't help at all MySql should go through all rows every time, there is no way to optimize this. You need to have a separate stats table like this:
CREATE TABLE name_stats( name VARCHAR(n), cnt INT, UNIQUE( name ), INDEX( cnt ) )
and you should update this table whenever you add a new row to 'people' table like this:
INSERT INTO name_stats VALUES( 'Bob', 1 ) ON DUPLICATE KEY UPDATE cnt = cnt + 1;
Querying this table for the list of top names should give you the results instantaneously.