I have two tables employee and attendance.
employee : empID, empName
attendance: attendanceID, empID, date, inTime, outTime
I need to show these data in a grid where employee name in the left side and then dates. So the column headers would be like Emp Name, 1,2,3,4....,30, With or without data, number of days in the month needs to be printed.
I realized three ways to do this.
Get attendance and employee data in a join query order by empID. Then loop through the data and print it if it is matching with current date.This will go until the empID change in current loop.
Loop through employees, then loop for days in the month, in every record get attendance from the database for particular employee and particular dates.
foreach($employees as $emp)
{
$empID = $emp['empID'];
for($day =1; $day<=$maxDaysInTheMonth $day++)
{
$attendance = getAttendanceFromDatabase($empID,$day);
}
}
To make performance better we try to minimize database connections and unnecessary loops. I like to implement the second way as it has minimum conditions and loops and code is clean. But it is making database retrieval for every employee, every day. Can someone pointout some facts for performance please.
Fetching records in a single query and looping through it is better. As it has to call database server a single time. For the second way - it has to call the database server multiple times which is more costlier.
Then make an associative array from the data. The index would be the empID.
After generating the array you can use it as you want.
Try this query
$sql="SELECT employee.empName AS empName, attendance.date AS date FROM employee,attendance WHERE employee.empID=attendance.empID";
As #Sougata suggest, Fetching records in a single query and looping through it is better. But keep in mind the query performance should be increased as follows:
Avoid Multiple Joins in a Single Query
Try to avoid writing a SQL query using multiple joins that includes outer joins, cross apply, outer apply and other complex sub queries. It reduces the choices for Optimizer to decide the join order and join type. Sometime, Optimizer is forced to use nested loop joins, irrespective of the performance consequences for queries with excessively complex cross apply or sub queries
Avoid Use of Non-correlated Scalar Sub Query
You can re-write your query to remove non-correlated scalar sub query as a separate query instead of part of the main query and store the output in a variable, which can be referred to in the main query or later part of the batch. This will give better options to Optimizer, which may help to return accurate cardinality estimates along with a better plan.
Creation and Use of Indexes
We are aware of the fact that Index can magically reduce the data retrieval time but have a reverse effect on DML operations, which may degrade query performance. With this fact, Indexing is a challenging task, but could help to improve SQL query performance and give you best query response time.
Create a Highly Selective Index
Selectivity define the percentage of qualifying rows in the table (qualifying number of rows/total number of rows). If the ratio of the qualifying number of rows to the total number of rows is low, the index is highly selective and is most useful. A non-clustered index is most useful if the ratio is around 5% or less, which means if the index can eliminate 95% of the rows from consideration. If index is returning more than 5% of the rows in a table, it probably will not be used; either a different index will be chosen or created or the table will be scanned.
Position a Column in an Index
Order or position of a column in an index also plays a vital role to improve SQL query performance. An index can help to improve the SQL query performance if the criteria of the query matches the columns that are left most in the index key. As a best practice, most selective columns should be placed leftmost in the key of a non-clustered index.
Related
I need to solve the following task: I have a quite large array of IDs in PHP script and I need to select from MySQL DB all rows with IDs NOT IN this array.
There are several similar questions (How to find all records which are NOT in this array? (MySql)) and the most favourite answer is use NOT IN () construction with implode(',',$array) within a brackets.
And this worked... until my array gown up to 2007 IDs and about 20 kB (in my case) I've got a "MySQL server has gone away" error. As I can understand this is because of the lengthy query.
There are also some solutions to this problem like this:
SET GLOBAL max_allowed_packet=1073741824;
(just taken from this question).
Probably I could do it in this way, however now I doubt that NOT IN (implode) approach is a good one to a big arrays (I expect that in my case array can be up to 8000 IDs and 100 kB).
Is there any better solution for a big arrays?
Thanks!
EDIT 1
As a solution it is recommended to insert all IDs from array to a temporary table and than use JOIN to solve the initial task. This is clear. However I never used temporary tables and therefore I have some additional question (probably worth to be as a separate question but I decided to leave it here):
If I need to do this routine several times during one MySQL session, which approach will be better:
Each time I need to SELECT ID NOT IN PHP array I will create a NEW temporary table (all those tables will be deleted after MySQL connection termination - after my script will be terminated in fact).
I will create a temporary table and delete one after I made needed SELECT
I will TRNCATE a temporary table afterwards.
Which is the better? Or I missed something else?
In such cases it is usually better to create a temporary table and perform the query against it instead. It'd be something along the lines of:
CREATE TEMPORARY TABLE t1 (a int);
INSERT INTO t1 VALUES (1),(2),(3);
SELECT * FROM yourtable
LEFT JOIN t1 on (yourtable.id=t1.a)
WHERE t1.a IS NULL;
Of course INSERT statement should be constructed so that you'd insert all values from your array into the temporary table.
Edit: Inserting all values in a single INSERT statement would most probably lead into the same problem you already faced. Hence I'd suggest that you use a prepared statement that will be executed to insert the data into temporary table while you iterate through the PHP array.
I've once had to tackle this problem, but with a IN(id) WHERE Clause with approx 20,000-30,000 identifiers (indexes).
The way I got around this, with SELECT query, was that I reduced the number of filtered identifiers and increased the number of times I sent the same query, in order to extract the same data.
You could use array_chunk for PHP and divide 20,000 by 15, which would give you 15 separate SQL Calls, filtering records by 1500 identifiers (per call, you can divide more than 15 to reduce the number of identifiers further). But in your case, if you just divide 2007 idenitifers by 10 it would reduce the number of identifiers you're pushing to the database to 200 per SQL request, there are otherways to optimize this further with temporary tables and so fourth.
By dividing the number of indexes you're trying filter it will speed up each query, to run faster than if you were to send every index to the database in a single dump.
Currently,I am working on one php project. for my project extension,i needed to add more data in mysql database.but,i had to add datas in only one particular table and the datas are added.now,that table size is 610.1 MB and number of rows is 34,91,534.one more thing 22 distinct record is in that table,one distinct record is having 17,00,000 of data and one more is having 8,00,000 of data.
After that i have been trying to run SELECT statement it is taking more time(6.890 sec) to execute.in that table possible number of columns is having index.even though it is taking more time.
I tried two things for fast retrieval process
1.stored procedure with possible table column index.
2.partitions.
Again,both also took more time to execute SELECT query against some distinct record which is having more number of rows.any one can you please suggest me better alternative for my problem or let me know, if i did any mistake earlier which i had tried.
When working with a large amount of rows like you do, you should be careful of heavy complex nested select statements. With each iteration of nested selects it uses more resources to get to the results you want.
If you are using something like:
SELECT DISTINCT column FROM table
WHERE condition
and it is still taking long to execute even if you have indexes and partitions going then it might be physical resources.
Tune your structure and then tune your code.
Hope this helps.
I am wondering the best format to lay out my data in a mySQL table so that it can be queried in the fastest manner to gather an array of daily values to be further utilized by php.
So far, I have laid out the table as such:
item_id price_date price_amount
1 2000-03-01 22.4
2 2000-03-01 19.23
3 2000-03-01 13.4
4 2000-03-01 14.95
1 2000-03-02 22.5
2 2000-03-02 19.42
3 2000-03-02 13.4
4 2000-03-02 13.95
with item_id defined as an index.
Also, I am using:
"SELECT DISTINCT price_date FROM table_name"
to get an array containing a unique list of dates.
Furthermore, the part of the code that is within a loop (and the focus of my optimization question) is currently written as:
"SELECT price_amount FROM table_name WHERE item_id = 1 ORDER BY price_date"
This second "SELECT" statement is actually within a loop where I am selecting/storing-in-array the daily prices of each item_id requested.
All is currently functioning and pulling the data from mySQL properly, however, both the above listed "SELECT" statements are taking approx 4-5 seconds to complete per each run, and when looping through 100+ products to create a summary, adds up to a very inefficient/slow information system.
Is there any more-efficient way that I could structure the mySQL table and/or SELECT statements to retrieve the results faster? Perhaps defining a different index on a different column? I have used the EXPLAIN command to return information per the queries but am unsure how to use the EXPLAIN information to increase the efficiency of my queries.
Thanks in advance for any mySQL wizards that may be able to assist.
Single column index
I am using:
"SELECT DISTINCT price_date FROM table_name"
to get an array containing a unique list of dates.
This query can be executed more efficiently if you create an index for the price_date column:
ALTER TABLE table_name ADD INDEX price_idx (price_date);
Mutiple column index
Furthermore, the part of the code that is within a loop (and the focus of my optimization question) is currently written as:
"SELECT price_amount FROM table_name WHERE item_id = 1 ORDER BY price_date"
For the second query, you should create an index covering both the item_id and price_date column:
ALTER TABLE table_name ADD INDEX item_price_idx (item_id, price_date);
I know this is a bit late, but i stumbled across this and thought I would throw my thoughts into the mix.
Indexes used well are very helpful in speeding up queries (Explain shows some really godd results around which indexes are being chosen - if any - for a specific query). However efficient PHP will help even more.
In your case you do not show the PHP, but it looks like you offer a list of dates and then loop through finding all the items in that date to get the prices. It would be more efficient to do something like the following:
Select item_id, price_amount from table_name where price_date= order by item_id, price_amount
with an index (preferably a Unique Index) on price_date,item_id,price_amount
You then have a single loop through the resultant SQL not a loop with multiple SQL connections (this is especially true if your SQL server is separate from the PHP box as an external network connection can have an overhead).
4-5 seconds for a single query though is very slow )by a factor of at least 100x) so it would indicate a problem (very large table with no key to use) or disk issues (potentially).
I realized, that the response to a MySQL query becomes much faster, when creating an index for the column you use for "ORDER BY", e.g.
SELECT username FROM table ORDER BY registration_date DESC
Now I'm wondering which indices I should create to optimize the request time.
For example I frequently use the following queries:
SELECT username FROM table WHERE
registration_date > ".(time() - 10000)."
SELECT username FROM table WHERE
registration_date > ".(time() - 10000)."
&& status='active'
SELECT username FROM table WHERE
status='active'
SELECT username FROM table ORDER BY registration_date DESC
SELECT username FROM table WHERE
registration_date > ".(time() - 10000)."
&& status='active'
ORDER BY birth_date DESC
Question 1:
Should I set up separate indices for the first three request types? (i.e. one index for the column "registration_date", one index for the column "status", and another column for the combination of both?)
Question 2:
Are different indices independently used for "WHERE" and for "ORDER BY"? Say, I have a combined index for the columns "status" and "registration_date", and another index only for the column "birth_date". Should I setup another combined index for the three columns ("status", "registration_date" and "birth_date")?
There are no hard-and-fast rules for indices or query optimization. Each case needs to be considered and examined.
Generally speaking, however, you can and should add indices to columns that you frequently sort by or use in WHERE statements. (Answer to Question 2 -- No, the same indices are potentially used for ORDER BY and WHERE) Whether to do a multi-column index or a single-column one depends on the frequency of queries. Also, you should note that single-column indices may be combined by mySQL using the Index Merge Optimization:
The Index Merge method is used to retrieve rows with several range
scans and to merge their results into one. The merge can produce
unions, intersections, or unions-of-intersections of its underlying
scans. This access method merges index scans from a single table; it
does not merge scans across multiple tables.
(more reading: http://dev.mysql.com/doc/refman/5.0/en/index-merge-optimization.html)
Multi-column indices also require that you take care to structure your queries in such a way that your use of indexed columns matches the column order in the index:
MySQL cannot use an index if the columns do not form a leftmost
prefix of the index. Suppose that you have the SELECT statements shown
here:
SELECT * FROM tbl_name WHERE col1=val1; SELECT * FROM tbl_name WHERE
col1=val1 AND col2=val2;
SELECT * FROM tbl_name WHERE col2=val2; SELECT * FROM tbl_name WHERE
col2=val2 AND col3=val3;
If an index exists on (col1, col2, col3), only the first two queries
use the index. The third and fourth queries do involve indexed
columns, but (col2) and (col2, col3) are not leftmost prefixes of
(col1, col2, col3).
Bear in mind that indices DO have a performance consideration of their own -- it is possible to "over-index" a table. Each time a record is inserted or an indexed column is modified, the index/indices will have to be rebuilt. This does demand resources, and depending on the size and structure of your table, it may cause a decrease in responsiveness while the index building operations are active.
Use EXPLAIN to find out exactly what is happening in your queries. Analyze, experiment, and don't over-do it. The shotgun approach is not appropriate for database optimization.
Documentation
MySQL EXPLAIN - http://dev.mysql.com/doc/refman/5.0/en/explain.html
How MySQL uses indices - http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
Index Merge Optimization - http://dev.mysql.com/doc/refman/5.0/en/index-merge-optimization.html
To quote this page:
[Indices] will slow down your updates and inserts.
That's the tradeoff you have to calculate. To optimize your table, you should put indices only in the columns you are most likely to apply conditions to - the more indices you have, the slower your data-changing operations become. In that sense, I personally don't see much merit in creating combined indices - if you create all 7 possible permutations of indices for 3 columns, you are most definitely putting more drag on your updates and inserts than just using 3 indices for 3 columns (and even that can be debatable). On the other hand, if the data is being edited much, much less than it is being SELECTed, then indices can really help you speed things up.
Something else to take into consideration (again quoting the above page):
If your table is very small [...] it's worse to use an index than to leave it out and just let it do a table scan. Indexes really only come in handy with tables that have a lot of rows.
Yes, it is a good idea to have indexes on your column you often use, both for order by and in your where clauses.
But be aware: UPDATES, INSERTS and DELETE slow down if you have indexes.
That is because after such an operation, the index must be updated too.
So, as a rule-of-thumb: If your application is read-intensive, use the indexes where you think they help.
If your application is often updating the data, be careful, because that may get slow because of the indexes.
When in doubt, you must simply get dirty hands, and study the results of EXPLAIN.
http://dev.mysql.com/doc/refman/5.6/en/explain.html
As for the first two examples, you can satisfy them with one index: {registration_date, status}. Such an index can support filters on the first item (registration_date) or on both.
It does not work for status alone, however. The question on status is how selective is the status. That is, what proportion of records have status = "active". If this is a high proportion (so, on average, every database page would have an active record), then an index may not help very much.
The order by's are trickier. I don't know if mysql uses indexes for this purpose. Often, using an index for sorting entire records is less efficient than just sorting the records. Using the index causes a random access pattern to the records in the pages, which can cause major performance problems for tables larger than the page cache.
Use the explain function on your select statements to determine where your joins are slowing down (the more rows that are referenced, the slower it will be). Then apply your indices to those columns.
EXPLAIN SELECT * FROM table JOIN table 2 ON a = b WHERE conditions;
I have two tables here:
ITEMS
ID| DETAILS| .....| OWNER
USERS:
ID| NAME|....
Where ITEMS.OWNER = USERS.ID
I'm listing the items out with their respective owners names. For this I could use a join on both tables or I could select all the ITEMS and loop through them making a sql query to retrieve the tuple of that itmes owner. Thats like:
1 sql with a JOIN
versus
1x20 single table sql queries
Which would be a better apporach to take in terms of speed?
Thanks
Of course a JOIN will be faster.
Making 20 queries will imply:
Parsing them 20 times
Making 20 index seeks to find the start of the index range on items
Returning 20 recordsets (each with its own metadata).
Every query has overhead. If you can do something with one query, it's (almost) always better to do it with one query. And most database engines are smarter than you. Even if it's better to split a query in some way, the database will find out himself.
An example of overhead: if you perform 100 queries, there will be a lot more traffic between your application and your webserver.
In general, if you really want to know something about performance, benchmark the various approaches, measure the parameters you're interested in and make a decision based on the results of the becnhmark.
Good luck!
Executing a join will be much quicker as well as a better practice
I join would be a lot quicker than performing another query on the child table for each record in the parent table.
You can also enable performance data in SQL to see the results for yourself..
http://wraithnath.blogspot.com/2011/01/getting-performance-data-from-sql.html
N