Takes lots of time when use aggregate function in the query - php

Table has 100 000 records, takes 20-21 seconds when use aggregate function. How to optimize this query?
SELECT source, sum(product_price*quantity) AS price
FROM `sheet`
WHERE source !=''
GROUP BY source
ORDER BY `price` DESC
I have also used indexing in the table
ALTER TABLE `dbname`.`sheet` ADD INDEX `reprting_module` (`source`(30));
This is the output after explain the query

First of all, you're asking your MySQL server to do some computation in this query, and then to sort the results. It will take some time. It necessarily must examine every, or almost every, row of your table. There's no magic to make those operations instantaneous.
Secondly, your WHERE source != '' filter may be defeating your indexing. You could try WHERE source > '' instead. That will allow MySQL's query planner to random-access your index, then scan it sequentially.
Third, your subset source index (source(30)) doesn't help performance.
Fourth, you can try creating a compound covering index on these columns:
ALTER TABLE dbname.sheet
ADD INDEX `reprting_module` (source, product_price, quantity);
Then write your query like this:
SELECT source, SUM(product_price*quantity) AS price
FROM sheet
WHERE source > ''
GROUP BY source
ORDER BY SUM(product_price*quantity) DESC
If you're lucky this will be a little faster. Why? Because MySQL can satisfy your entire query by random-accessing the index to the first non-empty source value, then sequentially scanning just the index to perform your computation.
Notice that the query I showed, with the index I showed, will be very fast indeed if you use
WHERE source = 'some-particular-value'
to narrow down the scope of the computation.

"Prefix" indexes, such as INDEX(source(30)), are virtually useless. Please provide SHOW CREATE TABLE. If source could be VARCHAR(255) or smaller, simply add INDEX(source) But that is probably not useful here, since most of the table needs to be read.
How much RAM do you have? What is the value of innodb_buffer_pool_size? How big (GB) is the table? These combine to ask whether you are CPU-bound or I/O-bound, and whether a simple tuning fix can change it from I/O to CPU, thereby possibly speeding it up to 2 seconds. (20 seconds seems very high for a mere 100K rows.)

Related

MySQL for selecting MAXIMUM differences of two columns

I have a table with following columns:
ItemCode VARCHAR
PriceA DECIMAL(10,4)
PriceB DECIMAL(10,4)
The table has around 1,000 rows.
My requirement is to check the difference (PriceA-PriceB) for each row and then display top 50 items that have maximum price differences.
There are two ways I can implement this
1) Trust that SQL calculation is non-complex, easy and fast, so I run the following query:
SELECT ItemCode, (PriceA - PriceB) AS PDiff FROM testtable ORDER BY PDiff DESC LIMIT 50
and second,
2) Add one more column (called PriceDiff), which will store the difference (PriceA-PriceB).
However, these will have to be inserted manually and need extra space. But it can simply run the MAX(PriceDiff) select query for top 50.
My question is - in terms of speed and efficiency for a web application (displaying results on a website/app), which of the above method is better?
I have attempted to generate time consumed for each query, but both are reporting similar figures so unable to make any inferences.
Any explanation by the experts, or any fine-tuning of code, will be really appreciated.
Thanks
In general, to improve performance you always have to make a tradeoff between memory and time. Caching results will improve speed, however takes more memory. You can reduce memory usage by calculating stuff on the fly at the expense of performance.
In your case, storing additional 1000+ values in the DB is a matter of few extra Kb. Calculating the diff on the fly will have a negligible impact on performance. Either option is absolute peanuts to any DB and server.
I would stick with doing calculations on the fly as that is less complex and keeps the db normalized.
The first method is fastest, but is prone to error, as was mentioned.
May I suggest another solution, using a primary key. You could then set the value of the new column to what you are trying to figure from within the web application.
Then, when wanting to know the top 50, you could use your original method of finding the top 50, using your second method, where you would select from the table which stores the differences.
These links explain primary keys and how to use them:
http://www.mysqltutorial.org/mysql-primary-key/
https://www.w3schools.com/sql/sql_primarykey.asp

The ways to increase "select" from db table with 8.000.000 rows?

Hellow, wrold!
I have read about that in web, but I have not found suitable solutions. I am not pro in sql, and I have the simplest table which contains 10 columns, 8.000.000 rows and the simplest query (i am working with php)
SELECT `field_name_1`, `field_name_2`, `field_name_3`
FROM `table_name`
WHERE `some_field`=#here_is_an_integer#
ORDER BY `some_field`
LIMIT 10;
Maybe some indexes, caching, or something like this.
If you have some minds about that, I'll be glad you help or just say the way I should follow to find the solution.
Thank you!
Use index on some_field and in the best way on all columns where you use SQL WHERE.
If you only want to show data, use pagination on Sql with LIMIT
And like Admin set caching and others Mysql (or MariaDB) limits for better searching like in there (Top 20+ MySQL Best Practices)
... simple answer, if you have space available, from
MySQL> ALTER TABLE table_name ADD INDEX ( some_field );
Here is things to think about:
Make sure all fields that will be joined, WHERE or ORDER / GROUP BY have appropriate indexes (unique or just plain text)
With many rows in a table, the memory cache of the server must be able to store the temporary resultset.
Use InnoDb for slower inserts and faster selects, and tune InnoDb to be able to store the temporary resultset
Do not use RAND() so that a resultset can be query-cached.
Filter early; try to put conditions on JOIN .. AND x=y, instead of on the final WHERE condition (unless of course it is about the main table). (inner) join in the most optimal order; if you have many users and little reports for example, start by selecting the users first, limiting the number of rows immediately before doing other joins.
Perhaps you can formalize your question a bit better (maybe with the use of a question mark somewhere).
Anyways, if you want to increase the speed of a select on a single table such as the one you describe then at the very least the column involved in the WHERE clause should have a proper index. This could be just a standard 'KEY (some_field)' if it is an integer type, otherwise if it is a string (i.e varchar or varbinary) type field with a reasonable amount of cardinality within the first n bytes you can use a prefix index and do something like 'KEY (some_field(8))' and minimize the index size (subsequently increasing performance by decreasing btree search time).
Ideally the server will have enough ram (and have a large enough innodb_buffer_pool_size if you are using innodb) to keep the aforementioned index in memory. Also look into https://dev.mysql.com/doc/refman/5.7/en/innodb-parameters.html#sysvar_innodb_adaptive_hash_index
... if your server has the available resources and your application has the data access patterns to justify it.
Hope this helps!

MySQL+PHP: How to paginate data from complex query with ORDER BY on user-selected column

I have a table with currently ~1500 rows which is expected to grow over time (can't say how much, but still), the website is read-only and lets users do complex queries through the use of some forms, then the search query is completely URL-encoded since it's a public database. It's important to know that users can select what column data must be sorted by.
I'm not concerned about putting some indexes and slowing down INSERTs and UPDATEs (just performed occasionally by admins) since it's basically heavy-reading, but I need to paginate results as some popular queries can return 900+ results and that takes up too much space and RAM on client-side (results are further processed to create a quite rich <div> HTML element with an <img> for each result, btw).
I'm aware of the use of OFFSET {$m} LIMIT {$n} but would like to avoid it
I'm aware of the use of this
Query
SELECT *
FROM table
WHERE {$filters} AND id > {$last_id}
ORDER BY id ASC
LIMIT {$results_per_page}
and that's what I'd like to use, but that requires rows to be sorted only by their ID!
I've come up with (what I think is) a very similar query to custom sort results and allow efficient pagination.
Query:
SELECT *
FROM table
WHERE {$filters} AND {$column_id} > {$last_column_id}
ORDER BY {$column} ASC
LIMIT {$results_per_page}
but that unfortunately requires to have a {$last_column_id} value to pass between pages!
I know indexes (especially unique indexes) are basically automatically-updated integer-based columns that "rank" a table by values of a column (be it integer, varchar etc.), but I really don't know how to make MySQL return the needed $last_column_id for that query to work!
The only thing I can come up with is to put an additional "XYZ_id" integer column next to every "XYZ" column users can sort results by, then update values periodically through some scripts, but is it the only way to make it work? Please help.
(Too many comments to fit into a 'comment'.)
Is the query I/O bound? Or CPU bound? It seems like a mere 1500 rows would lead to being CPU-bound and fast enough.
What engine are you using? How much RAM? What are the settings of key_buffer_size and innodb_buffer_pool_size?
Let's see SHOW CREATE TABLE. If the table is full of big BLOBs or TEXT fields, we need to code the query to avoid fetching those bulky fields only to throw them away because of OFFSET. Hint: Fetch the LIMIT IDs, then reach back into the table to get the bulky columns.
The only way for this to be efficient:
SELECT ...
WHERE x = ...
ORDER BY y
LIMIT 100,20
is to have INDEX(x,y). But, even that, will still have to step over 100 cow paddies.
You have implied that there are many possible WHERE and ORDER BY clauses? That would imply that adding enough indexes to cover all cases is probably impractical?
"Remembering where you left off" is much better than using OFFSET, so try to do that. That avoids the already-discussed problem with OFFSET.
Do not use WHERE (a,b) > (x,y); that construct used not to be optimized well. (Perhaps 5.7 has fixed it, but I don't know.)
My blog on OFFSET discusses your problem. (However, it may or may not help your specific case.)

Is there a way to speed up this query with no WHERE clause?

I have about 1 million rows so its going pretty slow. Here's the query:
$sql = "SELECT `plays`,`year`,`month` FROM `game`";
I've looked up indexes but it only makes sense to me when there's a 'where' clause.
Any ideas?
Indexes can make a difference even without a WHERE clause depending on what other columns you have in your table. If the 3 columns you are selecting only make up a small proportion of the table contents a covering index on them could reduce the amount of pages that need to be scanned.
Not moving as much data around though, either by adding a WHERE clause or doing the processing in the database would be better if possible.
If you don't need all 1 million records, you can pull n records:
$sql = "SELECT `plays`,`year`,`month` FROM `game` LIMIT 0, 1000";
Where the first number is the offset (where to start from) and the second number is the number of rows. You might want to use ORDER BY too, if only pulling a select number of records.
You won't be able to make that query much faster, short of fetching the data from a memory cache instead of the db. Fetching a million rows takes time. If you need more speed, figure out if you can have the DB do some of the work, e.g. sum/group togehter things.
If you're not using all the rows, you should use the LIMIT clause in your SQL to fetch only a certain range of those million rows.
If you really need all the 1 million rows to build your output, there's not much you can do from the database side.
However you may want to cache the result on the application side, so that the next time you'd want to serve the same output, you can return the processed output from your cache.
The realistic answer is no. With no restrictions (ie. a WHERE clause or a LIMIT) on your query, then you're almost guaranteed a full table scan every time.
The only way to decrease the scan time would be to have less data (or perhaps a faster disk). It's possible that you could re-work your data to make your rows more efficient (CHARS instead of VARCHARS in some cases, TINYINTS instead of INTS, etc.), but you're really not going to see much of a speed difference with that kind of micro-optimization. Indexes are where it's at.
Generally if you're stuck with a case like this where you can't use indexes, but you have large tables, then it's the business logic that requires some re-working. Do you always need to select every record? Can you do some application-side caching? Can you fragment the data into smaller sets or tables, perhaps organized by day or month? Etc.

How to efficiently paginate large datasets with PHP and MySQL?

As some of you may know, use of the LIMIT keyword in MySQL does not preclude it from reading the preceding records.
For example:
SELECT * FROM my_table LIMIT 10000, 20;
Means that MySQL will still read the first 10,000 records and throw them away before producing the 20 we are after.
So, when paginating a large dataset, high page numbers mean long load times.
Does anyone know of any existing pagination class/technique/methodology that can paginate large datasets in a more efficient way i.e. that does not rely on the LIMIT MySQL keyword?
In PHP if possible as that is the weapon of choice at my company.
Cheers.
First of all, if you want to paginate, you absolutely have to have an ORDER BY clause. Then you simply have to use that clause to dig deeper in your data set. For example, consider this:
SELECT * FROM my_table ORDER BY id LIMIT 20
You'll have the first 20 records, let's say their id's are: 5,8,9,...,55,64. Your pagination link to page 2 will look like "list.php?page=2&id=64" and your query will be
SELECT * FROM my_table WHERE id > 64 ORDER BY id LIMIT 20
No offset, only 20 records read. It doesn't allow you to jump arbitrarily to any page, but most of the time people just browse the next/prev page. An index on "id" will improve the performance, even with big OFFSET values.
A solution might be to not use the limit clause, and use a join instead -- joining on a table used as some kind of sequence.
For more informations, on SO, I found this question / answer, which gives an example -- that might help you ;-)
There are basically 3 approaches to this, each of which have their own trade-offs:
Send all 10000 records to the client, and handle pagination client-side via Javascript or the like. Obvious benefit is that only a single query is necessary for all of the records; obvious downside is that if the record size is in any way significant, the size of the page sent to the browser will be of proportionate size - and the user might not actually care about the full record set.
Do what you're currently doing, namely SQL LIMIT and grab only the records you need with each request, completely stateless. Benefit in that it only sends the records for the page currently requested, so requests are small, downsides in that a) it requires a server request for each page, and b) it's slower as the number of records/pages increases for later pages in the result, as you mentioned. Using a JOIN or a WHERE clause on a monotonically increasing id field can sometimes help in this regard, specifically if you're requesting results from a static table as opposed to a dynamic query.
Maintain some sort of state object on the server which caches the query results and can be referenced in future requests for a limited period of time. Upside is that it has the best query speed, since the actual query only needs to run once; downside is having to manage/store/cleanup those state objects (especially nasty for high-traffic websites).
SELECT * FROM my_table LIMIT 10000, 20;
means show 20 records starting from record # 10000 in the search , if ur using primary keys in the where clause there will not be a heavy load on my sql
any other methods for pagnation will take real huge load like using a join method
I'm not aware of that performance decrease that you've mentioned, and I don't know of any other solution for pagination however a ORDER BY clause might help you reduce the load time.
Best way is to define index field in my_table and for every new inserted row you need increment this field. And after all you need to use WHERE YOUR_INDEX_FIELD BETWEEN 10000 AND 10020
It will much faster.
some other options,
Partition the tables per each page so ignore the limit
Store the results into a session (a good idea would be to create a hash of that data using md5, then using that cache the session per multiple users)

Categories