MySQL Select Query 17000 rows slow - php

The problem is that when I display about 17000 records on the page it takes a long time and when I open this page a couple of times in new tabs they load only when the first page is loaded.
Normally this is stupid because web applications should work multithreaded.
I don't understand why this is so.
I use indexes to load things faster and it doesn't bring anything, I think. If I use PHPMyAdmin to load the data with a SELECT * FROM ... LIMIT 30000 load, it takes 10 minutes and then comes a server error. I don't know why that is so.
How can I increase the speed to insert, read and write data and so on?
This page selects 2002 comments (data sets) and is already slow
https://www.prodigy-official.de/punity/questions/show?question=11
This page selects 17000 rows and does not loads...
https://www.prodigy-official.de/punity/questions/show?question=10
WHY
I use InnoDB as Engine.
I use a Virtual Machine.
Infos:
http://prntscr.com/pm8e9j

Glancing through the code, it seems like you display a question, plus all its comments?
$question_id = $_GET['question'];
SELECT id,username_id,question_id,comment,comment_date
FROM `user.comments`
WHERE `question_id` = $question_id
AND is_reply_from_comment_id = '0'
ORDER BY id
foreach result
{
SELECT id,username_id,comment,comment_date
FROM `user.comments`
WHERE is_reply_from_comment_id = $comment_id
ORDER BY id
}
For the sanity of your users, you must put a limit on the number of comments displayed all at once.
Replacing INDEX(is_reply_from_comment_id) with INDEX(is_reply_from_comment_id, question_id) will help the first SELECT without hurting the second.
Do you understand that the schema limits the table to only 32K rows? (It sounds like you will soon hit that limit.)

Related

Performance issue: Load data infile is making select query slow

I have a program which scans twitter, facebook, google+ 24 hours a day. Per user a searchlist is running and inserted with (100 results at one time, function runs in a loop untill there are not futher results)
Yii::app()->db->createCommand(
"LOAD DATA INFILE '/var/tmp/inboxli_user".$user.".txt'
INTO TABLE inbox
FIELDS TERMINATED BY ',$%'
LINES STARTING BY 'thisisthebeginningxxx'
(created_on, created_at, tweet, tweet_id, profile_image,
twitter_user_id, screenname, followers, lang, tags, type,
positive_score, readme, answered, deleted, searchlist_id,
handled_by, used_as_newsitem, user_id)
" )->execute();
into the database in order to keep the load as small as possible on the server. How ever when my functions are doing the bulk insert, my select functions runs very slow. Normally the inbox loads within 1.5 second but when the insertion is running sometimes it takes like 20 seconds for a page to open.
My question how can i optimize this? So insertion and select can use the database at the same time without slowing things down?
Get off MyISAM! Use InnoDB; it does a much better job of not locking out other actions.
Load data is very efficient, increase the count to, say, 500.
What indexes do you have? Let's see SHOW CREATE TABLE. DROP any unnecessary indexes; this will speed up the LOAD.
Consider turning off the Query cache.
Well, first you should make sure you indexed your table correctly. See How does database indexing work?
that will speed up the select statements pretty much.
Second, it's possible that you split your file into multiple chunks. So the database server removes the caches and logs for each new file you loaded.
See: https://www.percona.com/blog/2008/07/03/how-to-load-large-files-safely-into-innodb-with-load-data-infile/

PHP - Query on page with many requests

I have around 700 - 800 visitors at all time on my home page (according to analytics) and a lot of hits in general. However, I wish to show live statistics of my users and other stuff on my homepage. I therefore have this:
$stmt = $dbh->prepare("
SELECT
count(*) as totalusers,
sum(cashedout) cashedout,
(SELECT sum(value) FROM xeon_stats_clicks
WHERE typ='1') AS totalclicks
FROM users
");
$stmt->execute();
$stats=$stmt->fetch();
Which I then use as $stats["totalusers"] etc.
table.users have `22210` rows, with index on `id, username, cashedout`, `table.xeon_stats_clicks` have index on `value` and `typ`
However, whenever I enable above query my website instantly becomes very slow. As soon as I disable it, the load time drastically falls.
How else can this be done?
You should not do it that way. You will eventually exhuast your precious DB resources, as you now are experiencing. The good way is to run a separate cronjob in 30 secs or 1 min interval, and then write the result down to a file :
file_put_contents('stats.txt', $stats["totalusers"]);
and then on your mainpage
<span>current users :
<b><? echo file_get_contents('stats.txt');?></b>
</span>
The beauty is, that the HTTP server will cache this file, so until stats.txt is changed, a copy will be upfront in cache too.
Example, save / load JSON by file :
$test = array('test' => 'qwerty');
file_put_contents('test.txt', json_encode($test));
echo json_decode(file_get_contents('test.txt'))->test;
will output qwerty. Replace $test with $stats, as in comment
echo json_decode(file_get_contents('stats.txt'))->totalclicks;
From what I can tell, there is nothing about this query that is specific to any user on the site. So if you have this query being executed for every user that makes a request, you are making thousands of identical queries.
You could do a sort of caching like so:
Create a table that basically looks like the output of this query.
Make a PHP script that just executes this query and updates the aforementioned table with the lastest result.
Execute this PHP script as a cron job every minute to update the stats.
Then the query that gets run for every request can be real simple, like:
SELECT totalusers cashedout, totalclicks FROM stats_table
From the query, I can't see any real reason to use a sub-query in there as it doesn't use any of the data in the users table, and it's likely that that is slowing it down - if memory serves me right it will query that xeon_clicks table once for every row in your users table (which is a lot of rows by the looks of things).
Try doing it as two separate queries rather than one.

Optimize Pagination and Filter Query Results with Count

I am having performance issues when dealing with pagination and filtering products as seen on many ecommerce sites, here is an example from Zappos
Kind of the standard:
Showing 1-10 of 132 results. [prev] 1 2 [3] 4 ... 13 [next]
[10] Results per page
To me it seems like a large part of the problem is the query is run twice, once to count the number of results and again to actually populate the array. Below is the "filter" query:
SELECT product_id, product_title, orderable
FROM table_view
WHERE (family_title = 'Shirts' OR category_title = 'Shirts')
AND ((detail_value = 'Blue' AND detail_title = 'Color')
OR (detail_value = 'XL' AND detail_title = 'Size'))
GROUP BY product_id, product_title, orderable
HAVING COUNT(detail_title)=2
ORDER BY product_id
LIMIT 10 OFFSET 0
The query takes about 20ms to run just by itself. The table it is selecting from is a view which is a join of about five different tables. The parameters that are passed in by the user are the "detail_value" & "detail_title" which are the filtering criterial. The "family" & "category" and then the Limit is set by the "results per page". So if they want to view all the results the limit is set to 2000. And every time they go to a new page via the pagination the whole query is run again. Below is a snippet of the PHP, $products is an array of the query results. And then the $number_of_results is a count of the same thing with the maximum limit.
$products = filter($value, $category_title, $number_per_page, $subcategory, $start_number);
$number_of_results = count(filter($value, $category_title, 2000, $subcategory, 0));
$pages = ceil($number_of_results / $number_per_page);
When run on my local machine the results page takes about 600-800ms to load, when deployed to Heroku the page takes 13-16 seconds to load. I've left out a lot of the PHP code, but I'm using PHP's PDO class to make the query results into an object to display in PHP. The tables being joined are the product table, category table, detail table, and the two tables linking them via foreign keys.
Google results show that this is a pretty common/complex problem, but I have yet to come across any real solution that works for me.
Many queries for pagination generally need to run several times: once to determine how many records would be shown, then again to grab a screen of records. Then subsequent queries grabbing the next screen of records, etc.
Two solutions to slow pagination queries are:
Use a cursor to pull n-records from the open query resultset
Speed up the queries
Solution 1 can be expensive memory-wise for the server's resources and might not scale well if you have many concurrent users generating queries like this. It might also be difficult to implement cursors with the PDO class you're using.
Solution 2 could be done via improving view queries, adding indexes, etc. However that may not be enough. If the tables are read much more often than they are written to, you might try using UPDATE/INSERT/DELETE trigger tricks. Rather than running the query against a VIEW, create a table with the same column structure and data as the VIEW. Any time that one of the underlying tables changes, manually modify this new table to follow the changes. This will slow down writes, but greatly improve reading.

How to efficiently paginate large datasets with PHP and MySQL?

As some of you may know, use of the LIMIT keyword in MySQL does not preclude it from reading the preceding records.
For example:
SELECT * FROM my_table LIMIT 10000, 20;
Means that MySQL will still read the first 10,000 records and throw them away before producing the 20 we are after.
So, when paginating a large dataset, high page numbers mean long load times.
Does anyone know of any existing pagination class/technique/methodology that can paginate large datasets in a more efficient way i.e. that does not rely on the LIMIT MySQL keyword?
In PHP if possible as that is the weapon of choice at my company.
Cheers.
First of all, if you want to paginate, you absolutely have to have an ORDER BY clause. Then you simply have to use that clause to dig deeper in your data set. For example, consider this:
SELECT * FROM my_table ORDER BY id LIMIT 20
You'll have the first 20 records, let's say their id's are: 5,8,9,...,55,64. Your pagination link to page 2 will look like "list.php?page=2&id=64" and your query will be
SELECT * FROM my_table WHERE id > 64 ORDER BY id LIMIT 20
No offset, only 20 records read. It doesn't allow you to jump arbitrarily to any page, but most of the time people just browse the next/prev page. An index on "id" will improve the performance, even with big OFFSET values.
A solution might be to not use the limit clause, and use a join instead -- joining on a table used as some kind of sequence.
For more informations, on SO, I found this question / answer, which gives an example -- that might help you ;-)
There are basically 3 approaches to this, each of which have their own trade-offs:
Send all 10000 records to the client, and handle pagination client-side via Javascript or the like. Obvious benefit is that only a single query is necessary for all of the records; obvious downside is that if the record size is in any way significant, the size of the page sent to the browser will be of proportionate size - and the user might not actually care about the full record set.
Do what you're currently doing, namely SQL LIMIT and grab only the records you need with each request, completely stateless. Benefit in that it only sends the records for the page currently requested, so requests are small, downsides in that a) it requires a server request for each page, and b) it's slower as the number of records/pages increases for later pages in the result, as you mentioned. Using a JOIN or a WHERE clause on a monotonically increasing id field can sometimes help in this regard, specifically if you're requesting results from a static table as opposed to a dynamic query.
Maintain some sort of state object on the server which caches the query results and can be referenced in future requests for a limited period of time. Upside is that it has the best query speed, since the actual query only needs to run once; downside is having to manage/store/cleanup those state objects (especially nasty for high-traffic websites).
SELECT * FROM my_table LIMIT 10000, 20;
means show 20 records starting from record # 10000 in the search , if ur using primary keys in the where clause there will not be a heavy load on my sql
any other methods for pagnation will take real huge load like using a join method
I'm not aware of that performance decrease that you've mentioned, and I don't know of any other solution for pagination however a ORDER BY clause might help you reduce the load time.
Best way is to define index field in my_table and for every new inserted row you need increment this field. And after all you need to use WHERE YOUR_INDEX_FIELD BETWEEN 10000 AND 10020
It will much faster.
some other options,
Partition the tables per each page so ignore the limit
Store the results into a session (a good idea would be to create a hash of that data using md5, then using that cache the session per multiple users)

Pagination Strategies for Complex (slow) Datasets

What are some of the strategies being used for pagination of data sets that involve complex queries? count(*) takes ~1.5 sec so we don't want to hit the DB for every page view. Currently there are ~45k rows returned by this query.
Here are some of the approaches I've considered:
Cache the row count and update it every X minutes
Limit (and offset) the rows counted to 41 (for example) and display the page picker as "1 2 3 4 ..."; then recompute if anyone actually goes to page 4 and display "... 3 4 5 6 7 ..."
Get the row count once and store it in the user's session
Get rid of the page picker and just have a "Next Page" link
I've had to engineer a few pagination strategies using PHP and MySQL for a site that does over a million page views a day. I persued the strategy in stages:
Multi-column indexes I should have done this first before attempting a materialized view.
Generating a materialized view. I created a cron job that did a common denormalization of the document tables I was using. I would SELECT ... INTO OUTFILE ... and then create the new table, and rotate it in:
SELECT ... INTO OUTFILE '/tmp/ondeck.txt' FROM mytable ...;
CREATE TABLE ondeck_mytable LIKE mytable;
LOAD DATA INFILE '/tmp/ondeck.txt' INTO TABLE ondeck_mytable...;
DROP TABLE IF EXISTS dugout_mytable;
RENAME TABLE atbat_mytable TO dugout_mytable, ondeck_mytable TO atbat_mytable;
This kept the lock time on the write contended mytable down to a minimum and the pagination queries could hammer away on the atbat materialized view. I've simplified the above, leaving out the actual manipulation, which are unimportant.
Memcache I then created a wrapper about my database connection to cache these paginated results into memcache. This was a huge performance win. However, it was still not good enough.
Batch generation I wrote a PHP daemon and extracted the pagination logic into it. It would detect changes mytable and periodically regenerate the from the oldest changed record to the most recent record all the pages to the webserver's filesystem. With a bit of mod_rewrite, I could check to see if the page existed on disk, and serve it up. This also allowed me to take effective advantage of reverse proxying by letting Apache detect If-Modified-Since headers, and respond with 304 response codes. (Obviously, I removed any option of allowing users to select the number of results per page, an unimportant feature.)
Updated:
RE count(*): When using MyISAM tables, COUNT didn't create a problem when I was able to reduce the amount of read-write contention on the table. If I were doing InnoDB, I would create a trigger that updated an adjacent table with the row count. That trigger would just +1 or -1 depending on INSERT or DELETE statements.
RE page-pickers (thumbwheels) When I moved to agressive query caching, thumb wheel queries were also cached, and when it came to batch generating the pages, I was using temporary tables--so computing the thumbwheel was no problem. A lot of thumbwheel calculation simplified because it became a predictable filesystem pattern that actually only needed the largest page numer. The smallest page number was always 1.
Windowed thumbweel The example you give above for a windowed thumbwheel (<< 4 [5] 6 >>) should be pretty easy to do without any queries at all so long as you know your maximum number of pages.
My suggestion is ask MySQL for 1 row more than you need in each query, and decide based on the number of rows in the result set whether or not to show the next page-link.
MySQL has a specific mechanism to compute an approximated count of a result set without the LIMIT clause: FOUND_ROWS().
MySQL is quite good in optimizing LIMIT queries.
That means it picks appropriate join buffer, filesort buffer etc just enough to satisfy LIMIT clause.
Also note that with 45k rows you probably don't need exact count. Approximate counts can be figured out using separate queries on the indexed fields. Say, this query:
SELECT COUNT(*)
FROM mytable
WHERE col1 = :myvalue
AND col2 = :othervalue
can be approximated by this one:
SELECT COUNT(*) *
(
SELECT COUNT(*)
FROM mytable
) / 1000
FROM (
SELECT 1
FROM mytable
WHERE col1 = :myvalue
AND col2 = :othervalue
LIMIT 1000
)
, which is much more efficient in MyISAM.
If you give an example of your complex query, probably I can say something more definite on how to improve its pagination.
I'm by no means a MySQL expert, but perhaps giving up the COUNT(*) and going ahead with COUNT(id)?

Categories