Does MYSQL load the whole table into cache everytime? - php

Lets say I have a table, with say 1 million rows, with the first column being a primary key.
Then, if I run the following:
SELECT * FROM table WHERE id='tomato117' LIMIT 1
Does the table ALL get put into the cache (thereby causing the query to slow as more and more rows get added) or would the number of rows of the table not matter, since the query uses the primary key?
edit: (added limit 1)

If the id is define as primary key, which only one record with value tomato117, so limit does not useful.
Using SELECT * will trigger mysql read from disk because unlikely all columns are stored into index. (mysql not able to fetch from index) In theory, it will affect performance.
However, your sql is matching query cache condition. So, mysql will stored the result into query cache for subsequent usage.
If you query cache size is huge, mysql will keep store all sql results into query cache until memory full.
This come with a cost, if there is an update on your table, query cache invalidation will be harder for mysql.
http://www.mysqlperformanceblog.com/2007/03/23/beware-large-query_cache-sizes/
http://www.mysqlperformanceblog.com/2006/06/09/why-mysql-could-be-slow-with-large-tables/

nothing of the sort.
It will only fetch the row you selected and perhaps a few other blocks. They will remain in cache until something pushes them out.
By cache, I refer to the innodb buffer pool, not query cache, which should probably be off anyway.

SELECT * FROM table WHERE id = 'tomato117' LIMIT 1
When tomato117 is found, it stops searching, if you don't set LIMIT 1 it will search until end of table. tomato117 can be second, and it will still search 1 000 000 rows for other tomato117.
http://forge.mysql.com/wiki/Top10SQLPerformanceTips
Showing rows 0 - 0 (1 total, Query took 0.0159 sec)
SELECT *
FROM 'forum_posts'
WHERE pid = 643154
LIMIT 0 , 30
Showing rows 0 - 0 (1 total, Query took 0.0003 sec)
SELECT *
FROM `forum_posts`
WHERE pid = 643154
LIMIT 1
Table is about 1GB, 600 000+ rows.

If you add the word EXPLAIN before the word SELECT, it will show you a table with a summary of how many rows it's reading instead of the normal results.
If your table has an index on the id column (including if it's set as primary key), the engine will be able to jump straight to the exact row (or rows, for a non-unique index) and only read the minimal amount of date. If there's no index, it will need to read the whole table.

Related

Best approach to select most viewed posts from last n hours

I'm using PHP and MYSQL(innodb engine).
As MYSQL reference says, selecting with comparison of one column and ordering by another can't use our considered index.
I have a table named News.
This table has at least 1 million records with two important columns: time_added and number_of_views.
I need to select most viewed records from last n hours. What is the best index to do this? Or is it possible to run this kind of queries very fast for a table with millions of records?
I've already done this for "last day", meaning I can select most viewed records from last day by adding a new column (date_added). But if I decide to select these records from last week, I'm in trouble again.
First, write the query:
select n.*
from news n
where time_added >= date_sub(now(), interval <n> hours)
order by number_of_views desc
limit ??;
The best index is (time_added, number_of_views). Actually, number_of_views won't be used for the full query, but I would include it for other possible queries.
First you must add the following line to the my.cnf (in section
[mysqld]):
query_cache_size = 32M (or more).
query_cache_limit = 32M (or more)
query_cache_size Sets size of the cache
Another option, which should pay attention - this query_cache_limit - it sets the maximum amount of the result of the query, which can be placed in the cache.
Check the status of the cache, you can request the following:
show global status like 'Qcache%';
http://dev.mysql.com/doc/refman/5.7/en/mysql-indexes.html
If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to look up rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3). For more information, see http://dev.mysql.com/doc/refman/5.7/en/multiple-column-indexes.html
You need a summary table. Since 'hour' is your granularity, something like this might work:
CREATE TABLE HourlyViews (
the_hour DATETIME NOT NULL,
ct SMALLINT UNSIGNED NOT NULL,
PRIMARY KEY(the_hour)
) ENGINE=InnoDB;
It might need another column (and add it to the PK) if there is some breakdown of the items you are counting. And you might want some other things SUM'd or COUNT'd in this table.
Build and maintain this table incrementally. That is, every hour, add another row to the table. (Or you could keep it updated with INSERT .. ON DUPLICATE KEY UPDATE ...)
More on Summary Tables
Then change the query to use that table; it will be a lot faster.

Update Current Row in MySQL Loop

I have a MySQL table with over 16 million rows and there is no primary key. Whenever I try to add one, my connection crashes. I have tried adding one as an auto increment in PHPMyAdmin and in shell but the connection is always lost after about 10 minutes.
What I would like to do is loop through the table's rows in PHP so I can limit the number of results and with each returned row add an auto-incremented ID number. Since the number of impacted rows would be reduced by reducing the load on the MySQL query, I won't lose my connection.
I want to do something like
SELECT * FROM MYTABLE LIMIT 1000001, 2000000;
Then, in the loop, update the current row
UPDATE (current row) SET ID='$i++'
How do I do this?
Note: the original data was given to me as a txt file. I don't know if there are duplicates but I cannot eliminate any rows. Also, no rows will be added. This table is going to be used only for querying purposes. When I have added indexes, however, there were no problems.
I suspect you are trying to use phpmyadmin to add the index. As handy as it is, it is a PHP script and is limited to the same resources as any PHP script on your server, typically 30-60 seconds run time, and a limited amount of ram.
Suggest you get the mysql query you need to add the index, then use SSH to shell in, and use command line MySQL to add your indexes.
If you don't have duplicate rows then the following way might shed some light:
Suppose you want to update the auto incremented value for first 10000 rows.
UPDATE
MYTABLE
INNER JOIN
(SELECT
*,
#rn := #rn + 1 AS row_number
FROM MYTABLE,(SELECT #rn := 0) var
ORDER BY SOME_OF_YOUR_FIELD
LIMIT 0,10000 ) t
ON t.field1 = MYTABLE.field1 AND t.field2 = MYTABLE.field2 AND .... t.fieldN = MYTABLE.fieldN
SET MYTABLE.ID = t.row_number;
For next 10000 rows just need to change two things:
(SELECT #rn := 10000) var
LIMIT 10000,10000
Repeat..
Note: ORDER BY SOME_OF_YOUR_FIELD is important otherwise you would get results in random order. Better create a function which might take limit,offset as parameter and do this job. Since you need to repeat the process.
Explanation:
The idea is to create a temporary table(t) having N number of rows and assigning a unique row number to each of the row. Later make an inner join between your main table MYTABLE and this temporary table t ON matching all the fields and then update the ID field of the corresponding row(in MYTABLE) with the incremented value(in this case row_number).
Another IDEA:
You may use multithreading in PHP to do this job.
Create N threads.
Assign each thread a non overlapping region (1 to 10000, 10001 to
20000 etc) like the above query.
Caution: The query will get slower in higher offset.

Getting random results from large tables

I'm trying to get 4 random results from a table that holds approx 7 million records. Additionally, I also want to get 4 random records from the same table that are filtered by category.
Now, as you would imagine doing random sorting on a table this large causes the queries to take a few seconds, which is not ideal.
One other method I thought of for the non-filtered result set would be to just get PHP to select some random numbers between 1 - 7,000,000 or so and then do an IN(...) with the query to only grab those rows - and yes, I know that this method has a caveat in that you may get less than 4 if a record with that id no longer exists.
However, the above method obviously will not work with the category filtering as PHP doesn't know which record numbers belong to which category and hence cannot select the record numbers to select from.
Are there any better ways I can do this? Only way I can think of would be to store the record id's for each category in another table and then select random results from that and then select only those record ID's from the main table in a secondary query; but I'm sure there is a better way!?
You could of course use the RAND() function on a query using a LIMIT and WHERE (for the category). That however as you pointed out, entails a scan of the database which takes time, especially in your case due to the volume of data.
Your other alternative, again as you pointed out, to store id/category_id in another table might prove a bit faster but again there has to be a LIMIT and WHERE on that table which will also contain the same amount of records as the master table.
A different approach (if applicable) would be to have a table per category and store in that the IDs. If your categories are fixed or do not change that often, then you should be able to use that approach. In that case you will effectively remove the WHERE from the clause and getting a RAND() with a LIMIT on each category table would be faster since each category table will contain a subset of records from your main table.
Some other alternatives would be to use a key/value pair database just for that operation. MongoDb or Google AppEngine can help with that and are really fast.
You could also go towards the approach of a Master/Slave in your MySQL. The slave replicates content in real time but when you need to perform the expensive query you query the slave instead of the master, thus passing the load to a different machine.
Finally you could go with Sphinx which is a lot easier to install and maintain. You can then treat each of those category queries as a document search and let Sphinx randomize the results. This way you offset this expensive operation to a different layer and let MySQL continue with other operations.
Just some issues to consider.
Working off your random number approach
Get the max id in the database.
Create a temp table to store your matches.
Loop n times doing the following
Generate a random number between 1 and maxId
Get the first record with a record Id greater than the random number and insert it into your temp table
Your temp table now contains your random results.
Or you could dynamically generate sql with a union to do the query in one step.
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
UNION
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
UNION
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
UNION
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
Note: my sql may not be valid, as I'm not a mySql guy, but the theory should be sound
First you need to get number of rows ... something like this
select count(1) from tbl where category = ?
then select a random number
$offset = rand(1,$rowsNum);
and select a row with offset
select * FROM tbl LIMIT $offset, 1
in this way you avoid missing ids. The only problem is you need to run second query several times. Union may help in this case.
For MySQl you can use
RAND()
SELECT column FROM table
ORDER BY RAND()
LIMIT 4

QUERY speed with limit and milion records

Hi i have a 7milion records db table for testing query speed.
I tested up my 2 queries which are the same query with different limit parametres:
query 1 -
SELECT *
FROM table
LIMIT 20, 50;
query 2 -
SELECT *
FROM table
LIMIT 6000000, 6000030;
query exec times are:
query 1 - 0.006 sec
query 2 - 5.500 sec
In both of these queries, I am fetching same number of records, but in the second case it's taking more time. Can someone please explain the reasons behind this?
Without looking into it too closely, my assumption is that this occurs because the first query only has to read to the 50th record to return results, whereas the second query has to read six million before returning results. Basically, the first query just shorts out quicker.
I would assume that this has an incredible amount to do with the makeup of the table - field types and keys, etc.
If a record is made up of fixed-length fields (e.g. CHAR vs. VARCHAR), then the DBMS can just calculate where the nth record starts and jumps there. If its variable length, then you would have to read the records to determine where the nth record starts. Similarly, I'd further assume that tables which have appropriate primary keys would be quicker to query than those without such keys.
I think the slowdown is tied to the fact you are using limits with offsets and are querying the table with no additional context for indexing. Its possible the first is just faster because it can get to the offset quicker.
It's the difference between returning 50 rows and 6000030 rows (or ~1million rows since you said there were only 7million rows).
With two arguments, the first argument specifies the offset of the
first row to return, and the second specifies the maximum number of
rows to return. The offset of the initial row is 0 (not 1):
SELECT * FROM tbl LIMIT 5,10; # Retrieve rows 6-15
http://dev.mysql.com/doc/refman/5.0/en/select.html
Also, I think you're looking for 30 row pages so your queries should be using 30 as the second parameter in the limit clause.
SELECT *
FROM table
LIMIT 20, 30;
SELECT *
FROM table
LIMIT 6000000, 30;

Is it possible to have 2 limits in a MySQL query?

Ok here is the situation (using PHP/MySQL) you are getting results from a large mysql table,
lets say your mysql query returns 10,000 matching results and you have a paging script to show 20 results per page, your query might look like this
So page 1 query
SELECT column
FROM table_name
WHERE userId=1
AND somethingelse=something else
LIMIT 0,20
So page 2 query
SELECT column
FROM table_name
WHERE userId=1
AND somethingelse=something else
LIMIT 20,40
Now you are grabbing 20 results at a time but there are a total of 10,000 rows that match your search criteria,
How can you only return 3,000 of the 10,000 results and still do your paging of 20 per page with a LIMIT 20 in your query?
I thought this was impossible but myspace does it on there browse page somehow, I know they aren't using php/mysql but how can it be achieved?
UPDATE
I see some people have replied with a couple of methods, it seems none of these would actually improve the performance by limiting the number to 3000?
Program your PHP so that when it finds itself ready to issue a query that ends with LIMIT 3000, 20 or higher, it would just stop and don't issue the query.
Or I am missing something?
Update:
MySQL treats LIMIT clause nicely.
Unless you have SQL_CALC_FOUND_ROWS in your query, MySQL just stops parsing results, sorting etc. as soon as it finds enough records to satisfy your query.
When you have something like that:
SELECT column
FROM table_name
WHERE userId=1
AND somethingelse='something else'
LIMIT 0, 20
, MySQL will fetch first 20 records that satisfy the criteria and stop.
Doesn't matter how many records match the criteria: 50 or 1,000,000, performance will be the same.
If you add an ORDER BY to your query and don't have an index, then MySQL will of course need to browse all the records to find out the first 20.
However, even in this case it will not sort all 10,000: it will have a "running window" of top 20 records and sort only within this window as soon as it finds a record with value large (or small) enough to get into the window.
This is much faster than sorting the whole myriad.
MySQL, however, is not good in pipelining recorsets. This means that this query:
SELECT column
FROM (
SELECT column
FROM table_name
WHERE userId=1
AND somethingelse='something else'
LIMIT 3000
)
LIMIT 0, 20
is worse performance-wise than the first one.
MySQL will fetch 3,000 records, cache them in a temporary table (or in memory) and apply the outer LIMIT only after that.
Firstly, the LIMIT paramters are Offset and number of records, so the second parameter should always be 20 - you don't need to increment this.
Surely if you know the upper limit of rows you want to retrieve, you can just put this into your logic which runs the query, i.e. check that Offset + Limit <= 3000
As Sohnee said, or (depending on your requirements) you can get all the 3000 records by SQL and then use array_slice in php to get chunks of the array.
You could achieve this with a subquery...
SELECT name FROM (
SELECT name FROM tblname LIMIT 0, 3000
) `Results` LIMIT 20, 40
Or with a temporary table, whereby you select all 3000 rows into a temp table then page by the temporary row id, which will be sequential.
You can specify the limit as a function of the page number (20*p, 20*p+2) in your php code, and limit the value of the page number to 150.
Or you could get the 3000 records and them using jquery tabs, split the records on 20 per page.

Categories