Pagination: 2 Queries(row-count and data) or 1 larger query - php

As I do not know anything about speed and complexity of php and mysql(i) scripts, I had this question:
I have a database with 3 tables:
'Products' with about 9 fields. Containing data of products, like 'long' content text.
'Categories' with 2 fields. Containing name of categories
'Productcategories' with 2 fields. Containing which product has which categories. Each product is part of 1-3 categories.
In order to set up pagination (I need row_count because I wish to know what the last page is), I was wondering what the most sufficient way to do it is, and or it depends on the amount of products (50, 100, 500?). The results returned depends on a chosen category:
"SELECT * FROM `productcategories`
JOIN products ON products.proID = productcategories.proID
WHERE productcategories.catID =$category";
Idea 1:
1 query which only selects 1 field, instead of all. And then counts the total rows for my pagination with mysqli_num_rows().
A second query which directly selects 5 or 10 (with LIMIT I expect) products to be actually shown.
Idea 2:
Only 1 query (above), on which you use mysqli_nuw_rows() for row count and later on, filter out the rows you want to show.
I do not know which is the best. Idea 1 seems faster as you have to select a lot less data, but I do not know or the 2 queries needed influence the speed a lot? Which is the fastest: collecting 'big' amounts of data or doing queries?
Feel free to correct me if I am completely on the wrong path with my ideas.

It is generally considered best practice to return as little data as possible so the short answer is to use the two queries.
However, MySQL does provide one interesting function that will allow you to return the row count that would have been returned without the limit clause:
FOUND_ROWS()
Just keep in mind not all dbms' implement this, so use with care.
Example:
mysql> SELECT SQL_CALC_FOUND_ROWS * FROM tbl_name
-> WHERE id > 100 LIMIT 10;
mysql> SELECT FOUND_ROWS();

Use select count(1) as count... for the total number of rows. Then select data as needed for pagination with limit 0,10 or something like that.
Also for total count you don't need to join to the products or categories tables as that would only be used for displaying extra info.
"SELECT count(1) as count FROM `productcategories` WHERE catID=$category";
Then for data:
"SELECT * FROM `productcategories`
JOIN categories ON categories.catID = productcategories.catID
JOIN products ON products.proID = productcategories.proID
WHERE productcategories.catID=$category limit 0,10";
Replacing * with actual fields needed would be better though.

Related

How do I improve the speed of these PHP MySQLi queries without indexing?

Lets start by saying that I cant use INDEXING as I need the INSERT, DELETE and UPDATE for this table to be super fast, which they are.
I have a page that displays a summary of order units collected in a database table. To populate the table an order number is created and then individual units associated with that order are scanned into the table to recored which units are associated with each order.
For the purposes of this example the table has the following columns.
id, UID, order, originator, receiver, datetime
The individual unit quantities can be in the 1000's per order and the entire table is growing to hundreds of thousands of units.
The summary page displays the number of units per order and the first and last unit number for each order. I limit the number of orders to be displayed to the last 30 order numbers.
For example:
Order 10 has 200 units. first UID 1510 last UID 1756
Order 11 has 300 units. first UID 1922 last UID 2831
..........
..........
Currently the response time for the query is about 3 seconds as the code performs the following:
Look up the last 30 orders by by id and sort by order number
While looking at each order number in the array
-- Count the number of database rows that have that order number
-- Select the first UID from all the rows as first
-- Select the last UID from all the rows as last
Display the result
I've determined the majority of the time is taken by the Count of the number of units in each order ~1.8 seconds and then determining the first and last numbers in each order ~1 second.
I am really interested in if there is a way to speed up these queries without INDEXING. Here is the code with the queries.
First request selects the last 30 orders processed selected by id and grouped by order number. This gives the last 30 unique order numbers.
$result = mysqli_query($con, "SELECT order, ANY_VALUE(receiver) AS receiver, ANY_VALUE(originator) AS originator, ANY_VALUE(id) AS id
FROM scandb
GROUP BY order
ORDER BY id
DESC LIMIT 30");
While fetching the last 30 order numbers count the number of units and the first and last UID for each order.
while($row=mysqli_fetch_array($result)){
$count = mysqli_fetch_array(mysqli_query($con, "SELECT order, COUNT(*) as count FROM scandb WHERE order ='".$row['order']."' "));
$firstLast = mysqli_fetch_array(mysqli_query($con, "SELECT (SELECT UID FROM scandb WHERE orderNumber ='".$row['order']."' ORDER BY UID LIMIT 1) as 'first', (SELECT UID FROM barcode WHERE order ='".$row['order']."' ORDER BY UID DESC LIMIT 1) as 'last'"));
echo "<td align= center>".$count['count']."</td>";
echo "<td align= center>".$firstLast['first']."</td>";
echo "<td align= center>".$firstLast['last']."</td>";
}
With 100K lines in the database this whole query is taking about 3 seconds. The majority of the time is in the $count and $firstlast queries. I'd like to know if there is a more efficient way to get this same data in a faster time without Indexing the table. Any special tricks that anyone has would be greatly appreciated.
Design your database with caution
This first tip may seems obvious, but the fact is that most database problems come from badly-designed table structure.
For example, I have seen people storing information such as client info and payment info in the same database column. For both the database system and developers who will have to work on it, this is not a good thing.
When creating a database, always put information on various tables, use clear naming standards and make use of primary keys.
Know what you should optimize
If you want to optimize a specific query, it is extremely useful to be able to get an in-depth look at the result of a query. Using the EXPLAIN statement, you will get lots of useful info on the result produced by a specific query, as shown in the example below:
EXPLAIN SELECT * FROM ref_table,other_table WHERE ref_table.key_column=other_table.column;
Don’t select what you don’t need
A very common way to get the desired data is to use the * symbol, which will get all fields from the desired table:
SELECT * FROM wp_posts;
Instead, you should definitely select only the desired fields as shown in the example below. On a very small site with, let’s say, one visitor per minute, that wouldn’t make a difference. But on a site such as Cats Who Code, it saves a lot of work for the database.
SELECT title, excerpt, author FROM wp_posts;
Avoid queries in loops
When using SQL along with a programming language such as PHP, it can be tempting to use SQL queries inside a loop. But doing so is like hammering your database with queries.
This example illustrates the whole “queries in loops” problem:
foreach ($display_order as $id => $ordinal) {
$sql = "UPDATE categories SET display_order = $ordinal WHERE id = $id";
mysql_query($sql);
}
Here is what you should do instead:
UPDATE categories
SET display_order = CASE id
WHEN 1 THEN 3
WHEN 2 THEN 4
WHEN 3 THEN 5
END
WHERE id IN (1,2,3)
Use join instead of subqueries
As a programmer, subqueries are something that you can be tempted to use and abuse. Subqueries, as show below, can be very useful:
SELECT a.id,
(SELECT MAX(created)
FROM posts
WHERE author_id = a.id)
AS latest_post FROM authors a
Although subqueries are useful, they often can be replaced by a join, which is definitely faster to execute.
SELECT a.id, MAX(p.created) AS latest_post
FROM authors a
INNER JOIN posts p
ON (a.id = p.author_id)
GROUP BY a.id
Source: http://20bits.com/articles/10-tips-for-optimizing-mysql-queries-that-dont-suck/

Mysql query one random?

I have issuse with displaying random item from table, is there way to display just one random item, but not the item i preffered to not to be displayed?
I have table PRODUCT
and rows
ID
TITLE
URL_TITLE
and
$link = "coffee";
I want to display random just one product from table PRODUCT, but not the the same i have in $link
What i want to say i just want random item from table, but not when in $link="coffee" to get that random element
p.s
$link in in URL_TITLE row :)
This should help:
SELECT ID, TITLE, URL_TITLE
FROM PRODUCT
WHERE URL_TITLE != "coffee"
ORDER BY RAND()
LIMIT 1;
Note that in some versions of SQL, != is written <>.
Obviously, you want to select all rows or a different subset, just use SELECT * or whatever you need.
Edit: as per HamZa's comment and James' answer, using ORDER BY RAND() is bad practice on large tables. You could potentially generate a random ID and select that, checking that it's not your illegal one, but if you have a whole bunch that you can't select, you could potentially call this query a ton of times (which is bad).
Using rand() in queries is not ideal, especially in large tables, but even small ones as you never know when they'll grow (ie site or service exceeds your expectations etc).
You can use it like this:
select item from product where item != '$link' ORDER BY rand() LIMIT 1
But you should use something better, like temp tables, or you could select all the IDs from the database in one query, use PHP to select a random one from them all, then use the ID selected by PHP and grab the data from the database.
SELECT id
FROM PRODUCT
WHERE URL_TITLE != "coffee"
ORDER BY RAND()
LIMIT 1;
Although I don't know PHP, I think a simple way to do it is to pass the $link value as part of a where condition.
Using 'pure' SQL:
select *
from product
where URL_TITLE <> 'coffee'
order by rand()
limit 1

Getting random results from large tables

I'm trying to get 4 random results from a table that holds approx 7 million records. Additionally, I also want to get 4 random records from the same table that are filtered by category.
Now, as you would imagine doing random sorting on a table this large causes the queries to take a few seconds, which is not ideal.
One other method I thought of for the non-filtered result set would be to just get PHP to select some random numbers between 1 - 7,000,000 or so and then do an IN(...) with the query to only grab those rows - and yes, I know that this method has a caveat in that you may get less than 4 if a record with that id no longer exists.
However, the above method obviously will not work with the category filtering as PHP doesn't know which record numbers belong to which category and hence cannot select the record numbers to select from.
Are there any better ways I can do this? Only way I can think of would be to store the record id's for each category in another table and then select random results from that and then select only those record ID's from the main table in a secondary query; but I'm sure there is a better way!?
You could of course use the RAND() function on a query using a LIMIT and WHERE (for the category). That however as you pointed out, entails a scan of the database which takes time, especially in your case due to the volume of data.
Your other alternative, again as you pointed out, to store id/category_id in another table might prove a bit faster but again there has to be a LIMIT and WHERE on that table which will also contain the same amount of records as the master table.
A different approach (if applicable) would be to have a table per category and store in that the IDs. If your categories are fixed or do not change that often, then you should be able to use that approach. In that case you will effectively remove the WHERE from the clause and getting a RAND() with a LIMIT on each category table would be faster since each category table will contain a subset of records from your main table.
Some other alternatives would be to use a key/value pair database just for that operation. MongoDb or Google AppEngine can help with that and are really fast.
You could also go towards the approach of a Master/Slave in your MySQL. The slave replicates content in real time but when you need to perform the expensive query you query the slave instead of the master, thus passing the load to a different machine.
Finally you could go with Sphinx which is a lot easier to install and maintain. You can then treat each of those category queries as a document search and let Sphinx randomize the results. This way you offset this expensive operation to a different layer and let MySQL continue with other operations.
Just some issues to consider.
Working off your random number approach
Get the max id in the database.
Create a temp table to store your matches.
Loop n times doing the following
Generate a random number between 1 and maxId
Get the first record with a record Id greater than the random number and insert it into your temp table
Your temp table now contains your random results.
Or you could dynamically generate sql with a union to do the query in one step.
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
UNION
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
UNION
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
UNION
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
Note: my sql may not be valid, as I'm not a mySql guy, but the theory should be sound
First you need to get number of rows ... something like this
select count(1) from tbl where category = ?
then select a random number
$offset = rand(1,$rowsNum);
and select a row with offset
select * FROM tbl LIMIT $offset, 1
in this way you avoid missing ids. The only problem is you need to run second query several times. Union may help in this case.
For MySQl you can use
RAND()
SELECT column FROM table
ORDER BY RAND()
LIMIT 4

Mysql percent based query

Is this kind of mysql query possible?
SELECT power
FROM ".$table."
WHERE category IN ('convertible')
AND type = bwm40%
AND type = audi60%
ORDER BY RAND()
Would go something like this: from all the cars, select the power of the ones that are convertible, but 40% of the selection would be bmw's and the other 60% audi's.
Can this be done with mysql?
Can't seem to make it work with the ideea bellow, gives me an error, here is how I tried it:
$result = mysql_query("
SELECT power, torque FROM ".$table."
WHERE category IN ('convertible')
ORDER BY (case type when 'bmw' then 0.4 when 'audi' then 0.6) * RAND() DESC
LIMIT ".$offset.", ".$rowsperpage."");
You could try adjusting the randomness using a CASE:
SELECT power
FROM table
WHERE category IN ('convertible')
AND type IN ('bwm', 'audi')
ORDER BY (case type when 'bwm' then Wbwm when 'audi' then Waudi) * RAND() DESC
Where Wbmw and Waudi are weighting factors. Then you'd add a LIMIT clause to chop off the results at your desired size. That won't guarantee your desired proportions but it might be good enough for your purposes.
You'd want to play with the weighting factors (Wbmw and Waudi above) a bit to get the results you want. The weighting factors would depend on frequencies of bwm and audi in your database so 0.2 and 0.8, for example, might work better. As Chris notes in the comments, 0.4 and 0.6 would only work if you have a 50/50 split between BMW and Audi. Putting the weights in a separate table would make this approach easier to maintain and the SQL would be prettier.
Doubt this can be done properly in a single statement. Personally I would:
Calculate the COUNT() for each car type, grab them together in a query.
Retrieve both car types separately using sub-queries with LIMIT set to the correct amount and offset based on the percentage desired (so if you want 20 results total, starting at 40, and BMW if 40%, then the limit would be 8 results starting at 16 - they need to be integer values)
Using a UNION to combine the results, ORDER BY RAND() to mix them together.
That's only two actual queries, one for the counts, one combined query for the results, you could combine them in a stored procedure if performance is that much of an issue.
You could combine them using a statement prepare/execute from the results - have a look at this method from a possible duplicate question.

Possible to limit results returned, yet leave 'pagination' to table?

I am building a php site using jquery and the DataTables plugin. My page is laid out just as it needs to be with pagination working but in dealing with large datasets, I have noticed the server is pulling ALL returned rows as opposed to the 10 row (can be more) limit stated within each 'page'.
Is it possible to limit the results of a query and yet keep say the ID numbers of those results in memory so when page 2 is hit (or the result number is changed) only new data is sought after?
Does it even make sense to do it this way?
I just dont want to query a DB with 2000 rows returned then have a 'front-end-plugin' make it look like the other results are hidden when they are truthfully on the page from the start.
The LIMIT clause in SQL has two parts -- the limit and the offset.
To get the first 10 rows:
SELECT ... LIMIT 0,10
To get the next 10 rows:
SELECT ... LIMIT 10,10
To get the next 10 rows:
SELECT ... LIMIT 20,10
As long as you ORDER the result set the same each time, you absolutely don't have to (and don't want to) first ask the database to send you all 2000 rows.
To display paging in conjunction with this, you still need to know how many total rows match your query. There are two ways to handle that --
1) Ask for a row count with a separate query
SELECT COUNT(*) FROM table WHERE ...
2) Use the SQL_CALC_FOUND_ROWS hint in your query, which will tell MySQL to calculate how many rows match the query before returning only the 10 you asked for. You then issue a SELECT FOUND_ROWS() query to get that result.
SELECT SQL_CALC_FOUND_ROWS column1, column2 ... LIMIT 0,10
2 is preferable since it does not add an extra query to each page load.

Categories