Mysql percent based query - php

Is this kind of mysql query possible?
SELECT power
FROM ".$table."
WHERE category IN ('convertible')
AND type = bwm40%
AND type = audi60%
ORDER BY RAND()
Would go something like this: from all the cars, select the power of the ones that are convertible, but 40% of the selection would be bmw's and the other 60% audi's.
Can this be done with mysql?
Can't seem to make it work with the ideea bellow, gives me an error, here is how I tried it:
$result = mysql_query("
SELECT power, torque FROM ".$table."
WHERE category IN ('convertible')
ORDER BY (case type when 'bmw' then 0.4 when 'audi' then 0.6) * RAND() DESC
LIMIT ".$offset.", ".$rowsperpage."");

You could try adjusting the randomness using a CASE:
SELECT power
FROM table
WHERE category IN ('convertible')
AND type IN ('bwm', 'audi')
ORDER BY (case type when 'bwm' then Wbwm when 'audi' then Waudi) * RAND() DESC
Where Wbmw and Waudi are weighting factors. Then you'd add a LIMIT clause to chop off the results at your desired size. That won't guarantee your desired proportions but it might be good enough for your purposes.
You'd want to play with the weighting factors (Wbmw and Waudi above) a bit to get the results you want. The weighting factors would depend on frequencies of bwm and audi in your database so 0.2 and 0.8, for example, might work better. As Chris notes in the comments, 0.4 and 0.6 would only work if you have a 50/50 split between BMW and Audi. Putting the weights in a separate table would make this approach easier to maintain and the SQL would be prettier.

Doubt this can be done properly in a single statement. Personally I would:
Calculate the COUNT() for each car type, grab them together in a query.
Retrieve both car types separately using sub-queries with LIMIT set to the correct amount and offset based on the percentage desired (so if you want 20 results total, starting at 40, and BMW if 40%, then the limit would be 8 results starting at 16 - they need to be integer values)
Using a UNION to combine the results, ORDER BY RAND() to mix them together.
That's only two actual queries, one for the counts, one combined query for the results, you could combine them in a stored procedure if performance is that much of an issue.
You could combine them using a statement prepare/execute from the results - have a look at this method from a possible duplicate question.

Related

Pagination: 2 Queries(row-count and data) or 1 larger query

As I do not know anything about speed and complexity of php and mysql(i) scripts, I had this question:
I have a database with 3 tables:
'Products' with about 9 fields. Containing data of products, like 'long' content text.
'Categories' with 2 fields. Containing name of categories
'Productcategories' with 2 fields. Containing which product has which categories. Each product is part of 1-3 categories.
In order to set up pagination (I need row_count because I wish to know what the last page is), I was wondering what the most sufficient way to do it is, and or it depends on the amount of products (50, 100, 500?). The results returned depends on a chosen category:
"SELECT * FROM `productcategories`
JOIN products ON products.proID = productcategories.proID
WHERE productcategories.catID =$category";
Idea 1:
1 query which only selects 1 field, instead of all. And then counts the total rows for my pagination with mysqli_num_rows().
A second query which directly selects 5 or 10 (with LIMIT I expect) products to be actually shown.
Idea 2:
Only 1 query (above), on which you use mysqli_nuw_rows() for row count and later on, filter out the rows you want to show.
I do not know which is the best. Idea 1 seems faster as you have to select a lot less data, but I do not know or the 2 queries needed influence the speed a lot? Which is the fastest: collecting 'big' amounts of data or doing queries?
Feel free to correct me if I am completely on the wrong path with my ideas.
It is generally considered best practice to return as little data as possible so the short answer is to use the two queries.
However, MySQL does provide one interesting function that will allow you to return the row count that would have been returned without the limit clause:
FOUND_ROWS()
Just keep in mind not all dbms' implement this, so use with care.
Example:
mysql> SELECT SQL_CALC_FOUND_ROWS * FROM tbl_name
-> WHERE id > 100 LIMIT 10;
mysql> SELECT FOUND_ROWS();
Use select count(1) as count... for the total number of rows. Then select data as needed for pagination with limit 0,10 or something like that.
Also for total count you don't need to join to the products or categories tables as that would only be used for displaying extra info.
"SELECT count(1) as count FROM `productcategories` WHERE catID=$category";
Then for data:
"SELECT * FROM `productcategories`
JOIN categories ON categories.catID = productcategories.catID
JOIN products ON products.proID = productcategories.proID
WHERE productcategories.catID=$category limit 0,10";
Replacing * with actual fields needed would be better though.

mysql order by rand() performance issue and solution

i was using order by rand() to generate random rows from database without any issue but i reaalised that as the database size increase this rand() causes heavy load on server so i was looking for an alternative and i tried by generating one random number using php rand() function and put that as id in mysql query and it was very very fast since mysql was knowing the row id
but the issue is in my table all numbers are not availbale.for example 1,2,5,9,12 like that.
if php rand() generate number 3,4 etc the query will be blank as there is no id with number 3 , 4 etc.
what is the best way to generate random numbers preferable from php but it should generate the available no in that table so it must check that table.please advise.
$id23=rand(1,100000000);
SELECT items FROM tablea where status='0' and id='$id23' LIMIT 1
the above query is fast but generate sometimes no which is not availabel in database.
SELECT items FROM tablea where status=0 order by rand() LIMIT 1
the above query is too slow and causes heavy load on server
First of, all generate a random value from 1 to MAX(id), not 100000000.
Then there are at least a couple of good solutions:
Use > not =
SELECT items FROM tablea where status='0' and id>'$id23' LIMIT 1
Create an index on (status,id,items) to make this an index-only query.
Use =, but just try again with a different random value if you don't find a hit. Sometimes it will take several tries, but often it will take only one try. The = should be faster since it can use the primary key. And if it's faster and gets it in one try 90% of the time, that could make up for the other 10% of the time when it takes more than one try. Depends on how many gaps you have in your id values.
Use your DB to find the max value from the table, generate a random number less than or equal to that value, grab the first row in which the id is greater than or equal to your random number. No PHP necessary.
SELECT items
FROM tablea
WHERE status = '0' and
id >= FLOOR(1 + RAND() * (SELECT MAX(id) FROM tablea))
LIMIT 1
You are correct, ORDER BY RAND() is not good solution if you are dealing with large datasets. Depending how often it needs to be randomized, what you can do is generate a column with a random number and then update that number at some predefined interval.
You would take that column and use it as your sort index. This works well for a heavy read environment and produces predicable random order for a certain period of time.
A possible solution is to use limit:
$id23=rand(1,$numberOfRows);
SELECT items FROM tablea where status='0' LIMIT $id23 1
This wont produce any missed rows (but as hek2mgl mentioned) requires knowing the number of rows in the select.

Getting random results from large tables

I'm trying to get 4 random results from a table that holds approx 7 million records. Additionally, I also want to get 4 random records from the same table that are filtered by category.
Now, as you would imagine doing random sorting on a table this large causes the queries to take a few seconds, which is not ideal.
One other method I thought of for the non-filtered result set would be to just get PHP to select some random numbers between 1 - 7,000,000 or so and then do an IN(...) with the query to only grab those rows - and yes, I know that this method has a caveat in that you may get less than 4 if a record with that id no longer exists.
However, the above method obviously will not work with the category filtering as PHP doesn't know which record numbers belong to which category and hence cannot select the record numbers to select from.
Are there any better ways I can do this? Only way I can think of would be to store the record id's for each category in another table and then select random results from that and then select only those record ID's from the main table in a secondary query; but I'm sure there is a better way!?
You could of course use the RAND() function on a query using a LIMIT and WHERE (for the category). That however as you pointed out, entails a scan of the database which takes time, especially in your case due to the volume of data.
Your other alternative, again as you pointed out, to store id/category_id in another table might prove a bit faster but again there has to be a LIMIT and WHERE on that table which will also contain the same amount of records as the master table.
A different approach (if applicable) would be to have a table per category and store in that the IDs. If your categories are fixed or do not change that often, then you should be able to use that approach. In that case you will effectively remove the WHERE from the clause and getting a RAND() with a LIMIT on each category table would be faster since each category table will contain a subset of records from your main table.
Some other alternatives would be to use a key/value pair database just for that operation. MongoDb or Google AppEngine can help with that and are really fast.
You could also go towards the approach of a Master/Slave in your MySQL. The slave replicates content in real time but when you need to perform the expensive query you query the slave instead of the master, thus passing the load to a different machine.
Finally you could go with Sphinx which is a lot easier to install and maintain. You can then treat each of those category queries as a document search and let Sphinx randomize the results. This way you offset this expensive operation to a different layer and let MySQL continue with other operations.
Just some issues to consider.
Working off your random number approach
Get the max id in the database.
Create a temp table to store your matches.
Loop n times doing the following
Generate a random number between 1 and maxId
Get the first record with a record Id greater than the random number and insert it into your temp table
Your temp table now contains your random results.
Or you could dynamically generate sql with a union to do the query in one step.
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
UNION
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
UNION
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
UNION
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
Note: my sql may not be valid, as I'm not a mySql guy, but the theory should be sound
First you need to get number of rows ... something like this
select count(1) from tbl where category = ?
then select a random number
$offset = rand(1,$rowsNum);
and select a row with offset
select * FROM tbl LIMIT $offset, 1
in this way you avoid missing ids. The only problem is you need to run second query several times. Union may help in this case.
For MySQl you can use
RAND()
SELECT column FROM table
ORDER BY RAND()
LIMIT 4

Select is much slower when selecting latest records

A table with about 70K records is displayed on a site, showing 50 records per page.
Pagination is done with limit offset,50 on the query, and the records can be ordered on different columns.
Browsing the latest pages (so the offset is around 60,000) makes the queries much slower than when browsing the first pages (about 10x)
Is this an issue of using the limit command?
Are there other ways to get the same results?
With large offsets, MySQL needs to browse more records.
Even if the plan uses filesort (which means that all records should be browsed), MySQL optimizes it so that only $offset + $limit top records are sorted, which makes it much more efficient for lower values of $offset.
The typical solution is to index the columns you are ordering on, record the last value of the columns and reuse it in the subsequent queries, like this:
SELECT *
FROM mytable
ORDER BY
value, id
LIMIT 0, 10
which outputs:
value id
1 234
3 57
4 186
5 457
6 367
8 681
10 366
13 26
15 765
17 345 -- this is the last one
To get to the next page, you would use:
SELECT *
FROM mytable
WHERE (value, id) > (17, 345)
ORDER BY
value, id
LIMIT 0, 10
, which uses the index on (value, id).
Of course this won't help with arbitrary access pages, but helps with sequential browsing.
Also, MySQL has certain issues with late row lookup. If the columns are indexed, it may be worth trying to rewrite your query like this:
SELECT *
FROM (
SELECT id
FROM mytable
ORDER BY
value, id
LIMIT $offset, $limit
) q
JOIN mytable m
ON m.id = q.id
See this article for more detailed explanations:
MySQL ORDER BY / LIMIT performance: late row lookups
It's how MySQL deals with limits. If it can sort on an index (and the query is simple enough) it can stop searching after finding the first offset + limit rows. So LIMIT 0,10 means that if the query is simple enough, it may only need to scan 10 rows. But LIMIT 1000,10 means that at minimum it needs to scan 1010 rows. Of course, the actual number of rows that need to be scanned depend on a host of other factors. But the point here is that the lower the limit + offset, the lower that the lower-bound on the number of rows that need to be scanned is...
As for workarounds, I would optimize your queries so that the query itself without the LIMIT clause is as efficient as possible. EXPLAIN is you friend in this case...

MySQL performance – Multiple queries or one inefficient query?

I have three tables, each contain some common information, and some information that is unique to the table.
For example: uid, date are universal among the tables, but one table can contain a column type while the other contains currency.
I need to query the database and get the last 20 entries (date DESC) that have been entered in all three tables.
My options are:
Query the database once, with one large query, containing three UNION ALL clauses, and pass along fake values for columns, IE:
FROM (
SELECT uid, date, currency, 0, 0, 0
and later on
FROM (
SELECT uid, date, 0, type, 0, 0
This would leave me with allot of null-valued fields..
OR I can query the database three times, and somehow within PHP sort through the information to get the combined latest 20 posts. This would leave me with an excess of information - 60 posts to look through (LIMIT 20) * 3 - and force me to preform some type of addtional quicksort every time.
What option is better/any alternate ideas?
Thanks.
Those two options are more similar than you make it sound.
When you perform the single large query with UNIONs, MySQL will still be performing three separate queries, just as you propose doing in your alternative plan, and then combining them into a single result.
So, you can either let MySQL do the filtering (and LIMIT) for you, or you can do it yourself. Given that choice, letting MySQL do all the work sounds far preferable.
Having extra columns in the result set could theoretically hinder performance, but with so small a result set as your 20 rows, I wouldn't expect it to have any detectable impact.
It all depends of how big your tables are. If each table has a few thousands records, you can go with the first solution (UNION), and you'll be fine.
On bigger tables, I'd probably go with the second solution, mostly because it will use much less ressources (RAM) than the UNION way, and still be reasonably fast.
But I would advise you to think about your data model, and maybe optimize it. The fact you have to use UNION-based queries usually means there's room for optimization, typically by merging the three tables, with an added "type" field (names isn't good at all, but you see my point).
if you know your limits you can limit each query and had union only run on little data. this should be better as mysql will return only 20 rows and will make the sorting faster then you can in php...
select * from (
SELECT uid, date, currency, 0, 0, 0 from table_a order by date desc limit 20
union
SELECT uid, date, 0, type, 0, 0 from table_b order by date desc limit 20
...
) order by date desc limit 20

Categories