mysqli query with 50% within parameter and other half random - php

So , I'm struggling here with a question:
Is it possible to make a query in my sql similar to this:
at least 20% of the results from category A
at least 15% of the results from category B
at least 10% of the results from category C
at least 10% of the results from category D
and the other 50% random ( category a,b,c,d,e,f,g, and so on ) ?
I've tried to search a while but didn't find a good answer so I'm hoping that any one here can give a good hint.
Thanks in advance!!
Before anyone asks and give me a thumbs down, I'm asking this so I can profile my website visitors. Its not at all a random question.
PS: the php tag is because sometimes I use php functions to solve this kind of problems and the website is php based

I am not sure what exactly are you trying to do but I think you can solve your problem with different approach , but anyway i thought about your question and the only idea came in my mind is the following :
*supposing your have a total 5000 rows and you want to SELECT only 50 rows of them where these 50 rows distributing according to your percentage .
$limit = 50;
$cat_a_per = $limit *0.2; // 20% of the results
$cat_b_per = $limit *0.1; // 10% instead of yours 15% because 15% is incorrect ( try to sum percentages up :) )
$cat_c_per = $limit *0.1; // 10% of the results
$cat_d_per = $limit *0.1; // 10% of the results
$rest_per = $limit*0.5; // the rest 50%
// Now create a 5 mysql queries like the following :
"Select * From my_table where cat='A' limit $cat_a_per" ..
"Select * From my_table where cat='B' limit $cat_b_per" ..
"Select * From my_table where cat='C' limit $cat_c_per" ..
"Select * From my_table where cat='D' limit $cat_d_per" ..
"Select * From my_table limit $rest_per" ..
Now sum up the results in one array or use UNION and you are good to go ...

Related

How to Limit MySQL query based on total records count percentage [duplicate]

Let's say I have a list of values, like this:
id value
----------
A 53
B 23
C 12
D 72
E 21
F 16
..
I need the top 10 percent of this list - I tried:
SELECT id, value
FROM list
ORDER BY value DESC
LIMIT COUNT(*) / 10
But this doesn't work. The problem is that I don't know the amount of records before I do the query. Any idea's?
Best answer I found:
SELECT*
FROM (
SELECT list.*, #counter := #counter +1 AS counter
FROM (select #counter:=0) AS initvar, list
ORDER BY value DESC
) AS X
where counter <= (10/100 * #counter);
ORDER BY value DESC
Change the 10 to get a different percentage.
In case you are doing this for an out of order, or random situation - I've started using the following style:
SELECT id, value FROM list HAVING RAND() > 0.9
If you need it to be random but controllable you can use a seed (example with PHP):
SELECT id, value FROM list HAVING RAND($seed) > 0.9
Lastly - if this is a sort of thing that you need full control over you can actually add a column that holds a random value whenever a row is inserted, and then query using that
SELECT id, value FROM list HAVING `rand_column` BETWEEN 0.8 AND 0.9
Since this does not require sorting, or ORDER BY - it is O(n) rather than O(n lg n)
You can also try with that:
SET #amount =(SELECT COUNT(*) FROM page) /10;
PREPARE STMT FROM 'SELECT * FROM page LIMIT ?';
EXECUTE STMT USING #amount;
This is MySQL bug described in here: http://bugs.mysql.com/bug.php?id=19795
Hope it'll help.
I realize this is VERY old, but it still pops up as the top result when you google SQL limit by percent so I'll try to save you some time. This is pretty simple to do these days. The following would give the OP the results they need:
SELECT TOP 10 PERCENT
id,
value
FROM list
ORDER BY value DESC
To get a quick and dirty random 10 percent of your table, the following would suffice:
SELECT TOP 10 PERCENT
id,
value
FROM list
ORDER BY NEWID()
I have an alternative which hasn't been mentionned in the other answers: if you access from any language where you have full access to the MySQL API (i.e. not the MySQL CLI), you can launch the query, ask how many rows there will be and then break the loop if it is time.
E.g. in Python:
...
maxnum = cursor.execute(query)
for num, row in enumerate(query)
if num > .1 * maxnum: # Here I break the loop if I got 10% of the rows.
break
do_stuff...
This works only with mysql_store_result(), not with mysql_use_result(), as the latter requires that you always accept all needed rows.
OTOH, the traffic for my solution might be too high - all rows have to be transferred.

Pagination: 2 Queries(row-count and data) or 1 larger query

As I do not know anything about speed and complexity of php and mysql(i) scripts, I had this question:
I have a database with 3 tables:
'Products' with about 9 fields. Containing data of products, like 'long' content text.
'Categories' with 2 fields. Containing name of categories
'Productcategories' with 2 fields. Containing which product has which categories. Each product is part of 1-3 categories.
In order to set up pagination (I need row_count because I wish to know what the last page is), I was wondering what the most sufficient way to do it is, and or it depends on the amount of products (50, 100, 500?). The results returned depends on a chosen category:
"SELECT * FROM `productcategories`
JOIN products ON products.proID = productcategories.proID
WHERE productcategories.catID =$category";
Idea 1:
1 query which only selects 1 field, instead of all. And then counts the total rows for my pagination with mysqli_num_rows().
A second query which directly selects 5 or 10 (with LIMIT I expect) products to be actually shown.
Idea 2:
Only 1 query (above), on which you use mysqli_nuw_rows() for row count and later on, filter out the rows you want to show.
I do not know which is the best. Idea 1 seems faster as you have to select a lot less data, but I do not know or the 2 queries needed influence the speed a lot? Which is the fastest: collecting 'big' amounts of data or doing queries?
Feel free to correct me if I am completely on the wrong path with my ideas.
It is generally considered best practice to return as little data as possible so the short answer is to use the two queries.
However, MySQL does provide one interesting function that will allow you to return the row count that would have been returned without the limit clause:
FOUND_ROWS()
Just keep in mind not all dbms' implement this, so use with care.
Example:
mysql> SELECT SQL_CALC_FOUND_ROWS * FROM tbl_name
-> WHERE id > 100 LIMIT 10;
mysql> SELECT FOUND_ROWS();
Use select count(1) as count... for the total number of rows. Then select data as needed for pagination with limit 0,10 or something like that.
Also for total count you don't need to join to the products or categories tables as that would only be used for displaying extra info.
"SELECT count(1) as count FROM `productcategories` WHERE catID=$category";
Then for data:
"SELECT * FROM `productcategories`
JOIN categories ON categories.catID = productcategories.catID
JOIN products ON products.proID = productcategories.proID
WHERE productcategories.catID=$category limit 0,10";
Replacing * with actual fields needed would be better though.

php paging and the use of limit clause

Imagine you got a 1m record table and you want to limit the search results down to say 10,000 and not more than that. So what do I use for that? Well, the answer is use the limit clause.
example
select recid from mytable order by recid asc limit 10000
This is going to give me the last 10,000 records entered into this table.
So far no paging.
But the limit phrase is already in use.
That brings to question to the next level.
What if I want to page thru this record particular record set 100 recs at a time? Since the limit phrase is already part of the original query, how do I use it again, this time to take care of the paging?
If the org. query did not have a limit clause to begin with, I'd adjust it as limit 0,100 and then adjusting it as limit 100,100 and then limit 200,100 and so on while the paging takes it course. But at this time, I cannot.
You almost think you'd want to use two limit phrases one after the other - which is not not gonna work.
limit 10000 limit 0,100 for sure it would error out.
So what's the solution in this particular case?
[EDIT]
After I post this question, I realized that the limit 10000 in the org. query becomes meaningless once the paging routine kicks in. I somehow confused myself and though that order by recid asc limit 10000 in its entirety is part of the where clause. In reality, the limit 10000 portion has no bearing in the recordset content - other than taking care of the confining the recordset to the requested limit. So, there is no point of keeping the limit 10000 once the paging starts. I'm sorry for wasting your time with this. :(
I'd say get rid of the first limit, then don't bother doing a count of the table, or take the lesser of the count and your limit, i.e. 10000, and do the pagination based on that.
i.e.
$perpage = 100;
$pages = $totalcount/$perpage;
$page = $get['Page'];
if($page > $pages || $page < 0)
{
$page = 0;
}
$limit = "LIMIT " . ($page * $perpage) . ", " . $perpage;
To calculate totalcount, do
SELECT COUNT(*) FROM mytable
then check it against your limit, i.e.
if($totalcount > 10000)
{
$totalcount = 10000;
}
The reason to do a dedicated count query is that it requires very little DB to PHP data transfer, and many DBMS's can optimize the crap out of it compared to a full table SELECT query.
LIMIT can have two arguments, the first being the offset and the second being how many records to return. So
LIMIT 5,10
Will skip the first 5 records then fetch the next 10.
You will have to set your limit based on the current page. Something like
LIMIT (CURRENT_PAGE -1) * PAGE_SIZE , PAGE_SIZE
So if you had ten records per page and were on page 2 you would skip the first ten records and grab the next ten.
The offset suggestion is a great one and you should probably use that. But if for some reason offset doesn't fit your needs (say someone inserting a new record would shift your page slightly) you could also add a where recid > #### clause to your query. This is how you would paginate when working with Twitter API.
Here is an example in PHP.
<?php
$query = 'select recid from mytable';
if(isset($_GET['recid'])&&$_GET['recid']!=''){
$query = $query.' where recid > '.$_GET['recid'];
}
$query = $query.' order by recid asc limit 10000';
//LOG INTO MYSQL
$result = mysql_query($query);
$last_id = '';
while ($row = mysql_fetch_assoc($result)) {
//DO YOUR DISPLAY WORK
$last_id = $row['recid'];
}
echo '<a href="?recid='.$last_id.'>Next Page</a>';
?>
Again, a bit more complicated than needs to be but will return set pages.
You can use the offset variable. If this is your full query then You could use:
select recid from mytable order by recid asc limit 100 offset 300
for example would give you from 300-399. And obviously you will increase the offset by 100 for every page. So for the first page offset =0, for the second page offset = 100, etc.
to be general offset = (page-1)*100
And as #Mathieu Lmbert said you can make sure the offset doesn't reach 9900.
Effectively there can be only one limit, and it must be in the query. So the only solution will be to adjust the LIMIT clause in your query.
The thing you shouldn't do is to read all 10.000 entries, throwing away the 9900 that you do not want.

Mysql percent based query

Is this kind of mysql query possible?
SELECT power
FROM ".$table."
WHERE category IN ('convertible')
AND type = bwm40%
AND type = audi60%
ORDER BY RAND()
Would go something like this: from all the cars, select the power of the ones that are convertible, but 40% of the selection would be bmw's and the other 60% audi's.
Can this be done with mysql?
Can't seem to make it work with the ideea bellow, gives me an error, here is how I tried it:
$result = mysql_query("
SELECT power, torque FROM ".$table."
WHERE category IN ('convertible')
ORDER BY (case type when 'bmw' then 0.4 when 'audi' then 0.6) * RAND() DESC
LIMIT ".$offset.", ".$rowsperpage."");
You could try adjusting the randomness using a CASE:
SELECT power
FROM table
WHERE category IN ('convertible')
AND type IN ('bwm', 'audi')
ORDER BY (case type when 'bwm' then Wbwm when 'audi' then Waudi) * RAND() DESC
Where Wbmw and Waudi are weighting factors. Then you'd add a LIMIT clause to chop off the results at your desired size. That won't guarantee your desired proportions but it might be good enough for your purposes.
You'd want to play with the weighting factors (Wbmw and Waudi above) a bit to get the results you want. The weighting factors would depend on frequencies of bwm and audi in your database so 0.2 and 0.8, for example, might work better. As Chris notes in the comments, 0.4 and 0.6 would only work if you have a 50/50 split between BMW and Audi. Putting the weights in a separate table would make this approach easier to maintain and the SQL would be prettier.
Doubt this can be done properly in a single statement. Personally I would:
Calculate the COUNT() for each car type, grab them together in a query.
Retrieve both car types separately using sub-queries with LIMIT set to the correct amount and offset based on the percentage desired (so if you want 20 results total, starting at 40, and BMW if 40%, then the limit would be 8 results starting at 16 - they need to be integer values)
Using a UNION to combine the results, ORDER BY RAND() to mix them together.
That's only two actual queries, one for the counts, one combined query for the results, you could combine them in a stored procedure if performance is that much of an issue.
You could combine them using a statement prepare/execute from the results - have a look at this method from a possible duplicate question.

How can I show my ads an equal number of times?

I'm building my own advertisement platform, and I have a little problem. How can I show my ads an equal number of times?
So for example:
Name | Views
Ads 1 | 100
Ads 2 | 98
Ads 3 | 99
So my system need to show the ads with the least views, in this case "Ads 2 or Ads 3".
So all my ads follow each others views. So when my 3 ads have 3.000 views total, there should be 1.000 views on every view.
I'm coding in PHP, and I don't have an example, because I need inspiration how to fix my problem.
Select your least viewed add like this:
SELECT * FROM ads ORDERBY views ASC LIMIT 0, 1
This way, all the ads with less views will slowly catch up.
-- Edit, using your next requirement
probabilityForHighestScore = 30;
random = rand(0, 100);
if (random > probabilityForHighestScore)
SELECT * FROM ads ORDERBY views ASC LIMIT 0, 1
else
SELECT * FROM ads ORDERBY score DESC LIMIT 0, 1
If you need something else, you'd better explain you whole requirement first. Because if it is not clear for you, it won't be clear in your question, and the answers won't do what you want.
I'm assuming you have a database table containing information about these ads. You could add, if you don't already have it, a views field to that table. Then, whenever you need to display an ad, you just grab the one with the lowest view count, add 1 to it's view counter, and display the ad.
Edit:
The problem with #MarvinLabs solution, as I explain in the comments, is that it's giving a huge bonus to a single record.
Let's say you have 50 separate ads in your system. Let's also say that your highest scoring record has a score of 9.9/10, and your second highest scoring record has a score of 9.8/10. Both of these are very high scoring items, but if you use #MarvinLabs code, the highest scoring record will get 30% of all views, while the second highest scoring record will get 1.4% of all views (70 percent of all views divided across the 49 non-highest scoring ads).
What you might want to consider is allowing for a larger range of high scoring ads to be included. You can do this in any one of three ways:
First, you can set a threshold, or multiple thresholds, which divide a certain percentage of views to certain ranges of scores. For example, you could have it so that ads which score more than 9/10 get 30% of all views. You would do that like this:
$random = rand(1,100);
if ($random > 30) {
$sql = "SELECT * FROM ads WHERE score >= 9 ORDER BY views ASC";
} else {
$sql = "SELECT * FROM ads WHERE score < 9 ORDER BY views ASC";
}
The problem with this is that if you don't have any ads with a score above 9, you won't get anything back. For that reason, you probably don't want to use this method.
Second, you could spread your 30% of views across the top 5 or 10 ads:
SELECT *
FROM ads
WHERE id IN
(SELECT id
FROM ads
ORDER BY score DESC
LIMIT 10)
ORDER BY views ASC;
This solves the problem of "what if I don't have any records above the threshold" while still spreading the "high score bonus" across more than just a single record. The problem with this, if you consider it a problem that is, is that it doesn't scale with the volume of ads you have on record. Whether you have 10 records or 10,000 records, you'll still give the bonus to just 10 (or 20, or 50.. whatever you set) records.
If you want to scale, you'll want the third solution.
The third solution is to set your limit based on a percentage of the total number of records in the table. Since MySQL doesn't have a built-in way of handling this, you'll need to workaround this in one of two ways:
One way to do it the lazy way and run two queries - one to get the current record count, and another to create a query based on it. For example:
$query1 = "SELECT COUNT(*) FROM ads";
//store result in $count
$percentage = round($count * 0.10); //get 10% of records
$query2 = "SELECT * FROM ads WHERE id IN " .
"(SELECT id " .
" FROM ads " .
" ORDER BY score DESC " .
" LIMIT " . $percentage . ") " .
"ORDER BY views ASC"
A better way would be to avoid the second round-trip to the database and use a prepared statement:
SELECT #percentage := ROUND(COUNT(*) * 10/100) FROM ads;
PREPARE PERCENTAGE FROM
SELECT *
FROM ads
WHERE id IN
(SELECT id FROM ads
ORDER BY score DESC
LIMIT ?)
ORDER BY views ASC;
EXECUTE PERCENTAGE USING #percentage;

Categories