SQL: How to select rows that sum up to certain value - php

I want to select rows that sum up to a certain value.
My SQL (SQL Fiddle):
id user_id storage
1 1 1983349
2 1 42552
3 1 367225
4 1 1357899
37 1 9314493
It should calculate the sum up to 410000 and get the rows. Meanwhile it should get something like this:
id user_id storage
2 1 42552
3 1 367225
As you can see, 42552 + 367225 = 409777. It selected two rows that are nearly 410000.
I have tried everything but it didn't work :(
Sorry for my language, I am German.

You can use a correlated subquery to get the running total and retrieve those rows whose running total is < a specified number. (note that i changed the storage column to int. if it is a varchar the comparison would return the wrong result)
select id,user_id,storage
from uploads t
where storage+coalesce((select sum(storage) from uploads
where storage<t.storage),0) < 410000
order by storage
SQL Fiddle
Edit: When there are duplicate values in the storage column, it has to be accounted for in the running sum by including a condition for the id column. (in this case < condition has been used, so the smallest id for a duplicate storage value gets picked up)
select id,user_id,storage
from uploads t
where storage+coalesce((select sum(storage) from uploads
where storage<t.storage
or (storage=t.storage and id < t.id)),0) < 410000
order by storage

This is what you need:
SET #suma = 0;
SELECT #suma:=#suma+`storage`, id, storage FROM table
WHERE #suma<=410000
ORDER BY storage ASC;
I added "ORDER BY storage ASC" to skip rows that have to much storage.

Related

display huge data in batches of 100 every hour in mysql/php

I have a database with more than 600 rows but I can only retrieve/display 100 every hour. So I use
select * from table ORDER BY id DESC LIMIT 100
to retrieve the first 100. How do I write a script that will retrieve the data in batches of 100 every 1hr so that I can use it in a cron job?
Possible solution.
Add a field for to mark the record was already shown.
ALTER TABLE tablename
ADD COLUMN shown TINYINT NULL DEFAULT NULL;
NULL will mean that the record was not selected, 1 - that record is marked for selection, 0 - that record was already selected.
When you need to select up to 100 records you
2.1. Mark records to be shown
UPDATE tablename
SET shown = 1
WHERE shown = 1
OR shown IS NULL
ORDER BY shown = 1 DESC, id ASC
LIMIT 100;
shown = 1 condition in WHERE considered the fact that some records were marked but were not selected due to some error. shown = 1 DESC re-marks such records before non-marked.
If there is 100 or less records which were not selected all of them will be marked, else only 100 records with lower id (most ancient) will be marked.
2.2. Select marked records.
SELECT *
FROM tablename
WHERE shown = 1
ORDER BY id
LIMIT 100;
2.3. Mark selected records.
UPDATE tablename
SET shown = 0
WHERE shown = 1
ORDER BY id
LIMIT 100;
This is applicable when only one client selects the records.
If a lot of clients may work in parallel, and only one cliens must select a record, then use some cliens number (unique over all clients) for to mark a record for selection instead of 1.
Of course if there is only one client, and you guarantee that selection will not fail, you may simply store last shown ID somewhere (on the client side, or in some service table on the MySQL side) and simply select "next 100" starting from this stored ID:
SELECT *
FROM tablename
WHERE id > #stored_id
ORDER BY id
LIMIT 100;
and
SELECT MAX(id)
FROM tablename
WHERE id > #stored_id
ORDER BY id
LIMIT 100;
for to store instead of previous #stored_id.
Thank you #Akina and #Vivek_23 for your contributions. I was able to figure out an easier way to go about it.
Add a new field to table, eg shownstatus
Create a cronjob to display 100 (LIMIT 100) records with their shownstatus not marked as shown from table every hour and then update each record's shownstatus to shown NB. If I create a cronjob to run every hour for the whole day, I can get all records displayed and their shownstatus updated to shown by close of day.
Create a second cronjob to update all record's shownstatus to notshown
The downside to this is that, you can only display a total of 2,400 records a day. ie. 100 records every hour times 24hrs. So if your record grows to about 10,000. You will need to set your cronjob to run for atleast 5 days to display all records.
Still open to a better approach if there's any, but till then, I will have to just stick to this for now.
Let's say you made a cron that hits a URL something like
http://yourdomain.com/fetch-rows
or a script for instance, like
your_project_folder/fetch-rows.php
Let's say you have a DB table in place that looks something like this:
| id | offset | created_at |
|----|--------|---------------------|
| 1 | 100 | 2019-01-08 03:15:00 |
| 2 | 200 | 2019-01-08 04:15:00 |
Your script:
<?php
define('FETCH_LIMIT',100);
$conn = mysqli_connect(....); // connect to DB
$result = mysqli_query($conn,"select * from cron_hit_table where id = (select max(id) from cron_hit_table)")); // select the last record to get the latest offset
$offset = 0; // initial default offset
if(mysqli_num_rows($result) > 0){
$offset = intval(mysqli_fetch_assoc($result)['offset']);
}
// Now, hit your query with $offset included
$result = mysqli_query($conn,"select * from table ORDER BY id DESC LIMIT $offset,100");
while($row = mysqli_fetch_assoc($result)){
// your data processing
}
// insert new row to store next offset for next cron hit
$offset += FETCH_LIMIT; // increment current offset
mysqli_query($conn,"insert into cron_hit_table(offset) values($offset)"); // because ID would be auto increment and created_at would have default value as current_timestamp
mysqli_close($conn);
Whenever cron hits, you fetch last row from your hit table to get the offset. Hit the query with that offset and store the next offset for next hit in your table.
Update:
As pointed out by #Dharman in the comments, you can use PDO for more abstracted way of dealing with different types of database(but make sure you have appropriate driver for it, see checklist of drivers PDO supports to be sure) along with minor checks of query syntaxes.

How can I find the leaderboard position of a user based on the amount of points they have using SQL?

In my database I have a profile table which includes the following columns:
Id, profilename,points,lastupdated
Users can be put into the profile table more than once, and for the leaderboard I use the value of the max(lastupdated) column.
Below I have SQL code to try and determine someones leaderboard position, and it works except for 1 small problem.
SELECT COUNT( DISTINCT `profilename` )
FROM `profiletable`
WHERE `points` >= integer
If two people have the same points, then they have the same leaderboard position, but this makes the count function work in a way that I didnt not intend.
For example:
position 1: profilename john points 500
position 2: profilename steve points 300
position 2: profilename alice points 300
position 4: profilename ben points 200
When I run the SQL query with Integer being set to john's points, I get a return value of 1 from the count function, but when I run the SQL query for steve or alice, instead of getting 2, I get 3.
I have also tried using the logic points > integer but this returns 1 for steve or alice.
How can I modify the SQL query so that the correct leaderboard position is returned by the count?
Use greater than in the where clause and add 1 to the count:
SELECT COUNT( DISTINCT `profilename` ) + 1
FROM `profiletable`
WHERE `points` > integer
Update:
If you need to take into the updated time field as well, then first you need to get the record associated with the last update time:
SELECT COUNT(p1.`profilename` ) + 1
FROM `profiletable` p1
LEFT JOIN `profiletable` p2 ON p1.profilename=p2.profilename
and p1.lastupdated<p2.lastupdated
WHERE p1.`points` > integer and p2.profilename is null

MySQL update IDs on update, insert and delete

In my current application I am making a menu structure that can recursively create sub menu's with itself. However due to this I am finding it difficult to also allow some sort of reordering method. Most applications might just order by an "ordering" column, however in this case although doing that does not seem impossible, just a little harder.
What I want to do is use the ID column. So if I update id 10 to be id 1 then id 1 that was there previously becomes 2.
What I was thinking at a suggestion from a friend was to use cascades. However doing a little more research that does not seem to work as I was thinking it might.
So my question is, is there an ability to do this naively in MySQL? If so what way might I do that? And if not what would you suggest to come to the end result?
Columns:
id title alias icon parent
parents have a lower id then their children, to make sure the script creates the array to put the children inside. That part works, however If I want to use an ordering column I will have to make a numbering system that would ensure a child element is never higher then its parent in the results. Possible, but if I update a parent then I must uniquely update all its children as well, resulting in more MySQL queries that I would want.
I am no MySQL expert so this is why I brought up this question, I feel there might be a perfect solution to this that can allow the least overhead when it comes to the speed of the application.
Doing it on the ID column would be tough because you can't ever have 2 rows with the same ID so you can't set row 10 to row 1 until after you've set row 1 to row 2 but you can't set row 1 to row 2 until you set row 2 to row 3, etc. You'd have to delete row 10 and then do an update ID += 1 WHERE ID < 10... but you'd also have to tell MySQL to start from the highest number and go down....
You'd have to do it in separate queries like this:
Move ID 10 to ID 2
DELETE FROM table WHERE id = 10;
UPDATE table SET id = id + 1 WHERE id >= 2 AND id < 10 ORDER BY id DESC
INSERT INTO table (id, ...) VALUES (2, ...);
Another option, if you don't want to delete and reinsert would be to set the id for row 10 to be MAX(id) + 1 and then set it to 1 after
Also if you want to move row 2 to row 10 you'd have to subtract the id:
Move ID 2 to ID 10
DELETE FROM table WHERE id = 2;
UPDATE table SET id = id - 1 WHERE id > 2 AND id <= 10 ORDER BY id DESC
INSERT INTO table (id, ...) VALUES (10, ...);
If you don't have your ID column set as UNSIGNED you could make all the IDs you want to switch to negative ids since AUTO_INCREMENT doesn't do negative numbers. Still this is pretty hacky and I wouldn't recommend it. You also probably need to lock the table so no other writes happen while this is running.:
Move ID 2 to ID 10
UPDATE table SET id = id * -1 WHERE id > 2 AND id <= 10;
UPDATE table SET id = 10 WHERE id = 2;
UPDATE table SET id = id * -1 - 1 WHERE id < -2 AND id >= -10;

Select a random row but with odds

I have a dataset of rows each with an 'odds' number between 1 and 100. I am looking to do it in the most efficient way possible. The odds do not necessarily add up to 100.
I have had a few ideas.
a)
Select the whole dataset and then add all the odds up and generate a random number between 1 and that number. Then loop through the dataset deducting the odds from the number until it is 0.
I was hoping to minimize the impact on the database so I considered if I could only select the rows I needed.
b)
SELECT * FROM table WHERE (100*RAND()) < odds
I considered LIMIT 0,1
But then if items have the same probability only one of the will be returned
Alternatively take the whole dataset and pick a random one from there... but then the odds are affected as it becomes a random with odds and then a random without odds thus the odds become tilted in favour of the higher odds (even more so).
I guess I could order by odds ASC then take the whole dataset and then with PHP take a random out of the rows with the same odds as the first record (the lowest).
Seems like a clumsy solution.
Does anyone have a superior solution? If not which one of the above is best?
Do some up-front work, add some columns to your table that help the selection. For example suppose you have these rows
X 2
Y 3
Z 1
We add some cumulative values
Key Odds Start End
X 2 0 1 // range 0->1, 2 values == odds
Y 3 2 4 // range 2->4, 3 values == odds
Z 1 5 5 // range 5->5, 1 value == odds
Start and End are chosen as follows. The first row has a start of zero. Subsequent rows have a start one more than previous end. End is the (Start + Odds - 1).
Now pick a random number R in the range 0 to Max(End)
Select * from T where R >= T.Start and R <= T.End
If the database is sufficiently clever we may we be able to use
Select * from T where R >= T.Start and R <= (T.Start + T.Odds - 1)
I'm speculating that having an End column with an index may give the better performance. Also the Max(End) perhaps gets stashed somewhere and updated by a trigger when ncessary.
Clearly there's some hassle in updating the Start/End. This may not be too bad if either
The table contents are stable
or insertions are in someway naturally ordered, so that each new row just continues from the old highest.
What if you took your code, and added an ORDER BY RAND() and LIMIT 1?
SELECT * FROM table WHERE (100*RAND()) < odds ORDER BY RAND() LIMIT 1
This way, even if you have multiples of the same probability, it will always come back randomly ordered, then you just take the first entry.
select * from table
where id between 1 and 100 and ((id % 2) <> 0)
order by NewId()
Hmm. Not entirely clear what result you want, so bear with me if this is a bit crazy. That being said, how about:
Make a new table. The table is a fixed data table, and looks like this:
Odds
====
1
2
2
3
3
3
4
4
4
4
etc,
etc.
Then join from your dataset to that table on the odds column. You'll get as many rows back for each row in your table as the given odds of that row.
Then just pick one of that set at random.
If you have an index on the odds column, and a primary key, this would be very efficient:
SELECT id, odds FROM table WHERE odds > 0
The database wouldn't even have to read from the table, it would get everything it needed from the odds index.
Then, you'll select a random value between 1 and the number of rows returned.
Then select that row from the array of rows returned.
Then, finally, select the whole target row:
SELECT * FROM table WHERE id = ?
This assures an even distribution between all rows with an odds value.
Alternatively, put the odds in a different table, with an autoincrement primary key.
Odds
ID odds
1 4
2 9
3 56
4 12
Store the ID foreign key in the main table instead of the odds value, and index it.
First, get the max value. This never touches the database. It uses the index:
SELECT MAX(ID) FROM Odds
Get a random value between 1 and the max.
Then select the record.
SELECT * FROM table
JOIN Odds ON Odds.ID = table.ID
WHERE Odds.ID >= ?
LIMIT 1
This will require some maintenance if you tend to delete Odds value or roll back inserts to keep the distribution even.
There is a whole chapter on random selection in the book SQL Antipatterns.
I didn't try it, but maybe something like this (with ? a random number from 0 to SUM(odds) - 1)?
SET #prob := 0;
SELECT
T.*,
(#prob := #prob + T.odds) AS prob
FROM table T
WHERE prob > ?
LIMIT 1
This is basically the same as your idea a), but entirely within one (well, technically two if you count the variable set-up) SQL commands.
A general solution, suitable for O(log(n)) updates, is something like this:
Store objects as leaves of a (balanced) tree.
At each branch node, store the weights of all objects under it.
When adding, removing, or modifying nodes, update weights of their parents.
Then pick a number between 0 and (total weight - 1) and navigate down the tree until you find the right object.
Since you don't care about the order of things in the tree, you can store them as an array of N pointers and N-1 numbers.

How to sort rows in a table and fetch row positions?

Consider the following table structure:
id name rank position
-------------------------------
1 Steve 5 0
2 David 3 0
3 Helen 9 0
4 Mark 15 0
I need a fast algorithm to rate these rows by rank column and store the "position" of each row in position field in realtime.
Now I have a brutal solution:
SELECT * FROM table ORDER BY rank DESC
And then fetch the result in a loop and update every row filling the position column. But what If I'd have thousands of entries? How can I optimize it?
I answered a very similar question specific to MySQL a few days ago:
Automate MySQL fields (like Excel)
The trick is not to store the column in the database but to calculate it dynamically when you fetch the results. Alternatively you could use triggers to update the column every time you make an insert, update or delete.
Pick one:
WITH t AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY rank) as RowNum
,RANK() OVER (ORDER BY rank) as RankNum
,DENSE_RANK() OVER (ORDER BY rank) as DenseRankNum
, *
FROM table
)
UPDATE t set position = RankNum
--UPDATE t set position = RowNum
--UPDATE t set position = DenseRankNum
;

Categories