Mysql query to average time - php

I play a lot of board games and I maintain a site/database which keeps track of several statistics. One of the tables keeps track of various times. It's structure looks like this:
gameName (text - the name of the board game)
numPeople (int - the number of people that played)
timeArrived (timestamp - the time we arrived at the house we are playing the game)
beginSetup (timestamp - the time when we begin to set up the game)
startPlay (timestamp - the time we actually start playing the game)
gameEnd (timestamp - the time the game is finished)
Basically, what I'm wanting to do is use these times to get some interesting/useful info from (like what game on average takes the longest to set up, what game on average takes the longest to play, what game is the longest from arrival to finish, etc...) Normally, I rely way too much on PHP and I would just do a select * ... and grab all the times then do some PHP calculations to find all the stats but I know that MySQL can do all this for me with a query. Unfortunately, I get pretty lost when it comes to more complex queries so I'd like some help.
I'd like some examples of a couple queries and hopefully I can figure out other average time queries once someone gets me started. What would the query look like for longest time on average to play a board game? What about quickest game/time to set up on average?
Additional Info:
drew010 - You have me off to a great start but I'm not getting the results I'd expected. I've give you some real exmples...
I've got a game called Harper and it's been played twice (so there are two records in the database with time entires). Here are what the times look like for it:
beginSetup(1) = 2012-07-25 12:06:03
startPlay(1) = 2012-07-25 12:47:14
gameEnd(1) = 2012-07-25 13:29:45
beginSetup(2) = 2012-08-01 12:06:30
startPlay(2) = 2012-08-01 12:55:00
gameEnd(2) = 2012-08-01 13:40:32
When I then run the query you provided me (and I convert the seconds into hours/minutes/seconds) I get these results (sorry, I don't know how to do the cool table you did):
gameName = Harper
Total Time = 03:34:32
...and other incorrect numbers.
From the numbers, the Average Total Time should be about 1 hour and 24 minutes - not 3 hours and 34 minutes. Any idea why I'd be getting incorrect numbers?

Here is a query to get the average setup time and play time for each game, hope it helps:
SELECT
gameName,
AVG(UNIX_TIMESTAMP(startPlay) - UNIX_TIMESTAMP(beginSetup)) AS setupTime,
AVG(UNIX_TIMESTAMP(gameEnd) - UNIX_TIMESTAMP(startPlay)) AS gameTime,
AVG(UNIX_TIMESTAMP(gameEnd) - UNIX_TIMESTAMP(beginSetup)) AS totalTime,
FROM `table`
GROUP BY gameName
ORDER BY totalTime DESC;
Should yield results similar to:
+----------+-----------+-----------+-----------+
| gameName | setupTime | gameTime | totalTime |
+----------+-----------+-----------+-----------+
| chess | 1100.0000 | 1250.0000 | 2350.0000 |
| checkers | 466.6667 | 100.5000 | 933.3333 |
+----------+-----------+-----------+-----------+
I just inserted about 8 test rows with some random data so my numbers don't make sense, but that is the result you would get.
Note that this will scan your entire table so it could take a while depending on how many records you have in this table. It's definitely something you want to run in the background periodically if you have a considerable amount of game records.

For something like how long it took to set up you could write something like:
SELECT DATEDIFF(HOUR, BeginSetup, StartTime) -- in hours how long to set up

Related

Sort large SQL database of Electrical Meter Readings through PHP

I'm not sure how to ask this properly as I'm a little green to this and seeing how I can't ask it properly I haven't been able to google the results.
Backstory: I manage an apartment complex. Every apartment has a digital electrical meter. Every day I can download a CSV file of all unit's and their readings.
Using PHP and SQL i can pull the UNIT # from a table called tenants - Then I can reference the specific unit # in a search on my browser from a specific date and it will automatically calculate the usage for the month (or whatever range I select).
I have that part down! What I'm trying to do now is create a one button pull where I can see all usage from all tenants in one easy table.
Right now the database looks likes this
|UNIT|KWH|DATE |
|101 |100|01/01/2022|
|102 |80 |01/01/2022|
|103 |110|01/01/2022|
|104 |108|01/01/2022|
|101 |110|01/02/2022|
|102 |90 |01/02/2022|
|103 |125|01/02/2022|
|104 |128|01/01/2022|
ETC
It just keeps growing as I import the CSV file daily into the database
What I want to be able to quickly see is:
|UNIT|TOTAL KWH|DATE RANGE
|101 |10 |01/01/2022 - 01/30/2022|
|102 |10 |01/01/2022 - 01/30/2022|
|103 |15 |01/01/2022 - 01/30/2022|
|104 |20 |01/01/2022 - 01/30/2022|
The below code gives me the specific unit
SELECT Max(KWH)-Min(KWH) AS TOTALKWH,UNIT AS UNIT
FROM testdb
WHERE UNIT = 'Unit_220'
AND Date >='11/01/2022' AND Date <='11/30/2022'
I'm stuck on how to select all units and not just a specific unit. Any thoughts how to do this easily? Or perhaps a better way than I am currently?
Use MySQL to achieve your desired idea.
SELECT
tb1.UNIT,
(
tb1.KWH -
(
SELECT
tb3.KWH
FROM
kwh AS tb3
WHERE
tb3.DATE = DATE_ADD(tb1.DATE, INTERVAL -1 MONTH) AND tb3.UNIT = tb1.UNIT
)
) AS "TOTAL KWH",
CONCAT(
DATE_ADD(tb1.DATE, INTERVAL -1 MONTH),
" ~ ",
DATE_ADD(tb1.DATE, INTERVAL -1 DAY)
) AS "DATE RANGE"
FROM
kwh AS tb1
WHERE
(
SELECT
COUNT(tb2.DATE)
FROM
kwh AS tb2
WHERE
tb2.DATE < tb1.DATE
) >= 1
ORDER BY
tb1.DATE;

Create view from live stats

I created a sistem to input results from a school basketball tournament. The idea is that after the game the operators will input the result in a format that the system fetches to save in the db in a format like the one below:
Date | Team | Score 1Q | Score 2Q | Score 3Q | Score 4Q | Score OT | Final Score | W | L | Won over Team | Lost to Team | Regular Season? | Finals?
I created a PHP page that calculate many stats from the table above, like Total Wins, Win%, Avg Points, Avg. Points per Quarter, % Turn Around Games when loosing on Half Time or 3Q, % Finals games disputed, Times became champions etc, and many more deep stats.
But I was thinking in creating a View with this information calcalated on the DB and in real time, instead of having the script handles it.
But how can I turn the selects needed from the first table into a working second table with all calculations done whenever we make the selection?
Thanks
#decio, I think your idea about creating a view to calculate those stats is not a bad idea. You might be able to do so with the something similar to the following SQL script:
CREATE VIEW result_stats_view AS SELECT SUM(W) as total_wins, SUM(L) as total_losses FROM precalculate_stats_table_name;
This shows the total wins and losses for the season, but you probably get the idea. Check out MySQL aggregate functions (like average, sum, etc.) here:
https://dev.mysql.com/doc/refman/8.0/en/aggregate-functions.html
Once you have your calculations added to the view then you can simply do query like this to get your calculated data:
SELECT * from result_stats_view

MySQL query to average data if within bounds for graph

I have a large amount of data from a data Logging device stored in a MySQL DB that I want to place on a graph, I want to show a months worth of data - the Logging is per second.
I’m using PHP and the Google Charts library to draw the graph as an image client side.
There is no point trying to display 2,628,000 on a graph on a screen so I want to try and get an SQL query to give an average datapoint for say each hour (3600 down to 1), instead of each second, unless it is out of bounds.
The reason being the whole point in the graph is to show if the value has gone out of bounds and when it did.
The current SQL query to get the data required for last month for example is below, the first problem is PHP is hitting its memory limit before its able to return the data:
SELECT Tms, Hz FROM log WHERE Tms >= ".$start." AND Tms <=".$finish." ORDER BY Tms ASC
The average value should be for example 60, the upper limit is 61.5 and the lower limit is 58.5 - any value outside of these should be returned as-is otherwise the hours worth of data should be returned as an average for that hour.
EDIT: To answer the comments:
DB structure is:
ID - double - AUTO_INCREMENT
Tms - timestamp
Hz - float
Example Data is:
ID | Tms | Hz
1 | 1559347082 | 59.91
2 | 1559347083 | 59.98
3 | 1559347084 | 60.53
4 | 1559347085 | 62.03
5 | 1559347086 | 61.11
6 | 1559347087 | 60.93
7 | 1559347088 | 60.88
.......
3606 | 1559350686 | 59.99
The expected results would be to have an array of results, all of the values within an hour as an average, unless there is a value out of bounds.
So for the data above, items 1,2,3 would be returned with the average Tms: 1559347083 and average Hz: 60.14, but the next value in the array of results would be Tms: 1559347085 and Hz: 62.03.
Results:
Tms: 1559347083 | Hz: 60.14
Tms: 1559347085 | Hz: 62.03
Tms: 1559348886 | Hz: 60.17
The maximum amount of points to be averaged or grouped together would be 3600 rows = 1 hour so the graph does show some movement.
One of the current errors when trying to select a large amount of data:
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 20480 bytes)
This is happening as the result is being placed into an array so I can add the values for the bounds so there is a clear line on the graph:
while($row = $result->fetch_assoc()) {
$dataPoint = array($row['Tms'], '58.5', $row[$graph], '61.5');
....
array_push($dataPoints, $dataPoint);
This array ($dataPoints) then gets passed to a function to either output as JSON or output as CSV using fputcsv
It is not logical, or useful, to have one query that does give both, hourly averages, and individual out of bounds values. This requires two queries. So let's start with the first, the hourly average:
SELECT
COUNT(ID) AS CountID,
DATE(Tms) AS DateTms,
HOUR(Tms) AS HourTms,
AVG(Hz) AS AvgHz
FROM
log
WHERE
Tms >= '2019-01-01 12:00:00' AND
Tms <= '2019-12-12 12:00:00'
GROUP BY
HOUR(Tms)
ORDER BY
Tms ASC
I've put real dates in the WHERE conditions, instead of the undocumented variables $start and $finish, but these can, of course, be replace. I've added a counter, because it is always useful, and finally, because we report for each hour of the day, I have added a date. The GROUP BY HOUR(Tms) does the grouping by whole hours.
The second query is about the out of bouds values. It is simply:
SELECT
ID,
Tms,
Hz
FROM
log
WHERE
Tms >= '2019-01-01 12:00:00' AND
Tms <= '2019-12-12 12:00:00' AND
(Hz < 58.5 OR Hz > 61.5)
ORDER BY
Tms ASC
You can easily combine the results of these two queries into one array with PHP. However...
I am worried that the last query might produce too much data when there are too much out of bound values. And that's probably what you're saying in your later addition to the question. To solve this you could work with an hourly average of the out of bounds values. You would have to use two queries for this, one for values below the lower limit and one for those above the upper limit. I'll show the first one here:
SELECT
COUNT(ID) AS CountID,
DATE(Tms) AS DateTms,
HOUR(Tms) AS HourTms,
AVG(Hz) AS AvgHz
FROM
log
WHERE
Tms >= '2019-01-01 12:00:00' AND
Tms <= '2019-12-12 12:00:00' AND
Hz < 58.5
GROUP BY
HOUR(Tms)
ORDER BY
Tms ASC
This looks very much like the first query, which is a good thing. The only addition is the range limiting of the Hz value. The other query simply has Hz > 61.5. The results of the three queries can be collected in an array and displaying in a graph.
The three queries could be forced into one query, but I don't see the advantage of that. With three separate queries you could, for instance, write a PHP function that does the query and gets the results, and all you need to vary, using function parameters, is range limiting and the start/finish times.
Finally a bit about your database. I see you use doubles for the ID, that should probably be an integer. Also don't forget to put indexes on Tms and Hz otherwise your queries might be very slow.

Altering thousands of records every page refresh

I am starting to think about my new project and I've found a couple of speed issues, so I hope you can help me with selecting a good and elegant way to code it.
Each user has in the database records of "places" he has visited. Each place has "schools" - a number of schools in this particular place. Each school has classes. Each class may end its "learning year" at different times, so it's number should increment if date is >= end of learning year.
So we have such a database:
"places" table:
place | user_id |
-----------------
1 | 4 |
2 | 4 |
User no 4 visited place no 1 and 2
"schools" table:
school | place |
----------------
5 | 2 |
6 | 2 |
Place 2 has two schools - with id 5 and 6.
"class" table:
class | school | end_learning | class_number
---------------------------------------------
20 | 5 | 01.01.2013 | 2
21 | 5 | 03.01.2013 | 3
22 | 5 | 05.01.2013 | 4
School 5 has 3 classes with ids 20, 21, 22. If date is greater than 01.01.2013, the class number of class 20 should be incremented to 3 and end learning date changed to 01.01.2014. And so on.
And now we got into the problem - if there is 1000 places, each with 100 schools, each with 10 classes we got 1000000 records. It's a lot. Because all I have presented is just a simple example I have to consider updating whole database every time user refreshes the page so I'm afraid it might be laggy on that amount of records.
I also can serialize class into one field in school table:
school | place | classes
-------------------------------------------------------------------------
5 | 2 | serialized class 20, 21, 22 with end_learning field and class number
6 | 2 | other serialized classes from school 6
In that case I get 10 times less records but each time I have to deserialize data, check dates and if it's less than now alter it, serialize and save to database. The second problem is that I have to select all records from db to manipulate them not only all those need to be altered.
I am also thinking about having two databases: One with records that might need change in further future, and second that might need change in next 24hrs (near future). Every 24hrs all the classes which end learning in next 24 hrs are moved to "near future" db so every refresh of the page works on thousands of records, not hundreds of thousands or millions. Instead of that it works on millions of records (further future) to create "near future" table only once per day.
What do you think about all those database schemas? Maybe you have a better idea?
I don't quite understand the business logic or data model you outline - but I will assume you have thought this through.
Firstly, RDBMS solutions like MySQL are really, really good at managing large numbers of records, as long as the data you are working with is relational. As far as I can tell, you will be searching across many records, but only updating a few (a user will only be enrolled in a limited number of classes); I don't see this as a huge problem.
Secondly, it's nearly always better to go with the "standard" relational model until you can prove it doesn't meet your performance needs than to go for "exotic" solutions at the start off (I class your serialization and partitioning solution as "exotic" for the purpose of this answer). A lot of time and energy has gone into optimizing performance of SQL; if there were a simple alternative, it would be part of the standard solution. There are, of course, points at which the standard relational model doesn't scale (Facebook-size traffic, for instance), or business domains where the relational model doesn't really fit (documents, graphs). However, all the alternatives have benefits and drawbacks just like "standard" MySQL.
Thirdly, the best way to deal with possible performance issues is, well, to deal with them. In code. Build a test rig, create a schema according to the relational model, populate it with test data (e.g. using DbMonster), throw some load at it (e.g. using JMeter) and tune your schema and queries to prove your situation doesn't fit the standard solution. Only go for something exotic if you really can prove that you can't play nice with standard, relational database stuff.

Popularity Algorithm

I'm making a digg-like website that is going to have a homepage with different categories. I want to display the most popular submissions.
Our rating system is simply "likes", like "I like this" and whatnot. We basically want to display the submissions with the highest number of "likes" per time. We want to have three categories: all-time popularity, last week, and last day.
Does anybody know of a way to help? I have no idea how to go about doing this and making it efficient. I thought that we could use some sort of cron-job to run every 10 minutes and pull in the number of likes per the last 10 minutes...but I've been told that's pretty inefficient?
Help?
Thanks!
Typically Digg and Reddit-like sites go by the date of the submission and not the times of the votes. This way all it takes is a simple SQL query to find the top submissions for X time period. Here's a pseudo-query to find the 10 most popular links from the past 24 hours using this method:
select * from submissions
where (current_time - post_time) < 86400
order by score desc limit 10
Basically, this query says to find all the submissions where the number of seconds between now and the time it was posted is less than 86400, which is 24 hours in UNIX time.
If you really want to measure popularity within X time interval, you'll need to store the post and time for every vote in another table:
create table votes (
post foreign key references submissions(id),
time datetime,
vote integer); -- +1 for upvote, -1 for downvote
Then you can generate a list of the most popular posts between X and Y times like so:
select sum(vote), post from votes
where X < time and time < Y
group by post
order by sum(vote) desc limit 10;
From here you're just a hop, skip, and inner join away from getting the post data tied to the returned ids.
Do you have a decent DB setup? Can we please hear about your CREATE TABLE details and indices? Assuming a sane setup, the DB should be able to pull the counts you require fast enough to suit your needs! For example (net of indices and keys, that somewhat depend on what DB engine you're using), given two tables:
CREATE TABLE submissions (subid INT, when DATETIME, etc etc)
CREATE TABLE likes (subid INT, when DATETIME, etc etc)
you can get the top 33 all-time popular submissions as
SELECT *, COUNT(likes.subid) AS score
FROM submissions
JOIN likes USING(subid)
GROUP BY submissions.subid
ORDER BY COUNT(likes.subid) DESC
LIMIT 33
and those voted for within a certain time range as
SELECT *, COUNT(likes.subid) AS score
FROM submissions
JOIN likes USING(subid)
WHERE likes.when BETWEEN initial_time AND final_time
GROUP BY submissions.subid
ORDER BY COUNT(likes.subid) DESC
LIMIT 33
If you were storing "votes" (positive or negative) in likes, instead of just counting each entry there as +1, you could simply use SUM(likes.vote) instead of the COUNTs.
For stable list like alltime, lastweek, because they are not supposed to change really fast so that I think you should save the list in your cache with expiration time is around 1 days or longer.
If you concern about correct count in real time, you can check at every page view by comparing the page with lowest page in the cache.
All you need to do is to care for synchronizing between the cache and actual database.
thethanghn
Queries where the order is some function of the current time can become real performance problems. Things get much simpler if you can bucket by calendar time and update scores for each bucket as people vote.
To complete nobody_'s answer I would suggest you read up on the documentation (if you are using MySQL of course).

Categories