I was searching for some effective peak counter algorithm but couldn't find one that suits my needs:
C-like language (it will be in PHP but I could translate from JS/C/Cpp)
Data is not traversable forward
Has static thresholds - detects peaks that are over 270 Watts AND those peaks are short so there is also a time restriction.
I already checked here:
Peak signal detection in realtime timeseries data
But that does not seem to be a viable solution for me.
Here is some example data:
Data has following format:
$a = [
'timestamp' = 1500391300.342
'power' => 383.87
];
On this chart there are 14 peaks. So i need algorithm to count them in loop.
Surprisingly each peak can have more than 10 points. Peak is continuous array that keeps filling itself. There is access to at least 100 previous points but there is no access to future data.
I have also prepared Calc (ODC) document with 15 peaks and some example data.
Here it is
So far I have simple algorithm that counts rising slopes but that does not work correctly (because there may be no slope over 270W, it may be divided to 5 points or jump may be too long to count as peak):
if ($previousLog['power'] - $deviceLog['power'] > 270) {
$numberShots++;
}
Thanks in advance, any tips could be help.
Simple hysteresis should work:
Find a sample that is above 270W
While sample are above 240W keep track of the largest sample seen,
and its timestamp
When you find a sample that is below 240W, go back to step 1
Related
I'm wondering what are the chances of getting 100 using mt_rand(1,100)?
Are the chances 1-100? does that mean I'll get atleast 100 once if i "roll" 100 times?
I've been wondering this for a while but I can't find any solution.
The reason why i wonder is because i'm trying to calculate how many times I have to roll in order to get 100 guaranteed.
<?php
$roll = mt_rand(1,100);
echo $roll;
?>
Regards Dennis
Are the chances 1-100? does that mean I'll get atleast 100 once if i "roll" 100 times?
No, thats not how random number generators work. Take an extreme example:
mt_rand(1, 2)
One would assume that over a long enough time frame that the number of 1s and the number of 2s would be the same. However, it is perfectly possible to get a sequence of 10 consecutive 1s. Just because its random, doesn't mean that a specific number must appear, if that were the case it would no longer be random.
I'm trying to calculate how many times I have to roll in order to get 100 guaranteed.
Mathematically, there is no number where 100 is guaranteed to be in the sequence. If each roll is independent there is a 99/100 chance that it won't be 100.
For two rolls this is (99/100)^2 or 98% likely. For 100 rolls its about 37% likely that you won't roll one 100 in that set. In fact, you need to roll in sets of 230 to have a less than 1% chance of having no 100s in the set.
The probability of getting 100 is 1/100 by calling this function however there is no guarantee of getting 100 when you call it for the 100 times. You have to take a much bigger sample space. For example: If you call this function for 100,000,000 times, there are good chances that 100 will be found for 100,000 times.
This can be answered in a better way if you let us know about your use case in more detail.
getting 1 out of 100 rolls is just a statistical way of explaining it. though there is 1%(means 1 out of 100), it doesn't mean you really will get one 1 out of 100 rolls. it's a matter of chances.
mt_rand uses the Mersenne Twister to generate pseudo random numbers, that are said to be uniform distributed. So if, you set min and max values, it should be (most likely) also uniform distributed.
So: you can only talk about the propability to get a number in the given range and also about an expected number of trys until you get a specific number or all numbers in range.
This means: No guarantees for a given number number to get a specific number at least once.
I have a simple PHP/HTML page that runs MySQL queries to pull temperature data and display on a graph. Every once in a while there is some bad data read from my sensors (DHT11 Temp / RH sensors, read by Arduino), where there will be a spike that is too high or too low, so I know it's not a good data point. I have found this is easy to deal with if it is "way" out of range, as in not a sane temperature, I just use a BETWEEN statement to filter out any records that are not possibly true.
I do realize that ultimately this should be fixed at the source so these bad readings never post in the first place, however as a debugging tool, I do actually want to record those errors in my DB, so I can track down the points in time when my hardware was erroring.
However, this does not help with the occasional spikes that actually fall within the range of sane temperatures. For example if it is 65 F outside, and the sensor occasionally throws an odd reading and I get a 107 F reading, it totally screws up my graphs, scaling, etc. I cant filter that with a BETWEEN (that I know of), because 107 F is actually a practical summer time temp in my region.
Is there a way to filter out values based on their neighboring rows? Can I do something like, if I am reading five rows for the sake of simplicity, and their result is: 77,77,76,102,77 ... that I can say "anything that is more than (x) difference between sequential rows, ignore it because it's bad data" ?
[/longWinded]
It is hard to answer without your schema so I did a SQLFiddle to reproduce your problem.
You need to average the temperature between a time frame and then compare this value with the current row. If the difference is too big, then we don't select this row. In my Fiddle this is done by :
abs(temp - (SELECT AVG(temp) FROM temperature AS t
WHERE
t.timeRead BETWEEN
DATE_ADD(temperature.timeRead, interval-3 HOUR)
AND
DATE_ADD(temperature.timeRead, interval+3 HOUR))) < 8
This condition is calculating the average of the temprature of the last 3 hours and the next 3 hours. If the difference is more than 8 degrees then we skip this row.
I have lots of sensor data from which I need to be able to detect changes reliably. Basically it comes from water level sensor in remote client. It's using accelerometer & float to get the water level. My problem is that the data can be noisy sometimes (it varies by 2-5 units per measurement) and sometimes I need to detect changes as low as 7-9 units.
When I'm graphing the data it's quite obvious for human eye that there's a change but how would I go at it programming wise? Now I'm just trying to detect changes bigger than x programmatically but it's not too reliable. I've attached a sample graph and pointed the changes with arrows. The huge changes in the beginning are just testing, so it's not normal behaviour for data.
The data is in MYSQL database and the code is in PHP so if you could point me to right direction I'd highly appreciate it!
EDIT: Also there can be some spikes in the data which are not considered valid but rather a mistake in the data.
EDIT: Example data can be found from http://pastebin.com/x8C9AtAk
The algorithm would need to run every 30 mins or so and should be able to detect changes within the last 2-4 pings. Each ping is in 3-5min interval.
I made some awk that you, or someone else, might like to experiment with. I average the last 10 (m) samples excluding the current one, and also average the last 2 samples (n) and then calculate the difference between the two and output a message if the absolute difference exceeds a threshold.
#!/bin/bash
awk -F, '
# j will count number of samples
# we will average last m samples and last n samples
BEGIN {j=0;m=10;n=2}
{d[j]=$3;id[j++]=$1" "$2} # Store this point in array d[]
END { # Do this at end after reading all samples
for(i=m-1;i<j;i++){ # Iterate over all samples, except first few while building average
totlastm=0 # Calculate average over last m not incl current
for(k=m;k>0;k--)totlastm+=d[i-k]
avelastm=totlastm/m # Average = total/m
totlastn=0 # Calculate average over last n
for(k=n-1;k>=0;k--)totlastn+=d[i-k]
avelastn=totlastn/n # Average = total/n
dif=avelastm-avelastn # Calculate difference between ave last m and ave last n
if(dif<0)dif=-dif # Make absolute
mesg="";
if(dif>4)mesg="<-Change detected"; # Make message if change large
printf "%s: Sample[%d]=%d,ave(%d)=%.2f,ave(%d)=%.2f,dif=%.2f%s\n",id[i],i,d[i],m,avelastm,n,avelastn,dif,mesg;
}
}
' <(tr -d '"' < levels.txt)
The last bit <(tr...) just removes the double quotes before sending the file levels.txt to awk.
Here is an excerpt from the output:
18393344 2014-03-01 14:08:34: Sample[1319]=343,ave(10)=342.00,ave(2)=342.00,dif=0.00
18393576 2014-03-01 14:13:37: Sample[1320]=343,ave(10)=342.10,ave(2)=343.00,dif=0.90
18393808 2014-03-01 14:18:39: Sample[1321]=343,ave(10)=342.10,ave(2)=343.00,dif=0.90
18394036 2014-03-01 14:23:45: Sample[1322]=342,ave(10)=342.30,ave(2)=342.50,dif=0.20
18394266 2014-03-01 14:28:47: Sample[1323]=341,ave(10)=342.20,ave(2)=341.50,dif=0.70
18394683 2014-03-01 14:38:16: Sample[1324]=346,ave(10)=342.20,ave(2)=343.50,dif=1.30
18394923 2014-03-01 14:43:17: Sample[1325]=348,ave(10)=342.70,ave(2)=347.00,dif=4.30<-Change detected
18395167 2014-03-01 14:48:25: Sample[1326]=345,ave(10)=343.20,ave(2)=346.50,dif=3.30
18395409 2014-03-01 14:53:28: Sample[1327]=347,ave(10)=343.60,ave(2)=346.00,dif=2.40
18395645 2014-03-01 14:58:30: Sample[1328]=347,ave(10)=343.90,ave(2)=347.00,dif=3.10
The right way to go about problems of this kind is to build a model of the phenomenon of interest and also a model of the noise process, and then make inferences about the phenomenon given some data. These inferences are necessarily probabilistic. The general computation you need to carry out is P(H_k | data) = P(data | H_k) P(H_k) / (sum_k (P(data | H_k) P(H_k)) (a generalized form of Bayes rule) where the H_k are all the hypotheses of interest, such as "step of magnitude at time " or "noise of magnitude ". In this case there might be a large number of plausible hypotheses, covering all possible magnitudes and times. You might need to limit the range of hypotheses considered in order to make the problem tractable, e.g. only looking back a certain number of time steps.
I have a system that logs date:time and it returns results such as:
05.28.2013 11:58pm
05.27.2013 10:20pm
05.26.2013 09:47pm
05.25.2013 07:30pm
05.24.2013 06:24pm
05.23.2013 05:36pm
What I would like to be able to do is have a list of date:time prediction for the next few days - so a person could see when the next event might occur.
Example of prediction results:
06.01.2013 04:06pm
05.31.2013 03:29pm
05.30.2013 01:14pm
Thoughts on how to go about doing time prediction of this kind with php?
The basic answer is "no". Programming tools are not designed to do prediction. Statistical tools are designed for that purpose. You should be thinking more about R, SPSS, SAS, or some other similar tool. Some databases have rudimentary data analysis tools built-in, which is another (often inferior) option.
The standard statistical technique for time-series prediction is called ARIMA analysis (auto-regressive integrated moving average). It is unlikely that you are going to be implementing that in php/SQL. The standard statistical technique for estimating time between events is Poisson regression. It is also highly unlikely that you are going to be implementing that in php/SQL.
I observe that your data points are once per day in the evening. I might guess that this is the end of some process that runs during the day. The end time is based on the start time and the duration of the process.
What can you do? Often a reasonable prediction is "what happened yesterday". You would be surprised at how hard it is to beat this prediction for weather forecasting and for estimating the stock market. Another very reasonable method is the average of historical values.
If you know something about your process, then an average by day of the week can work well. You can also get more sophisticated, and do Monte Carlo estimates, by measuring the average and standard deviation, and then pulling a random value from a statistical distribution. However, the average value would work just as well in your case.
I would suggest that you study a bit about statistics/data mining/predictive analytics before attempting to do any "predictions". At the very least, if you really have a problem in this domain, you should be looking for the right tools to use.
As Gordon Linoff posted, the simple answer is "no", but you can write some code that will give a rough guess on what the next time will be.
I wrote a very basic example on how to do this on my site http://livinglion.com/2013/05/next-occurrence-in-datetime-sequence/
Here's a possible way that this could be done, using PHP + MySQL:
You can have a table with two fields: a DATE field and a TIME field (essentially storing the date + time portion separately). Say that the table is named "timeData" and the fields are:
eventDate: date
eventTime: time
Your primary key would be the combination of eventDate and eventTime, so that they're never repeated as a pair.
Then, you can do a query like:
SELECT eventTime, count(*) as counter FROM timeData GROUP BY eventTime ORDER BY counter DESC LIMIT 0, 10
The aforementioned query will always return the first 10 most frequent event times, ordered by frequency. You can then order these again from smallest to largest.
This way, you can return quite accurate time prediction results, which will become even more accurate as you gather data each day
I have a tricky question that I've looked into a couple of times without figuring it out.
Some backstory: I am making a textbased RPG-game where players fight against animals/monsters etc. It works like any other game where you hit a number of hitpoints on each other every round.
The problem: I am using the random-function in php to generate the final value of the hit, depending on levels, armor and such. But I'd like the higher values (like the max hit) to appear less often than the lower values.
This is an example-graph:
How can I reproduce something like this using PHP and the rand-function? When typing rand(1,100) every number has an equal chance of being picked.
My idea is this: Make a 2nd degree (or quadratic function) and use the random number (x) to do the calculation.
Would this work like I want?
The question is a bit tricky, please let me know if you'd like more information and details.
Please, look at this beatiful article:
http://www.redblobgames.com/articles/probability/damage-rolls.html
There are interactive diagrams considering dice rolling and percentage of results.
This should be very usefull for you.
Pay attention to this kind of rolling random number:
roll1 = rollDice(2, 12);
roll2 = rollDice(2, 12);
damage = min(roll1, roll2);
This should give you what you look for.
OK, here's my idea :
Let's say you've got an array of elements (a,b,c,d) and you won't to randomly pick one of them. Doing a rand(1,4) to get the random element index, would mean that all elements have an equal chance to appear. (25%)
Now, let's say we take this array : (a,b,c,d,d).
Here we still have 4 elements, but not every one of them has equal chances to appear.
a,b,c : 20%
d : 40%
Or, let's take this array :
(1,2,3,...,97,97,97,98,98,98,99,99,99,100,100,100,100)
Hint : This way you won't only bias the random number generation algorithm, but you'll actually set the desired probability of apparition of each one (or of a range of numbers).
So, that's how I would go about that :
If you want numbers from 1 to 100 (with higher numbers appearing more frequently, get a random number from 1 to 1000 and associate it with a wider range. E.g.
rand = 800-1000 => rand/10 (80->100)
rand = 600-800 => rand/9 (66->88)
...
Or something like that. (You could use any math operation you imagine, modulo or whatever... and play with your algorithm). I hope you get my idea.
Good luck! :-)