PHP Get peak from dataset with value and timestamp - php

I need to get the peak from a dataset using PHP. This dataset is made with timestamp and value. I need to get the 3 peak like the image with the 3 relative timestamp
This is a graphic rappresentation of the dataset:
But i don't need to rappresentate graphically, i'd like just a simple return of an array of the three value/timestamp. I need also a sort of threshold for avoid flase positive peak, for example minimum variation like from 0 to 400 (i'll define it in case but i need a threshold)
You can find the example dataset here:
https://wetransfer.com/downloads/d7d20a726285ea29ae2ff682764b045020210401192032/13e788
Many thanks for the help, i'm stuck with this. I have searched on Stackoverflow, i have see some algorithm but i cant apply to my necessity

I think that your sample data size suggests a database as a proper tool for the job. So assuming that data is already stored in table readings with two numeric columns - ts and reading, then this query may help.
select ts, reading
from
(
select ts, reading, lag(reading) over (order by ts) as variation from readings
) as t
where variation < 400 -- or whatever your threshold value may be
order by reading desc
limit 3;
This is Postgresql dialect that I am most comfortable with. You can re-write it in another SQL dialect if necessary and then easily pull the result data in PHP using PDO for example.

Related

Full text document similarity search

I have big database of articles and I'd like before adding new items to DB check if already similar items exist and if so - group them together, so that later I can easily display them as a group of similar items.
Currently we use very simple, but shockingly very precise and our needs fully satisfying PHP's similar_text() function. The problem is, that before we add an item to DB, we first need to pull X amount of items from DB to then loop through every single one in order to check whether our new item is at least 75% similar to other items in order to group them together. This uses a lot of resources and time that we don't really have.
We use MySQL and Solr for all our queries. I've tried using MySQL Full-Text Search, Solr More like this. Compared to PHPs implementation, they are super fast and efficient, but I just can't get a robust percentage score which PHP similar_text() provides. It is crucial for our grouping to be accurate.
For example using this MySQL query:
SELECT id, body, ROUND(((MATCH(body) AGAINST ('ARTICLE TEXT')) / scores.max_score) * 100) as relevance
FROM natural_text_test,
(SELECT MAX(MATCH(body) AGAINST('ARTICLE TEXT')) as max_score FROM natural_text_test LIMIT 1) scores
HAVING relevance > 75
ORDER BY relevance DESC
i get that article with 130 words is 85% similar with another article with 4700 words. And in comparison PHP's similar_text() returns only 3% similarity score which is well below our threshold and is correct in our case.
I've also looked into Levenshtein distance algorithm, but it seems that the same problem as with MySQL and Solr arises.
There has to be a better way to handle similarity checks, maybe I'm using the algorithms incorrectly?
Based on some of the Comments, I might propose this...
It seems that 75%-similar documents would have a lot of the same sentences in the same order.
Break the doc into sentences
Take a crude hash of each sentence, map it to a visible ascii character. This gives you a string that is, perhaps, 1/100th the size of the original doc.
Store that with the doc.
When searching, use levenshtein() on this string to find 'similar' documents.
Sure, hashing is imperfect, etc. But this is fast. And you could apply some other technique to double-check the few docs that are close.
For a hash, I might do
$md5 = md5($sentence);
$x = somehow get 6 bits out of that hex string
$hash = chr(ord('0' + $x));

Efficient way of emulating LIMIT (FETCH), OFFSET in Progress OpenEdge 10.1B SQL using PHP

I want to be able to use the equivalent of MySQL's LIMIT, OFFSET in Progress OpenEdge 10.1b.
Whilst the FETCH/OFFSET commands are available as of Progress OpenEdge 11, unfortunately version 10.1B does not have them, therefore it is difficult to produce paged recordsets (e.g. Records 1-10, 11-20, 21-30 etc.).
ROW_NUMBER is also not supported by 10.1b. Seems that it is pretty much the same functionality as was found in SQL Server 2000.
If searching always in the order of the primary key id (pkid), this could be achieved by using "SELECT TOP 10 * FROM table ORDER BY pkid ASC", then identifying the last pkid and finding the next set with "SELECT TOP 10 * FROM table WHERE pkid>last_pkid ORDER BY pkid ASC"; this, however only works when sorting by the pkid.
My solution to this was to write a PHP function where I could pass the limit and offset and then return only the results where the row number was between my those defined values. I use TOP to return no more than the sum of the limit and offset.
function limit_query($sql, $limit=NULL, $offset=0)
{
$out = array();
if ($limit!=NULL) {
$sql=str_replace_first("SELECT", "SELECT TOP ".($limit+$offset), $sql);
}
$query = $db->query($sql); //$db is my DB wrapper class
$i=0;
while ($row = $this->fetch($query)) {
if ($i>=$offset) { //only add to return array if greater than offset
$out[] = $row;
}
$i++;
}
$db->free_result($query);
return $out;
}
This works well on small recordsets or on the first few pages of results, but if the total results are in the thousands, if you want to see results on page 20, 100 or 300, it is very slow and inefficient (Page one is querying only the first 10 results, page 2 the first 20 but page 100 will query the first 1000).
Whilst in most cases, the user will probably not venture past page 2 or 3, so the lack of efficiency isn't perhaps a major issue, I do wonder if there is a more efficient way of emulating this functionality.
Sadly, upgrading to a newer version of Progress, or a superior database such as MySQL is not an option, as the db is provided by third-party software.
Can anyone suggest alternative, more efficient methods?
I am not sure I fully understand the question, so here's an attempt to give you an answer:
You probably won't be able to do what you want with a single hit to the db. Just by sorting records / adding functions you probably won't achieve the paging functionality you are trying to get. As far as I know, Progress won't number the rows, unless, as you said, you're sorting by some crescent pkid.
My suggestion to you would be a procedure to run in the back end to create the query with a batch size same as the page (in your case 10), and use a loop to get the next batch until you get the ones you need. Look into batching datasets or use an open query using MAX-ROWS.
Hope it helps, or at least gives you an idea to get this. I actually like your PHP implementation, it seems like a good workaround, not ugly to keep.
You should be able to install an upgraded version of Progress, convert your database(s) and recompile the code against the new version. Normally your support through your vendor would provide you with the latest version of Progress (Openedge) and wouldn't be a huge issue. Going from version 10 to 11 shouldn't cause any compile issues and give you all of the SQL benefits of the newer version.
Honestly your comment about MySql being superior is a little confusing, but that's a discussion for another day. ;D
Best regards!

Do a calculation in SQL query or PHP?

this is my first post, so I hope I follow all the conventions :D
I have a MySql database with x and y coordinates and a php web service that receives other x and y coords. I want to calculate the square distance: (x2-x1)^2 + (y2-y1)^2 and sort from closest to farthest
SQL query (in php) would be:
SELECT ((x_coor-$x)*(x_coor-$x)+(y_coor-$y)*(y_coor-$y)) AS SquareDis FROM $table ORDER BY SquareDis
What is more gentle to performance, doing this in a SQL query or in the php program?
Thanks for all answers in advance!
If you want to sort by the results, then do the calculation in SQL.
Otherwise, you are just using the database as a "file store" and not taking advantage of the functionality that it offers. In addition, by doing the ordering in the database, you can limit the number of rows being returned -- another optimization.

Basic PHP/MySQL math example

I have a PHP form that grabs user-entered data and then posts it to a MySQL database. I'd like to know, how I can take the mathematical difference between two fields and post it to a third field in the database?
For example, I'd like to subtract "travel_costs" from "show_1_price" and write the difference to the "total_cost" variable. What's the best way to do this? Thanks so much.
You can lately process a select query: SELECT show_1_price - travel_costs AS pricediff FROM my_table; and then grab value in php and again do an insert query...
Should be simple to do on the PHP side of things how about
query=sprintf("INSERT INTO table VALUES(%d, %d, %d)", travel_costs,
show_1_price, show_1_price - travel_cost);
Generally though it is bad form to store a value in a database that can be calculated from other values. The reason being that you may never ever access this value again yet you are using storage for it. CPU cycles are much more abundant today so calculate the value when need. This is not a golden rule though - there are times when it could be more efficient to store the calculated value - although this is not usually the case.

MySql speed of executing max(), min(), sum() on relatively large database

I have a relatively large database (130.000+ rows) of weather data, which is accumulating very fast (every 5minutes a new row is added). Now on my website I publish min/max data for day, and for the entire existence of my weatherstation (which is around 1 year).
Now I would like to know, if I would benefit from creating additional tables, where these min/max data would be stored, rather than let the php do a mysql query searching for day min/max data and min/max data for the entire existence of my weather station. Would a query for max(), min() or sum() (need sum() to sum rain accumulation for months) take that much longer time then a simple query to a table, that already holds those min, max and sum values?
That depends on weather your columns are indexed or not. In case of MIN() and MAX() you can read in the MySQL manual the following:
MySQL uses indexes for these
operations:
To find the MIN() or MAX() value for a
specific indexed column key_col. This
is optimized by a preprocessor that
checks whether you are using WHERE
key_part_N = constant on all key parts
that occur before key_col in the
index. In this case, MySQL does a
single key lookup for each MIN() or
MAX() expression and replaces it with
a constant.
In other words in case that your columns are indexed you are unlikely to gain much performance benefits by denormalization. In case they are NOT you will definitely gain performance.
As for SUM() it is likely to be faster on an indexed column but I'm not really confident about the performance gains here.
Please note that you should not be tempted to index your columns after reading this post. If you put indices your update queries will slow down!
Yes, denormalization should help performance a lot in this case.
There is nothing wrong with storing calculations for historical data that will not change in order to gain performance benefits.
While I agree with RedFilter that there is nothing wrong with storing historical data, I don't agree with the performance boost you will get. Your database is not what I would consider a heavy use database.
One of the major advantages of databases is indexes. They used advanced data structures to make data access lightening fast. Just think, every primary key you have is an index. You shouldn't be afraid of them. Of course, it would probably be counter productive to make all your fields indexes, but that should never really be necessary. I would suggest researching indexes more to find the right balance.
As for the work done when a change happens, it is not that bad. An index is a tree like representation of your field data. This is done to reduce a search down to a small number of near binary decisions.
For example, think of finding a number between 1 and 100. Normally you would randomly stab at numbers, or you would just start at 1 and count up. This is slow. Instead, it would be much faster if you set it up so that you could ask if you were over or under when you choose a number. Then you would start at 50 and ask if you are over or under. Under, then choose 75, and so on till you found the number. Instead of possibly going through 100 numbers, you would only have to go through around 6 numbers to find the correct one.
The problem here is when you add 50 numbers and make it out of 1 to 150. If you start at 50 again, your search is less optimized as there are 100 numbers above you. Your binary search is out of balance. So, what you do is rebalance your search by starting at the mid-point again, namely 75.
So the work a database is just an adjustment to rebalance the mid-point of its index. It isn't actually a lot of work. If you are working on a database that is large and requires many changes a second, you would definitely need to have a strong strategy for your indexes. In a small database that gets very few changes like yours, its not a problem.

Categories