mysql interpolate data - php

Is there an 'easy way' to grab two rows of data from a table, and add rows with values 'in-between'?
I want to grab a latitude, a longitude and a timestamp from each row. Compare the timestamp to the one from the previous row, and interpolate new rows if the timestamp is bigger than my minimum...grab two rows 1 minute apart and add rows for every 10 seconds...
Is using a stored procedure the best way to go about this? Easiest?
Currently using mySql and PHP...

I would just grab the data and do the math in PHP. SQL isn't all that versatile, and you'd be saving yourself a headache.
EDIT: Actually, just for the fun of it, you could make the math easier by left-joining to a calendar table.
First you need a table ints with the values 0-9. Then you can do something like:
SELECT cal.t, lat, lng FROM (
SELECT {start_time} + INTERVAL (t.i*1000 + u.i*100 + v.i*10) SECOND as t
FROM ints as t
JOIN ints as u
JOIN ints as v
WHERE t <= {end_time}
) LEFT JOIN locations ON (cal.t = locations.stamp)
This would return a table with NULL values for lat and lng where there isn't an entry on the 10 second mark, so you could iterate through and do the math for just those. Keep in mind, this only works if you all the datapoints you do have (other than the start and end) land right on a 10-second mark.

Related

Using MYSQL, SELECT every nth row from a subquery SELECT

I have a table of over 300,000 rows and I would like to render this data on a graph, but 300,000 rows isn't really necessary all at once. For example, even though there may be 100 rows of data for a given day, I don't need to display all that data if I'm showing a whole year worth of data. So I would like to "granularize" the data.
I was thinking of getting everything and then using a script to remove what I don't need, but that seems like it would be much slower and harder on the database.
So here's what I have so far.
SET #row_number := 0;
SELECT #row_number := #row_number + 1 as row_number,
price, region, timestamp as row_number FROM pricehistory;
This gives me all the rows and numbers them. I was planning on adding a where clause to get every 1000 rows (i.e. every nth row) like this
SET #row_number := 0;
SELECT #row_number := #row_number + 1 as row_number,
price, region, timestamp as row_number FROM pricehistory
WHERE row_number % 1000 = 0;
But MYSQL doesn't see row_number as a column for some reason. Any ideas? I've looked at other solutions online, but they don't seem to work for MYSQL in particular.
As Racil's comment suggested, you can just go by an auto-incremented id field if you have one; but you've stated the amount of data for different dates could be different, so this could make for a very distorted graph. If you select every 1000th record for a year and half the rows are from the last 3 months ("holiday shopping" for a commerce example), the latter half of a year graph will actually reflect the latter quarter of the year. For more useful results you're most likely better off with something like this:
SELECT region, DATE(timestamp) AS theDate
, AVG(price), MIN(price), MAX(price)
FROM pricehistory
GROUP BY region, theDate
;
It doesn't look like I'm going to get another answer so I'll go ahead and write the solution I came up with.
My data is pretty evenly distributed as it grabs prices at regular intervals so there's no reason to worry about that.
Here's my solution.
Let's say I have 500,000 rows and I want to display a subset of those rows let's say 5000 rows. 500000/5000 is 100 so I take 100 and use it in my select statement like this SELECT * FROM pricehistory where id % 100 = 0;
Here is the actual code
public function getScaleFactor($startDate, $endDate) {
$numPricePoints = $this->getNumPricePointsBetweenDates($startDate, $endDate);
$scaleFactor = 1;
if ($numPricePoints > $this->desiredNumPricePoints) {
$scaleFactor = floor($numPricePoints / $this->desiredNumPricePoints);
}
return $scaleFactor;
}
I then use $scaleFactor in the SQL like this SELECT * FROM pricehistory WHERE id % {$scaleFactor} = 0;
This isn't a perfect solution because you don't always end up with 5000 rows exactly, but I don't NEED exactly 5000 rows. I'm just trying to reduce the resolution of the data while still getting a graph that looks close to what it would be had I used all 500,000 rows.

Finding Interval of a data present on latest 2 dates

I'm developing a web-based tool that can help analyze number intervals that occurs in a 6-digit lottery.
Let us focus on a certain number first. Say 7
The sql query I've done so far:
SELECT * FROM `l642` WHERE `1d`=7 OR `2d`=7 OR `3d`=7 OR `4d`=7 OR `5d`=7
OR `6d`=7 ORDER BY `draw_date` DESC LIMIT 2
This will pull the last two latest dates where number 7 is present
I'm thinking of using DATEDIFF but I'm confused on how to get the previous value to subtract it on the latest draw_date
My goal is to list the intervals of numbers 1-42 and I'll plan to accomplish it using PHP.
Looking forward to your help
A few ideas spring to mind.
(1) First, since you perfectly have your result set ordered, use PHP loop on the two rows getting $date1 =$row['draw_date']. Then fetch next/last row and set $date2 =$row['draw_date']. With these two you have
$diff=date_diff($date1,$date2);
as the difference in days.
(2)
A second way is to have mysql return datediff by including a rownumber in the resultset and doing a self-join with aliases say alias a for row1 and alias b for row2.
datediff(a.draw_date,b.drawdate).
How one goes about getting rownumber could be either:
(2a) rownumber found here: With MySQL, how can I generate a column containing the record index in a table?
(2b) worktable with id int auto_increment primary key column with select into from your shown LIMIT 2 query (and a truncate table worktable between iterations 1 to 42) to reset auto_increment to 0.
The entire thing could be wrapped with an outer table 1 to 42 where 42 rows are brought back with 2 columns (num, number_of_days), but that wasn't your question.
So considering how infrequent you are probably doing this, I would probably recommend not over-engineering it and would shoot for #1

MySQL sort within one column of data

For example, if I had a column in a database called RANDOM that has random bits of information distinguished by their end notation like this:
RANDOM
1. 12312 KM, 201 M, 1213 H, 101029 DOLLARS
2. 231 KM, 2351 M, 754 H, 345 DOLLARS, 120 L, 1201 FT
3. 2324 M
Some entries have other miscellaneous but important data points while others my only have one or two.
I would like to sort using only data within column RANDOM.
$RESULT = mysqli_query($CON, "SELECT * FROM TABLE WHERE RANDOM CONTAINS 'M' ORDER BY 'NUMBER BEFORE M'");
Therefore this would find the 3 rows that contain 'M' and then sort by the number in front of 'M'. Similarly with other variables like KM or DOLLARS. Is this possible using pure MySQL in a single statement?
Your sort would require a 2 aspirin headache with
ORDER BY substr('str',locate(str,a,b),locate(str,b,c))
If I understand the question then yes you could do it in one line , just requires some manipulation as your data is all in the one column.
SELECT * FROM
(SELECT *, CAST(LEFT(RANDOM, INSTR(RANDOM, '.') - 1) AS UNSIGNED) AS ORD_NUMBER
FROM `test_table`
) DERIVED_TABLE ORDER BY ORD_NUMBER DESC
Basically we are getting the number at the start by using INSTR to find the first . ( assuming this is the format for all items) and then to be safe we cast that as an integer. The data is returned using a derived table so then we sort on our dynamically calculated ord_number that's the result from filtering. Hope that helps.
So using this principle you can filter out however you like :)
But maybe rethink the db Design as doing anykind of query like this means maybe your RDBMS isn't setup correct (simple solution new column with that value :))

MySQL Query Between Two Ranges

I need help with a query. I am taking input from a user where they enter a range between 1-100. So it could be like 30-40 or 66-99. Then I need a query to pull data from a table that has a high_range and a low_range to find a match to any number in their range.
So if a user did 30-40 and the table had entries for 1-80, 21-33, 32-40, 40-41, 66-99, and 1-29 it would find all but the last two in the table.
What is the easiest why to do this?
Thanks
If I understood correctly (i.e. you want any range that overlaps the one entered by the user), I'd say:
SELECT * FROM table WHERE low <= $high AND high >= $low
What I understood is that the range is stored in this format low-high. If that is the case, then this is a poor design. I suggest splitting the values into two columns: low, and high.
If you already have the values split, you can use some statement like:
SELECT * FROM myTable WHERE low <= $needleHigherBound AND high >= $needleLowerBound
If you have the values stored in one column, and insist they stay so, You might find the SUBSTRING_INDEX function of MySQL useful. But in this case, you'll have to write a complicated query to parse all the values of all the rows, and then compare them to your search values. It seems like a lot of effort to cover up a design flaw.

distance calculations in mysql queries

I have to query a database of thousands of entries and order this by the distance from a specified point.
The issue is that each entry has a latitude and longitude and I would need to retrieve each entry to calculate its distance. With a large database, I don't want to retrieve each row, this may take some time.
Is there any way to build this into the mysql query so that I only need to retrieve the nearest 15 entries.
E.g.
`SELECT events.id, caclDistance($latlng, events.location) AS distance FROM events ORDER BY distance LIMIT 0,15`
function caclDistance($old, $new){
//Calculates the distance between $old and $new
}
Option 1:
Do the calculation on the database by switching to a database that supports GeoIP.
Option 2:
Do the calculation on the databaseusing a stored procedure like this:
CREATE FUNCTION calcDistance (latA double, lonA double, latB double, LonB double)
RETURNS double DETERMINISTIC
BEGIN
SET #RlatA = radians(latA);
SET #RlonA = radians(lonA);
SET #RlatB = radians(latB);
SET #RlonB = radians(LonB);
SET #deltaLat = #RlatA - #RlatB;
SET #deltaLon = #RlonA - #RlonB;
SET #d = SIN(#deltaLat/2) * SIN(#deltaLat/2) +
COS(#RlatA) * COS(#RlatB) * SIN(#deltaLon/2)*SIN(#deltaLon/2);
RETURN 2 * ASIN(SQRT(#d)) * 6371.01;
END//
If you have an index on latitude and longitude in your database, you can reduce the number of calculations that need to be calculated by working out an initial bounding box in PHP ($minLat, $maxLat, $minLong and $maxLong), and limiting the rows to a subset of your entries based on that (WHERE latitude BETWEEN $minLat AND $maxLat AND longitude BETWEEN $minLong AND $maxLong). Then MySQL only needs to execute the distance calculation for that subset of rows.
If you're simply using a stored procedure to calculate the distance) then SQL still has to look through every record in your database, and to calculate the distance for every record in your database before it can decide whether to return that row or discard it.
Because the calculation is relatively slow to execute, it would be better if you could reduce the set of rows that need to be calculated, eliminating rows that will clearly fall outside of the required distance, so that we're only executing the expensive calculation for a smaller number of rows.
If you consider that what you're doing is basically drawing a circle on a map, centred on your initial point, and with a radius of distance; then the formula simply identifies which rows fall within that circle... but it still has to checking every single row.
Using a bounding box is like drawing a square on the map first with the left, right, top and bottom edges at the appropriate distance from our centre point. Our circle will then be drawn within that box, with the Northmost, Eastmost, Southmost and Westmost points on the circle touching the borders of the box. Some rows will fall outside that box, so SQL doesn't even bother trying to calculate the distance for those rows. It only calculates the distance for those rows that fall within the bounding box to see if they fall within the circle as well.
Within your PHP (guess you're running PHP from the $ variable name), we can use a very simple calculation that works out the minimum and maximum latitude and longitude based on our distance, then set those values in the WHERE clause of your SQL statement. This is effectively our box, and anything that falls outside of that is automatically discarded without any need to actually calculate its distance.
There's a good explanation of this (with PHP code) on the Movable Type website that should be essential reading for anybody planning to do any GeoPositioning work in PHP.
EDIT
The value 6371.01 in the calcDistance stored procedure is the multiplier to give you a returned result in kilometers. Use appropriate alternative multipliers if you want to result in miles, nautical miles, meters, whatever
SELECT events.id FROM events
ORDER BY pow((lat - pointlat),2) + pow((lon - pointlon),2) ASC
LIMIT 0,15
You dont have to calculate the absolute distance in meters using the radius of the earth and so forth.
To get the closest points you only need the points ordered with relative distance.
Is this what you're looking for? http://zcentric.com/2010/03/11/calculate-distance-in-mysql-with-latitude-and-longitude/
i think stored procedures are what you're looking for.
If your question is a "find my nearest" or "store finder" type question then you can google for those terms. Generally though, that type of data is accompanied by a postal code of some description, and it is possible to narrow down the list (as Mark Maker points out) by association with postal code.
Every case is different, and this may not apply to you, just throwing it out there.

Categories