Distance formula for MariaDB nearest 200 places without radius - php

I have MariaDB, Server version: 10.0.23-MariaDB, with latitude and longitude columns (float 10,6) plus a geo_location column (geometry) that was calculated from the latitude and longitude columns.
I would like to find the nearest 200 people from a person. The person at the center has a latitude and longitude that is passed to the query. Is there a way to do that without a radius? So, if the population density is high the radius would be small. If the population density is low then the radius would be large.
There are about 4 million rows, and it needs to be as fast as possible. The rows can be filtered first based on the county that they reside. Some counties are super large with low population density and others are small counties with high population density. I need the fastest way to find the nearest 200 people.

SELECT *, ST_DISTANCE(geo_location, POINT(lon, lat)) AS distance
FROM geotable
ORDER by distance DESC
LIMIT 200;
The bad news is that it will be very slow, because no spatial indexes are used by st_distance(). You should try to restrict your query by using a maximum radius to select less records:
set #dist = 100;
set #rlon1 = lon-#dist/abs(cos(radians(lat))*69);
set #rlon2 = lon+#dist/abs(cos(radians(lat))*69);
set #rlat1 = lat-(#dist/69);
set #rlat2 = lat+(#dist/69);
SELECT *, ST_DISTANCE(geo_location, POINT(lon, lat)) AS distance
FROM geotable
WHERE ST_WITHIN(geo_location,ENVELOPE(LINESTRING(point(#rlon1, #rlat1), point(#rlon2, #rlat2))))
ORDER by distance DESC
LIMIT 200;
Or if you have the POLYGON coordinates of each country, you could use that instead of a maximum radius.

6 decimal places is good enough (16cm / 0.5 ft), but FLOAT (1.7m / 5.6 ft) looses some of that precision. It is essentially never good to tack (M,N) onto FLOAT or DOUBLE; you incur 2 roundings, one of which is a waste.
The is no straightforward way to "find nearest" on the globe because there are no "2-dimensional" indexes. However, by using partitioning for one dimension and a clustered PRIMARY KEY for the other, you can do a pretty good job.
The real problem with most solutions is the large number of disk blocks that need to be hit without finding valid items. In fact, usually well over 90% of the rows touched are not needed.
All of this is 'solved' in My lat/lng blog. It will touch maybe 800 rows to get the 200 you desire, and they will be well clustered, so only a few blocks need be touched. It does not need any pre-filtering on Country, but it does need some radical restructuring of the table. And, if you want to distinguish two people embracing each other, I suggest a scaled INT (16mm / 5/8 in) - Degrees * 10000000. Also, FLOAT won't work with PARTITIONing; INT will. The code in that link uses a MEDIUMINT scaled (2.7m / 8/8 ft), but that could be changed.

Related

Match location between coordinates

I have a static table with location name, latitude, longitude, tolerance. for example:
NY, 40.7128, 74.0060, 100 x 50
There are 600 records at this time, the table will grow slowly.
Also there is a dynamic table with MAC address, and coordinates that change every few minutes:
AC233F271FE4, 40.7228, 74.0110
With 4000 records
I want to count how many MAC addresses are within tolerance for each location. Accuracy for earth being a sphere/ellipsoid is not important.
At first I was going to calculate the distance between two points in SQL query when I want to display count, but now I am thinking if it is better to calculate this in php when I update coordinates for the MAC address. I could add a location column, calculate closest point, update MACs lat/long/location every few minutes. Then SQL query for displaying count would be a simple SELECT COUNT GROUP.
The main question is - at what stage is it better to determine location within tolerance?
Second question would be - do I use geography function of SQL (I have read that it is slow) or Haversine formula and how to implement tolerance into distance between two points?
P.S. been goggling, current plan is to make a function that will find nearest location name for given coordinates and make a computed column in the dynamic table which will call the function. In my mind it will work like this:
receive MAC address, lat, long => update lat, long where MAC the same as received => computed column runs the function to get the name for updated coordinates => in front end I can hopefully display MAC + location name.
This looks overcomplicated and confusing to me, would be educational to find a better way.
This feels like a great use of the geospatial capabilities in SQL Server. And you can (mostly) do it with what you have, but with some augmentations for performance. Here's what I'd suggest.
Add a column to your static locations table that represents a polygon for your tolerance. I'm not sure what "100 x 50" means (a bounding box that's 100m x 50m? if so, what's the orientation?). Either way, presumably you can derive a polygon given the lat/long and the tolerance. Persist that in a column of type geography. Put an index on this column.
Similarly, add a column to your dynamic table. In some ways, this is easier than the above as you can just make it a computed column (persisted if you can) that has the definition of newColumn as geography::Point(Latitude, Longitude, 4236). Index this column as well.
Finally, you have what you need. You can now run a geospatial query like so:
select *
from dbo.dynamicTable as d
join dbo.staticTable as s
on s.BoundingBox.STContains(d.Point) = 1;
The indexing on the the two tables is what makes this reasonable. Otherwise, it has to do a Cartesian join between the two. That's unlikely to perform well, regardless of the specifics.

How to group objects based on longitude/latitude proximity using laravel/php

I have a group of users. The user count could be 50 or could be 2000. Each should have a long/lat that I have retrieved from Google Geo api.
I need to query them all, and group them by proximity and a certain count. Say the count is 12 and I have 120 users in the group. I want to group people by how close they are (long/lat) to other people. So that I wind up with 10 groups of people who are close in proximity.
I currently have the google geo coding api setup and would prefer to use that.
TIA.
-- Update
I have been googling about this for awhile and it appears that I am looking for a spatial query that returns groups by proximity.
Keep in mind that this problem grows exponentially with every user you add, as the amount of distance calculations is linked to the square of the number of users (it's actually N*(N-1) distances... so a 2000 user base would mean almost 4 million distance calculations on every pass. Just keep that in mind when sizing the resources you need
Are you looking to group them based on straight-line (actually great circle) distance or based on walking/driving distance?
If the former, the great circle distance can be approximated with simple math if you're able to tolerate a small margin of error and wish to assume the earth is a sphere. From GCMAP.com:
Earth's hypothetical shape is called the geoid and is approximated by
an ellipsoid or an oblate sphereoid. A simpler model is to use a
sphere, which is pretty close and makes the math MUCH easier. Assuming
a sphere of radius 6371.2 km, convert longitude and latitude to
radians (multiply by pi/180) and then use the following formula:
theta = lon2 - lon1
dist = acos(sin(lat1) × sin(lat2) + cos(lat1) × cos(lat2) × cos(theta))
if (dist < 0) dist = dist + pi
dist = dist × 6371.2
The resulting distance is in kilometers.
Now, if you need precise calculations and are willing to spend the CPU cycles needed for much complex math, you can use Vincenty's Formulae, which uses the WGS-84 reference ellipsoid model of the earth which is used for navigation, mapping and whatnot. More info HERE
As to the algorithm itself, you need to build a to-from matrix with the result of each calculation. Each row and column would represent each node. Two simplifications you may consider:
Distance does not depend on direction of travel, so $dist[n][m] == $dist[m][n] (no need to calculate the whole matrix, just half of it)
Distance from a node to itself is always 0, so no need to calculate it, but since you're intending to group by proximity, to avoid a user being grouped with itself, you may want to always force $dist[m][m] to an arbitrarily defined and abnormally large constant ($dist[m][m] = 22000 (miles) for instance. Will work as long as all your users are on the planet)
After making all the calculations, use an array sorting method to find the X closest nodes to each node and there you have it
(you may or may not want to prevent a user being grouped on more than one group, but that's just business logic)
Actual code would be a little too much to provide at this time without seeing some of your progress first, but this is basically what you need to do algoritmically.
... it appears that I am looking for a spatial query that returns groups by proximity. ...
You could use hdbscan. Your groups are actually clusters in hdbscan wording. You would need to work with min_cluster_size and min_samples to get your groups right.
https://hdbscan.readthedocs.io/en/latest/parameter_selection.html
https://hdbscan.readthedocs.io/en/latest/
It appears that hdbscan runs under Python.
Here are two links on how to call Python from PHP:
Calling Python in PHP,
Running a Python script from PHP
Here is some more information on which clustering algorithm to choose:
http://nbviewer.jupyter.org/github/scikit-learn-contrib/hdbscan/blob/master/notebooks/Comparing%20Clustering%20Algorithms.ipynb
http://scikit-learn.org/stable/modules/clustering.html#clustering
Use GeoHash algorithm[1]. There is a PHP implementation[2]. You may pre-calculate geohashes with different precision, store them in SQL database alongside lat-lon values and query using native GROUP BY.
https://en.wikipedia.org/wiki/Geohash
https://github.com/lvht/geohash

Finding cities close to one another using longitude and latitude [duplicate]

This question already has answers here:
MySQL Great Circle Distance (Haversine formula)
(9 answers)
Closed 2 years ago.
Each user in my db is associated to a city (with it's longitude and latitude)
How would I go about finding out which cities are close to one another?
i.e. in England, Cambridge is fairly close to London.
So If I have a user who lives in Cambridge. Users close to them would be users living in close surrounding cities, such as London, Hertford etc.
Any ideas how I could go about this? And also, how would I define what is close? i.e. in the UK close would be much closer than if it were in the US as the US is far more spread out.
Ideas and suggestions. Also, do you know any services that provide this sort of functionality?
Thanks
If you can call an external web service, you can use the GeoNames API for locating nearby cities within some radius that you define:
http://www.geonames.org/export/web-services.html
Getting coordinates from City names is called reverse geo coding. Google maps has a nice Api fot that.
There is also the Geonames project where you get huge databases of cities, zip codes etc and their cooridnates
However if you already have the coordinates, its a simple calculation to get the distance.
The tricky thing is to get a nice performant version of it. You probably have it stored in a mysql database, so you need to do it there and fast.
It is absolutely possible. I once did a project including that code, I will fetch it and post it here.
However to speed things up I would recommend first doing a rectangular selection around the center coordinates. This is very, very fast using bee tree indexes or even better stuff like multidimensional range search. Then inside that you can then calculate the exact distances on a limited set of data.
Outside that recangular selection the directions are so vast that it does not need to be displayed or calculated so accurately. Or just display the country, continent or something like that.
I am still at the office but when i get home i can fetch the codes for you. Int he meantime it would be good if you could inform me how you store your data.
Edit: in the mean time here you have a function which looks right to me (i did it without a function in one query...)
CREATE FUNCTION `get_distance_between_geo_locations`(`lat1` FLOAT, `long1` FLOAT, `lat2` FLOAT, `long2` FLOAT)
RETURNS FLOAT
LANGUAGE SQL
DETERMINISTIC
CONTAINS SQL
SQL SECURITY DEFINER
COMMENT ''
BEGIN
DECLARE distance FLOAT DEFAULT -1;
DECLARE earthRadius FLOAT DEFAULT 6371.009;
-- 3958.761 --miles
-- 6371.009 --km
DECLARE axis FLOAT;
IF ((lat1 IS NOT NULL) AND (long1 IS NOT NULL) AND (lat2 IS NOT NULL) AND (long2 IS NOT NULL)) THEN -- bit of protection against bad data
SET axis = (SIN(RADIANS(lat2-lat1)/2) * SIN(RADIANS(lat2-lat1)/2) + COS(RADIANS(lat1)) * COS(RADIANS(lat2)) * SIN(RADIANS(long2-long1)/2) * SIN(RADIANS(long2-long1)/2));
SET distance = earthRadius * (2 * ATAN2(SQRT(axis), SQRT(1-axis)));
END IF;
RETURN distance;
END;
i quoted this from here: http://sebastian-bauer.ws/en/2010/12/12/geo-koordinaten-mysql-funktion-zur-berechnung-des-abstands.html
and here is another link: http://www.andrewseward.co.uk/2010/04/sql-function-to-calculate-distance.html
The simplest way to do this would be to calculate a bounding box from the latitude and longitude of the city and a distance (by converting the distance to degrees of longitude).
Once you have that box (min latitude, max latitude, min longitude, max longitude), query for other cities whose latitude and longitude are inside the bounding box. This will get you an approximate list, and should be quite fast as it will be able to use any indexes you might have on the latitude and longitude columns.
From there you can narrow the list down if desired using a real "distance between points on a sphere" function.
You need a spatial index or GIS functionality. What database are you using? MySQL and PostgreSQL both have GIS support which would allow you to find the N nearest cities using an SQL query.
Another option you might want to consider would be to put all of the cities into a spatial search tree like a kd-tree. Kd-trees efficiently support nearest-neighbor searches, as well as fast searches for all points in a given bounding box. You could then find nearby cities by searching for a few of the city's nearest neighbors, then using the distance to those neighbors to get an estimate size for a bounding box to search in.

distance calculations in mysql queries

I have to query a database of thousands of entries and order this by the distance from a specified point.
The issue is that each entry has a latitude and longitude and I would need to retrieve each entry to calculate its distance. With a large database, I don't want to retrieve each row, this may take some time.
Is there any way to build this into the mysql query so that I only need to retrieve the nearest 15 entries.
E.g.
`SELECT events.id, caclDistance($latlng, events.location) AS distance FROM events ORDER BY distance LIMIT 0,15`
function caclDistance($old, $new){
//Calculates the distance between $old and $new
}
Option 1:
Do the calculation on the database by switching to a database that supports GeoIP.
Option 2:
Do the calculation on the databaseusing a stored procedure like this:
CREATE FUNCTION calcDistance (latA double, lonA double, latB double, LonB double)
RETURNS double DETERMINISTIC
BEGIN
SET #RlatA = radians(latA);
SET #RlonA = radians(lonA);
SET #RlatB = radians(latB);
SET #RlonB = radians(LonB);
SET #deltaLat = #RlatA - #RlatB;
SET #deltaLon = #RlonA - #RlonB;
SET #d = SIN(#deltaLat/2) * SIN(#deltaLat/2) +
COS(#RlatA) * COS(#RlatB) * SIN(#deltaLon/2)*SIN(#deltaLon/2);
RETURN 2 * ASIN(SQRT(#d)) * 6371.01;
END//
If you have an index on latitude and longitude in your database, you can reduce the number of calculations that need to be calculated by working out an initial bounding box in PHP ($minLat, $maxLat, $minLong and $maxLong), and limiting the rows to a subset of your entries based on that (WHERE latitude BETWEEN $minLat AND $maxLat AND longitude BETWEEN $minLong AND $maxLong). Then MySQL only needs to execute the distance calculation for that subset of rows.
If you're simply using a stored procedure to calculate the distance) then SQL still has to look through every record in your database, and to calculate the distance for every record in your database before it can decide whether to return that row or discard it.
Because the calculation is relatively slow to execute, it would be better if you could reduce the set of rows that need to be calculated, eliminating rows that will clearly fall outside of the required distance, so that we're only executing the expensive calculation for a smaller number of rows.
If you consider that what you're doing is basically drawing a circle on a map, centred on your initial point, and with a radius of distance; then the formula simply identifies which rows fall within that circle... but it still has to checking every single row.
Using a bounding box is like drawing a square on the map first with the left, right, top and bottom edges at the appropriate distance from our centre point. Our circle will then be drawn within that box, with the Northmost, Eastmost, Southmost and Westmost points on the circle touching the borders of the box. Some rows will fall outside that box, so SQL doesn't even bother trying to calculate the distance for those rows. It only calculates the distance for those rows that fall within the bounding box to see if they fall within the circle as well.
Within your PHP (guess you're running PHP from the $ variable name), we can use a very simple calculation that works out the minimum and maximum latitude and longitude based on our distance, then set those values in the WHERE clause of your SQL statement. This is effectively our box, and anything that falls outside of that is automatically discarded without any need to actually calculate its distance.
There's a good explanation of this (with PHP code) on the Movable Type website that should be essential reading for anybody planning to do any GeoPositioning work in PHP.
EDIT
The value 6371.01 in the calcDistance stored procedure is the multiplier to give you a returned result in kilometers. Use appropriate alternative multipliers if you want to result in miles, nautical miles, meters, whatever
SELECT events.id FROM events
ORDER BY pow((lat - pointlat),2) + pow((lon - pointlon),2) ASC
LIMIT 0,15
You dont have to calculate the absolute distance in meters using the radius of the earth and so forth.
To get the closest points you only need the points ordered with relative distance.
Is this what you're looking for? http://zcentric.com/2010/03/11/calculate-distance-in-mysql-with-latitude-and-longitude/
i think stored procedures are what you're looking for.
If your question is a "find my nearest" or "store finder" type question then you can google for those terms. Generally though, that type of data is accompanied by a postal code of some description, and it is possible to narrow down the list (as Mark Maker points out) by association with postal code.
Every case is different, and this may not apply to you, just throwing it out there.

How do I calculate and use a Morton (z-index) value to index geodata with PHP/MySQL?

I have a MySQL table of records, each with a lat/lng coordinate. Searches are conducted on this data based on a center point and a radius (any records within the radius is returned). I'm using the spherical law of cosines to calculate the distance within my query. My problem is that indexing the geodata is horribly inefficient (lat/lng values are stored as floats). Using MySQL's spatial extensions is not an option. With datasets around 100k in size the query takes an unreasonable amount of time to execute.
I've done some research and it seems like using a z-index i.e. Morton number could help. I could calculate the Morton number for each record on insertion and then calculate a high/low Morton value for a bounding box based on the Earth's radius/center point/given search radius.
I only know enough about this stuff to build my app so I'm not entirely sure if this would work, and I also don't know how I can compute the Morton number in PHP. Would this be a bitwise operation?
If your radius is small compared to the size of the Earth, then you can probably get by with simple 2D Pythagorus rather than expensive 3D spherical geometry. This is probably less true the closer you get to the poles, so I hope you're not mapping penguins or polar bears!
Next, think about the bounding boxes for your problem. You know they must be within +/- $radius of the search point. Convert the search radius to degrees and find all records where lat/lon is within the box defined by the search center +/- $radiusindegrees.
If you do that search first and come up with a list of possible matches then you have only to filter out the corners of your search box from the resulting data set. If you get back the lat/lon of the matching points you can calculate the distance in PHP and avoid having to calculate it for all points in the table. Did that make sense?
use the database to find everything that fits within a square bounding box and then use PHP to filter those points that are outside of the desired radius.

Categories