I would like to implement a search by distance on a website.
There must be a user in a living city can find all users living within 100 or 200 km for example.
I have a table in my database that stores all the cities and their coordinates.
I thought to create another table that would store the distance between all cities but my data base contains 36,000 cities and it may make a lot of records ...
How could I make this search more simply knowing that my project will be developed with Symfony and Doctrine?
Thank you beforehand
You can use the correct answer here to determine the distance between co-ordinates.
Measuring the distance between two coordinates in PHP
For performance reasons you need to use geospatial index to efficiently query such a database. For example MongoDB has a feature for this.
If performance is not an issue you can simply store locations in relational database table and calculate distances in SQL. See this question for some information about this solution: Geo-Search (Distance) in PHP/MySQL (Performance)
Related
I'm going to build an app where the users can see points of interest in a predefined radius around their location.
My first idea was to store the latitude and longitude of all POI's in a database and comparing the users location with the POI's location via SQL.
The problem is the performance I think. If there are thousands of POI's and thousands of user requests with their location, it wouldn't be very economically or is this no problem for todays servers?
My next approach was to divide the map in quadrants, and only observing the surrounding quadrants.
tl;dr:
All in all I'm looking for:
a way to do an radius search
at best caching the results for other users
the cache will be updated when a new POI is being registered.
If you have any ideas how to realize something like that, please let me know.
Thank you
Fabian
I think what you are looking for is the Harversine formula, which it allows you to find the distance between two points in a sphere (in this case the Earth). An implementation using SQL would be something like this:
ACOS (
SIN(RADIANS($latitude)) *
SIN(RADIANS(T.latitude))+
COS(RADIANS($latitude)) *
COS(RADIANS(T.latitude))*
COS(RADIANS($longitude-T.longitud)))*6378.137 AS distance
Adding this to the select of your query will return a column called distance calculating (in Km) how far is the point ($latitude,$longitude), normally the user, from (T.latitude,T.longitude), normally the element of the table.
In case you want to filter, and don't show elements further than a certain distance you can make a condition like:
HAVING distance<$radius
I imagine that you are using MySQL, if this is the case you have to use HAVING instead of WHERE to make a condition over a computed column (distance).
A complete example of a query would be like this:
SELECT T.*, ACOS (
SIN(RADIANS($latitude)) *
SIN(RADIANS(T.latitude))+
COS(RADIANS($latitude)) *
COS(RADIANS(T.latitude))*
COS(RADIANS($longitude-T.longitud)))*6378.137 AS distance
FROM your_table as T
HAVING distance < $radius
ORDER BY distance LIMIT $limit
If you want to optimize a bit more the performance add a limit to the query so you will have for example the 10 nearest places.
Take your time to consider Spatial data types aswell since they were specificly made for this kind of work.
Note that I do not recommend you to insert your php variables directly into your query, is really unsecure, I did that only as an example.
Hope this helps you.
I have a database like this
http://i.stack.imgur.com/MHEwr.jpg
I have a PHP function which will compute distance { get_distance ($person_location) } of that address from the user (web user).
I need to have a query which will use that function and return the data from the database order by distance from the user [Using { get_distance ($person_location) } function of PHP].
Can anyone help me please?
You can't sort your SQL results on the serverside by the result of a PHP function.*
There are two approaches to your general problem:
1. Move calculation to SQL
Your distance computation probably relies on geo-coordinates (latitude and longitude). Save this data for every address in the database and then do the distance computation in SQL as well.
Find more on how to do this in MySQL here: Fastest Way to Find Distance Between Two Lat/Long Points
Your todo list for this. Do the following things ONCE:
Get all your addresses from the DB
Calculate the geo coordinates for each address with your PHP API
Update your database and put those geo coordinates in extra columns
Do the following things from now on:
Every time you add a row to your table, calculate the geo coordinates beforehand with your API and add them as well
Every time you change an address in the database, calculate the new geo coordinates with your PHP API and update them as well
Every time you need to calculate the distance for the current user to all other addresses, do a SELECT query which computes the distance and does the sorting
2. Do everything in PHP
Query your database for all addresses, put them into a PHP array, compute the distance to the current user with your function and then sort your array.
I strongly suggest not to do that, however, and implement everything on the server-side (Approach 1).
* well in theory you could, by calulcating the distance for every address offline, updating a temporary table with the result, and then querying your table again using this temporary table to sort your results. However, this is even worse than doing everything in PHP, you shouldn't even consider this!
imho it is not possible to use PHP functions in your query, only thing like aggregate functions served by MySQL.
I guess you need to process through the data by PHP.
I want to love DynamoDB, but the major drawback is the query/scan on the whole DB to pull the results for one query. Would I be better sicking with MySQL or is there another solution I should be aware of?
Uses:
Newsfeed items (Pulls most recent items from table where id in x,x,x,x,x)
User profiles relationships (users follow and friend eachother)
User lists (users can have up to 1,000 items in one list)
I am happy to mix and match database solutions.The main use is lists.
There will be a few million lists eventually, ranging from 5 to 1000 items per list. The list table is formatted as follows: list_id(bigint)|order(int(1))|item_text(varchar(500))|item_text2(varchar(12))|timestamp(int(11))
The main queries on this DB would be on the 'list_relations' table:
Select 'item_text' from lists where list_id=539830
I suppose my main question. Can we get all items for a particular list_id, without a slow query/scan? and by 'slow' do people mean a second? or a few minutes?
Thank you
I'm not going to address whether or not it's a good choice or the right choice, but you can do what you're asking. I have a large dynamoDB instance with vehicle VINs as the Hash, something else for my range, and I have a secondary index on vin and a timestamp field, I am able to make fast queries over thousands of records for specific vehicles over timestamp searches, no problem.
Constructing your schema in DynamoDB requires different considerations than building in MySQL.
You want to avoid scans as much as possible, this means picking your hash key carefully.
Depending on your exact queries, you may also need to have multiple tables that have the same data..but with different hashkeys depending on your querying needs.
You also did not mention the LSI and GSI features of DynamoDB, these also help your query-ability, but have their own sets of drawbacks. It is difficult to advise further without knowing more details about your requirements.
I just recently discovered sphinx search which I want to use for my PHP application. I have a table of geolocations where every record stores a country code. For every user who uses the search function to look up geopositions, I know which country he is from.
How would I reweigh the results such that the matching results are ascending in distance to the country of the user? I already have calculated a distance matrix for each country to each other, which I can access via SQL. The country information in the geolocation database is stored as 2 letter ISO country code.
What is a good solution for this problem? I heard about UDFs, are they applicable for that problem? Is it possible to solve this problem more easily by reformatting my table?
Thank you very much.
The "easiest" way to solve this is to have coordinates for each country. You then store the coordinates for each record in the sphinx index, and when searching find the coordinates and us it in the search. This way sphinx caculates the distance dynamically.
Did you have coordinates likes this to create the matrix? But it also resupposes, you are just using a 'point' per country, if your matrix is more advanced, eg taking the closest point on the borders of each (to make disances between odd shaped countries better), then it wont work so well.
In theory you could perhaps do this with payloads, by using the country name as keywords, and the distance in a payload (arranged specially so that close disances have a high weight) but will probably be expensive to index, and might not work all that well in practice.
I work on a site which sells let's say stuff and offers a "vendors search". On this search you enter your city, or postal code, or region and a distance (in km or miles) then the site gives you a list of vendors.
To do that, I have a database with the vendors. In the form to save these vendors, you enter their full address and when you click on the save button, a request to google maps is made in order to get their latitude and longitude.
When someone does a search, I look on a table where I store all the search terms and their lat/lng.
This table looks like
+--------+-------+------+
| term | lat | lng |
+--------+-------+------+
So the first query is something very simple
select lat, lng from my_search_table where term = "the term"
If I find a result, I then search with a nice method for all the vendors in the range the visitor wants and print the result on a map.
If I don't find a result, I search with a levenshtein function because people writing bruxelle or bruxeles instead of bruxelles is something really common and I don't want to make a request to google maps all the time (I also have a "how many time searched" column in my table to get some stats)
So I request my_search_time with no where clause and loop through all results to get the smallest levensthein distance. If the smallest result is greater than 2, I request coordinates from google maps.
Here is my problem. For some countries (we have several sites all around the world), my_search_table has 15-20k+ entries... and php doesn't (really) like looping on such data (which I perfectly understand) and my request falls under the php timeout. I could increase this timeout but the problem will be the same in a few months.
So I tried a levensthein MySQL function (found on stackoverflow btw) but it's also very slow.
So my question is "is there any way to make this search fast even on very large datasets ?"
My suggestion is based on three things:
First, your data set is big. That means - it's: big enough to reject the idea of "select all" + "run levenshtein() in PHP application"
Second, you have control over your database. So you can adjust some architecture-related things
Finally, performance of SELECT queries is the most important thing, while performance for adding new data doesn't matter.
The thing is you can not perform fast levenshtein search because levenshtein itself is very slow. I mean, calculating levenshtein distance is a slow thing. Thus, you'll not be able to resolve the issue with only "smart search". You'll have to prepare some data.
Possible solution will be: create some group index and assign it during adding/updating data. That means - you'll store additional column which will store some hash (numeric, for example). When adding new data, you'll:
Perform search with levenshtein distance (for that you may either use your application or that function which you've (already mentioned) over all records in your table against inserted data
Set group index for new row to value of index which found rows in previous step have.
If nothing found, set some new group index value (it' the first row and there are no similar rows yet) - which will be different from any group index values that already present in table
To search desired rows, you'll need just select rows with same group index value. That means: your select queries will be very fast. But - yes, this will cause extremely huge overhead when adding/changing your data. Thus, it isn't applicable for case, when performance of updating/inserting matters.
You could try MySQL function SOUNDS LIKE
SELECT lat, lng FROM my_search_table WHERE term SOUNDS LIKE "the term"
You can use a kd-tree or a ternary tree to speed up the search. The idea is to use a binary search.