I have built an application in CakePHP that lists businesses. There are about 2000 entries, and the latitude and longitude coordinates for each business is in the DB.
I now am trying to tackle the search function.
There will be an input box where the user can put a street address, city, or zipcode, and then I would like it to return the 11 closest businesses as found from the database.
How would I go about doing this?
I use the Yahoo Geo Planet API to identify the place corresponding to the search term the user entered. This normally matches multiple places, so you have to present them back to the user to get them to pick the right one. Then, once you know the right place, and it's lat longs, which the Yahoo API provides, you can use the haversine formula to get the closest businesses to the users location. There's a good example in the answer to this question.
I'd approach this by creating a square around the point, to get the point is a whole thing in isself, as you'll need a postcode database, or api, which tend to cost money. Either in buying the database or per lookup.
Doing it by city or similar at least, you could probably, not sure, return the long-lat from GMaps for that city.
Then I'd try and get 4 corner longlat coords around that point. Then I could search the database for values which are between those.
Either way it's a tricky thing I'd say, fab question though! Interested to see peoples suggestions.
Related
Say I have a database table representing users with potentially millions of records (Wishful thinking). This table contains a whole bunch of information about each user including information about their location:
City
County/State etc
Country
Latitude
Longitude
Geohash based on the latitude/longitude values.
I would like to implement a feature where by a logged in user can search for other users that are nearby.
Ideally, I would like to grab say the 20 users that are geographically closest to the user, followed by the next 20, and the next 20 etc. So essentially I want to be able to order my users table by the distance from a certain point.
Approach 1
I have some previous experience with the haversine formula which I used to calculate the distance between one point and a few hundred others. This approach would be ideal on a relatively small record set but I fear it would become incredibly slow with such a large record set.
Approach 2
I've additionally done some research into geohashing and I understand how the hash is calculated and I get the theory behind how it represents a location and how precision is lost with shorter resolutions. I could of course grab the users that are located near the user's geographical area by grabbing users that have a similar beginning to their geohash (Based on a precision I specify - and potentially looking in the neighbouring regions) but that doesn't solve the problem of needing to sort by location. This approach is also not great for edge cases where 2 users may be very close to one another but lie close to the edges of 2 regions represented by the geohash.
Any ideas/suggestion towards the approach would be greatly appreciated. I'm not looking for code in particular but links to good examples and resources would be helpful.
Thanks,
Jonathon
Edit
Approach 3
After some thought I've come up with another potential solution to consider. Upon receiving each user's location information, I would store information about the location (town/city, area, country, latitude, longitude, geohash maybe) in a separate table (say locations). I would then connect the user to the location by a foreign key. This would give me a much smaller dataset to work with. To find nearby users I could then simply find other locations that are close to the user's location and then use their IDs to find other users. Perhaps some sort of caching could be then implemented by storing a list of the nearby location IDs for each location.
You can try a space filling curve. Translate the co-ordinate to a binary and interleave it. Treat it as base-4 number. You are also wrong a geohash can be used to sort also by location. Most likely use a bounding box and filter the solution and then use the harvesine formula.
I have a database table filled with addresses, the table is over 4,000 records long
I am wondering the best way to get the addresses compare it with a search field and sort them by distance from the search field location? GoogleAPI documentation says the requests are limited to like 25,000 per day does that mean I can only do 7 searches per day?
In my opinion - yes. Google is smart about calculating distance between 2 LatLng's, because gives you distance using streets and roads, not distance in a straight line between 2 points (which would be easy to calculate in php).
Saving LatLng's of those 4000 addresses wouldn't do you any good, because you still need to ask google about the distance from a user's address to each of them. You can't calculate that yourself even if you have all the LatLng's (you need the map).
I guess you could save each user input, and save that address with 4000 distances to each location... but that would only be useful for a user returning to the site for the 2nd time.
...
Ok, I have this idea:
Do store the LatLng's of each of the 4000 locations.
Store distances between all 4000 locations (so that if you pick one, you could get the list of the all the others, ordered by distance).
When you get the address from a user, convert it to a LatLng, and use simple mathematics to find the closest location in a straight line.
Using the list of distances to all the other locations from database, ask google to get the accuall distance to that location, and about 10-20 to locations closest to it; for the rest use the distances from the closest from the database.
This way you'd get the first 10-20 accurate distances to the closest locations from user input, and the rest would be pulled from the database - they would actually be distances from the closest location to the other locations.
I believe that since the addresses don't change that much, you can cache the latitude/longitude somewhere and refer to those instead of making repeated requests. Please elaborate if there are other mitigating conditions, of course.
I'm building a directory that lets you view business around you, based on your current location. Right now, these business are stored in the database as address1, address2, city, province, postal_code.
If I'm wanting to do distances, should I be storing lat/long as well? What's the best way to go about this?
I'm using PHP, HTML5 geolocation and Google Maps.
Unless you have other ways to narrow your database search, you probably do want to keep lat/long, as it will help you filter your database search for nearby addresses. It's complicated to accurately compute exact distances between two points using lat/long, but it's at least a starting point for narrowing down the number of addresses to look at in more detail.
Also, if you want to be able to run when you don't have a data connection, you need some data to use in lieu of a web-lookup. It's possible to get a reasonably accurate distance from lat-long without having to worry too much about the convergence of longitude lines, at least for an around-me kind of an accuracy.
You don't "have to" if you look at the distance service in google maps you can pass an address or lat, long:
https://developers.google.com/maps/documentation/javascript/distancematrix
You don't want to do a distance calculation for every entry in your database, you need to narrow it down. The postal_code may be sufficient for this. If it isn't, you can quickly narrow the search with lat/lon by restricting each to a range around the location; this provides a nearly square trapezoid region which will probably be good enough as-is.
I have a whitelist of cities. Let's say, Seattle, Portland, Salem. Using GeoIP, I'd detect user city. Let's call it $user_city. Based on $user_city, I want to display classified-listings from nearest city from my whitelist (Seattle || Portland || Salem) with in 140 miles. If city is not listed in 140 miles, I'd just show a drop-down and ask user to manually select a city.
There are a few ways of doing this:
calculate this on the fly (I found an algorithm in one of SO answers)
with help of DB (let me explain):
create a table called regions
regions will have
city 1 | city 2 | distance (upto 140 miles)
city 1= cities from whitelist
city 2= any city within 140 miles from city 1
This would create a reasonable sized table. If my whitelist has 200 cities, and there are 40 cities (or towns) within 140 miles of each city. This would create 8000 rows.
Now, when a user comes to my site:
1) I check if user is from whitelist city already (city 1 column). If so, display that city
2). If not, check if $user_city is in "city 2" column
2a) if it is, get whitelist city with lowest distance
2b) if it is not, display drop-down for manual input
Final constraint: whichever method we select, it has to work from within iFrame. I mean, can I create this page on my mysite1.com and embed this page inside someothersite2.com inside an iframe? Will it still be able to get user_city and find nearest whitelisted city? I know there are some cross-domain scripting rules so I am not sure if iFrame would be able to get user-ip address, pass it to GeoIP, and resolve it to $user_city
So, my question:
How best to do this? If a lot of people embed my page in their page (using iframe) then my server would get pounded 10000s of times per second (wishful thinking, but let's assume that's the case). I don't know if a DB would be able to handle so much pounding. I don't want to have to pay for more DB servers or web-servers. I want to minimize resource-requirement at my end. So, I don't mind offloading a bit of work to user's browser via JavaScript.
EDIT:
Some answers have recommended storing lat, long and then doing the Math. The reason I suggested creating a 'regions' table is that this way all math is precomputed. If I have a "whitelist" of cities, and if I precompute all possible nearby city for each whitelisted city. Then I don't have to compute distance (using Haversine algorithm for eg) everytime.
Is it possible to offload all of this to user's browser via some crafty use of Java Script? I don't want to overload my server for a free service. It might make money but I am very close to broke and I am afraid my server would go down before I make enough money to pay for the upgrades.
So, the three constraints of this problem are 1) should work from inside iframe (I am hoping this will go viral and every blogger would want to embed my site into their page's iframe. 2) should be very fast 3) should minimize load on my server
Use one table City and do a mysql math-calculation for every query, with the addition of a cache layer eg memcache. Fair performance and very flexible!
Use two tables City (id,lat,lng,name) and Distance (city_id1,city_id2,dist), get your result by a traditional JOIN. (Could use a cache layer too.) Not very flexible.
Custom data structure: CityObj (id,lat,lng,data[blob]) just serialize and compress a php-array of the cities and store it. This might rise your eyebrows but as we know the bottleneck is never CPU or memory, it's disc IO. This is one read from an index of an INT as apposed to the JOIN which uses a tmp-table. This is not very flexible but will be fast and scalable. Easy to shard and cluster.
Is it possible to offload all of this to user's browser via some crafty use of Java Script? I don't want to overload my server for a free service. It might make money but I am very close to broke and I am afraid my server would go down before I make enough money to pay for the upgrades.
Yes, it is possible...using Google Maps API and the geometry library. The function you are looking for is google.maps.geometry.spherical.computeDistanceBetween. Here is an example that I made a while ago that might help get you started. I use jQuery here. Take a look at the source to see what's happening and modify as needed. Briefly:
supplierZips is an Array of zip codes comparable to your city whitelist.
The first thing I do on page load is geocode the whitelist locations. You can actually do this ahead of time and cache the results, if your city whitelist is constant. This'll speed up your app.
When the user enters a zip code, I first check if it's a valid zip from a json dataset of all valid zip codes in the U.S.( http://ampersand.no.de/maps/validUSpostalCodes.json, 352 kb, data generated from zip code data at http://www.geonames.org).
If the zip is valid, I compute the location between that zip and each location in the whitelist, using the aforementioned computeDistanceBetween in the Google Maps API.
Hope this helps get you started.
You just have to get the lat and the long of each city and add it to the database.
So every city only has 1 record. No distances are stored on the position on the globe.
Once you have that you can easily do a query with using haversine formula ( http://en.wikipedia.org/wiki/Haversine_formula ) to get the nearest cities within a range.
know there are some cross-domain scripting rules so I am not sure if iFrame would be able to get user-ip address
It will be possible to get the user ip or whatever if you just get the info from the embedded page.
I don't know if a DB would be able to handle so much pounding
If you have that many requests you should have by then found a way to make a buck with it :-) which you can use for upgrades :D
Your algorithm seems generally correct. What I would do is use PostGIS (a postgresql plugin, and easier to set up than it looks :-D). I believe the additional learning curve is totally worth it, it is THE standard for geodata.
If you put the whitelist cities in as POINTs, with latitudes and longitudes, you can actually ask PostGIS to sort by distance to a given lat/lon. It should be much more efficient than doing it yourself (PostGIS is very optimized).
You could get lats and longs of your user cities (and the whitelist cities) by using a geocoding API like Yahoo Placefinder or Google Maps. What I would do would be to have a table (either the same as the whitelist cities or not) that stores city name, lat, and lon, and do lookups on that. If the city name isn't found though, hit the API you are using, and cache the result in the table. This way you'll quickly not need to hit the API except for obscure places. The API is fast too.
If you're really going to be seeing that kind of server load, you may want to look into using something besides PHP though (such as node.js). Incidentally you shouldn't have any trouble geocoding from an iframe, from the Point of View of the server, its just like the browser is going to that page "normally".
So, I am trying to develop an application that will display user listings. The site should detect user location (I am using maxmind api for that) and then show listsings from user's location + cities within a user-specified radius.
How do I do this? MaxMind API lets me detect user's city by IP address but how do I find nearby cities?
Reference site: www.oodle.com (you can also manually change city+radius).
Sanguine
Rather than store and compare cities, store and compare latitudes and longitudes, which are concrete locations rather than ambiguous names. All of MaxMind's GeoIP databases provide them. A quick Google search should provide you the math to calculate distances between points on the earth.
If you actually want to find nearby cities, not nearby users as you've said, then you need a database mapping cities to locations. Again, MaxMind provides this with all their databases. Go to their website, go to the page about the database you purchased or downloaded, and look at the instructions for inserting the CSV format into a SQL database. That'll get you the latitude and longitude of each city. Then, again, a Google search will provide you the math to calculate the distance between two points on the earth (lat/long pairs) in a SQL query. Order by that calculation to get the nearest cities.
Sorry to give you only advice rather than code, but there's a lot of little things you've just gotta do yourself to build this site.