I have a table with over 2 million rows. One of the values is an address. Some rows have a common address. I am using php.
On my website, I want the user to put in their zip code, and I will return all results within that zip code. I would then use Google to Geolocate them on a map. The problem is that since Google charges by the query, I can't be wasting time and money requesting coordinates for an address I already have. Here is what I believe to be the correct approach:
Ask user for zip code
Run "Select * with 'Zip Code' = $user_zip" (paraphrasing)
Run a Geolocate on first address and plot on map
Check for matching addresses in result and group with the mapped result
Find next new address
Repeat 3-6 until complete
Is there a better way to approach this? I am looking for efficiency, easy way to manipulate all matching results at once, and the least amount of queries. If my way is correct, can someone please help me with the logic for numbers 3-5?
If I understand this right what you are trying to do is to render a map with markers for each record in your database that is within a certain zip area. And your challenge is that you need coordinates to render each marker. The biggest issue with your approach in terms of wasting resources is that you do not store the coordinates of each address in your database. I would suggest you to:
1 - Alter the endpoint (or script or whatever) that creates these records in your db to fetch the coordinates and store them in the database.
2 - Run a one time migration to fetch coordinates for each record. While I understand that doing this for 2 milion rows could be "costly" with Google's Geocoding (Estimate is 1000$ for 2 milion api calls). To save the costs you could look into some of the opensource map tools.
Either way fetching coordinates during the request lifecycle is both a waste of resource and it will significantly affect speeds.
I've got a list of shops that I have put in a javascript array. I have their addresses as well.
I'm needing to create an autocomplete which allows me to put in a city name and it displays the 3 nearest to that location. I imagine it will need to interface with google's apis some how but not sure where to start.
I've got the actual autocomplete jquery stuff working on an ajax script, but I don't know how to get things located nearest.
You need the lat/long locations of the stores, https://developers.google.com/maps/documentation/geocoding/ Then you need the lat/long location of the user, with some relatively simple mathematics you can then calculate the distance between these two points:
$distance = round((6371*3.1415926*sqrt(($lat2-$lat1)*($lat2-$lat1) +
cos($lat2/57.29578)*cos($lat1/57.29578)*($lon2-$lon1)*($lon2-$lon1))/180), 1);
If you have a large number of stores and a large number of users I advise caching these distances in a mysql table, you have to do this for each store in your database. So you create a table for each e.g. zipcode that requests this and put up a cron to remove these tables every hour or so.
So the process:
User asks for the nearest store
You get his location through google api (or your own storage)
Check if there's a table for his location
If yes, give him the results directly, if no generate the table and give him the results
Mind that google only allows a limited number of data requests. Even though this number is huge (I believe 25.000 requests per day) it may be advisable to store the lat-lon locations of your stores AND users. Would also improve the speed.
I made something similar to this, I fetched the lat/lon locations at the moment a location was inserted into the database and inserted it in a seperate per-zipcode lat/lon table.
I am designing a web app where I need to determine which places listed in my DB are in the users driving distance.
Here is a broad overview of the process that I am currently using -
Get users current location via Google's map api
Run through each place in my database(approx 100) checking if the place is within the users driving distance using the google places api. I return and parse the JSON file with PHP to see if any locations exist given the users coordinates.
If place is in users driving distance display top locations(limited to 20 by google places), other wise don't display
This process works fine when I am running through a handful of places, but running through 100 places is much slower, and makes 100 api calls. With Google's current limit of 100,000 calls per day, this could become an issue down the road.
So is there a better way to determine which places in my database are within a users driving distance? I do not want to keep track of addresses in my DB, I want to rely on Google for that.
Thanks.
You can use the formula found here to calculate the distance between zip codes:
http://support.sas.com/kb/5/325.html
This is not precise (door-step to door-step) but you can calculate the distance from the user's zip code to the location's zip code.
Using this method, you won't even have to hit Google's API.
I have an unconventional idea for you. This will be very, very odd when you think about it for the first time, as it does exactly the opposite order of what you will expect to do. However, you might get to see the logic.
In order to put it in action, you'll need a broad category of stuff that you want the user to see. For instance, I'm going to go with "supermarkets".
There is a wonderful API as part of google places called nearbySearch. Its true wonder is to allow you to rank places by distance. We will make use of this.
Pre-requisites
Modify your database and store the unique ID returned on nearbySearch places. This isn't against the ToS, and we'll need this
Get a list of those IDs.
The plan
When you get the user's location, query nearbySearch for your category, and loop through results with the following constraints:
If the result's ID matches something in your database, you have that result. Bonus #1: it's sorted by distance ascending! Bonus #2: you already get the lat-loc for it!
If the result's ID does not match, you can either silently skip it or use it and add it to your database. This means that you can quite literally update your database on-the-fly with little to no manual work as an added bonus.
When you have run through the request, you will have IDs that never came up in the results. Calculate the point-to-point distance of the furthest result in Google's data and you will have the max distance from your point. If this is too small, use the technique I described here to do a compounded search.
The only requirement is: you need to know roughly what you are searching for. However, consider this: your normal query cycle takes you anywhere between 1 and 100 google Queries. My method takes 1 for a 50km radius. :-)
To calculate distances, you will need Haversine's formula rather than doing a zip code lookup, by the way. This has the added advantage of being truly international.
Important caveats
This search method directly depends on the trade-off between the places you know about and the distance. If you are looking for less than 10km radii, use this method to only generate one request.
If, however, you have to do compounded searching, bear in mind that each request cycle will cost you 3N, where N is the number of queries generated on the last cycle. Therefore, if you only have 3 places in a 100km radius, it makes more sense to look up each place individually.
Let's say your site has 200,000 unique users a day. So, your server is heavily loaded/pounded; and you do NOT have resources to buy a bigger/better server. So, you are stuck with what you have.
Now, whenever a user comes to your site, you need to do some calculation (calculate distance between user city as detected via GeoIP and some whitelist of cities, figure out the nearest city within 140 mile radius).
Would you do this calculation via PHP or via JavaScript?
First, would you precalculate all nearby cities within 140 mile radius of whitelisted cities? For eg: Whitelist city 1 can have 20 nearby cities. Or would you do on-the-fly calculation everytime?
For eg:
Whitelist = Detroit, MI
and nearby city = Kalamazoo, MI (140 miles)
Second, if pre-computed: would you store this in XML file or some MySQL table? Now, we just have to search through a table (mysql or xml no more than 1 mb in size). I am guessing this would be inefficient because client browser (JavaScript) would have to download 1mb xml and search through it. This would make page load time even slower. Using DB might be faster but then DB load increases (if 200,000 unique users are trying to load the page over the course of a day).
Maybe the best way to do would be to do precompute, store precomputed results in XML, and then use PHP to search through XML and find nearest whitelisted city to user?
If you, the site, are actually relying on the city information, then you must do the calculation on the server.
Database queries are almost always going to be faster than XML searches for sufficiently large XML files. You can optimize the query, MySQL will cache things, etc.
Pre-calculating all city-city distances would be a way to go, for sure. GeoIP doesn't only provide city names, it does give actual latitude/longitude locations as well. I'm sure that the possible list of cities changes rather constantly, too.
I would look into using the geospacial capabilities of MySQL. General over view of searching by coordinates here:
Fastest Way to Find Distance Between Two Lat/Long Points
In short what you will do is setup a database of the cities you care about, with their lat/long, and query that table based on the GeoIP provided lat/long.
One of the sites I work on is a social networking site of sorts, and the content would be greatly enhanced by using some sort of location service to recommend "friends" based on proximity. The site focuses on the US, but with potential users worldwide.
I've considered creating an associative array or relational database with countries, states/provinces/territories, counties, and cities to provide a rough way to drill down to their relative proximity, but this can be extremely unwieldy and complicated very quickly.
I've also considered IP geolocation, but the results tend to be unreliable (some services show my company's IP as located some 600 miles North-east), and I would at least need some sort of fallback to lookup, for instance, a zip/postal code.
Can you tell me a clear defined way to effectively do this sort of lookup locally, without use of 3rd party APIs, preferably with at least some reference to where to gather the basic information from in the first place? I'm currently running PHP 5.3.2 and MySQL 5.1.44, if it makes any difference.
Thank you!
EDIT:
Added a bounty to try to get better ideas, or other ways of handling the problem, perhaps more efficiently. As it is, the load time due to the huge database size is insane. I figure I definitely need to improve my caching, but I'm trying to see if there's anything I should be doing with regard to improving my location system.
This might be a bit obvious... but the only way that you can know the location of a user, with the best degree of accuracy is to actually:
Ask the User where they are!
Once you have asked the user where they are, you can then use third party applications to figure out distances.
If you don't want to use any third party application as your question mentioned, then you could download and integrate one of the Geo databases into your own service.
The source which I use is Yahoo Geo Planet.
You can download the entire GeoPlanet Data file which comes in TSV format. When I downloaded it I just imported it to mysql using mysqlimport.
http://developer.yahoo.com/geo/geoplanet/data/
It contains a record for every distinct geographically location in the world. A tonne of post codes, districts, regions, countries, practically everything you would ever need.
In addition to that, it contains neighbours, so you can query based on geographic regions which are close to.
Unfortunately, simply asking where they are isn't quite good enough, and while GeoPlanet is a good option, and I have decided to use it, I didn't feel it was a complete answer. Yes, it works, but -how-. Aliases don't cover misspellings, and while most outsiders call San Francisco things like "San Fran" or "Frisco", locals use "The City", so aliases don't always work. I needed some level of exactitude.
Well, after some work, here's the approach I've used, which is a bit intensive, and may not be an option for everybody, but works for me:
First thing, grab a copy of the GeoPlanet db in TSV format from http://developer.yahoo.com/geo/geoplanet/data/ (105 MB Zipped)
To import this into my MySQL db, I created the tables with columns named according to the Readme file located in the zip. Geoplanet_places was the only one given a primary key associated to the WOE_ID. This and geoplanet_adjacencies are really the only tables I need at this moment. For me, importation was done locally to my DB using:
mysqlimport --socket=/PATH/TO/SOCKET/mysql.sock --user=EXAMPLE --password=EXAMPLE DATABASE_NAME /PATH/TO/DOWNLOADED/GEOPLANET/DATA/geoplanet_places.tsv
I stripped the version number from the .tsv, and used the filename as the table name. Your experience may be significantly different, but I'm adding it for clarity. Import all the files you want.
I decided to have two options for people entering their profile data: You always have to select your country (from an option list, using ISO 3166 Alpha-2 Codes as the value), but we can then use either the postal (ZIP/PIN) code to look up where they are; or, for countries like Ireland lacking a national postal code system, they can enter their city and province name.
To search using country and postal code, I can do something like this:
SELECT Parent_ID FROM geoplanet_places WHERE ISO = "$ctry" AND Name="$zip" AND PlaceType="ZIP";
I count the results. If 0, I have no result, the place is not known, and I assume a problem (An error is logged accordingly to confirm it is not a fluke). If there is more than one, the results are enumerated and a next screen pops up asking to confirm in which location they reside. Ideally, this should never happen with the postal code system, but may occur when asking based on location. If there is only one, I store the Parent_ID to their profile asI continue to query back, passing back in the Parent_ID as a comparator to the WOE_ID, as so:
SELECT Name, WOE_ID, Parent_ID FROM geoplanet_places WHERE WOE_ID="$pid";
Where $pid is the previous Parent_ID - I'll use this later on when rendering the page to determine location, and Town/City is low enough of a level to apply proximity checks on the adjacencies table. Trying to join the results was significantly slower than throwing multiple queries when I ran it with MySQLWorkbench. I continue the queries until Parent_ID="1" meaning that it's parent is the world (it is a country).
I decided that when I'm searching using text entry for city, state/province, and country, I'll have to guarantee accurate entry by confirming using a Metaphone processor to determine their likely selection if it can't be found the first time. Unfortunately some people either can't spell or the primary language of the site is not their primary language.
To display location, I start with the WOE_ID stored in their profile, get the name, then look up it's parent. I comma-separate to get a result like Irvine, Orange, CA, USA. I can look up based on any one of these names to determine other members in proximity using the adjacencies and places tables.
Again, this probably isn't the best way to go about it, and using Geolocation can change if, for instance, you're on a trip using the hotel wifi; however, this method seems "close enough for government work", so I thought I'd share my solution as worthless as it may be.
This solution is generally more accurate & useful than the only matching at the city level, but it will require you to use third-party services for geocoding when a user signs up if you only have their address. Hope it still helps.
1) Get the users's location. Use as much information as you can get:
Ask them where they are when they register
Use the HTML5/JS navigator.geolocation API http://merged.ca/iphone/html5-geolocation (works well with iPhones and the like)
Use IP geolocation database like http://www.maxmind.com/app/geolitecity (it's free, can be downloaded once and used locally, though it should be updated monthly for best results)
2) You need to store the location's latitude and longitude along with the user. If you don't already have it from a sensor lookup or Geo IP database, you will need to do a geocode lookup on the address. You asked not to use a third party service, but there really isn't a way around it (that's why the services exist; rolling your own is very complicated and expensive). See http://en.wikipedia.org/wiki/Geocoding#List_of_some_geocoding_systems for a list of geocoding services you can use.
// Google Maps Example
$address = "$line1, $city, $state $zip, $country";
$ch = curl_init();
$query = http_build_query(array(
'oe' => 'utf8',
'sensor' => 'false', // set this to 'true' if you used navigation.geolocation
'key' => YOUR GMAPS API KEY HERE,
'address' => $address
));
curl_setopt($ch, CURLOPT_URL, 'http://maps.google.com/maps/api/geocode/json?' . $query);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$latLong = current(end(json_decode(curl_exec($ch), true))); // let's pretend nothing ever goes wrong
3) You can now search users by calculating the distance from your search location to each user's location and putting a limit on it for the proximity. Example:
(reference: http://jehiah.cz/a/spatial-proximity-searching-using-latlongs)
$myLat = 45.5;
$myLong = -73.5833;
$range = 2; // miles
$sql = "SELECT *,
truncate((degrees(acos(
sin(radians(latitude))
* sin( radians({$myLat}))
+ cos(radians(latitude))
* cos( radians({$myLat}))
* cos( radians(longitude - {$myLong}) )
) ) * 69.09),1) as distance
FROM users
HAVING distance < {$range}";