Combining SQL results into groups based on column value - php

I have a table with over 2 million rows. One of the values is an address. Some rows have a common address. I am using php.
On my website, I want the user to put in their zip code, and I will return all results within that zip code. I would then use Google to Geolocate them on a map. The problem is that since Google charges by the query, I can't be wasting time and money requesting coordinates for an address I already have. Here is what I believe to be the correct approach:
Ask user for zip code
Run "Select * with 'Zip Code' = $user_zip" (paraphrasing)
Run a Geolocate on first address and plot on map
Check for matching addresses in result and group with the mapped result
Find next new address
Repeat 3-6 until complete
Is there a better way to approach this? I am looking for efficiency, easy way to manipulate all matching results at once, and the least amount of queries. If my way is correct, can someone please help me with the logic for numbers 3-5?

If I understand this right what you are trying to do is to render a map with markers for each record in your database that is within a certain zip area. And your challenge is that you need coordinates to render each marker. The biggest issue with your approach in terms of wasting resources is that you do not store the coordinates of each address in your database. I would suggest you to:
1 - Alter the endpoint (or script or whatever) that creates these records in your db to fetch the coordinates and store them in the database.
2 - Run a one time migration to fetch coordinates for each record. While I understand that doing this for 2 milion rows could be "costly" with Google's Geocoding (Estimate is 1000$ for 2 milion api calls). To save the costs you could look into some of the opensource map tools.
Either way fetching coordinates during the request lifecycle is both a waste of resource and it will significantly affect speeds.

Related

jquery autocomplete for shops near location

I've got a list of shops that I have put in a javascript array. I have their addresses as well.
I'm needing to create an autocomplete which allows me to put in a city name and it displays the 3 nearest to that location. I imagine it will need to interface with google's apis some how but not sure where to start.
I've got the actual autocomplete jquery stuff working on an ajax script, but I don't know how to get things located nearest.
You need the lat/long locations of the stores, https://developers.google.com/maps/documentation/geocoding/ Then you need the lat/long location of the user, with some relatively simple mathematics you can then calculate the distance between these two points:
$distance = round((6371*3.1415926*sqrt(($lat2-$lat1)*($lat2-$lat1) +
cos($lat2/57.29578)*cos($lat1/57.29578)*($lon2-$lon1)*($lon2-$lon1))/180), 1);
If you have a large number of stores and a large number of users I advise caching these distances in a mysql table, you have to do this for each store in your database. So you create a table for each e.g. zipcode that requests this and put up a cron to remove these tables every hour or so.
So the process:
User asks for the nearest store
You get his location through google api (or your own storage)
Check if there's a table for his location
If yes, give him the results directly, if no generate the table and give him the results
Mind that google only allows a limited number of data requests. Even though this number is huge (I believe 25.000 requests per day) it may be advisable to store the lat-lon locations of your stores AND users. Would also improve the speed.
I made something similar to this, I fetched the lat/lon locations at the moment a location was inserted into the database and inserted it in a seperate per-zipcode lat/lon table.

How to quickly determine if multiple places are within users vicinity - Google Places API

I am designing a web app where I need to determine which places listed in my DB are in the users driving distance.
Here is a broad overview of the process that I am currently using -
Get users current location via Google's map api
Run through each place in my database(approx 100) checking if the place is within the users driving distance using the google places api. I return and parse the JSON file with PHP to see if any locations exist given the users coordinates.
If place is in users driving distance display top locations(limited to 20 by google places), other wise don't display
This process works fine when I am running through a handful of places, but running through 100 places is much slower, and makes 100 api calls. With Google's current limit of 100,000 calls per day, this could become an issue down the road.
So is there a better way to determine which places in my database are within a users driving distance? I do not want to keep track of addresses in my DB, I want to rely on Google for that.
Thanks.
You can use the formula found here to calculate the distance between zip codes:
http://support.sas.com/kb/5/325.html
This is not precise (door-step to door-step) but you can calculate the distance from the user's zip code to the location's zip code.
Using this method, you won't even have to hit Google's API.
I have an unconventional idea for you. This will be very, very odd when you think about it for the first time, as it does exactly the opposite order of what you will expect to do. However, you might get to see the logic.
In order to put it in action, you'll need a broad category of stuff that you want the user to see. For instance, I'm going to go with "supermarkets".
There is a wonderful API as part of google places called nearbySearch. Its true wonder is to allow you to rank places by distance. We will make use of this.
Pre-requisites
Modify your database and store the unique ID returned on nearbySearch places. This isn't against the ToS, and we'll need this
Get a list of those IDs.
The plan
When you get the user's location, query nearbySearch for your category, and loop through results with the following constraints:
If the result's ID matches something in your database, you have that result. Bonus #1: it's sorted by distance ascending! Bonus #2: you already get the lat-loc for it!
If the result's ID does not match, you can either silently skip it or use it and add it to your database. This means that you can quite literally update your database on-the-fly with little to no manual work as an added bonus.
When you have run through the request, you will have IDs that never came up in the results. Calculate the point-to-point distance of the furthest result in Google's data and you will have the max distance from your point. If this is too small, use the technique I described here to do a compounded search.
The only requirement is: you need to know roughly what you are searching for. However, consider this: your normal query cycle takes you anywhere between 1 and 100 google Queries. My method takes 1 for a 50km radius. :-)
To calculate distances, you will need Haversine's formula rather than doing a zip code lookup, by the way. This has the added advantage of being truly international.
Important caveats
This search method directly depends on the trade-off between the places you know about and the distance. If you are looking for less than 10km radii, use this method to only generate one request.
If, however, you have to do compounded searching, bear in mind that each request cycle will cost you 3N, where N is the number of queries generated on the last cycle. Therefore, if you only have 3 places in a 100km radius, it makes more sense to look up each place individually.

fusion tables limitation

I want to create a web site using google maps and fusion tables where users can leave a marker on a map with message so that another users will be able to see this marker on
their map. And all of this in a real time.
I've already created a small prototype.
And i've got a question: Google has a limitations on using FT
So users won't be able to see a markers that placed further than 100.000 one:
Only the first 100,000 rows of data in a table are mapped or included
in query results.
Queries with spatial predicates only return data from within this
first 100,000 rows.
Therefore, if you apply a filter to a very large table and the filter matches data in rows
after the first 100K, these rows are not displayed.
How can i overcome this limitation?
Or it's better to create my own database and use a marker cluster to work with large
amounts of markers and therefore to forget about FT?
AFAIK there is no way around the 100K limitation. Perhaps a Google Premier license, costing money, would allow you to overcome this, I'm not sure. Another possibility is to maintain 5 Fusion Tables, each with a maximum of 100K rows. You can display 5 Fusion Table layers at a time via the GMap API. Don't see why this wouldn't work. You'd just have to run your query code against all the current layers. I've done this with 2 layers (both much smaller than 100K) but it worked fine.
At your broadest view, where you are threatened by the 100k limit use the following algorithm
Union(Newest 50k records, Random sample 50k records from all older)
and plot this "limited" set. There is no way that view is actually going to display each single point to your user.
When he zooms in, grab the current viewing dimensions thru JS, and filter your DB records fro those within the viewing area.
This means, 1st you'd call to see if the filter returns 100k (thus meaning there's proabably 100k+ records that could be displayed) apply the random sampling algo again to reduce to 100k. If the filter returns <100k, then you no longer need to random sample and you are no longer threatened by the limit.

XML search or DB search / javascript (client side) or php (server side) calculation

Let's say your site has 200,000 unique users a day. So, your server is heavily loaded/pounded; and you do NOT have resources to buy a bigger/better server. So, you are stuck with what you have.
Now, whenever a user comes to your site, you need to do some calculation (calculate distance between user city as detected via GeoIP and some whitelist of cities, figure out the nearest city within 140 mile radius).
Would you do this calculation via PHP or via JavaScript?
First, would you precalculate all nearby cities within 140 mile radius of whitelisted cities? For eg: Whitelist city 1 can have 20 nearby cities. Or would you do on-the-fly calculation everytime?
For eg:
Whitelist = Detroit, MI
and nearby city = Kalamazoo, MI (140 miles)
Second, if pre-computed: would you store this in XML file or some MySQL table? Now, we just have to search through a table (mysql or xml no more than 1 mb in size). I am guessing this would be inefficient because client browser (JavaScript) would have to download 1mb xml and search through it. This would make page load time even slower. Using DB might be faster but then DB load increases (if 200,000 unique users are trying to load the page over the course of a day).
Maybe the best way to do would be to do precompute, store precomputed results in XML, and then use PHP to search through XML and find nearest whitelisted city to user?
If you, the site, are actually relying on the city information, then you must do the calculation on the server.
Database queries are almost always going to be faster than XML searches for sufficiently large XML files. You can optimize the query, MySQL will cache things, etc.
Pre-calculating all city-city distances would be a way to go, for sure. GeoIP doesn't only provide city names, it does give actual latitude/longitude locations as well. I'm sure that the possible list of cities changes rather constantly, too.
I would look into using the geospacial capabilities of MySQL. General over view of searching by coordinates here:
Fastest Way to Find Distance Between Two Lat/Long Points
In short what you will do is setup a database of the cities you care about, with their lat/long, and query that table based on the GeoIP provided lat/long.

most efficient way of calculating nearest city (from whitelist)

I have a whitelist of cities. Let's say, Seattle, Portland, Salem. Using GeoIP, I'd detect user city. Let's call it $user_city. Based on $user_city, I want to display classified-listings from nearest city from my whitelist (Seattle || Portland || Salem) with in 140 miles. If city is not listed in 140 miles, I'd just show a drop-down and ask user to manually select a city.
There are a few ways of doing this:
calculate this on the fly (I found an algorithm in one of SO answers)
with help of DB (let me explain):
create a table called regions
regions will have
city 1 | city 2 | distance (upto 140 miles)
city 1= cities from whitelist
city 2= any city within 140 miles from city 1
This would create a reasonable sized table. If my whitelist has 200 cities, and there are 40 cities (or towns) within 140 miles of each city. This would create 8000 rows.
Now, when a user comes to my site:
1) I check if user is from whitelist city already (city 1 column). If so, display that city
2). If not, check if $user_city is in "city 2" column
2a) if it is, get whitelist city with lowest distance
2b) if it is not, display drop-down for manual input
Final constraint: whichever method we select, it has to work from within iFrame. I mean, can I create this page on my mysite1.com and embed this page inside someothersite2.com inside an iframe? Will it still be able to get user_city and find nearest whitelisted city? I know there are some cross-domain scripting rules so I am not sure if iFrame would be able to get user-ip address, pass it to GeoIP, and resolve it to $user_city
So, my question:
How best to do this? If a lot of people embed my page in their page (using iframe) then my server would get pounded 10000s of times per second (wishful thinking, but let's assume that's the case). I don't know if a DB would be able to handle so much pounding. I don't want to have to pay for more DB servers or web-servers. I want to minimize resource-requirement at my end. So, I don't mind offloading a bit of work to user's browser via JavaScript.
EDIT:
Some answers have recommended storing lat, long and then doing the Math. The reason I suggested creating a 'regions' table is that this way all math is precomputed. If I have a "whitelist" of cities, and if I precompute all possible nearby city for each whitelisted city. Then I don't have to compute distance (using Haversine algorithm for eg) everytime.
Is it possible to offload all of this to user's browser via some crafty use of Java Script? I don't want to overload my server for a free service. It might make money but I am very close to broke and I am afraid my server would go down before I make enough money to pay for the upgrades.
So, the three constraints of this problem are 1) should work from inside iframe (I am hoping this will go viral and every blogger would want to embed my site into their page's iframe. 2) should be very fast 3) should minimize load on my server
Use one table City and do a mysql math-calculation for every query, with the addition of a cache layer eg memcache. Fair performance and very flexible!
Use two tables City (id,lat,lng,name) and Distance (city_id1,city_id2,dist), get your result by a traditional JOIN. (Could use a cache layer too.) Not very flexible.
Custom data structure: CityObj (id,lat,lng,data[blob]) just serialize and compress a php-array of the cities and store it. This might rise your eyebrows but as we know the bottleneck is never CPU or memory, it's disc IO. This is one read from an index of an INT as apposed to the JOIN which uses a tmp-table. This is not very flexible but will be fast and scalable. Easy to shard and cluster.
Is it possible to offload all of this to user's browser via some crafty use of Java Script? I don't want to overload my server for a free service. It might make money but I am very close to broke and I am afraid my server would go down before I make enough money to pay for the upgrades.
Yes, it is possible...using Google Maps API and the geometry library. The function you are looking for is google.maps.geometry.spherical.computeDistanceBetween. Here is an example that I made a while ago that might help get you started. I use jQuery here. Take a look at the source to see what's happening and modify as needed. Briefly:
supplierZips is an Array of zip codes comparable to your city whitelist.
The first thing I do on page load is geocode the whitelist locations. You can actually do this ahead of time and cache the results, if your city whitelist is constant. This'll speed up your app.
When the user enters a zip code, I first check if it's a valid zip from a json dataset of all valid zip codes in the U.S.( http://ampersand.no.de/maps/validUSpostalCodes.json, 352 kb, data generated from zip code data at http://www.geonames.org).
If the zip is valid, I compute the location between that zip and each location in the whitelist, using the aforementioned computeDistanceBetween in the Google Maps API.
Hope this helps get you started.
You just have to get the lat and the long of each city and add it to the database.
So every city only has 1 record. No distances are stored on the position on the globe.
Once you have that you can easily do a query with using haversine formula ( http://en.wikipedia.org/wiki/Haversine_formula ) to get the nearest cities within a range.
know there are some cross-domain scripting rules so I am not sure if iFrame would be able to get user-ip address
It will be possible to get the user ip or whatever if you just get the info from the embedded page.
I don't know if a DB would be able to handle so much pounding
If you have that many requests you should have by then found a way to make a buck with it :-) which you can use for upgrades :D
Your algorithm seems generally correct. What I would do is use PostGIS (a postgresql plugin, and easier to set up than it looks :-D). I believe the additional learning curve is totally worth it, it is THE standard for geodata.
If you put the whitelist cities in as POINTs, with latitudes and longitudes, you can actually ask PostGIS to sort by distance to a given lat/lon. It should be much more efficient than doing it yourself (PostGIS is very optimized).
You could get lats and longs of your user cities (and the whitelist cities) by using a geocoding API like Yahoo Placefinder or Google Maps. What I would do would be to have a table (either the same as the whitelist cities or not) that stores city name, lat, and lon, and do lookups on that. If the city name isn't found though, hit the API you are using, and cache the result in the table. This way you'll quickly not need to hit the API except for obscure places. The API is fast too.
If you're really going to be seeing that kind of server load, you may want to look into using something besides PHP though (such as node.js). Incidentally you shouldn't have any trouble geocoding from an iframe, from the Point of View of the server, its just like the browser is going to that page "normally".

Categories