Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I was thinking about using either the google or yahoo api to calculate the distance from one zip code to another, and to get the city of that zip code. However, the api calls are limited, as the website I am working on will query the api multiple times throughout multiple pages.
I was wondering where can I go for either a database with zip codes and cities, or a zip code database to query lat / long for distance.
I did some googling, and most of the free ones I downloaded were either not accurate, or it was to large to fit in the database.
Thanks
I will be using PHP
In the past I have used this PHP class. While I haven't used it very extensively, it did what I needed it to do in terms of Zip Code lookup and distance.
Commercial zip code databases with Lat/Long are available. They are not expensive and are not large (well, if you restrict to USA, 40K small records or so). I have had good luck with zip-finder.com in the past, but important caveat... once you begin maintaining your own zip code table(s), you will need to keep it in sync with whatever the USPS does with zipcodes over time. One really irritating thing they do is remove zipcodes.
That said, calculating distance is pretty trivial, but you only get one lat/long point per zipcode (more or less the centroid of area). For a large zipcode, your distance accuracy can have a mile or more of slop in it, so be aware of that.
This is the best free zip code database you will find: http://federalgovernmentzipcodes.us/
It works fine if you don't need super accurate lat,lng. Unfortunately the lat,lng are not that accurate, not because of the two decimal places - but rather because they are simply a bit off.
I reworked this database to make the lat,lng more accurate by hitting google maps
maps.googleapis.com/maps/api/geocode/json?address="zip code city state";
You can try this API
http://ws.geonames.org/postalCodeSearch?postalcode=10033
Gives out results with lat and long too
Check out PHP-ZipCode-Class. You can adapt it to any number of zip code databases. Personally, I would go with a commercial database as the free ones can easily get outdated. Trust me on that. I tried to maintain a free database on a high traffic e-commerce site for years. Would have been a LOT CHEAPER to just buy a commercial database. If you really insist on a free database, here are a couple that I know of (but have not tried).
http://zipcode.jassing.com/
http://federalgovernmentzipcodes.us/
You're correct about the api limitations. It is true with Bing as well (although their api is pretty good).
If you go with a database...
Although it takes a little work, you can get it free from the US Census Tiger Data - which is what most low-end 3rd party ZIP Code database are based on. Just know that in 2000, they replaced the ZIP Code with ZCTA (which is ZIP Code like, but not exact). I’ve included the link below which has an explanation of ZCTA from the census site: http://www.census.gov/geo/ZCTA/zcta.html
Other things to consider: most latitude and longitude centroids are based on geometric calculations- meaning they could fall in the middle of forestry land, large lakes, parks (e.g. Central Park) where no people live. I’m not sure what your needs are but that may be fine for you. If you need population based centers, you will probably need commercial data (see http://greatdata.com/zip-codes-lat-long click the ‘more details’ tab at the top for an explanation of this topic).
Also, determine if you only need the major city for each ZIP Code (one-to-one relationship – normally 40,000+ records) or if a ZIP Code boundary covers more than one city, you need each city listed as a separate record (~57,000 records). Most locators and address validation utilities need the latter.
I've been using the zip-code database from http://zip-info.com/ for many years. It's updated every quarter (very important) and is very accurate. The database is about $50 and I purchase an update twice a year.
There are something like 54,000 5 digit zip codes in the US - so any good database is going to be large - just strip out the data fields you don't need (limit it to zip/lat/lon) if you want to reduce data storage (though it's minimal savings). As Rob said, distance calcs are easy to do - just look for a script that does great circle calculations as a staring point.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
My end goal is sending 3 million records to the Google Maps API to show as markers, but before I get to that..
I haven't been able to even load up 1 million into a PHP array. The data is 18 digits for each element, with 2 columns and 1 million rows.
The query is just a straight up SELECT *, but I'm running out of memory when looping through and storing the correct records in an array. I've tried using an SplFixedArray but not having any luck with that either.
I need to find a good way to batch this and split it up - After running some tests, I can pull about 500k into an array without hitting the memory limit (which is already 512M!), so could I just do this in 2 or 3 queries? I will still need the full amount of data saved into arrays on the server side before the page loads and I can pass it to Maps, so I'm assuming batching it won't fix that as it will all still be in memory?
edit: there's a big comment chain growing, but mostly everyone is in agreement that this is a bad idea for some reason or another. So my solution is to draw it back to about 300k points, which will be achievable with a lot less head bashing.
There really isn't any point trying to pump millions of map markers to google maps.
It simply isn't feasible from a memory perspective or a performance perspective. Just think of the size of that data -- even if each marker is only a single byte of data, that's 3 megabytes. But in reality, each marker will need about 20 bytes minimum, just for the co-ords and the JSON markup, so that's 60 megabytes, before you even start adding a description to each one. Your system isn't going to be able to transfer that to Google anything like fast enough to make it usable on the web. And even if you could, Google isn't going to accept you sending that kind of volumne of data to them every time someone wants to look at your map.
And in any case, having all those markers on the map wouldn't be usable anyway; they'd obscure the whole map and each other, and just make a mess.
Even sending a few hundred map markers to Google at once is pushing it. Sending millions is just not going to happen.
So what can you do instead? How do other sites manage to have thousands of millions of markers at once? The answer is simple -- they don't. They only send markers to Google for the portion of the map that is being displayed.
At wider zoom levels, you don't even display markers; you'd have a pre-rendered heat-map showing where your coverage is. At closer zoom levels, you would only load the map pins for the area being displayed. As the map is moved, you would load more.
A good example of what I mean is the Xfinity wifi map. They've hundreds of thousands of points on the map, but never load more than a few dozen markers at once. Quick and manageable.
Laying the issue of whether fetching 1 M records into the client is a good idea, and assuming that it is necessary. It is important to understand how MySQL client protocol works. There are two modes - one stores the entire data set at once in the client (STORE_RESULT) with the memory allocated for the entire set, while the other fetches one row at a time with the memory allocated for just the current row (USE_RESULT). To avoid the memory problem your client needs to employ the USE_RESULT mode.
For some examples, take a look at:
http://php.net/manual/en/mysqlinfo.concepts.buffering.php
You want unbuffered query.
A company we do business with wants to give us a 1.2 gb CSV file every day containing about 900,000 product listings. Only a small portion of the file changes every day, maybe less than 0.5%, and it's really just products being added or dropped, not modified. We need to display the product listings to our partners.
What makes this more complicated is that our partners should only be able to see product listings available within a 30-500 mile radius of their zip code. Each product listing row has a field for what the actual radius for the product is (some are only 30, some are 500, some are 100, etc. 500 is the max). A partner in a given zip code is likely to only have 20 results or so, meaning that there's going to be a ton of unused data. We don't know all the partner zip codes ahead of time.
We have to consider performance, so I'm not sure what the best way to go about this is.
Should I have two databases- one with zip codes and latitude/longitude and use the Haversine formula for calculating distance...and the other the actual product database...and then what do I do? Return all the zip codes within a given radius and look for a match in the product database? For a 500 mile radius that's going to be a ton of zip codes. Or write a MySQL function?
We could use Amazon SimpleDB to store the database...but then I still have this problem with the zip codes. I could make two "domains" as Amazon calls them, one for the products, and one for the zip codes? I don't think you can make a query across multiple SimpleDB domains, though. At least, I don't see that anywhere in their documentation.
I'm open to some other solution entirely. It doesn't have to be PHP/MySQL or SimpleDB. Just keep in mind our dedicated server is a P4 with 2 gb. We could upgrade the RAM, it's just that we can't throw a ton of processing power at this. Or even store and process the database every night on a VPS somewhere where it wouldn't be a problem if the VPS were unbearably slow while that 1.2 gb CSV is being processed. We could even process the file offline on a desktop computer and then remotely update the database every day...except then I still have this problem with zip codes and product listings needing to be cross-referenced.
You might want to look into PostgreSQL and Postgis. It has similar features as MySQL spacial indexing features, without the need to use MyISAM (which, in my experience, tend to become corrupt as opposed to InnoDB).
In particular with Postgres 9.1, which allows k-nearest neighbour search queries using GIST indexes.
Well, that is an interesting problem indeed.
This seems like its actually two issues, one how should you index the databases and the second is how to you keep it up to date. The first you can achieve as you describe, but normalization may or may not be a problem, depending on how you are storing the zip code. This primarily comes down to what your data looks like.
As for the second one, this is more my area of expertise. You can have your client upload the csv to you as they currently are, keep a copy of the one from yesterday and run it through a diff utility, or you can leverage Perl, PHP, Python, Bash or any other tools you have, to find the lines that have changed. Pass those into a second block that would update your database. I have dealt with clients with issues along this line and scripting it away tends to be the best choice. If you need help with organizing your script that is always available.
Need some ideas/help on best way to approach a new data system design. Basically, the way this will work is there will be a bunch of different database/tables that will need to be updated on a regular (daily/weekly/monthly) basis with new records.
The people that will be imputing the data will be proficient in excel. The input process will be done via a simple upload form. Then the system needs to add what was imported to the existing data in the databases. There needs to be a "rollback" process that'll reset the database to any day within the last week.
There will be approximatively 30 to 50 different data sources. the main primary interface will be an online search area area. so all of the records need to be indexed/searchable.
Ideas/thoughts on how to best approach this? It needs to be built mostly out of php/mysql.
imputing the data
Typo?
What you are asking takes people with several years formal training to do. Conventionally, the approach would be to draw up a set of requirements, then a set of formal specifications, then the architecture of the system would be designed, then the data design, then the code implementation. There are other approaches which tend to shortcut this. However even in the case of a single table (although it does not necessarily follow that one "simple upload form" corresponds to one table), with a single developer there's a couple of days work before any part of the design could be finalised, the majority of which is finding out what the system is supposed to do. But you've given no indication of the usage nor data complexity of the system.
Also what do you mean by upload? That implies they'll be manipulating the data elsewhere and uploading files rather than inputting values directly.
You can't adequately describe the functionality of a complete system in a 9 line SO post.
You're unlikely to find people here to do your work for free.
You're not going to get the information you're asking for in a S.O. answer.
You seem to be struggling to use the right language to describe the facts you know.
Your question is very vague.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I’ve got a database full of UK Postcodes, now I’d like to be able to store the latitude and longitude of these specific postcodes along with the record with the database.
Is there any way that I can obtain this data free without violating any T&C's?
I know I could do this using Google maps API for each postcode, but I have way over 20,000 postcodes in this database and to get the lat and lng for each of these postcodes each time is not an option really.
Thanks in advance,
M
Postcode location data is available free now. See this answer:
https://stackoverflow.com/a/2625123/587803
https://www.ordnancesurvey.co.uk/opendatadownload/products.html
The one you want is "Code-Point open".
Here it is. Dumped SQL table full of Postcodes, LAT/LON, X/Y coordinates :)
Download zip (UK-Postcodes.zip) and
for more info read [same-website]/guides/article.php?article=64 :)
I haven't tested it, but it should be working, lad :)
Happy coding ^^
Maybe Nominatim should do the trick. The only problem is that it might not be up to date.
Geonames.org offers a free downloadable database. You could try Bing API or other commercial offerings but they have rather restrictive T&C in how many queries you can make within a time period, how you can use it (needs a lawyer to interpret their nuances).
I found Geonames.org to be more than sufficient for my needs.
For postcode database, unfortunately I cannot help you.
But for Google... You know, you don't have to do this for EVERY postcode EACH time. You can cache the values in your database and do it only once for each new postcode. Although this will take some time, it's possible.
Yeah, it still sucks. But better than nothing.
http://www.freethepostcode.org/ provides information you may find useful.
I've only seen this done from a local database with commercial software, possibly because of the licensing restrictions the Royal Mail put on their postcode data.
There are APIs with reasonably high rate limits that can geocode UK postcodes, though. Yahoo's geocoding API is restricted to 5,000 hits a day, but that means you could have your entire database done in four days, store the lat/lon in the database, and just use it to add new addresses as they come in, say.
Also -- how accurate do you need the information? If you just need a rough location, you could geocode to the Outbound (e.g. "BS1"); I've seen user-sourced databases of at least arguable legality around the net for that.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I am planning on creating a small website for my personal book collection. To automate the process a little bit, I would like to create the following functionality:
The website will ask me for the ISBN number of the book and will then automatically fetch the title and add it to my database.
Although I am mainly interested in doing this in php, I also have some Java implementation ideas for this. I believe it could also help if the answer was as much language-agnostic as possible.
This is the LibraryThing founder. We have nothing to offer here, so I hope my comments will not seem self-serving.
First, the comment about Amazon, ASINs and ISBN numbers is wrong in a number of ways. In almost every circumstance where a book has an ISBN, the ASIN and the ISBN are the same. ISBNs are not now 13 digits. Rather, ISBNs can be either 10 or 13. Ten-digit ISBNs can be expressed as 13-digit ones starting with 978, which means every ISBN currently in existence has both a 10- and a 13-digit form. There are all sorts of libraries available for converting between ISBN10 and ISBN13. Basically, you add 978 to the front and recalculate the checksum digit at the end.
ISBN13 was invented because publishers were running out of ISBNs. In the near future, when 979-based ISBN13s start being used, they will not have an ISBN10 equivalent. To my knowledge, there are no published books with 979-based ISBNs, but they are coming soon. Anyway, the long and short of it is that Amazon uses the ISBN10 form for all 978 ISBN10s. In any case, whether or not Amazon uses ten or thirteen-digit ASINs, you can search Amazon by either just fine.
Personally, I wouldn't put ISBN DB at the top of your list. ISBN DB mines from a number of sources, but it's not as comprehensive as Amazon or Google. Rather, I'd look into Amazon—including the various international Amazons—and then the new Google Book Data API and, after that, the OpenLibrary API. For non-English books, there are other options, like Ozone for Russian books.
If you care about the highest-quality data, or if you have any books published before about 1970, you will want to look into data from libraries, available by Z39.50 protocol and usually in MARC format, or, with a few libraries in Dublin Core, using the SRU/SRW protocol. MARC format is, to a modern programmer, pretty strange stuff. But, once you get it, it's also better data and includes useful fields like the LCCN, DDC, LCC, and LCSH.
LibraryThing runs off a homemade Python library that queries some 680 libraries and converts the many flavors of MARC into Amazon-compatible XML, with extras. We are currently reluctant to release the code, but maybe releasing a service soon.
Google has it's own API for Google Books that let's you query the Google Book database easily. The protocol is JSON based and you can view the technical information about it here.
You essentially just have to request the following URL :
https://www.googleapis.com/books/v1/volumes?q=isbn:YOUR_ISBN_HERE
This will return you the information about the book in a JSON format.
Check out ISBN DB API. It's a simple REST-based web service. Haven't tried it myself, but a friend has had successful experiences with it.
It'll give you book title, author information, and depending on the book, number of other details you can use.
Try https://gumroad.com/l/RKxO
I purchased this database about 3 weeks ago for a book citation app I'm making. I haven't had any quality problems and virtually any book I scanned was found. The only problem is that they provide the file in CSV and I had to convert 20 million lines which took me almost an hour! Also, the monthly updates are not delta and the entire database is sent which works for me but might be some work for others.
I haven't tried it, but take a look at isbndb
API Description: Introduction
ISBNdb.com's remote access application programming interface (API) is designed to allow other websites and standalone applications use the vast collection of data collected by ISBNdb.com since 2003. As of this writing, in July 2005, the data includes nearly 1,800,000 books; almost 3,000,000 million library records; close to a million subjects; hundreds of thousands of author and publisher records parsed out of library data; more than 10,000,000 records of actual and historic prices.
Some ideas of how the API can be used include:
- Cataloguing home book collections
- Building and verifying bookstores' inventories
- Empowering forums and online communities with more useful book references
- Automated cross-merchant price lookups over messaging devices or phones
Using the API you can look up information by keywords, by ISBN, by authors or publishers, etc. In most situations the API is fast enough to be used in interactive applications.
The data is heavily cross-linked -- starting at a book you can retrieve information about its authors, then other books of these authors, then their publishers, etc.
The API is primarily intended for use by programmers. The interface strives to be platform and programming language independent by employing open standard protocols and message formats.
Although the other answers are correct, this one explains the process in a little more detail. This one uses the GOOGLE BOOKS API.
https://giribhatnagar.wordpress.com/2015/07/12/search-for-books-by-their-isbn/
All you need to do is
1.Create an appropriate HTTP request
2.Send it and Receive the JSON object containing detail about the book
3.Extract the title from the received information
The response you get is in JSON. The code given on the above site is for NODE.js but I'm sure it won't be difficult to reproduce that in PHP(or any other language for that matter).
To obtain data for given ISBN number you need to interact with some online service like isbndb.
One of the best sources for bibliographic information is Amazon web service. It provides you with all bibliographic info + book cover.
You might want to look into LibraryThing, it has an API that would do what you want and they handle things like mapping multiple ISBNs for different editions of a single "work".
As an alternative to isbndb (which seems like the perfect answer) I had the impression that you could pass an ISBN into an Amazon product URL to go straight to the Amazon page for the book. While this doesn't programmatically return the book title, it might have been a useful extra feature in case you wanted to link to Amazon user reviews from your database.
However, this link appears to shows that I was wrong. Actually what Amazon uses is the ASIN number and while this used to be the same as 10-digit ISBN numbers, those are no longer the only kind - ISBNs now have 13 digits (though there is a straight conversion from the old 10-digit type).
But more usefully, the same link does talk about the Amazon API which can convert from ISBN to ASIN and is likely to also let you look up titles and other information. It is primarily aimed at Amazon affiliates, but no doubt it could do the job if for some reason isbndb does not.
Edit: Tim Spalding above points out a few practical facts about ISBNs - I was slightly too pessimistic in assuming that ASINs would not correspond any more.
You may also try this database: http://www.usabledatabases.com/database/books-isbn-covers/
It's got more books / ISBN than most web services you can currently find on the web. But it's probably an overkill for your small site.