Getting city boundaries from openstreetmap

Getting city boundaries from openstreetmap - php

I'm developing a website and I need to get all the boundaries of an area given depending on the user input.
For example, the user want to know the boundaries of a city named x. How should I get it from openstreetmap? I've heard of xapi and osmosis but couldnt find any examples anywhere.
Thanks!

I took a stab at doing this with JavaScript here:
https://github.com/pgkelley4/city-boundaries-google-maps
Basically it comes down to finding the relation that OpenStreetMap stores the city boundaries as.
I used something like the following query to get the area:
area[name="Seattle"]["is_in:state_code"="WA"];foreach(out;);
Or if that doesn't find anything, going through the node to find any associated areas:
node[name="New York"][is_in~"NY"];foreach(out;is_in;out;);
To get the relation ID, subtract 3600000000 from the area ID returned by the above queries. Then get the relation from its ID:
(relation(" + relationID + ");>;);out;
You can test out queries here, mine could probably be improved on:
http://overpass-api.de/query_form.html
That is how to get the city boundaries, processing them is another matter as nothing is in order within the relation. For that see my GitHub project and:
http://wiki.openstreetmap.org/wiki/Relation:multipolygon/Algorithm
Also I would note that the OpenStreetMap data for city boundaries is spotty. It is missing for big cities like Dallas and LA from what I can tell.

Related

Querying MusicBrainz search API via PHP script

I'm trying to retrieve release info from the MusicBrainz database using a PHP script on my server. I have a list of songs, with song title and artist name, and I'm trying to retrieve the date of the first release of that song, along with other info about that release.
I realise that the search won't always be 100% accurate, but the list consists of fairly rare and unique songs so it should at least get me on a good track.
I've gotten pretty far with my script, it returns results and everything, but I'm unsure about how to write the query exactly. The documentation is quite confusing and doesn't feature an example where you search for both song title and artist.
This is my code:
// this info is normally fetched from my DB, but just as a simple example (as it is returned):
$artist = "ZZTop";
$song_title = "It's only Love";
// I'm having trouble with this part:
$mb_query = 'http://www.musicbrainz.org/ws/2/recording?query=' . $song_title .' ANDartist:' . $artist ;
$xml = simplexml_load_file($mb_query);
$releasedate = $xml->{'recording-list'}->recording[0]->{'release-list'}->release[0]->date;
At first I've tried to rawurlencode() the $artist and $song_title, but funnily enough, that didn't return any results, so I figured I'd just leave it as a plain string. The query returns results, but they are really off and I have the feeling only part of the query is getting picked up (for instance only the song title and not the artist).
Does anyone know the right way to do this?

The query created by your example code is this:
http://musicbrainz.org/ws/2/recording?query=It%27s%20only%20Love%20ANDartist:ZZTop
The problems are:
ANDartist: should be AND artist:
ZZTop should be "ZZ Top", otherwise the artist isn't found. You can add an alias `ZZTop if you really think that is how many people spell it
You might want to use phrases ("...") to search for full titles. Otherwise MB searches for titles including (It's or only or Love) and `artist:"ZZ TOP". However, results containing all words will be rated higher and show up on the top. So this is optional.
So the correct/precise query to use would be:
http://musicbrainz.org/ws/2/recording?query=%22It%27s%20only%20Love%22%20AND%20artist:%22ZZ%20Top%22 (2 results)
A more fuzzy query that works would be:
http://musicbrainz.org/ws/2/recording?query=It%27s%20only%20Love%20AND%20artist:%28ZZ%20Top%29 (80 results, using artist:(ZZ Top) to search for ZZ or Top artists)
See the MusicBrainz Search Documentation and Lucene Search Syntax for Details.
This code works for me (on PHP 5.5.13) instead of your line:
$mb_query = 'http://www.musicbrainz.org/ws/2/recording?query="'.$song_title.'"'
.' AND artist:"'.$artist.'"';
The PHP Documentation say you only need to use rawurlencode() prior to PHP 5.1.0.
Additionally you might want to use a pre-made library to work with the MusicBrainz Web Service more easily. There is a PHP Library for WS/2 listed on the MB Documentation. I haven't tried it myself though.
bonus:
If you have problems finding recordings because the artist is spelled differently on your end you can search for the artist (including aliases) first and then use the id of the artist for the recording search. Note that you can't use the alias in a recording search directly.
This query will search for ZZTop in the artist names, artist aliases and artist sortname:
http://musicbrainz.org/ws/2/artist?query=%22ZZTop%22
(see the artist search field documentation).
With that search you get an unique ID: a81259a0-a2f5-464b-866e-71220f2739f1. Note that you might get multiple results, so you might want to save a list with results with a high score and try other entries when you can't find the recording in the next step.
Now you can use the ID instead of the name in the recording search:
http://musicbrainz.org/ws/2/recording?query=%22I%27ts%20only%20Love%22%20AND%20arid:a81259a0-a2f5-464b-866e-71220f2739f1
You can also use arid:(... OR ...) when you got multiple results for the query for the artist.

Sphinx Search Sorting/Ranking

I just recently discovered sphinx search which I want to use for my PHP application. I have a table of geolocations where every record stores a country code. For every user who uses the search function to look up geopositions, I know which country he is from.
How would I reweigh the results such that the matching results are ascending in distance to the country of the user? I already have calculated a distance matrix for each country to each other, which I can access via SQL. The country information in the geolocation database is stored as 2 letter ISO country code.
What is a good solution for this problem? I heard about UDFs, are they applicable for that problem? Is it possible to solve this problem more easily by reformatting my table?
Thank you very much.

The "easiest" way to solve this is to have coordinates for each country. You then store the coordinates for each record in the sphinx index, and when searching find the coordinates and us it in the search. This way sphinx caculates the distance dynamically.
Did you have coordinates likes this to create the matrix? But it also resupposes, you are just using a 'point' per country, if your matrix is more advanced, eg taking the closest point on the borders of each (to make disances between odd shaped countries better), then it wont work so well.
In theory you could perhaps do this with payloads, by using the country name as keywords, and the distance in a payload (arranged specially so that close disances have a high weight) but will probably be expensive to index, and might not work all that well in practice.

Levenshtein search

I work on a site which sells let's say stuff and offers a "vendors search". On this search you enter your city, or postal code, or region and a distance (in km or miles) then the site gives you a list of vendors.
To do that, I have a database with the vendors. In the form to save these vendors, you enter their full address and when you click on the save button, a request to google maps is made in order to get their latitude and longitude.
When someone does a search, I look on a table where I store all the search terms and their lat/lng.
This table looks like
+--------+-------+------+
| term | lat | lng |
+--------+-------+------+
So the first query is something very simple
select lat, lng from my_search_table where term = "the term"
If I find a result, I then search with a nice method for all the vendors in the range the visitor wants and print the result on a map.
If I don't find a result, I search with a levenshtein function because people writing bruxelle or bruxeles instead of bruxelles is something really common and I don't want to make a request to google maps all the time (I also have a "how many time searched" column in my table to get some stats)
So I request my_search_time with no where clause and loop through all results to get the smallest levensthein distance. If the smallest result is greater than 2, I request coordinates from google maps.
Here is my problem. For some countries (we have several sites all around the world), my_search_table has 15-20k+ entries... and php doesn't (really) like looping on such data (which I perfectly understand) and my request falls under the php timeout. I could increase this timeout but the problem will be the same in a few months.
So I tried a levensthein MySQL function (found on stackoverflow btw) but it's also very slow.
So my question is "is there any way to make this search fast even on very large datasets ?"

My suggestion is based on three things:
First, your data set is big. That means - it's: big enough to reject the idea of "select all" + "run levenshtein() in PHP application"
Second, you have control over your database. So you can adjust some architecture-related things
Finally, performance of SELECT queries is the most important thing, while performance for adding new data doesn't matter.
The thing is you can not perform fast levenshtein search because levenshtein itself is very slow. I mean, calculating levenshtein distance is a slow thing. Thus, you'll not be able to resolve the issue with only "smart search". You'll have to prepare some data.
Possible solution will be: create some group index and assign it during adding/updating data. That means - you'll store additional column which will store some hash (numeric, for example). When adding new data, you'll:
Perform search with levenshtein distance (for that you may either use your application or that function which you've (already mentioned) over all records in your table against inserted data
Set group index for new row to value of index which found rows in previous step have.
If nothing found, set some new group index value (it' the first row and there are no similar rows yet) - which will be different from any group index values that already present in table
To search desired rows, you'll need just select rows with same group index value. That means: your select queries will be very fast. But - yes, this will cause extremely huge overhead when adding/changing your data. Thus, it isn't applicable for case, when performance of updating/inserting matters.

You could try MySQL function SOUNDS LIKE
SELECT lat, lng FROM my_search_table WHERE term SOUNDS LIKE "the term"

You can use a kd-tree or a ternary tree to speed up the search. The idea is to use a binary search.

Web based parts inventory design

I am a field service technician and I have an inventory of parts that is either issued to me by the company I work for or through orders for specific jobs. I am trying to design a website to manage my parts, both on-hand inventory and parts that have been returned or transferred to someone else. Here is the information I need to track:
part number(10 digit)
req number(8 digit, unique)
description(up to 50 characters)
location(Van or shed).
WorkOrder("w"+9 digits ex: 'W212141234')
BOL(15 digit bill of lading #)
TransferDate(date I get rid of part)
TransferMethod(enum 'DEF','RTS','OBF')
I will probably use PHP to make a website and interact with the MySQL database.
What is the best design? A multi-table approach or one table with webpages that display queries of only certain fields? I need a list of on hand parts that list part number, req number, description, and location. I will also need to be able to have "defective returns" view that will list what parts I returned as DEF with all the remaining fields filled in.
Besides the "on hand" fields, the rest of the fields won't have data until they are no longer "on hand".
I really appreciate any help because I am new to both SQL and PHP. I have experimented with Ruby on Rails and django but I am not sure if I need to tackle all that at this point.

Even though you give some information on your issue, it is hard to actually approach it as the question itself on "what is the best design" is vague.
What I would do is this:
MYSQL TABLE DESIGN
Table parts
req number(int(8), unique, KEY)
part number(int(10))
description(varchar(50))
location(enum 'Van','shed')
WorkOrder(varchar(10))
BOL(varchar(15))
TransferDate(date)
TransferMethod(enum 'DEF','RTS','OBF')
onhand (boolean)
PHP SCRIPTS
and then i would make 2 php scripts with a single query each and a table displaying the info
onhand.php
select *fields filled for on hand parts* from parts where onhand = 1
notonhand.php
select *fields filled for not on hand parts* from parts where onhand = 0

Database Definition for Sphinx Search

Background
I am creating a MySQL database to store items such as courses where there may be many attributes to a single course. For example:
A single course may have any or all of the following attributes:
Title (varchar)
Secondary Title (varchar)
Description (text)
Date
Time
Specific Location (varchar; eg. White Hall Room 7)
General Location (varchar; eg. Las Vegas, NV)
Location Coords (floats; eg. lat, long)
etc.
The database is set up as follows:
A table storing specific course info:
courses table:
Course_ID (a Primary Key unique ID for each course)
Creator_ID (a unique ID for the creator)
Creation_Date (datetime of course creation)
Modified_Date (where this is the most recent timestamp the course was modified)
The table storing each courses multiple attributes is set up as follows:
course_attributes table:
Attribute_ID (a unique ID for each attribute)
Course_ID (reference to the specific course attribute is for)
Attribute (varchar definining the attribute; eg. 'title')
Value (text containing value of specified attribute; eg. 'Title Of My Course')
Desire
I would like to search this database using sphinx search. With this search, I have different fields weighing different amounts, for example: 'title' would be more important than 'description'.
Specific search fields that I wish to have are:
Title
Date
Location (string)
Location (geo - lat/long)
The Question
Should I define a View in Mysql to organize the attributes according to 'title', 'description', etc., or is there a way to define my sphinx.conf file to understand specific attributes?
I am open to all suggestions to solving this problem, whether it be rearrangement of the database/tables or the way in which I search.
Let me know if you need any additional details to help me find a solution.
Thanks in advance for the help
!--Update--!
OK, so after reading some of the answers, I feel that I should provide some additional information.
Latitude / Longitude
The latitude/longitude attributes are created by me internally after receiving the general location string. I can generate the values in any way I wish, meaning that I can store them together in a single lat/long attribute as 'float lat, float long' values or any other desired format. This is done only after they have been generated from the initial location string and verified. This is to guard against malformed data as #X-Zero and #Cody have suggested.
Keep in mind that the latitude and longitude was merely illustrating the need to have that field be searchable as opposed to anything more than that. It is simply another attribute; one of many.
Weighting Search Results
I know how to add weights to results in a Sphinx search query:
$cl->setFieldWeights( array('title'=>1000, 'description'=>500) );
This causes the title column to have a higher weight than the description column if the structure was as #X-Zero suggested. My question was more directed to how one would apply the above logic with the current table definition.
Database Structure, Views, and Efficiency
Using my introductory knowledge of Views, I was thinking that I could possibly create something that displays a row for each course where each attribute is its own column. I don't know how to accomplish this or if it's even possible.
I am not the most confident with database structures, but the reason I set my tables up as described was because there are many cases where not all of the fields will be completed for every course and I was attempting to be efficient [yes, it seems as though I've failed].
I was thinking that using my current structure, each attribute would contain a value and would therefore cause no wasted space in the table. Alternatively, if I had a table with tons of potential attributes, I would think there would be wasted space. If I am incorrect, I am happy to learn why my understanding is wrong.

Let me preface this by saying that I've never even heard of Sphinx, nor (obviously) used it. However, from a database perspective...
Doing multi-domain columns like this is a terrible (I will hunt you down and kill you) idea. For one thing, it's impossible to index or sort meaningfully, period. You also have to pray that you don't get a latitude attribute with textual data (and because this can only be reinforced programatically, I'm going to garuantee this will happen) - doing so will cause all distance based formulas to crash. And speaking of location, what happens if somebody stores a latitude without a longitude (note that this is possible regardless of whether you are storing a single GeoLocation attribute, or the pair)?
Your best bet is to do the following:
Figure out which attributes will always be required. These belong in the course table (...mostly).
For each related set of optional attributes, create a table. For example, location (although this should probably be required...), which would contain Latitude/Longitude, City, State, Address, Room, etc. Allow the columns to be nullable (in sets - add constraints so users can't add just longitude and not latitude).
For every set of common queries add a view. Even (perhaps especially) if you persist in using your current design, use a view. This promotes seperation between the logical and physical implementations of the database. (This assumes searching by SQL) You will then be able to search by specifying view_column is null or view_column = input_parameter or whichever.
For weighted searching (assuming dynamic weighting) your query will need to use left joins (inside the view as well - please document this), and use prepared-statement host-parameters (just save yourself the trouble of trying to escape things yourself). Check each set of parameters (both lat and long, for example), and assign the input weighting to a new column (per attribute), which can be summed up into a 'total' column (which must be over some threshold).
EDIT:
Using views:
For your structure, what you would normally do is left join to the attributes table multiple times (one for each attribute needed), keying off of the attribute (which should really be an int FK to a table; you don't want both 'title' and 'Title' in there) and joining on course_id - the value would be included as part of the select. Using this technique, it would be simple to then get the list of columns, which you can then apparently weight in Sphinx.
The problem with this is if you need to do any data conversion - you are betting that you'll be able to find all conversions if the type ever changes. When using strongly typed columns, this is between trivial (the likelyhood is that you end up with a uniquely named column) to unnecessary (views usually take their datatype definitions from the fields in the query); with your architecture, you'll likely end up looking through too many false positives.
Database efficiency:
You're right, unfilled columns are wasted space. Usually, when something is optional(ish), that means you may need an additional table. Which is Why I suggested splitting off location into it's own table: this prevents events which don't need a location (... what?) from 'wasting' the space, but then forces any event that defines a location to specify all required information. There's an additional benefit about splitting it off this way: if multiple events all use the same location (... not at the same time, we hope), a cross-reference table will save you a lot of space. Way more than your attributes table ever could (you're still having to store the complete location per event, after all). If you still have a lot of 'optional' attributes, I hear that NoSQL is made for these kinds of things (but I haven't really looked into it). However, other than that, the cost of an additional table is trivial; the cost of the data inside may not be, but the space required is weighed against the perceived value of the data stored. Remember that disk space is relatively cheap - it's developer/maintainer time that is expensive.
Side note for addresses:
You are probably going to want to create an address table. This would be completely divorced from the event information, and would include (among other things) the precomputed latitude/longitude (in the recommended datatype - I don't know what it is, but it's for sure not a comma-separated string). You would then have an event_address table that would be the cross-reference between the events and where they take place - if there is additional information (such as room), that should be kept in a location table that is referenced (instead of referencing address directly). Once a lat/long value is computed, you should never need to change it.
Thoughts on later updates for lat/long:
While specifying the lat/long values yourself is better, you're going to want to make them a required part of the address table (or part of/in addition to a purely lat/long only table). Frankly, multi-value columns (delimited lists) of any sort are just begging for trouble - you keep having to parse them every time you search on them (among other related issues). And the moment you make them separate rows, one of the pair will eventually get dropped - Murphy himself will personally intervene, if necessary. Additionally, updating them at different times from the addresses will result in an address having a lat/long pair that does not match; your best bet is to compute this at insertion time (there are a number of webservices to find this information for you).
Multi-domain tables:
With a multi-domain table, you're basically betting that the domain key (attribute) will never become out-of-sync with the value (err, value). I don't care how good you are, somewhere, somehow, it's going to happen: at my company, we had one of these in our legacy application (it stored FK links and which files the FKs refer to, along with an attribute). At one point an application was installed in production which promptly began storing the correct file links, but the FK links to a different file, for a given class of attribute. Thankfully, there were audit records in another file which allowed this to be reversed (... as near as they were able tell).
In summary:
Revisit your required/optional data. Don't be afraid to create additional tables, each for a single entity, with every column for a single domain; you will also need relationship tables. You may also wish to place your audit data (last_updated_time) in a set of separate tables (single-domain tables will help immensely in this regard).

In the sphinx config you define your index and the SQL queries that populate it. You can define basic attributes, see Sphinx Attributes
Sphinx also supports geo searches on lat/long but they need to be expressed in radians, definitely not text columns like you have. I agree with X-Zero that storing lat/lng values are strings is a bad idea.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.