Server-side clustering for google maps api v3

Server-side clustering for google maps api v3 - php

I am currently developing a kind of google maps overview widget that displays locations as markers on the map. The amount of markers varies from several hundreds up to thousands of markers (10000 up). Right now I am using MarkerClusterer for google maps v3 1.0 and the google maps javascript api v3 (premier) and it works pretty decent for lets say a hundred markers. Due to the fact that the number of markers will increase I need a new way of clustering the markers. From what I read the only way to keep the performance up is moving the clustering from the client-side to the server-side. Does anyone know a good PHP5 library which is able to get this done for me?
Atm I am digging deeper into the layer mechanisms of google maps. Maybe there are also a few leading PHP librarys I could start to check out? I also ran across FusionTables but since I need clustering I think this might not be the right solution.
Thanks in advance!

I don't know of a server-side library that'll do the job for you. I can however give you some pointers on how to implement one yourself.
The basic approach to clustering is simply to calculate the distance between your markers and when two of them are close enough you replace them with a single marker located at the mid-point between the two.
Instead of just having a limitation on how close to each other markers may be, you may also (or instead) choose to limit the number of clusters/markers you want as a result.
To accomplish this you could calculate the distance between all pairs of markers, sort them, and then merge from the top until you only have as many markers/clusters as you wish.
To refine the mid-point positioning when forming a cluster you may take into account the number of actual markers represented by each of the two to be merged. Think of that number as a weight and the line between the two markers as a scale. Then instead of always choosing the mid-point, choose the point that would balance the scale.
I'd guess that this simple form of clustering is good enough if you have a limited number of markers. If your data set (# of markers and their position) is roughly static you can calculate clustering on the server once in a while, cache it, and server clients directly from the cache.
However, if you need to support large scale scenarios potentially with markers all over the world you'll need a more sophisticated approach.
The mentioned cluster algorithm does not scale. In fact its computation cost would typically grow exponentially with the number of markers.
To remedy this you could split the world into partitions and calculate clustering and serve clients from each partition. This would indeed support scaling since the workload can be split and performed by several (roughly) independent servers.
The question then is how to find a good partitioning scheme. You may also want to consider providing different clustering of markers at different zoom levels, and your partitioning scheme should incorporate this as well to allow scaling.
Google divide the map into tiles with x, y and z-coordinates, where x and y are the horizontal and vertical position of the tile starting from the north-west corner of the map, and where z is the zoom level.
At the minimum zoom level (zero) the entire map consist of a single tile. (all tiles are 256x256 pixels). At the next zoom level that tile is divided into four sub tiles. This continues, so that in zoom level 2 each of those four tiles has been divided into four sub tiles, which gives us a total of 16 tiles. Zoom level 3 has 64 tiles, level 4 has 256 tiles, and so on. (The number of tiles on any zoom level can be expressed as 4^z.)
Using this partitioning scheme you could calculate clustering per tile starting at the lowest zoom level (highest z-coordinate), bubbling up until you reach the top.
The set of markers to be clustered for a single tile is the union of all markers (some of which may represent clusters) of its four sub tiles.
This gives you a limited computational cost and also gives you a nice way of chunking up the data to be sent to the client. Instead of requesting all markers for a given zoom level (which would not scale) clients can request markers on a tile-by-tile basis as they are loaded into the map.
There is however a flaw in this approach: Consider two adjacent tiles, one to the left and one to the right. If the left tile contains a marker/cluster at its far right side and the right tile contains a marker/cluster at its far left side, then those two markers/clusters should be merged but won't be since we're performing the clustering mechanism for each tile individually.
To remedy this you could post-process tiles after they have been clustered so that you merge markers/clusters that lay on the each of the four edges, taking into account each of the eight adjacent tiles for a given tile. This post-merging mechanism will only work if we can assume that no single cluster is large enough to affect the surrounding markers which are not in the same sub tile. This is, however, a reasonable assumption.
As a final note: With the scaled out approach you'll have clients making several small requests. These requests will have locality (i.e. tiles are not randomly requested, but instead tiles that are geographically close to each other are also typically accessed together).
To improve lookup/query performance you would benefit from using search keys (representing the tiles) that also have this locality property (since this would store data for adjacent tiles in adjacent data blocks on disk - improving read time and cache utilization).
You can form such a key using the tile/sub tile partitioning scheme. Let the top tile (the single one spanning the entire map) have the empty string as key. Next, let each of its sub tiles have the keys A, B, C and D. The next level would have keys AA, AB, AC, AD, BA, BC, ..., DC, DD.
Apply this recursively and you'll end up with a partitioning key that identifies your tiles, allows quick transformation to x,y,z-coordinates and has the locality property. This key naming scheme is sometimes called a Quad Key stemming from the fact that the partitioning scheme forms a Quad Tree. The locality property is the same as you get when using a Z-order curve to map a 2D-value into a 1D-value.
Please let me know if you need more details.

This article has some PHP examples for marker clustering:
http://www.appelsiini.net/2008/11/introduction-to-marker-clustering-with-google-maps

You could try my free clustering app. It is capable of more pins than the clientside google maps api. It offers kmeans an grid based clustering.
https://github.com/biodiv/anycluster

Related

Peak detection / Slice discrete data

I have been given a task, that given discrete data, like this
I need to slice it into 5 pieces, determined by the template it creates.
I am not allowed to guess a template, because every input looks different.
My approach was to find peaks in the data (above or below zero), then use that pattern of peaks to slice the data. Here is what I got: (not for the above data)
The top graph is the peaks in the graph, and because I know I have exactly 5 pieces, and 15 points, I can say that every piece has 3 points, and then slice it, which is the second graph in that picture.
Out of 40 inputs, I managed to do this only for 5 of them, because my "peak detection" algorithm is very very basic.
What peak detection algorithm should I use, that can also find local minimums, and has PHP implementation / simple psudo code? I am a beginner in this field of data analyzing, so I need your tips.
Finally, am I even going in the right direction on how to slice this data? or is there a better known way to do it?
EDIT:
My bad for not explaining before: the goal of this slicing, is to create a uniform not-time dependent model for a slice, meaning that long and short pieces will be the same length, and that is for each peak. If this is done per slice, just stretching, the data looks noisy, like this: (this is still in development, so I didn't write it before)
And I don't know how to do it without the peaks, because every slice has different times for different parts (1 second, 1.1 seconds, etc)

Find the 4 longest sub sets without intersection in your data where values remain within some tolerance of zero. In the case that you don't know how many beats you have to isolate peak detection becomes more relevant as the number of peaks above a given threshold define how many sections you dissect.
I don't think you're the first person to attack this sort of problem...
https://www.biopac.com/knowledge-base/extracting-heart-rate-from-a-noisy-ecg-signal/
Edit::
As far as a peak finding algorithm I think this paper provides some methods.
http://www.ifi.uzh.ch/dbtg/teaching/thesesarch/ReportRSchneider.pdf
The approach labeled Significant Peak-Valley Algorithm more or less boils down to finding local extrema (minimum and maximum) in regions beyond (below and above respectively) a given threshold defined by some arbitrary number of standard deviations from the mean.

Road coordinate caching (breaking Google's T&Cs?)

We have sets of rough coordinates for a few hundred road routes across the UK.
The waypoints along the routes are often 50 metres apart. This means that when we draw a line through the coordinates (our software limits us to straight lines) they sometimes cut across roads, buildings etc.
The plan is to create a PHP script that will run the coordinates through something which will return close, nicely placed road coordinates and insert them into our database, essentially replacing our spread out, 50 metre apart coordinates.
The only technology we've found that can do this is the Google Maps Directions API. If we pass the waypoints along the route, Google can in return give us perfect road route with the coordinates of each step on a particular leg/straight on the route.
We'd go ahead and do this now if we weren't uncertain about this being allowed.
https://developers.google.com/maps/faq
We have read through the Google Maps FAQ and can't find anything about caching the road coordinates. We don't want to do anything that would breach the terms of service as our application heavily relies on other Google APIs and Google Maps itself.
How should we continue? If this is breaking Google's terms of service, couldn't we just randomize the coordinates they return slightly so they're not the same? How could it be proved we've broken them then?

using Graph DB to store distance between locations with PHP

I need to be able to quickly find n closest destinations for a given destinations, calculate n x n distance matrix for n destinations and several other such operation related to distances between two or more destination.
I have learned a Graph DB will give far better performance compared to a MySQL database. My application is written in PHP.
SO my question is - Is it possible to use Graph DB with a PHP application, If yes then which one is the best option and opensource and how to store this data in graph DB and how would it be accessed.
Thanks in advance.

Neo4j is a very solid graph DB and has flexible (if a bit complex) licensing as well. It implements the Blueprints API and should be pretty easy to use from just about any language, including PHP. It also has a REST API as well, which is about as flexible as it gets, and there is at least one good example of using it from PHP.
Depending on what data you have, there are a number of ways to store it.
If you have "route" data, where your points are already connected to each other via specific paths (ie. you can't jump from one point directly to another), then you simply make each point a node and the connections you have between points in your routes are edges between nodes, with the distances as properties of those edges. This would give you a graph that looks like your classic "traveling salesman" sort of problem, and calculating distances between nodes is just a matter of doing a weighted breadth-first search (assuming you want shortest path).
If you can jump from place to place with your data set, then you have a fully connected graph. Obviously this is a lot of data, and grows quadratically as you add more destinations, but a graph DB is probably better at dealing with this than a relational DB is. To store the distances, as you add nodes to the graph, you also add an edge to each other existing node with the distance pre-calculated as one of it's properties. Then, to retrieve the distances between a pair of nodes, you simply find the edge between them and get it's distance property.
However, if you have a large number of fully-connected nodes, you would probably be better off just storing the coordinates of those nodes and calculating the distances as-needed, and optionally caching the results to speed things up.
Lastly, if you use the Blueprints API and the other tools in that stack, like Gremlin and Rexter, you should be able to swap in/out any compatible graph database, which lets you play around with different implementations that may meet your needs better, like using Titan on top of a Cassandra / Hadoop cluster.

Yes, a graph database will give you more performance than an extension for MySQL or Postgres will be able to. One that looks really slick is OrientDB, a there's a beta implementation in PHP using the binary protocol and another one that uses HTTP as the transport layer.
As for the example code, Alessandro (from odino.org) wrote a implementation of Dijkstra's algorithm along with a full explanation of how to use it with OrientDB to find the minimum distance between cities.

Actually it's not that much about database as about indexes. I've used MongoDB's geospatial indexing and search (document DB), which has geo indexing designed for finding multiple nearest elements to given coordinates - with good results. Still - it runs only simple queries (find nearest) and it gets a bit slow if your index doesn't fit in the RAM (I've used geonames DB with 8mln places with coordinates and got 0.005-2.5s per query on VM - 1. hdd overhead 2. probably the index didn't fit in the RAM).

php based game - castle layout

I am making a PHP-based RPG, city-building game. My idea is to create a 9X9 grid for a castle layout with the center 3X3 being the inner castle and upon clicking on that section they will see an 6X6 grid of the inner castle. The players will also be able to acquire tiles of the outer grid turning them into inner castle. Every tile is able to be built upon. What would the best way to represent this in the database, taking scalability into account?
The only approach I have come up so far is a 3-column table with (idcastle, Y, X), X being a string of 18 numbers. I would use substr to see if there is a building on that tile.
However, I think that I will run into scalability issues if there are a lot of castles (since each castle requires 18 rows).

It depends how you query the database. I would suggest dynamically making your id so it was
castleid_x_y_z that way you are only querying against the primary key making it very quick.
Plus use something like redis to handle it as it generally is limited only by network speed. If you have too many castles you would just push people to a second server. You don't even need to worry about scaling as you wouldn't split 1 castle over two servers

Adding many markers kills my google maps - how do I do this?

I'm stuck here again. I have a database with over 120 000 coordinates that I need to be displayed on a google maps integrated in my application. The thing is that and I've found out the hard way simply looping through all of the coordinates and creating an individual marker for each and adding it using the addOverlay function is killing the browser. So that definitely has to be the wrong way to do this- I've read a bit on clustering or Zoom level bunching - I do understand that theres no point in rendering all of the markers especially if most of them won't be seen in non rendered parts of the map except I have no idea how to get this to work.
How do I fix this here. Please guys I need some help here :(

There is a good comparison of various techniques here http://www.svennerberg.com/2009/01/handling-large-amounts-of-markers-in-google-maps/
However, given your volume of markers, you definitely want a technique that only renders the markers that should be seen in the current view (assuming that number is modest - if not there are techniques in the link for doing sensible things)

If you really have more than 120,000 items, there is no way that any of the client-side clusterers or managers will work. You will need to handle the markers server-side.
There is a good discussion here with some options that may help you.
Update: I've posted this on SO before, but this tutorial describes a server-side clustering method in PHP. It's meant to be used with the Static Maps API, but I've built it so that it will return clustered markers whenever the view changes. It works pretty well, though there is a delay in transferring the markers whenever the view changes. Unfortunately I haven't tried it with more than 3,000 markers - I don't know how well it would handle 120,000. Good luck!

I've not done any work with Google maps specifically but many moons ago, I was involved in a project which managed a mobile workforce for a large Telco.
They had similar functionality in that they had maps which they could zoom in on for their allocated jobs (local to the machine rather than over the network) and we solved a similar problem which sounds very similar like yours. Points of interest on the maps were called landmarks and were indicated by small markers on the map called landmark pointers, which the worker could select to get a textual description..
At the minimum zoom, there would have been a plethora of landmark pointers, making the map useless. We made a command decision to limit the landmark pointers to a smaller number (400). In order to do that, the map was divided into a 20x20 matrix no matter what the zoom level, which gave us 400 matrix elements.
Then, if a landmark shared the same matrix element as another, the application combined them and generated a single landmark pointer with the descriptive text containing the text of all the landmarks in that matrix element.
That way there were never more than 400 landmark pointers. As the minion zoomed in, the landmark pointers were regenerated and landmarks could end up in different matrix elements - in that case, they were no longer combined with other landmarks.
Similarly, zooming out sometimes merged two or more landmarks into a single landmark pointer.
That sounds like what you're trying to achieve with "clustering or zoom level bunching" although, as I said, I have little experience with Google Maps itself so I'm not sure this is possible. But given Google's reputation, I suspect it is.

I suggest that you use a marker manager class such as this one along with your existing code. Marker manager class allows you to manage thousands of markers and optimizes memory usage. There is a variety of marker managers (there is not just one) and I suggest you Google a bit.

Here is a non-cluster solution if you want to display hundreds or even thousands of markers very quickly. You can use a combination of OverlayView and DocumentFragment.
http://nickjohnson.com/b/google-maps-v3-how-to-quickly-add-many-markers

If only there is something more powerful than JS for this...
Ok enough sarcasm : ).
Have you used the Flash Maps API? Kevin Macdonald has successfully used it to cluster not 120K markers but 1,000,000 markers. Check out the Million Marker Map:
http://www.spatialdatabox.com/million-marker-map/million-marker-map.html
Map responsiveness is pretty much un-affected in this solution. If you are interested you can contact him here: http://www.spatialdatabox.com/sdb-contact-sales.html

Try this one :
http://googlegeodevelopers.blogspot.com/2009/04/markerclusterer-solution-to-too-many.html
Its an old question and already got many answers , but Stackoverflow is more as reference i hope this will help anyone who searches for the same problem .

There is a fairly simple solution- Use HTML5 canvas, though it sounds strange , its the fastest way to load upto 10,000 markers as well as a labels, which am sure no browser can handle if its a normal marker. Not conventional markers but light markers.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.