I have distance matrix as two-dimensional array, like this:
So, I need to find clusters, of elements with its help. I can do it, using hierarchic clusterization, like k-means. I have found such example here PHP K-Means
How can I convert my two-dimensional array into array of points, listed in this example?
$points = [
[80,55],[86,59],[19,85],[41,47],[57,58],
[76,22],[94,60],[13,93],[90,48],[52,54],
[62,46],[88,44],[85,24],[63,14],[51,40],
[75,31],[86,62],[81,95],[47,22],[43,95],
[71,19],[17,65],[69,21],[59,60],[59,12],
[15,22],[49,93],[56,35],[18,20],[39,59],
[50,15],[81,36],[67,62],[32,15],[75,65],
[10,47],[75,18],[13,45],[30,62],[95,79],
[64,11],[92,14],[94,49],[39,13],[60,68],
[62,10],[74,44],[37,42],[97,60],[47,73],
];
First: a nitpick: k-Means is not a hierarchical clustering algorithm, see https://www.quora.com/What-is-the-difference-between-k-means-and-hierarchical-clustering for details o the difference.
Second: you don't want to convert a distance matrix back to the points it originated from as you take a step back. Sadly the k-Means implementation you linked only has an API that allows you to enter raw coordinates and assumes Euclidean distance, therefore you have some possibilities, depending on your requirements:
Where do you get the distance matrix from? If it is possible, get the raw coordinates (and make sure the distance measure is euclidean distance) and use the library you linked.
Override the Point class in the library you linked: specifically the getDistanceWith method to return values from your matrix
If you only need to calculate the cluster once, use python and sklearn. This library does exactly what you want. Especially: https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.cluster.hierarchy.linkage.html
Write your own code: clustering is quite an easy topic and therefore it is a nice coding exercise.
Related
I have searched a lot on this topic, and have found no answers.
I have my own statistics package where I'm saving the geoip data of my users (along with a bunch of other data). I'm using the maxmind geoip library to get this information.
So, in my backend I'm visualizing this data as text that have basically two columns, one for the country name and another for the number of visits from the country.
I'd like to generate a map with this data.
something like a world map with the countries I have visits from highlighted.
Heat mapping would be nice, but not required.
I dont really care if it's generated with php (GD image library) or jquery, since I'm already using both those technologies for the statistics backend. But I'd REALLY like to do this without google analytics or their graphing APIs.
I will try to suppose. The way I would try do it ...
Get FullHD (or HighRes world map)
Coordinates I'd keep in WGS84 standard (float values, otherwise it is easy to convert)
I would try to approach real coordinates to the scale of image, but before this ...
The major part in this work is Math. I'm actually not a mathematician, but I know that it should be applied here and why.
Main goal here is to project coordinates on a flat surface, bec. WGS84 uses oblate spheroid as a reference surface (with radius = 6378137 m; flattering = 1/298.257223563), so it is not ideal circumference and it should be taken into account. + image should be GEO binded somehow (you should know coordinates of corner angles of this image #least. it is the easiest case if so).
Calculations for this case are not very massive, everything leads to the elementary plane geometry.
Here is the library that could help you working with geospatial data http://www.gdal.org/.
My advice to you, to consult some specialists in this field if you know nothing about it (maybe SatCom spec. or MobCom spec. in some university/academy) (or try Google if you are familiar enough with math and GEO) and to ask for a math model for projection of GPS coordinates to the flat surface and you will definitely get the answer.
If you don't need very high accuracy, try it yourself maybe you'll have luck.
You can try yourself in MatLab (more applicable in this case) or Mathcad if you know math enough and try yourself to position few points on the raster.
If you will find the answer in the nearest future, I would be glad if you post it here or share with me your solution for the particular case.
i need to finally answer my own question here. for anyone else who stumbles across this:
i have been using the d3js library with the topojson extension.
https://github.com/mbostock/topojson
I have a 2D array with various entries at different positions. However some positions have same value (say 5). I need to find the nearest block with value 5 from any other positioned element.!
Image is in this link :)
This is the image to understand the problem better
In this pic above. We can use the concept of Digital Image Processing to find the m-distance between each blocks. But if the problem space is too big ( suppose an array of 100X100 or 200X200) then the solution in this way will be time taking.
In way to solution I found out these links.
Wikipedia Link for Nearest Neighbour
Apart from this how to map this whole thing in programming...?
You can try any PL/SQL code for this, then you can get the nearest point from there.
The simplest (maybe not most efficient) way is if you use the Wikipedia method #1 which is as follows:
Loop though all the coordinate pairs, finding the distance between them. Formula: sqrt((x2-x1)^2+(y2-y1)^2)
Keep track of which pair are closest to the point you are testing, and the closest distance.
After each calculation, test if the distance is shorter; if so, then overwrite the distance and closest-pair variables.
I can expand this if you like.
I have a set of numbers:
1,22
1,46
32,1
1,9
32,22
1,14
1,45
1,33
33,22
45,22
32,46
32,9
3,1
3,9
3,22
3,32
3,46
9,22
46,22
46,45
46,33
15,1
15,46
15,6
15,22
15,3
15,9
15,45
15,33
15,32
15,14
I need to get combinations from them with a rule that each new pair can only be appended if the latter number is the same as the first in the pair.
For example if I have a pair {15,1}, the next on can be only {1,46} and the next {46,45}, and the final pair must end with the first number of the whole set. In this case it could be for example {45,1}.
So the end result of sets with 4 set limit would be
{15,1,1,46,46,45,45,1}
I can do basic power sets and generate all possible combinations from set of numbers but this seems to be too advanced for me.
I can do C, Javascript or PHP so all the help or solutions to this are highly appreciated. And for clarification, this is not a homework, this is just something I would like to learn and understand.
This looks as if some graph data structure, and some graph algorithms, would be appropriate. Your graph would comprise nodes (each of which is a number) and edges (each of which represents one of your pairs). Then write the appropriate routine for walking round the graph. It's not entirely clear from your question what the rules for the walk are, but I guess you know.
EDIT
Of course, I should point out that what you have is already a graph data structure, it's called an adjacency list. Google around for algorithms and representations.
I only have access to PHP5 (no PostGIS)
I have a bunch of suburb shapefiles, and a few events with lat-lon points. I've zero experience with shapefiles.
What is the best way to check which shapefiles contain these lat-long points (using only PHP)?
Do I convert the shapefiles to a lat-long polygon and use standard polygon-point intersection equation?
Or is these some awesome PHP library for loading/working with Shapefiles?
Each shapefile consists of 3 parts, shp,shx,dbf. The shp file contains the geometry, shx is an index to help access the shp, the dbf is plain old dbase file that contains data for each record.
You can extract the bounding box from the shp file as follows,
$handle = fopen("path/to/file.shp","rb");
fseek($handle, 36);
$min_x = unpack("d",fread($handle,8);
$min_y = unpack("d",fread($handle,8);
$max_x = unpack("d",fread($handle,8);
$max_y = unpack("d",fread($handle,8);
// Note, this code will only work on a little-endian machine
// You'll need to do a byte swap on big endian systems
Then you can test to see if a given event lies in the shapefile's bounding box.
if (($event_x >= $min_x) && ($event_x <= $max_x)
&& ($event_y >= $min_y) && ($event_y <= $max_y))
You can put this is a loop and get a subset of your shapefile that overlap with a given event. This doesn't mean you event is inside a polygon in a given shapefile, but it'll get you close. If you need an exact solution you'll have to extract the polygons and do a point in polygon test.
Disclaimer: Consider the above code pseudo code, I don't know php, so there are probably some bugs. Also, If you can switch to python things get a lot easier, there are existing libraries that provide shapefile parsing and spatial indexing, so you can determine exactly which polygons a point intersects with in a highly efficient manner.
Ref: ESRI Shapefile Whitepaper, http://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
To work with shapefiles, I suggest loading them into a database with spatial relations, and using the spatial relation capabilities of the database.
I personally use PostgreSQL with PostGIS extension for this. It has a utility for converting the shapefiles into an SQL insert. Then you can put your point into WKT (well known text) and query the database for what shapefile(s) it intersects with.
I do not believe php itself has any built in functions for dealing with GIS.
EDIT- Damn - I'm sorry, I didn't see the (no PostGIS) part until after the post. You might be able to convert your polygons to wkt and use a polygon-point intersection.
I know this question is quite old, but as a service to the community and future users looking for similar functionality in native PHP, I'd like to point out that my PHP Shapefile is a free and open source PHP library that can read and write any ESRI Shapefile, without any third party dependency.
Link to GitHub project: https://github.com/gasparesganga/php-shapefile
I am looking for an algorithm (php would be most ideal) that can, given two sets of coordinates (start and end), calculate the geographical coordinates along that path at given intervals (say every mile). Note that I am not looking for something like Bresenham's algorithm - I want the exact coordinates along the path.
You need to find the latitude/longitude of a point at a given distance along a great circle passing through your start- and end-point. You'll find the formulae worked out here, which you should be able to adapt to your use case.