Manipulating maps - php

Given a set of floorplans (in Autocad, svg, or whatever format need be...), I would like to programatically generate directions from point A to point B. Basically I would like to say: "How do I get from room 101 to room 143?" (or for triple bonus points, from room 101 to room 323). Anyone have any ideas how to go about this? I am pretty language agnostic at this point, although I know C(++), Erlang, PHP and Python the best. I do realize this is a tall order.
Thanks!

The general term for this is pathfinding. The problem has been studied extensively for 2D diagrams. I would break apart the problem into these sections:
Convert CAD model of floor into a simple model of rooms, doors, halways.
Run a pathfinding algorithm on that floor from source to destination, with constraints for human motion.
Convert the results to text directions (turn right, go straight, etc.). The addition of landmarks may be helpful
For multiple floors, you could just use the one floor implementation and go from (e.g.) 104 to the 1st floor stairs, 3rd floor stairs to 311. The conversion of the CAD drawing to a semantically useful format seems like the most difficult step to me.

I know you want to use php, but i recommend python and networkx. you have to convert your building into a set of (origin, Destination, cost) and then run either a TSP (as mentioned by still standing) or A* or Dijkstra

read about the traveling salesman algorithm there are an infinite number of paths from point A to point B. are you looking for the shortest? what is your means of transport? can you fly or are you forced to walk or drive? these are factors in determining a solution.

Related

Generate countries map based on geoip data

I have searched a lot on this topic, and have found no answers.
I have my own statistics package where I'm saving the geoip data of my users (along with a bunch of other data). I'm using the maxmind geoip library to get this information.
So, in my backend I'm visualizing this data as text that have basically two columns, one for the country name and another for the number of visits from the country.
I'd like to generate a map with this data.
something like a world map with the countries I have visits from highlighted.
Heat mapping would be nice, but not required.
I dont really care if it's generated with php (GD image library) or jquery, since I'm already using both those technologies for the statistics backend. But I'd REALLY like to do this without google analytics or their graphing APIs.
I will try to suppose. The way I would try do it ...
Get FullHD (or HighRes world map)
Coordinates I'd keep in WGS84 standard (float values, otherwise it is easy to convert)
I would try to approach real coordinates to the scale of image, but before this ...
The major part in this work is Math. I'm actually not a mathematician, but I know that it should be applied here and why.
Main goal here is to project coordinates on a flat surface, bec. WGS84 uses oblate spheroid as a reference surface (with radius = 6378137 m; flattering = 1/298.257223563), so it is not ideal circumference and it should be taken into account. + image should be GEO binded somehow (you should know coordinates of corner angles of this image #least. it is the easiest case if so).
Calculations for this case are not very massive, everything leads to the elementary plane geometry.
Here is the library that could help you working with geospatial data http://www.gdal.org/.
My advice to you, to consult some specialists in this field if you know nothing about it (maybe SatCom spec. or MobCom spec. in some university/academy) (or try Google if you are familiar enough with math and GEO) and to ask for a math model for projection of GPS coordinates to the flat surface and you will definitely get the answer.
If you don't need very high accuracy, try it yourself maybe you'll have luck.
You can try yourself in MatLab (more applicable in this case) or Mathcad if you know math enough and try yourself to position few points on the raster.
If you will find the answer in the nearest future, I would be glad if you post it here or share with me your solution for the particular case.
i need to finally answer my own question here. for anyone else who stumbles across this:
i have been using the d3js library with the topojson extension.
https://github.com/mbostock/topojson

Comparing to Images (shape of the sihlohette image)

I'm looking for a way to compare a number of silhouettes and determine which two are most alike, obviously I would like to do this in the most efficient way possible. I thought perhaps this could be done using the image magic morphology functionality, but perhaps I'm misunderstanding the function. http://www.imagemagick.org/Usage/morphology/#intro
Any thoughts?
Mathematical morphology is a technique used in digital image processing mostly for performing image shape analysis. Therefore it can be used for comparing two images to discover whether their shape is similar.
Various methods can be used to achieve this. Basic methods are Erosion and Dilatation and the others are more or less based on them. You can use iterations of Erosion with appropriately chosen structuring element (regarding the nature of the images) and obtain basic shape of the image and then compare it pixel-wise. Alternatively you can detect if something remains of the images or there is nothing left each iteration and based on that determine their similarity. This of course works only for some simple cases and more complex methods with some special treatments have to be implemented for common usage.
If you would like to use mathematical morphology for comparing images I recommend to familiarize yourself more with this concept by reading some related materials, e.g. Mathematical Morphology. There are also other ways you may find more suitable for resolving your task but the problem of comparing images is in general a very complicated issue and not straightforward at all.
A silhouette is a view of an object or scene as a solid shape of a single color, usually black. The shape therefore depicts the outline of the object, while the interior is featureless. It would have useful if you could have given examples of your test images. Thus when we have such simplified case of an image it would good to observe the spatial distribution of grayscale components of different sizes. This is efficiently obtained from the granulometry operation in mathematical morphology. This basically provides different scales of connected components in the image. You could look at this paper which uses the granulometry to characterize Human silhouettes using a dictionary learnt postive and negative human shapes.

Use GD or any other php library to build a workflow

I am developing a sports website that would be keeping a record of all tournaments in tennis, football and rugby. Now my database structure is built to hold who plays who in which tournament, so it would just be a select to display all the information. The type of workflow that I am talking about is the one that is commonly used in the sports arena where players' names are listed head to head, and the level of that match(knockout,quater final, semifinal, etc.) are also listed. I do not know the correct term for this though. I will give you an example for how it would look.
I am sure this is possible by using web technology, I am just finding it hard on where to start. Any advice or suggestions are much appreciated. Also if there are any libraries I could use for this, that would be immensely helpful.
Depending on how you want to format the information you should be able to do it in a few ways.
You could use GD like you mentioned but that may be a bit tedious once you get larger and larger brackets. (I don't have a lot of exp. with GD but I know the basics)
I have implemented a 256 person ladder or bracket using html and css. This proved to be pretty simple to do and it should be able to scale easily and be easy to make changes to.
Well on a first glance I would see the following data:
Teams
Cups (having Rounds)
Rounds (of Matches)
Matches (of Teams)
You could model that into a relational database, e.g. MySQL.
You can then create models in classes for your application, e.g. in PHP.
You can then create a Web UI to display the data you've entered into the database. You can use GD for that (if it's a need, I think HTML is not that bad for that, would do it with simple text based output first before turning everything into an image).
Maybe that's helpful. Was a bit lengthy for a comment, so I added it as an answer.

Is there any way to detect strings like putjbtghguhjjjanika?

People search in my website and some of these searches are these ones:
tapoktrpasawe
qweasd qwa as
aıe qwo ıak kqw
qwe qwe qwe a
My question is there any way to detect strings that similar to ones above ?
I suppose it is impossible to detect 100% of them, but any solution will be welcomed :)
edit: I mean the "gibberish searches". For example some people search strings like "asdqweasdqw", "paykaprkg", "iwepr wepr ow" in my search engine, and I want to detect jibberish searches.
It doesn't matter if search result will be 0 or anything else. I can't use this logic.
Some new brands or products will be ignored if I will consider "regular words".
Thank you for your help
You could build a model of character to character transitions from a bunch of text in English. So for example, you find out how common it is for there to be a 'h' after a 't' (pretty common). In English, you expect that after a 'q', you'll get a 'u'. If you get a 'q' followed by something other than a 'u', this will happen with very low probability, and hence it should be pretty alarming. Normalize the counts in your tables so that you have a probability. Then for a query, walk through the matrix and compute the product of the transitions you take. Then normalize by the length of the query. When the number is low, you likely have a gibberish query (or something in a different language).
If you have a bunch of query logs, you might first make a model of general English text, and then heavily weight your own queries in that model training phase.
For background, read about Markov Chains.
Edit, I implemented this here in Python:
https://github.com/rrenaud/Gibberish-Detector
and buggedcom rewrote it in PHP:
https://github.com/buggedcom/Gibberish-Detector-PHP
my name is rob and i like to hack True
is this thing working? True
i hope so True
t2 chhsdfitoixcv False
ytjkacvzw False
yutthasxcvqer False
seems okay True
yay! True
You could do what Stackoverflow does and calculate the entropy of the string.
Of course, this is just one of many heuristics SO uses to determine low-quality answers, and should not be relied upon as 100% accurate.
Assuming you mean jibberish searches... It would be more trouble than it's worth. You are providing them with a search functionality, let them use it however they please. I'm sure there are some algorithms out there that detect strange character groupings, but it would probably be more resource/labour intensive than just simply returning no results.
I had to solve a closely related problem for a source code mining project, and although the package is written in Python and not PHP, it seemed worth mentioning here in case it can still be useful somehow. The package is Nostril (for "Nonsense String Evaluator") and it is aimed at determining whether strings extracted during source-code mining are likely to be class/function/variable/etc. identifiers or random gibberish. It works well on real text too, not just program identifiers. Nostril uses n-grams (similar to the Gibberish Detector in the answer by Rob Neuhaus) in combination with a custom TF-IDF scoring function. It comes pretrained, and is ready to use out of the box.
Example: the following code,
from nostril import nonsense
real_test = ['bunchofwords', 'getint', 'xywinlist', 'ioFlXFndrInfo',
'DMEcalPreshowerDigis', 'httpredaksikatakamiwordpresscom']
junk_test = ['faiwtlwexu', 'asfgtqwafazfyiur', 'zxcvbnmlkjhgfdsaqwerty']
for s in real_test + junk_test:
print('{}: {}'.format(s, 'nonsense' if nonsense(s) else 'real'))
will produce the following output:
bunchofwords: real
getint: real
xywinlist: real
ioFlXFndrInfo: real
DMEcalPreshowerDigis: real
httpredaksikatakamiwordpresscom: real
faiwtlwexu: nonsense
asfgtqwafazfyiur: nonsense
zxcvbnmlkjhgfdsaqwerty: nonsense
The project is on GitHub and I welcome contributions.
I'd think you could detect these strings the same way you could detect "regular words." It's just pattern matching, no?
As to why users are searching for these strings, that's the bigger question. You may be able to stem off the gibberish searches some other way. For example, if it's comment spam phrases that people (or a script) is looking for, then install a CAPTCHA.
Edit: Another end-run around interpreting the input is to throttle it slightly. Allow a search every 10 seconds or so. (I recall seeing this on forum software, as well as various places on SO.) This will take some of the fun out of searching for sdfpjheroptuhdfj over and over again, and at the same time won't interfere with the users who are searching for, and finding, their stuff.
As some people commented, there are no hits in google for tapoktrpasawe or putjbtghguhjjjanika (Well, there are now, of course) so if you have a way to do a quick google search through an API, you could throw out any search terms that got no Google results and weren't the names of one of your products. Why you would want to do this is a whole other question - are you trying to save effort for your search library? Make your hand-review of "popular search terms" more meaningful? Or are you just frustrated at the inexplicable behaviour of some of the people out on the big wide internet? If it's the latter, my advice is just let it go, even if there is a way to prevent it. Some other weirdness will come along.
Short answer - Jibberish Search
Probabilistic Language Model works.
Logic
word is made up of sequence of characters, and if 2 characters come together more frequently and if we sum up all frequency of 2 contiguous characters coming together in word, and sum cross threshold limit (being an english word), it is said to proper english word. In brief, this logic is famous by Markov chains.
Link
For Mathematics of Gibberish and better understanding, refer to video https://www.youtube.com/watch?v=l15C8UJu17s . Thanks !!
If the search is performed on products, you could cache their names or codes and check them against that list before quering database. Else, if your site is for english users, you can build a dictionary of strings that aren't used in the english language, like qwkfagsd. Which, and agreeing with other answer, will be more resource intensive than if not there.

Reducing graph data without losing graph shape

I have a dataset with 100 000 datapoints which I have to plot on a graph. The resulting graph will be about 500px wide, so for every pixel there will be about 200 datapoints, which seems quite unnecessary.
I need to find a way to get rid of the excess datapoints without losing the shape of the graph to speed up the rendering. Currently the rendering of all 100 000 points can take 10+ seconds as I'm also using anti-aliasing and other "effects".
I tried to approach this problem by just taking every 200th datapoint and plotting them, but this results in some of the more significant points missing out (think about spikes in the graph that I want to be able to show). I also thought of splitting the dataset in chunks of 200 datapoints, then taking the maximum value from every chunk but that wont work either.
Is anyone aware of a method that would suit my needs here? The language I'm using is PHP, graph is created by GD and data is coming from MySQL, so optimizations to some of those are welcome.
The data is in this format:
Datetime Value
2005-01-30 00:00:00 35.30
2005-01-30 01:00:00 35.65
2005-01-30 02:00:00 36.15
2005-01-30 03:00:00 35.95
...
And the resulting graph currently looks like this:
alt text http://www.ulmanen.fi/stuff/graph-sample.png
I know this question is quite old but I had a problem almost similar.
To reduce the number of points to display without affecting the shape of the graph, We use the Ramer-Douglas-Peucker algoritm. The difference of shape between the uncompressed graph and the one with this algorithm is unnoticeable.
It seems to me that 1 in 200 is pretty serious data loss, and if those 200 values that should be represented with one value on the graph aren't close enough to be meaningfully substituted with an average, you have yourself a problem. If average isn't good enough, you must find a criterium to tell what data is more significant and should be included, and we can't help you with it because we don't know what kind of data it is, its statistical properties, or why any value would be more significant than the other. With those additional info, maybe a more specific answer could be given.
EDIT: After looking at the graph, it seems that you need both minimum and maximum in a given interval, because the dark blue area are values between those two, correct? Maybe you can take 100 values and make a graph from minimum, maximum, and average, so that every point in graph is made with 6 instead of 200 values, or something like that.
Another approach that might work is splitting the graph up into 200 point bins, and discard all but the maximum, minimum, and median points in each interval. Each of the three points in the interval gets plotted at its original location, so the locations of the extreme values won't change. Using the median instead of the mean will probably work better for your data set because the maxima are much more extreme than the minima, which would cause the filtered graph to shift upwards if you used the mean.
One approach to your problem is max-min decimation; I suggest you Google for a definition and algorithm I don't have either to hand or I would share with you.
Beyond that I think you might use a low-pass (anti-aliasing) filter followed by simple decimation (ie throwing away excess points).
I think that ordinary average from each 200 bunch of points would be just enough.
I don't know what your code/data source looks like but is it possible to do a distinct on your mysql select statement to reduce the number of data points being brought back to your application?

Categories