Rotating between URLs, how precise would randomly picking from an array be? - php

thanks for taking the time to read this.
My goal here is to rotate between links, anywhere from 1 link up to, let's say 4.
The easy way to do this, would be to make an array of the links and using php, pick one randomly to display.
While this is pretty easy, and quick to set up, it also has me worried a bit, because it's not really accurate, especially not on a small scale.
Giving you some numbers here, let's say my website gets anywhere from 3000 to 5000 unique impressions a day, how accurate would it be to randomly pick a link from an array for 2, 3 or 4 links to choose from?
If anyone else has an idea on how to make a system that rotates very accurate and evenly, let me know!
Thanks in advance to anyone that can help me out :)

Over a lengthy period of time with many impressions, most random functions would be evenly distributed. For a small distribution, the results may be noticeably skewed... but the more
But for perfectly even distribution, nothing beats a straight cafeteria-plate "next-up" array.
Either way, I think you will be satisfied.

Related

Get longest common substring based on string similarity

I have a table with a column that includes names like:
Home Improvement Guide
Home Improvement Advice
Home Improvement Costs
Home Gardening Tips
I would like the result to be:
Home Improvement
Home Gardening Tips
Based on a search for the word 'Home'.
This can be accomplished in MySQL or PHP or a combination of the two. I have been pulling my hair out trying to figure this out, any help in the right directly would be greatly appreciated. Thanks.
Edit / Problem kinda solved:
I think this problem can be solved much easier by changing the logic a little. For anyone else with this problem, here is my solution.
Get the sql results
Find the first occurrence of the searched word, one string at a time, and get the next word in the string to the right of it.
The results would include the searched word concatenated with the distinct adjoining word.
Not as good of a solution, but it works for my project. Thanks for the help everyone.
This is too long for a comment.
I don't think that Levenshtein distance does what you want. Consider:
Home Improvement
Home Improvement Advice on Kitchen Remodeling
Home Gardening
The first and third are closer by the Levenshtein measure than the first and third. And yet, I'm guessing that you want the first and second to be paired.
I have an idea of the algorithm you want. Something like this:
Compare every returned string to every other string
Measure the length of the initial overlap
Find the maximum over all the strings strings, pair those
Repeat the process with the second largest overlap and so on
Painful, but not impossible to implement in SQL. Maybe very painful.
What this suggests to me is that you are looking for a hierarchy among the products. My suggestion is to just include a category column and return the category. You may need to manually insert the categories into your data.

Reducing graph data without losing graph shape

I have a dataset with 100 000 datapoints which I have to plot on a graph. The resulting graph will be about 500px wide, so for every pixel there will be about 200 datapoints, which seems quite unnecessary.
I need to find a way to get rid of the excess datapoints without losing the shape of the graph to speed up the rendering. Currently the rendering of all 100 000 points can take 10+ seconds as I'm also using anti-aliasing and other "effects".
I tried to approach this problem by just taking every 200th datapoint and plotting them, but this results in some of the more significant points missing out (think about spikes in the graph that I want to be able to show). I also thought of splitting the dataset in chunks of 200 datapoints, then taking the maximum value from every chunk but that wont work either.
Is anyone aware of a method that would suit my needs here? The language I'm using is PHP, graph is created by GD and data is coming from MySQL, so optimizations to some of those are welcome.
The data is in this format:
Datetime Value
2005-01-30 00:00:00 35.30
2005-01-30 01:00:00 35.65
2005-01-30 02:00:00 36.15
2005-01-30 03:00:00 35.95
...
And the resulting graph currently looks like this:
alt text http://www.ulmanen.fi/stuff/graph-sample.png
I know this question is quite old but I had a problem almost similar.
To reduce the number of points to display without affecting the shape of the graph, We use the Ramer-Douglas-Peucker algoritm. The difference of shape between the uncompressed graph and the one with this algorithm is unnoticeable.
It seems to me that 1 in 200 is pretty serious data loss, and if those 200 values that should be represented with one value on the graph aren't close enough to be meaningfully substituted with an average, you have yourself a problem. If average isn't good enough, you must find a criterium to tell what data is more significant and should be included, and we can't help you with it because we don't know what kind of data it is, its statistical properties, or why any value would be more significant than the other. With those additional info, maybe a more specific answer could be given.
EDIT: After looking at the graph, it seems that you need both minimum and maximum in a given interval, because the dark blue area are values between those two, correct? Maybe you can take 100 values and make a graph from minimum, maximum, and average, so that every point in graph is made with 6 instead of 200 values, or something like that.
Another approach that might work is splitting the graph up into 200 point bins, and discard all but the maximum, minimum, and median points in each interval. Each of the three points in the interval gets plotted at its original location, so the locations of the extreme values won't change. Using the median instead of the mean will probably work better for your data set because the maxima are much more extreme than the minima, which would cause the filtered graph to shift upwards if you used the mean.
One approach to your problem is max-min decimation; I suggest you Google for a definition and algorithm I don't have either to hand or I would share with you.
Beyond that I think you might use a low-pass (anti-aliasing) filter followed by simple decimation (ie throwing away excess points).
I think that ordinary average from each 200 bunch of points would be just enough.
I don't know what your code/data source looks like but is it possible to do a distinct on your mysql select statement to reduce the number of data points being brought back to your application?

how to store and search mp3 by its content

I want to store multiple mp3 files and search them by giving some part of song, to detect which song it is.
I am thinking of storing all binary content in mysql and when I want to search for a specific song by content I will take some middle portion of song and actually match it with the binary data in MySQL.
My questions are:
Is this a reasonable way to find songs by their content?
Is it right to store the songs' content in the database or should I use the filesystem?
This is not going to work. MP3 is a "lossy" format. That means that it constantly alters subtle nuances of the music when encoding, thus producing totally different byte-wise data on almost every encoding for the same song.
Also, even in an uncompressed format like WAV, two identical records at different volumes will produce different byte data. So, it is impossible to compare music by comparing the byte values of the file's contents.
A binary comparison will work only for two exact identical copies of the same MP3 file. It won't even work anymore when you re-encode the same MP3 file with identical settings.
Comparing music is not a trivial matter, several approaches exist but to my knowledge none that can be used in PHP.
If you're lucky, there exists a web service that allows some kind of matching. Expect it to be commercial in some way, though - I doubt we are at the stage where this kind of thing can be used free of charge.
Is it a right way to find songs by content of song.
Only if you can be sure that the part you get as search criterium will actually be an excerpt from that particular MP3 file... and that is very, very unlikely. If the part can be from a different source (i.e. a different recording of the same song, or just a differently compressed MP3), you'll have to use audio fingerprinting which is vastly more complicated.
Is it right to store songs content in database or file store normally will work?
If you do simple binary matching, there is no point in using a database. If you have a more complex indexing technique (such as audio fingerprints) then using a database can make sense.
As others have pointed out - comparing MP3s by looking at the binary content of files is not going to work.
I wrote something like this in Java whilst at university for my final year project. I'd be more than happy to send you the source code. It dealt in relative similarities - "song X is more similar to song Y than it is to song Z", rather than matches, but it might be a step in the right direction.
And please, whatever you do, don't try and do this in PHP. The algorithm I used needed me to compute (if I remember correctly - I worked on this around 3 years ago) 30 30x30 matrices for each MP3 it analysed. Each song took around 30 seconds to process to a set of matrices on my clunky old machine (I'm sure my new PC could get the job done significantly quicker). Once I had those matrices for n songs a second step computed differences between each pair of songs, and a third step reduced those differences down to m-dimensional space. Each of these 3 steps takes a fair amount of horsepower, and PHP definitely isn't the right horse for the job.
What PHP might work for is a frontend - I ended up with a queryable web-app written in Ruby on Rails, where I had a simple backend which stored the co-ordinates of each song in m-dimensional space (I happened to choose m = 6) - given a particular song, or fragment, X, you could then compute songs within a certain "distance" of X.
NB. I should probably point out that all the code I wrote was basically just a wrapper around libraries others had written - which were by some smart people at a university in Austria - those libraries took two songs and generated the matrices - all I did was compute distances and map distances of lots of songs into m-dimensional space. Wish I was smart enough to have done the first bit too!
I don't fully understand what you're trying to do, but if you're going to index an MP3 collection, it's probably a better idea to store a hash (of sufficient length) rather than the actual file.
The problem is that the bytes don't give you any insight to the CONTENT of the file, i.e. the music in it. Even if you cut the metadata from the bytes to compare (to get rid of noise like changes in spelling/capitalisation of metadata), you only know something about the unique file itself. So you could compare two identical files (i.e. exact duplicates) for equality, but you couldn't compare any two random files for similarity.
To search songs, you may probably want to index their tags and focus on a nice, easy to use UI so users can look for them in flexible ways.
As said above, same song will show different content bytes depending on the encoding.
However, one idea pointing to your direction, and I'm not sure how feasible is, would be to index some songs patterns that may uniquely identify it. For ex. what do all Johnny Cash songs have in common? Volume, tone, a combination of them? And when you get a portion of content, you may extract that same pattern from it and match. That would be an interesting concept.

Best way to get a random word for a captcha script in PHP

I am working on a new captcha script and it is almost completed except I would like to have a list of words for example lets say I have a list of 300 5 letter words that I would like to use for the captcha image text.
What would be the best way for performance on a high traffic site to deal with this list for it?
Read the words from a text file on every load
Store in an array
other?
Using a fixed list of words could make your Captcha weak since it restricts the number of variations to just n! / (n - k)! options. With n = 300 words and k=2 different words per captcha it would be just 89700 options no matter how long the words are.
If you would use a sequence of four random letters (a-z) you would get more options (exactly n^k = 26^4 = 456976).
If you just want 300 hundred words to choose from, I'd just put them all in an array in straight php code and pull one out randomly. That would be the best performance.
Best option for performance
It would be best, to put list of random numbers in memory (APC or Memcache => google/stackoverflow search for APC or Memcache) to get the best performance, because disc IO is what will make your site slow most of the time. For this you should have a box with enough memory(>= 128MB) and you can install software (APC/Memcache). If you want good performance on a high traffic site, you should be willing to pay for !!!
If you are on a shared hosting provider (but then you won't get best performance), then it would be best to put the words in an array in the same file, because every require statement will fetch the file from disc.
return random word
Like lucky said you can fetch a random number, by a simple rand function call
return ($words[rand(0, count($words)-1);
Where $words is the array with all the words.
VPS hosting
vpslink
Slicehost
These are some cheap VPS hosting I found using google, but I think you should do some more research finding the best VPS hosting for your high performance site.
Instead of 300 words, you could simply generate a random number and display that. No need for a list, or loading a list, or managing the list, ....
Just how many logons per second do you need to handle? This doesn't seem like the right place to spend time in optimization. Just about any way you find the random word should be fine, especially if your word list is only 300 words.
I'd start with a simple text file, one word per line, and just do something simple like
$words = file("wordlist.txt");
return ($words[rand(0, count($word)-1);
and only if it really proved to be a bottleneck would I change it to do a random fseek() or some other "high performance" trick.

PHP: Script for generating Crossword game?

I need an script for generating crossword game. I have a list of 8 words for which I wnat to generate a crossword game, let's say for 15 column and 15 row.
I am not getting the concept of this problem. How to generate this using PHP ?? Can anyone tell me how to do that ??
I think that sounds easier than it is in practice, certainly when you only start with a list of 15-20 words. It is very difficult this way to put those words into a crossword. In most cases it will even be impossible...
I think this is a fun idea and i will try that some time, should be possible. Of couse you never know if there is a posibility for the given words in the given size, but if you try tons of combinations with an algorithm i think that should get some "acceptable" results.
I'd just start with the first word put it on the map, and then you try all other words left in all positions. And so on. So you get really a damn lot of combinations, which you could delete if they break you wanted size, and in the end you might have a nice list of possibilites and show like the 10 smallest of that to choose from. My GF is away this weekend, maybe ill have a try. I think recursive could be the right way to do that.

Categories