Related
I was wondering if anyone knows of any library, script, or service that can spell check a string and return a suggestion of the properly spelled word or suggestions if more that one properly spelled word that it could be written in PHP.
I would prefer if there wasn't a limit on the amount of queries I could do, so not like Google's APIs.
It would be great if it could function like this:
// string to be spell checked stored in variable
$misspelledString = "The quick brown lama jumped over the lazy dog.";
//pass that variable to function
//function returns suggestion or suggestions as an array of string or strings
$suggestion = spellCheck($misspelledString);
echo "Did you mean ".$suggestion[0];
You can try the included Pspell functions:
http://php.net/manual/en/ref.pspell.php
Or an external plugin, like this one:
http://www.phpspellcheck.com/
Check this SO question for an example.
Not quite as nice an API as in your example, but Pspell would be an option. It may already be included with your system copy of PHP. You'll need aspell libraries for each language you want to check.
http://php.net/manual/en/book.pspell.php
On my debian based machine, it's included in the system repositories as a separate package, php5-pspell.
You need to have "pspell" PHP extension, you can install it on Linux using CLI:
sudo apt-get install php-pspell;
sudo service apache2 restart;
The code is very simple:
if ($word = $_GET['word']) {
$spellLink = pspell_new("en");
if (!pspell_check($spellLink, $word)) {
$suggestions = pspell_suggest($spellLink, $word);
echo '<p>Did you mean: <i>"'.$suggestions[0].'"</i>?</p>';
}
}
I attempted to create a class that takes a list of phrases and compares that to the user inputs. What I was trying to do is get things like Porshre Ceyman to correct to Porsche Cayman for example.
This class requires an array of correct terms $this->full_model_list , and an array of the user input $search_terms. I took out the contruct so you will need to pass in the full_model_list. Note, this didn't fully work so I decided to scrap it, it was adapted from someone looking to correct large sentences ...
You would call it like so:
$sth = new SearchTermHelper;
$resArr = $sth->spellCheckModelKeywords($search_terms)
Code (VERY BETA) :
<?php
/*
// ---------------------------------------------------------------------------------------------------------------------
// ---------------------------------------------------------------------------------------------------------------------
//
// FUNCTION: Search Term Helper Class
// PURPOSE: Handles finding matches and such with search terms for keyword searching.
// DETAILS: Functions below build search combinations, find matches, look for spelling issues in words etc.
//
// ---------------------------------------------------------------------------------------------------------------------
// ---------------------------------------------------------------------------------------------------------------------
*/
class SearchTermHelper
{
public $full_model_list;
private $inv;
// --------------------------------------------------------------------------------------------------------------
// -- return an array of metaphones for each word in a string
// --------------------------------------------------------------------------------------------------------------
private function getMetaPhone($phrase)
{
$metaphones = array();
$words = str_word_count($phrase, 1);
foreach ($words as $word) {
$metaphones[] = metaphone($word);
}
return $metaphones;
}
// --------------------------------------------------------------------------------------------------------------
// -- return the closest matching string found in $this->searchAgainst when compared to $this->input
// --------------------------------------------------------------------------------------------------------------
public function findBestMatchReturnString($searchAgainst, $input, $max_tolerance = 200, $max_length_diff = 200, $min_str = 3, $lower_case = true, $search_in_phrases = true)
{
if (empty($searchAgainst) || empty($input)) return "";
//weed out strings we thing are too small for this
if (strlen($input) <= $min_str) return $input;
$foundbestmatch = -1;
if ($lower_case) $input = strtolower($input);
//sort list or else not best matches may be found first
$counts = array();
foreach ($searchAgainst as $s) {
$counts[] = strlen($s);
}
array_multisort($counts, $searchAgainst);
//get the metaphone equivalent for the input phrase
$tempInput = implode(" ", $this->getMetaPhone($input));
$list = array();
foreach ($searchAgainst as $phrase) {
if ($lower_case) $phrase = strtolower($phrase);
if ($search_in_phrases) $phraseArr = explode(" ",$phrase);
foreach ($phraseArr as $word) {
//get the metaphone equivalent for each phrase we're searching against
$tempSearchAgainst = implode(' ', $this->getMetaPhone($word));
$similarity = levenshtein($tempInput, $tempSearchAgainst);
if ($similarity == 0) // we found an exact match
{
$closest = $word;
$foundbestmatch = 0;
echo "" . $closest . "(" . $foundbestmatch . ") <br>";
break;
}
if ($similarity <= $foundbestmatch || $foundbestmatch < 0) {
$closest = $word;
$foundbestmatch = $similarity;
//keep score
if (array_key_exists($closest, $list)) {
//echo "" . $closest . "(" . $foundbestmatch . ") <br>";
$list[$closest] += 1;
} else {
$list[$closest] = 1;
}
}
}
if ($similarity == 0 || $similarity <= $max_tolerance) break;
}
// if we find a bunch of a value, assume it to be what we wanted
if (!empty($list)) {
if ($most_occuring = array_keys($list, max($list)) && max($list) > 10) {
return $closest;
}
}
//echo "input:".$input."(".$foundbestmatch.") match: ".$closest."\n";
// disallow results to be all that much different in char length (if you want)
if (abs(strlen($closest) - strlen($input)) > $max_length_diff) return "";
// based on tolerance of difference, return if match meets this requirement (0 = exact only 1 = close, 20+ = far)
return ((int)$foundbestmatch <= (int)$max_tolerance) ? $closest : "";
}
// --------------------------------------------------------------------------------------------------------------
// -- Handles passing arrays instead of a string above ( could have done this in the func above )
// --------------------------------------------------------------------------------------------------------------
public function findBestMatchReturnArray($searchAgainst, $inputArray, $max_tolerance = 200, $max_length_diff = 200, $min_str = 3)
{
$results = array();
$tempStr = '';
foreach ($inputArray as $item) {
if ($tmpStr = $this->findBestMatchReturnString($searchAgainst, $item, $max_tolerance, $max_length_diff, $min_str))
$results[] = $tmpStr;
}
return (!empty($results)) ? $results : $results = array();
}
// --------------------------------------------------------------------------------------------------------------
// -- Build combos of search terms -- So we can check Cayman S or S Cayman etc.
// careful, this is very labor intensive ( O(n^k) )
// --------------------------------------------------------------------------------------------------------------
public function buildSearchCombinations(&$set, &$results)
{
for ($i = 0; $i < count($set); $i++) {
$results[] = $set[$i];
$tempset = $set;
array_splice($tempset, $i, 1);
$tempresults = array();
$this->buildSearchCombinations($tempset, $tempresults);
foreach ($tempresults as $res) {
$results[] = trim($set[$i]) . " " . trim($res);
}
}
}
// --------------------------------------------------------------------------------------------------------------
// -- Model match function -- Get best model match from user input.
// --------------------------------------------------------------------------------------------------------------
public function findBestSearchMatches($model_type, $search_terms, $models_list)
{
$partial_search_phrases = array();
if (count($search_terms) > 1) {
$this->buildSearchCombinations($search_terms, $partial_search_phrases); // careful, this is very labor intensive ( O(n^k) )
$partial_search_phrases = array_diff($partial_search_phrases, $search_terms);
for ($i = 0; $i < count($search_terms); $i++) $partial_search_phrases[] = $search_terms[$i];
$partial_search_phrases = array_values($partial_search_phrases);
} else {
$partial_search_phrases = $search_terms;
}
//sort list or else not best matches may be found first
$counts = array();
foreach ($models_list as $m) {
$counts[] = strlen($m);
}
array_multisort($counts,SORT_DESC,$models_list);
unset($counts);
//sort list or else not best matches may be found first
foreach ($partial_search_phrases as $p) {
$counts[] = strlen($p);
}
array_multisort($counts,SORT_DESC,$partial_search_phrases);
$results = array("exact_match" => '', "partial_match" => '');
foreach ($partial_search_phrases as $term) {
foreach ($models_list as $model) {
foreach ($model_type as $mt) {
if (strpos(strtolower($model), strtolower($mt)) !== false) {
if ((strtolower($model) == strtolower($term) || strtolower($model) == strtolower($mt . " " . $term))
) {
// echo " " . $model . " === " . $term . " <br>";
if (strlen($model) > strlen($results['exact_match']) /*|| strtolower($term) != strtolower($mt)*/
) {
$results['exact_match'] = strtolower($model);
return $results;
}
} else if (strpos(strtolower($model), strtolower($term)) !== false) {
if (strlen($term) > strlen($results['partial_match'])
|| strtolower($term) != strtolower($mt)
) {
$results['partial_match'] = $term;
//return $results;
}
}
}
}
}
}
return $results;
}
// --------------------------------------------------------------------------------------------------------------
// -- Get all models in DB for Make (e.g. porsche) (could include multiple makes)
// --------------------------------------------------------------------------------------------------------------
public function initializeFullModelList($make) {
$this->full_model_list = array();
$modelsDB = $this->inv->getAllModelsForMakeAndCounts($make);
foreach ($modelsDB as $m) {
$this->full_model_list[] = $m['model'];
}
}
// --------------------------------------------------------------------------------------------------------------
// -- spell checker -- use algorithm to check model spelling (could expand to include english words)
// --------------------------------------------------------------------------------------------------------------
public function spellCheckModelKeywords($search_terms)
{
// INPUTS: findBestMatchReturnArray($searchList, $inputArray,$tolerance,$differenceLenTolerance,$ignoreStringsOfLengthX,$useLowerCase);
//
// $searchList, - The list of items you want to get a match from
// $inputArray, - The user input value or value array
// $tolerance, - How close do we want the match to be 0 = exact, 1 = close, 2 = less close, etc. 20 = find a match 100% of the time
// $lenTolerance, - the number of characters between input and match allowed, ie. 3 would mean match can be +- 3 in length diff
// $ignoreStrLessEq, - min number of chars that must be before checking (i.e. if 3 ignore anything 3 in length to check)
// $useLowerCase - puts the phrases in lower case for easier matching ( not needed per se )
// $searchInPhrases - compare against every word in searchList (which could be groups of words per array item (so search every word past to function
$tolerance = 0; // 1-2 recommended
$lenTolerance = 1; // 1-3 recommended
$ignoreStrLessEq = 3; // may not want to correct tiny words, 3-4 recommended
$useLowercase = true; // convert to lowercase matching = true
$searchInPhrases = true; //match words not phrases, true recommended
$spell_checked_search_terms = $this->findBestMatchReturnArray($this->full_model_list, $search_terms, $tolerance, $lenTolerance, $ignoreStrLessEq, $useLowercase,$searchInPhrases);
$spell_checked_search_terms = array_values($spell_checked_search_terms);
// return spell checked terms
if (!empty($spell_checked_search_terms)) {
if (strpos(strtolower(implode(" ", $spell_checked_search_terms)), strtolower(implode(" ", $search_terms))) === false //&&
// strlen(implode(" ", $spell_checked_search_terms)) > 4
) {
return $spell_checked_search_terms;
}
}
// or just return search terms as is
return $search_terms;
}
}
?>
I need to generate about 100000 unique code. I have tried following code. But it is getting slower. Can anybody suggest me how can I make it faster, soon I have to generate unique code of 1M.
$uniqueArray = array();
for($i = 1; $i <= 100000; $i++) {
$pass = substr(md5(uniqid(mt_rand(), true)) , 0, 6);
if (!in_array($pass, $uniqueArray)) {
$uniqueArray[] = 'AB' . $pass;
}
}
As is, your code is not guaranteed to generate a given amount of unique values. You would need to keep track of the actual number of values in $uniqueArray to be sure.
That said, you can speed up the generation by not searching in the array but by using the array as a map (key => value), e.g.:
<?php
declare(strict_types=1);
error_reporting(-1);
ini_set('display_errors', 'On');
/**
* #return string
*/
function createUniqueCode(): string
{
// ...or any other implementation
return 'AB' . substr(md5(uniqid((string)mt_rand(), true)), 0, 6);
}
$uniqueMap = [];
$n = 10000;
while ($n > 0) {
$pass = createUniqueCode();
if (!isset($uniqueMap[$pass])) {
$uniqueMap[$pass] = true;
$n--;
}
}
$codes = array_keys($uniqueMap);
echo count($codes);
<?php
$count = 100000;
$uniqueArray = range(0,$count-1);
shuffle($uniqueArray);
array_walk($uniqueArray,function(&$v,$k){$v = "AB".sprintf("%06d",$v);});
Runtime <100ms for 100000 values.
Note: If you need the full uniqueness of the values then use the pure uniqueid and dont reduce the results with substr() or other functions.
I have something like this array of strings, as seen they are very similar, with the exception of the same one place in the string:
$strings = [
'This is +1% better than last time!',
'This is +2% better than last time!',
'This is +3% better than last time!',
'This is +4% better than last time!',
'This is +5% better than last time!',
...
];
// Psuedo code
From this I'd like to end up with
$array = [
'1',
'2',
'3',
'4',
'5',
...
];
// And preferrably
$string = 'This is +%s% better than last time!';
Via a function that takes any array of similar strings and outputs what is actually different in them.
If you know that your array of similar strings will only differ in one place than you can compare them from the beginning until they differ and from the end until they differ and record those offsets. Then extract the string beginning at the first difference until the last difference from each string in your array.
function myfunc($strings) {
// if the array is empty we don't have anything to do
if (count($strings) == 0) return "";
// count how many words are in a string (use the first one)
$num_tokens = 0;
$tok = strtok($strings[0], " ");
while ($tok !== false) {
$num_tokens++;
$tok = strtok(" ");
}
$output = "";
$tokens = [];
// iterate over each word of the string
for ($w = 0; $w < $num_tokens; $w++) {
// iterate over each string
for ($s = 0; $s < count($strings); $s++) {
// extract the same word of each string
$tokens[$s] = strtok($strings[$s], " ");
// remove that word from the string so it
// will not be extracted again
$strings[$s] = trim(substr($strings[$s], strlen($tokens[$s])));
}
$first_token = true;
$tmp = "";
// If all words we have extracted are equal, we add that
// word to the $output string. Otherwise we add '+%s%'.
for ($s = 0; $s < count($strings); $s++) {
// In the first iteration we just extract the word. We
// will start comparing from the next iteration.
if ($first_token) {
$tmp = $tokens[$s];
$first_token = false;
} else {
// If the words are not the same, we will add '+%s%' and
// exit the loop.
if ($tokens[$s] != $tmp) {
$tmp = "+%s%";
break;
}
}
}
// Add the word to the $output string. If it's the first
// word we don't add a white space before it.
if ($output == "") {
$output .= $tmp;
} else {
$output .= " $tmp";
}
}
return $output;
}
Example of usage:
$strings = [
'This is +1% better than last time!',
'This is +2% better than last time!',
'This is +3% better than last time!',
'This is +4% better than last time!',
'This is +5% better than last time!',
];
echo myfunc($strings);
Warning for a very long post with a lot of code!
Thanks a lot for the help everybody, all of which gave me very good hints of where to go with this. Here is my solution / class to solve this problem, which is an extension of the code of Vicente Olivert Riera's answer and the method FlyingFoX explained in his/her answer:
Link to GitHub repo
class StringDiff
{
/**
* The unformatted string to be used in the
* vsprintf call
* #var string
*/
protected $unformattedString = '';
/**
* Array with arguments replacing the %s in the
* unformatted string
* #var array
*/
protected $args = [];
/**
* Returns the arguments to be used for a vsprintf
* call along with the format string
* #return array
*/
public function getArgs()
{
return $this->args;
}
/**
* Returns the unformatted string to be used in
* a vsprint call along with the arguments
* #return string
*/
public function getUnformattedString()
{
return $this->unformattedString;
}
/**
* Takes an array argument of very similarly formatted
* strings and fills in the $unformattedString and $args
* variables from the data provided
* #param array $strings Group of very similar strings
* #return void
*/
public function run(array $strings)
{
// If there are no strings, return nothing
if (count($strings) == 0) return '';
// Replacing % with %% so the vsprintf call doesn't
// bug the arguments
$strings = str_replace('%', '%%', $strings);
$num_words = 0;
// Explodes each string into many smaller containing
// only words
foreach($strings as $key => $string)
{
$strings[$key] = explode(' ', $string);
}
$num_words = count($strings[0]);
// Array containing the indices of the substrings
// that are different
$sub_str_nr = [];
// Loops through all the words in each string
for ($n = 0; $n < $num_words; $n++)
{
// First round only sets the string to be compared with
$first_round = true;
for ($s = 0; $s < count($strings); $s++)
{
if ($first_round)
{
$first_round = false;
$tmp[0] = $strings[$s][$n];
}
else
{
$tmp[1] = $strings[$s][$n];
if ($tmp[0] == $tmp[1])
{
$tmp[0] = $tmp[1];
}
else
{
if (!in_array($n, $sub_str_nr))
{
$sub_str_nr[] = $n;
}
}
}
}
}
// Array to hold the arguments, i.e. all the strings
// that differ from each other. From these the differences
// will be deduced and put into the $this->args variable
$args = [];
foreach($sub_str_nr as $nr)
{
$tmpArgs = [];
for ($a = 0; $a < count($strings); $a++)
{
$tmpArgs[] = $strings[$a][$nr];
}
$args[] = $tmpArgs;
}
foreach($args as $key => $arg)
{
// Offset from the beginning of the string that is still the same
$front_offset = 0;
// If the offset from the beginning has been maxed
$front_flag = true;
// Offset from the end of the string that is still the same
$back_offset = 0;
// Id the offset from the end has been maxed
$back_flag = true;
// The string to be compared against is the first in line
$tmp = $arg[0];
while ($front_flag || $back_flag)
{
// Flag 1 & 2 limits to only one increase of offset per loop
$flag1 = true;
$flag2 = true;
for ($a = 1; $a < count($strings); $a++)
{
// The two following if statements compare substring
// to substring of length one
if ($front_flag && $flag1)
{
if (substr($tmp, $front_offset, 1) != substr($arg[$a], $front_offset, 1) || is_numeric(substr($arg[$a], $front_offset, 1)))
{
$front_flag = false;
}
else
{
$front_offset++;
$flag1 = false;
}
}
if ($back_flag && $flag2)
{
if (substr($tmp, strlen($tmp) - $back_offset - 1, 1) != substr($arg[$a], strlen($arg[$a]) - $back_offset - 1, 1) || is_numeric(substr($arg[$a], strlen($arg[$a]) - $back_offset - 1, 1)))
{
$back_flag = false;
}
else
{
$back_offset++;
$flag2 = false;
}
}
}
}
// Sets the $this->args variable with the found arguments
foreach($arg as $arkey => $ar)
{
$this->args[$arkey][$key] = (float)substr($arg[$arkey], $front_offset, strlen($arg[$arkey]) - $back_offset - $front_offset);
}
// Changes the strings for the unformatted string, switches
// out the varying part to %s
$strings[0][$sub_str_nr[$key]] = substr($arg[0], 0, $front_offset) . '%s' . substr($arg[0], strlen($arg[0]) - $back_offset, $back_offset);
}
// Creates the unformatted string from the array of
// words, which originates from the original long string
$unformattedString = '';
foreach($strings[0] as $string)
{
$unformattedString.= ' ' . $string;
}
// Trim whitespaces in the beginning and end of the
// formatted string
$this->unformattedString = trim($unformattedString);
return;
}
}
How to use:
$stringd = new StringDiff;
$test_array = [
"Your Cooldown Reduction cap is increased to 41% and you gain 1% Cooldown Reduction",
"Your Cooldown Reduction cap is increased to 42% and you gain 2% Cooldown Reduction",
"Your Cooldown Reduction cap is increased to 43% and you gain 3% Cooldown Reduction",
"Your Cooldown Reduction cap is increased to 44% and you gain 4% Cooldown Reduction",
"Your Cooldown Reduction cap is increased to 45% and you gain 5% Cooldown Reduction",
];
$stringd->run($test_array);
foreach($stringd->getArgs() as $arg)
{
echo vsprintf($stringd->getUnformattedString(), $arg) . '<br>';
}
Outputs:
Your Cooldown Reduction cap is increased to 41% and you gain 1% Cooldown Reduction
Your Cooldown Reduction cap is increased to 42% and you gain 2% Cooldown Reduction
Your Cooldown Reduction cap is increased to 43% and you gain 3% Cooldown Reduction
Your Cooldown Reduction cap is increased to 44% and you gain 4% Cooldown Reduction
Your Cooldown Reduction cap is increased to 45% and you gain 5% Cooldown Reduction
array(5) {
[0]=>
array(2) {
[0]=>
float(41)
[1]=>
float(1)
}
[1]=>
array(2) {
[0]=>
float(42)
[1]=>
float(2)
}
[2]=>
array(2) {
[0]=>
float(43)
[1]=>
float(3)
}
[3]=>
array(2) {
[0]=>
float(44)
[1]=>
float(4)
}
[4]=>
array(2) {
[0]=>
float(45)
[1]=>
float(5)
}
}
Your Cooldown Reduction cap is increased to %s%% and you gain %s%% Cooldown Reduction
And yes, if you are wondering, this is related to the Riot API.
If you have suggestions for improvement or any changes at all, feel free to comment down below :)
I was wondering if anyone knows of any library, script, or service that can spell check a string and return a suggestion of the properly spelled word or suggestions if more that one properly spelled word that it could be written in PHP.
I would prefer if there wasn't a limit on the amount of queries I could do, so not like Google's APIs.
It would be great if it could function like this:
// string to be spell checked stored in variable
$misspelledString = "The quick brown lama jumped over the lazy dog.";
//pass that variable to function
//function returns suggestion or suggestions as an array of string or strings
$suggestion = spellCheck($misspelledString);
echo "Did you mean ".$suggestion[0];
You can try the included Pspell functions:
http://php.net/manual/en/ref.pspell.php
Or an external plugin, like this one:
http://www.phpspellcheck.com/
Check this SO question for an example.
Not quite as nice an API as in your example, but Pspell would be an option. It may already be included with your system copy of PHP. You'll need aspell libraries for each language you want to check.
http://php.net/manual/en/book.pspell.php
On my debian based machine, it's included in the system repositories as a separate package, php5-pspell.
You need to have "pspell" PHP extension, you can install it on Linux using CLI:
sudo apt-get install php-pspell;
sudo service apache2 restart;
The code is very simple:
if ($word = $_GET['word']) {
$spellLink = pspell_new("en");
if (!pspell_check($spellLink, $word)) {
$suggestions = pspell_suggest($spellLink, $word);
echo '<p>Did you mean: <i>"'.$suggestions[0].'"</i>?</p>';
}
}
I attempted to create a class that takes a list of phrases and compares that to the user inputs. What I was trying to do is get things like Porshre Ceyman to correct to Porsche Cayman for example.
This class requires an array of correct terms $this->full_model_list , and an array of the user input $search_terms. I took out the contruct so you will need to pass in the full_model_list. Note, this didn't fully work so I decided to scrap it, it was adapted from someone looking to correct large sentences ...
You would call it like so:
$sth = new SearchTermHelper;
$resArr = $sth->spellCheckModelKeywords($search_terms)
Code (VERY BETA) :
<?php
/*
// ---------------------------------------------------------------------------------------------------------------------
// ---------------------------------------------------------------------------------------------------------------------
//
// FUNCTION: Search Term Helper Class
// PURPOSE: Handles finding matches and such with search terms for keyword searching.
// DETAILS: Functions below build search combinations, find matches, look for spelling issues in words etc.
//
// ---------------------------------------------------------------------------------------------------------------------
// ---------------------------------------------------------------------------------------------------------------------
*/
class SearchTermHelper
{
public $full_model_list;
private $inv;
// --------------------------------------------------------------------------------------------------------------
// -- return an array of metaphones for each word in a string
// --------------------------------------------------------------------------------------------------------------
private function getMetaPhone($phrase)
{
$metaphones = array();
$words = str_word_count($phrase, 1);
foreach ($words as $word) {
$metaphones[] = metaphone($word);
}
return $metaphones;
}
// --------------------------------------------------------------------------------------------------------------
// -- return the closest matching string found in $this->searchAgainst when compared to $this->input
// --------------------------------------------------------------------------------------------------------------
public function findBestMatchReturnString($searchAgainst, $input, $max_tolerance = 200, $max_length_diff = 200, $min_str = 3, $lower_case = true, $search_in_phrases = true)
{
if (empty($searchAgainst) || empty($input)) return "";
//weed out strings we thing are too small for this
if (strlen($input) <= $min_str) return $input;
$foundbestmatch = -1;
if ($lower_case) $input = strtolower($input);
//sort list or else not best matches may be found first
$counts = array();
foreach ($searchAgainst as $s) {
$counts[] = strlen($s);
}
array_multisort($counts, $searchAgainst);
//get the metaphone equivalent for the input phrase
$tempInput = implode(" ", $this->getMetaPhone($input));
$list = array();
foreach ($searchAgainst as $phrase) {
if ($lower_case) $phrase = strtolower($phrase);
if ($search_in_phrases) $phraseArr = explode(" ",$phrase);
foreach ($phraseArr as $word) {
//get the metaphone equivalent for each phrase we're searching against
$tempSearchAgainst = implode(' ', $this->getMetaPhone($word));
$similarity = levenshtein($tempInput, $tempSearchAgainst);
if ($similarity == 0) // we found an exact match
{
$closest = $word;
$foundbestmatch = 0;
echo "" . $closest . "(" . $foundbestmatch . ") <br>";
break;
}
if ($similarity <= $foundbestmatch || $foundbestmatch < 0) {
$closest = $word;
$foundbestmatch = $similarity;
//keep score
if (array_key_exists($closest, $list)) {
//echo "" . $closest . "(" . $foundbestmatch . ") <br>";
$list[$closest] += 1;
} else {
$list[$closest] = 1;
}
}
}
if ($similarity == 0 || $similarity <= $max_tolerance) break;
}
// if we find a bunch of a value, assume it to be what we wanted
if (!empty($list)) {
if ($most_occuring = array_keys($list, max($list)) && max($list) > 10) {
return $closest;
}
}
//echo "input:".$input."(".$foundbestmatch.") match: ".$closest."\n";
// disallow results to be all that much different in char length (if you want)
if (abs(strlen($closest) - strlen($input)) > $max_length_diff) return "";
// based on tolerance of difference, return if match meets this requirement (0 = exact only 1 = close, 20+ = far)
return ((int)$foundbestmatch <= (int)$max_tolerance) ? $closest : "";
}
// --------------------------------------------------------------------------------------------------------------
// -- Handles passing arrays instead of a string above ( could have done this in the func above )
// --------------------------------------------------------------------------------------------------------------
public function findBestMatchReturnArray($searchAgainst, $inputArray, $max_tolerance = 200, $max_length_diff = 200, $min_str = 3)
{
$results = array();
$tempStr = '';
foreach ($inputArray as $item) {
if ($tmpStr = $this->findBestMatchReturnString($searchAgainst, $item, $max_tolerance, $max_length_diff, $min_str))
$results[] = $tmpStr;
}
return (!empty($results)) ? $results : $results = array();
}
// --------------------------------------------------------------------------------------------------------------
// -- Build combos of search terms -- So we can check Cayman S or S Cayman etc.
// careful, this is very labor intensive ( O(n^k) )
// --------------------------------------------------------------------------------------------------------------
public function buildSearchCombinations(&$set, &$results)
{
for ($i = 0; $i < count($set); $i++) {
$results[] = $set[$i];
$tempset = $set;
array_splice($tempset, $i, 1);
$tempresults = array();
$this->buildSearchCombinations($tempset, $tempresults);
foreach ($tempresults as $res) {
$results[] = trim($set[$i]) . " " . trim($res);
}
}
}
// --------------------------------------------------------------------------------------------------------------
// -- Model match function -- Get best model match from user input.
// --------------------------------------------------------------------------------------------------------------
public function findBestSearchMatches($model_type, $search_terms, $models_list)
{
$partial_search_phrases = array();
if (count($search_terms) > 1) {
$this->buildSearchCombinations($search_terms, $partial_search_phrases); // careful, this is very labor intensive ( O(n^k) )
$partial_search_phrases = array_diff($partial_search_phrases, $search_terms);
for ($i = 0; $i < count($search_terms); $i++) $partial_search_phrases[] = $search_terms[$i];
$partial_search_phrases = array_values($partial_search_phrases);
} else {
$partial_search_phrases = $search_terms;
}
//sort list or else not best matches may be found first
$counts = array();
foreach ($models_list as $m) {
$counts[] = strlen($m);
}
array_multisort($counts,SORT_DESC,$models_list);
unset($counts);
//sort list or else not best matches may be found first
foreach ($partial_search_phrases as $p) {
$counts[] = strlen($p);
}
array_multisort($counts,SORT_DESC,$partial_search_phrases);
$results = array("exact_match" => '', "partial_match" => '');
foreach ($partial_search_phrases as $term) {
foreach ($models_list as $model) {
foreach ($model_type as $mt) {
if (strpos(strtolower($model), strtolower($mt)) !== false) {
if ((strtolower($model) == strtolower($term) || strtolower($model) == strtolower($mt . " " . $term))
) {
// echo " " . $model . " === " . $term . " <br>";
if (strlen($model) > strlen($results['exact_match']) /*|| strtolower($term) != strtolower($mt)*/
) {
$results['exact_match'] = strtolower($model);
return $results;
}
} else if (strpos(strtolower($model), strtolower($term)) !== false) {
if (strlen($term) > strlen($results['partial_match'])
|| strtolower($term) != strtolower($mt)
) {
$results['partial_match'] = $term;
//return $results;
}
}
}
}
}
}
return $results;
}
// --------------------------------------------------------------------------------------------------------------
// -- Get all models in DB for Make (e.g. porsche) (could include multiple makes)
// --------------------------------------------------------------------------------------------------------------
public function initializeFullModelList($make) {
$this->full_model_list = array();
$modelsDB = $this->inv->getAllModelsForMakeAndCounts($make);
foreach ($modelsDB as $m) {
$this->full_model_list[] = $m['model'];
}
}
// --------------------------------------------------------------------------------------------------------------
// -- spell checker -- use algorithm to check model spelling (could expand to include english words)
// --------------------------------------------------------------------------------------------------------------
public function spellCheckModelKeywords($search_terms)
{
// INPUTS: findBestMatchReturnArray($searchList, $inputArray,$tolerance,$differenceLenTolerance,$ignoreStringsOfLengthX,$useLowerCase);
//
// $searchList, - The list of items you want to get a match from
// $inputArray, - The user input value or value array
// $tolerance, - How close do we want the match to be 0 = exact, 1 = close, 2 = less close, etc. 20 = find a match 100% of the time
// $lenTolerance, - the number of characters between input and match allowed, ie. 3 would mean match can be +- 3 in length diff
// $ignoreStrLessEq, - min number of chars that must be before checking (i.e. if 3 ignore anything 3 in length to check)
// $useLowerCase - puts the phrases in lower case for easier matching ( not needed per se )
// $searchInPhrases - compare against every word in searchList (which could be groups of words per array item (so search every word past to function
$tolerance = 0; // 1-2 recommended
$lenTolerance = 1; // 1-3 recommended
$ignoreStrLessEq = 3; // may not want to correct tiny words, 3-4 recommended
$useLowercase = true; // convert to lowercase matching = true
$searchInPhrases = true; //match words not phrases, true recommended
$spell_checked_search_terms = $this->findBestMatchReturnArray($this->full_model_list, $search_terms, $tolerance, $lenTolerance, $ignoreStrLessEq, $useLowercase,$searchInPhrases);
$spell_checked_search_terms = array_values($spell_checked_search_terms);
// return spell checked terms
if (!empty($spell_checked_search_terms)) {
if (strpos(strtolower(implode(" ", $spell_checked_search_terms)), strtolower(implode(" ", $search_terms))) === false //&&
// strlen(implode(" ", $spell_checked_search_terms)) > 4
) {
return $spell_checked_search_terms;
}
}
// or just return search terms as is
return $search_terms;
}
}
?>
I have a PHP script that generates some strings which will be used as license keys:
function KeyGen(){
$key = md5(microtime());
$new_key = '';
for($i=1; $i <= 25; $i ++ ){
$new_key .= $key[$i];
if ( $i%5==0 && $i != 25) $new_key.='-';
}
return strtoupper($new_key);
}
$x = 0;
while($x <= 10) {
echo KeyGen();
echo "<br />";
$x++;
}
After running the script once, I got these:
8B041-EC7D2-0B9E3-09846-E8C71
C8D82-514B9-068BC-8BF80-05061
A18A3-E05E5-7DED7-D09ED-298C4
FB1EC-C9844-B9B20-ADE2F-0858F
E9AED-945C8-4BAAA-6938D-713ED
4D284-C5A3B-734DF-09BD6-6A34C
EF534-3BAE4-860B5-D3260-1CEF8
D84DB-B8C72-5BDEE-1B4FE-24E90
93AF2-80813-CD66E-E7A5E-BF0AE
C3397-93AA3-6239C-28D9F-7A582
D83B8-697C6-58CD1-56F1F-58180
What I now am trying to do is change it so that I have another function that will check if the key has been generated using my script. Currently, what I am thinking is setting the $key to the MD5 of one specific string (for example, test) but, of course, that returns all the strings the same.
Can anyone help?
There are three basic ways of handling this. How you do it will depend on how many keys you're generating, and how important is may be to be able to invalidate keys at a later day. Which you choose is up to you.
Option 1: Server Database Storage
When the server generates a key (like using your algorithm), you store it in a database. Then later all you need to do to check the key is see if it's in the database.
Note that your algorithm needs a lot more entropy than you're providing it. The current timestamp is NOT enough. Instead, use strong randomness:
$key = mcrypt_create_iv($length_needed, MCRYPT_DEV_URANDOM);
Or, if you don't have mcrypt:
$key = openssl_random_pseudo_bytes($length_needed);
Or if you don't have mcrypt and openssl, use a library
Note that md5 returns a hex output (a-f0-9), where all of the above return full random binary strings (characters 0 - 255). So either base64_encode() it, or bin2hex() it.
Pros:
Simple to implement
Can "deactive" issued keys at a later date
Impossible to forge a new key
Cons:
Requires persistent storage per key
May not scale that well
Requires "key server" to validate keys
Option 2: Signing Keys
Basically, you generate a strong random key (from here out called the private key), and store it on your server. Then, when generating the license key, you generate a random blob, and then HMAC sign it with the private key, and make the license part of that block. That way, you don't need to store each individual key.
function create_key($private_key) {
$rand = mcrypt_create_iv(10, MCRYPT_DEV_URANDOM);
$signature = substr(hash_hmac('sha256', $rand, $private_key, true), 0, 10);
$license = base64_encode($rand . $signature);
return $license;
}
function check_key($license, $private_key) {
$tmp = base64_decode($license);
$rand = substr($tmp, 0, 10);
$signature = substr($tmp, 10);
$test = substr(hash_hmac('sha256', $rand, $private_key, true), 0, 10);
return $test === $signature;
}
Pros:
Simple to implement
Does not require persistent storage
Trivial to scale
Cons:
Cannot "Deactivate" keys individual
Requires storing "private keys"
Requires "key server" to validate keys.
Option 3: Public Key Crypto
Basically, you generate a public/private key pair. You embed the public key in your application. Then, you generate a key (similar to "signing keys" above), but instead of signing it with the HMAC signature, you sign it with a private key.
That way, the application (which has the public key) can verify the signature directly without needing to call back to your server.
function create_key($private_key) {
$rand = mcrypt_create_iv(10, MCRYPT_DEV_URANDOM);
$pkeyid = openssl_get_privatekey($private_key);
openssl_sign($rand, $signature, $pkeyid);
openssl_free_key($pkeyid);
$license = base64_encode($rand . $signature);
return $license;
}
function check_key($license, $public_key) {
$tmp = base64_decode($license);
$rand = substr($tmp, 0, 10);
$signature = substr($tmp, 10);
$pubkeyid = openssl_get_publickey($public_key);
$ok = openssl_verify($rand, $signature, $pubkeyid);
openssl_free_key($pubkeyid);
return $ok === 1;
}
Pros:
Simple to implement
Does not require persistent storage
Trivial to scale
Does not require "key server" to validate keys
Cons:
Cannot "Deactivate" keys individual
Requires storing "private keys"
Note:
This solution is on the assumption you want your licence key to always be in fixed format (see below) and still self authenticated
FORMAT : XXXXX-XXXXX-XXXXX-XXXXX-XXXX
If that is not the case refer to #ircmaxell for a better solution
Introduction
Self authenticated serial is tricky solution because:
Limited Size of Serial
It need to authenticate it self without Database or any storage
If private key is leaked .. it can easily be reversed
Example
$option = new CheckProfile();
$option->name = "My Application"; // Application Name
$option->version = 0.9; // Application Version
$option->username = "Benedict Lewis"; // you can limit the key to per user
$option->uniqid = null; // add if any
$checksum = new Checksum($option);
$key = $checksum->generate();
var_dump($key, $checksum->check($key));
Output
string '40B93-C7FD6-AB5E6-364E2-3B96F' (length=29)
boolean true
Please note that any modification in the Options would change the key and make it invalid;
Checking for collision
I just ran this simple test
set_time_limit(0);
$checksum = new Checksum($option);
$cache = array();
$collision = $error = 0;
for($i = 0; $i < 100000; $i ++) {
$key = $checksum->generate();
isset($cache[$key]) and $collision ++;
$checksum->check($key) or $error ++;
$cache[$key] = true;
}
printf("Fond %d collision , %d Errors in 100000 entries", $collision, $error);
Output
Fond 0 collision , 0 Errors in 100000 entries
Better Security
By default the script uses sha1 but PHP has a lot of better hash functions you can get that with the following code
print_r(hash_algos());
Example
$checksum = new Checksum($option, null, "sha512");
Class Used
class Checksum {
// Used used binaray in Hex format
private $privateKey = "ec340029d65c7125783d8a8b27b77c8a0fcdc6ff23cf04b576063fd9d1273257"; // default
private $keySize = 32;
private $profile;
private $hash = "sha1";
function __construct($option, $key = null, $hash = "sha1") {
$this->profile = $option;
$this->hash = $hash;
// Use Default Binary Key or generate yours
$this->privateKey = ($key === null) ? pack('H*', $this->privateKey) : $key;
$this->keySize = strlen($this->privateKey);
}
private function randString($length) {
$r = 0;
switch (true) {
case function_exists("openssl_random_pseudo_bytes") :
$r = bin2hex(openssl_random_pseudo_bytes($length));
break;
case function_exists("mcrypt_create_ivc") :
default :
$r = bin2hex(mcrypt_create_iv($length, MCRYPT_DEV_URANDOM));
break;
}
return strtoupper(substr($r, 0, $length));
}
public function generate($keys = false) {
// 10 ramdom char
$keys = $keys ? : $this->randString(10);
$keys = strrev($keys); // reverse string
// Add keys to options
$this->profile->keys = $keys;
// Serialise to convert to string
$data = json_encode($this->profile);
// Simple Random Chr authentication
$hash = hash_hmac($this->hash, $data, $this->privateKey);
$hash = str_split($hash);
$step = floor(count($hash) / 15);
$i = 0;
$key = array();
foreach ( array_chunk(str_split($keys), 2) as $v ) {
$i = $step + $i;
$key[] = sprintf("%s%s%s%s%s", $hash[$i ++], $v[1], $hash[$i ++], $v[0], $hash[$i ++]);
$i ++; // increment position
}
return strtoupper(implode("-", $key));
}
public function check($key) {
$key = trim($key);
if (strlen($key) != 29) {
return false;
}
// Exatact ramdom keys
$keys = implode(array_map(function ($v) {
return $v[3] . $v[1];
}, array_map("str_split", explode("-", $key))));
$keys = strrev($keys); // very important
return $key === $this->generate($keys);
}
}
What you are actually looking for is an algorithm like Partial Key Validation
See this article for the workings and port it to PHP
http://www.brandonstaggs.com/2007/07/26/implementing-a-partial-serial-number-verification-system-in-delphi/
Store these keys in a database when you create them.Later match them with the database rows and voila..It will be done
Note that it's not impossible that you will get duplicate keys with this algorithm, it's unlikely, but so is winning the lottery. You will have to store the keys in a database or file to check if it allready exists.