How to replace substrings using randomly selected array elements?

How to replace substrings using randomly selected array elements? - php

Just a comment:
In a previous question, I need to change the values, from column 0 to 1 or 1 to 0, depending on any value being parsed in the $myVar variable, the solution appended by a user here on stackoverflow was ideal:
$myWords=array(
array('funny','sad'),
array('fast','slow'),
array('beautiful','ugly'),
array('left','right'),
array('5','five'),
array('strong','weak')
);
// prepare values from $myWords for use with strtr()
$replacements=array_combine(array_column($myWords,0),array_column($myWords,1))+
array_combine(array_column($myWords,1),array_column($myWords,0));
echo strtr($myVar,$replacements);
Inputs/Outputs:
$myVar='I was beautiful and strong when I was 5 now I\'m ugly and weak';
//outputs: I was ugly and weak when I was five now I'm beautiful and strong
This Question:
But when in cases, do you see arrays with more than one option to make the switch? How to make the system do the exchange for any word among the multiple options, but without running the risk of doing "echo" with the key / word originally presented in $myVar that key / word that called the stock exchange action?
$myWords=array(
array('funny','sad','very sad','hyper funny'),
array('fast','slow','faster','very slow'),
array('beautiful','Studious man','great beauty','very hardworking'),
);
$myVar = 'That man is really fast and very hardworking';
How to make the system choose among other options, but it exclude from the exchange or rand or mt_rand etc ..., the keys responsible for calling the action: fast, hardworking, so as not to run the risk of $myVar not be changed.
Possible expected output:
$myVar = 'That man is really faster and Studious man';
fast must not be replaced by fast and very hardworking must not be replaced by very hardworking.

I think this is what you are trying to do...
Check the string for a random selection from each subarray.
If there is a match, you want to replace it with any of the other words in the same subarray.
If there is no match, just move on to the next subarray.
$myWords=array(
array('funny','sad','very sad','hyper funny'),
array('fast','slow','faster','very slow'),
array('beautiful','Studious man','great beauty','very hardworking'),
);
foreach($myWords as $words){
// randomize the subarray
shuffle($words);
// pipe-together the words and return just one match
if(preg_match('/\b\K'.implode('|',$words).'\b/',$myVar,$out)){ Bad pattern = incorrect use of \b
if(preg_match('/\K\b(?:'.implode('|',$words).')\b/',$myVar,$out)){
// generate "replace_pair" from matched word and a random remaining subarray word
// replace and preserve the new sentence
$myVar=strtr($myVar,[$out[0]=>current(array_diff($words,$out))]);
}
}
echo $myVar;
If:
$myVar='That man is really fast and very hardworking';
Then the output could be any of the following and more:
That man is really faster and great beauty
That man is really slow and Studious man
etc...
Effectively, no matter what random replacement happens, the output will never be the same as the input.
Here is the demo link.
This is the preg_match_all() version:
$myWords=array(
array('funny','sad','very sad','hyper funny'),
array('fast','slow','faster','very slow'),
array('beautiful','Studious man','great beauty','very hardworking'),
);
$myVar='The slow epic fail was both sad and funny';
foreach($myWords as $words){
$replacepairs=[]; // clear array
if(preg_match_all('/\b\K'.implode('|',$words).'\b/',$myVar,$out)){ Bad pattern = incorrect use of \b
if(preg_match_all('/\K\b(?:'.implode('|',$words).')\b/',$myVar,$out)){ // match all occurences
foreach($out[0] as $w){
$remaining=array_diff($words,[$w]); // find the remaining valid replacment words
shuffle($remaining); // randomize the remaining replacement words
$replacepairs[$w]=current($remaining); // pluck first value from remaining words
}
$myVar=strtr($myVar,$replacepairs); // use replacepairs on sentence
}
}
echo $myVar;
Possible outputs:
The faster epic fail was both hyper funny and hyper funny
The very slow epic fail was both very sad and hyper funny
The fast epic fail was both funny and very sad
etc...
Here is the demo link.

Related

Matching a longest word using Regex only

I need to match longest word of given string using regex:
for example given string
S = "hello night axe axbxbxx prom etc..."
character set 1 = [abcdexy]
character set 2 = [mnrpo]
I need to get only one word that match 2 constriants, all the word should contain characters from one set only and the chosen word should be the longest, I tried to solve this using php regex such as:
preg_match("/\b[abcdexy]+/",$s, $match1);
preg_match("/\b[mnrpo]+/",$s, $match2);
if(strlen($match1[0]) > strlen($match2[0]))
{
//output match1[0];
}
else
{
//output match2[0]
}
The expected output should be axbxbxx since it contain only characters from set 1 and it is the longest between words that belong to one of the two sets.
My question is, can I make this work using only regex without need for strlen() testing?

You can write a single regex expression that uses a pipe to match both character ranges, then sort the matched values by descending length and access the first element's value.
Code: (Demo)
$string='hello proxy night pom-pom-mop axe prom etc decayed';
if (preg_match_all('~\b(?:[a-exy]+|[m-pr]+)\b~', $string, $out)) {
usort($out[0], function($a, $b) {return strlen($b) - strlen($a);}); // or spaceship operator if you like
echo $out[0][0];
} else {
echo "no matches";
}
Output:
decayed
The above method is not "tie-aware" so if you have two values or more values that share the highest length, you will only get one value in the output. I think you need to build in some additional logic to handle these fringe cases like:
Output all highest length values or
Set a secondary criteria to break ties on length
I'll not bother coding up these solution extensions since I prefer not to go down rabbit holes.

I need a PHP regular expression to validate string format of 5 digits, one comma

I have a huge PHP input box on a webpage. This input should only take 5 digit string separated by commas:
00100,00247,90277,97030,00657
notice the last one has no comma at the end.
Is there a regular expression that can do this? Since the input box is very large and can take 100+ of these items, I want to validate it on the PHP server side before the database is queried and those avoid any SQL Injection tries.
Query is only run if only 5 numbers and a comma in the sequence, except for the last one.
These are a state's public water system ID's by the way.

I believe this will get the result you're looking for, though explode may be the better option.
/^(?:\d{5},)*\d{5}$/
This will only match 1 or more 5-digit numbers that are comma delimited with no spaces.

Since this is user submitted data, your validation should be more flexible. What if the user accidentally puts a space after one of the commas? Or a line break gets inserted?
I realize you are looking for a regex solution but may I suggest using explode to create an array and apply a rule to each element. Having them separated into elements allows more flexibility when validating and storing:
$nums = explode(',', '00100,00247,90277,97030,00657');
foreach ($nums as $num) {
if (!preg_match('/^\d{5}$)/', trim($num))) {
// error!
}
}

I'd explode it and validate each string individually:
$input = '00100,00247,90277,97030,00657';
$input_array = explode(',', $input);
$is_valid = true;
foreach ($input_array as $number) {
if (preg_match("/\\d/", trim($number)) != strlen(trim($number))) {
$is_valid = false;
}
}
print($is_valid);

I think you rather need str_getcsv:
while ($row = str_getcsv($fp)) {
// $row is an array containing your digits
}

Simple. This regex matches a value having one or more comma separated 5-digit numbers:
if (preg_match('/^\d{5}(\s*,\s*\d{5})*$/', $value)) {
// Good value
}
It allows whitespace between the numbers as well.

This might work:
/^\d{5}(?:,\d{5})*$/
edit 1 noticed ridgerunner has the same answer, so disregard this.
edit 2 some notes on performance.
Failure analysis
Backtracking give back on failure:
^\d{5}(?:,\d{5})*$ gives back ,\d{5}
^(?:\d{5},)*\d{5}$ gives back \d{5},
Post Backtracking regressive topography checks:
(After backtracking give back, checks are to the right of the one that gave back)
^\d{5}(?:,\d{5})*$ checks for $
^(?:\d{5},)*\d{5}$ checks for \d{5}$
Winner: ^\d{5}(?:,\d{5})*$
NON-Backtracking regex's (using possesive quantifier +):
^\d{5}(?:,\d{5})*+$ gives nothing back, fails immediately
^(?:\d{5},)*+\d{5}$ gives nothing back fails immediately
Benchmarks
Using a string of 50 blocks of \d{5},.
The sample string is matched against each regex in a loop of 100,000 times.
Failure was induced at the end of the string, removed for a sucess test.
Sucess:
All took 1 second to complete a sucessfull run.
Failure, Backtracking:
^\d{5}(?:,\d{5})\*$ took 1.2 seconds best
^(?:\d{5},)\*\d{5}$ took 1.6 seconds
Failure, Non-Backtracking:
^\d{5}(?:,\d{5})*+$ took .9 seconds
^(?:\d{5},)*+\d{5}$ took .9 seconds
Conclusions
Backtracking - Put the smallest post-backtracking check
after the backtracking sub-expression. In this case, the
smallest is $.
In general, put the required expressions ahead of the optional ones.
Best ^\d{5}(?:,\d{5})*$
NON-Backtracking - It doesn't matter.
^\d{5}(?:,\d{5})*+$ or ^(?:\d{5},)*+\d{5}$

PHP & word counting from string

Trying to take numbers from a text file and see how many times they occur.
I've gotten to the point where I can print all of them out, but I want to display just the number once, and then maybe the occurrences after them (ie: Key | Amount; 317 | 42).
Not looking for an Answer per se, all learning is good, but if you figure one out for me, that would be awesome as well!

preg_match_all will return the number of matches against a string.
$count = preg_match_all("#$key#", $string);
print "{$key} - {$count}";

So if you're already extracting the data you need, you can do this using a (fairly) simple array:
$counts = array();
foreach ($keysFoundFromFile AS $key) {
if (!$counts[$key]) $counts[$key] = 0;
$counts[$key]++;
}
print_r($counts);
If you're already looping to extract the keys from the file, then you can simply assign them directly to the $counts array without making a second loop.

I think you're looking for the function substr_count().
Just a heads up though, if you're looking for "123" and it finds "4512367" it will match it as part of it. The alternative would be using RegEx and using word boundaries:
$count = preg_match_all('|\b'. preg_quote($num) .'\b|', $text);
(preg_quote() for good practice, \b for word boundaries so we can be assured that it's not a number embedded in another number.)

What is the best way to unscramble words with PHP?

I have a word list and I want to unscramble words using this word list, in PHP.
It seems to me that PHP doesn't have a built-in function that does this. So could someone please suggest a good algorithm to do this, or at least point me in the right direction?
EDIT: edited to add example
So basically, what I'm talking about is I have a list of words:
apple
banana
orange
Then, I'm given a bunch of jumbled letters.
pplea
nanaba
eroang

Given a dictionary of known words:
foreach ($list as $word)
{
if (count_chars($scrambled_word,1) == count_chars($word,1))
echo "$word\n";
}
Edit: A simple optimization would be to move the count_chars($scrambled_word,1)) outside the loop since it never changes:
$letters = count_chars($scrambled_word,1)
foreach ($list as $word)
{
if ($letters == count_chars($word,1))
echo "$word\n";
}

Warning: I rarely use PHP, so this is dealing only with a general algorithm that should work in almost any language, not anything specific to PHP.
Presumably you have a word in which the letters have been rearranged, and you want to find what word(s) could be made from those letters.
If that's correct, the general idea is fairly simple: take a copy of your word list, and sort the letters in each word into alphabetical order. Put the sorted and unsorted versions of each word side by side, and sort the whole thing by the sorted words (but keeping each unsorted word along with its sorted version). You may want to collapse duplicates together, so that (for example) instead of {abt : bat} and {abt : tab}, you have: {abt: bat, tab}
Then, to match up a scrambled word, sort its letters in alphabetical order. Look for matches in your dictionary (since it's sorted, you can use a binary search). When you find a match, the result is the word (or words) associated with that sorted letter group. Using the example above, if the scrambled word was "tba", you'd sort it to get "abt", then look up "abt" to get "bat" and "tab".
Edit: As #Moron pointed out in the comments, sorting and binary search aren't really crucial points in themselves. The basic points are to turn all equivalent inputs into identical keys, then use some sort of fast lookup by key to find the word(s) for that key.
Sorting the letters in each word is one easy way to turn equivalent inputs into identical keys. Sorting the list and doing a binary search is one easy way to do fast lookups by key.
In both cases, there are quite a few alternatives. I'm not at all sure the alternatives are likely to improve performance a lot, but they certainly could.
Just for example, instead of a pure binary search you could have a second level of index that told you where the keys starting with 'a' were, the keys starting with 'b', and so on. Given that a couple of extremely frequently-used letters are near the beginning of the alphabet ('e' and 'a', for example) you might be better off sorting the words so that relatively uncommon letters ('q', 'z', etc.) are toward the front of the key, and the most commonly used letters are at the end. This would give that first lookup based on the initial character the greatest discrimination.
On the sort/binary search side, there are probably more alternatives, and probably better arguments to be made in favor of using something else. Hash tables typically allow lookups in (nearly) constant time. Tries can reduce storage substantially, especially when many words share a common prefix. The only obvious disadvantage is that the code for either one is probably more work (though PHP's array type is hash-based, so you could probably use it quite nicely).

It is possible to unscramble in O(log p + n) where
p = size of dictionary
n = length of word to be unscrambled
Assume a constant, c, the most occurrences of some letter within any word plus 1.
Assume a constant, k, the number of letters in the alphabet.
Assume a constant, j, the most number of words that can share the same hash or letter-sorted version.
Initialization of O(p) space:
1. Using the dictionary, D, create an associated list of letter sorted words, L, which will be size at most p since each word has a one sorted version.
2. Associate another column to L with a numerical hash of integers which can range [0, c^k-1].
3. For each word in L, generate its hash with the following function:
hash(word) = 0 if word is empty or (c^i + hash(remaining substring of the word))
where i is the zero-based alphabet index of the first letter.
Algorithm:
1. In O(n), determine hash, h, of the letter sorted version of the word in question.
2. In O(log p), search for the hash in L.
3. In O(n), list j associated words of length n.

Try these
http://www.php.net/manual/en/function.similar-text.php
http://www.php.net/manual/en/function.soundex.php
http://www.php.net/manual/en/function.levenshtein.php

The slow option would be to generate all permutations of the letters in a scrambled word, then probe them via pspell_check().
If you however can use a raw dictionary text file, then the best option is to just use a regular expression to scan it:
$dict = file_get_contents("words.txt"); // one word per line
$n = strlen($word);
if (preg_match('/^[$word]{$n}$/im', $dict, $match)) {
print $match[0];
}
I'm quite certain PCRE is significantly faster at searching for permutations than PHP and the guessing approach.

Make use of PHP's array functions since they can solve this for you.
$words = array('hello', 'food', 'stuff', 'happy', 'fast');
$scrambled_word = 'oehll';
foreach ($words as $word)
{
// Same length?
if (strlen($scrambled_word) === strlen($word))
{
// Convert to an array and match
if( ! array_diff(str_split($word), str_split($scrambled_word)))
{
print "Your word is: $word";
}
}
}
Basically, you look for something the same length - then you ask PHP to see if all the letters are the same.

If you have a really large list of words and want this unscramble operation to be fast, I'd try putting the word list into a database. Next add a field to your word list table that is the sum of the ascii values of the word, and then add an index on this ascii sum.
Whenever you want to retrieve a list of possible matches just search the word table for ascii sums that match the sum of the scrambled letters. Keep in mind that you may have a few false matches so you'll have to compare all of the matched words to ensure they contain only the letters of your scrambled word (but the result set should be pretty small).
If you don't want to use a database you could implement the same basic idea using a file, just sort the list by the sum value for faster retrieval of all matches.
Example Data assumes all lowercase (a=97, b=98, c=99, ...)
bat => 311,
cat => 312, ...
Example php function to figure out the sum for a word
function asciiSum($word) {
$characters = str_split(strtolower($word));
$sum = 0;
foreach($characters as $character) {
$sum += ord($character);
}
return $sum;
}
Even faster: add another field to the database that represents the string length, then you can search for words based on an ascii sum and a string length which would further reduce the number of false matches you would need to check for.

Any faster, simpler alternative to php preg_match

I am using cakephp 1.3 and I have textarea where users submit articles. On submit, I want to look into the article for certain key words and and add respective tags to the article.
I was thinking of preg_match, But preg_match pattern has to be string. So I would have to loop through an array(big).
Is there a easier way to plug in the keywords array for the pattern.
I appreciate all your help.
Thanks.

I suggest treating your array of keywords like a hash table. Lowercase the article text, explode by spaces, then loop through each word of the exploded array. If the word exists in your hash table, push it to a new array while keeping track of the number of times it's been seen.
I ran a quick benchmark comparing regex to hash tables in this scenario. To run it with regex 1000 times, it took 17 seconds. To run it with a hash table 1000 times, it took 0.4 seconds. It should be an O(n+m) process.
$keywords = array("computer", "dog", "sandwich");
$article = "This is a test using your computer when your dog is being a dog";
$arr = explode(" ", strtolower($article));
$tracker = array();
foreach($arr as $word){
if(in_array($word, $keywords)){
if(isset($tracker[$word]))
$tracker[$word]++;
else
$tracker[$word] = 1;
}
}
The $tracker array would output: "computer" => 1, "dog" => 2. You can then do the process to decide what tags to use. Or if you don't care about the number of times the keyword appears, you can skip the tracker part and add the tags as the keywords appear.
EDIT: The keyword array may need to be an inverted index array to ensure the fastest lookup. I am not sure how in_array() works, but if it searches, then this isn't as fast as it should be. An inverted index array would look like
array("computer" => 1, "dog" => 1, "sandwich" => 1); // "1" can be any value
Then you would do isset($keywords[$word]) to check if the word matches a keyword, instead of in_array(), which should give you O(1). Someone else may be able to clarify this for me though.

If you don't need the power of regular expressions, you should just use strpos().
You will still need to loop through the array of words, but strpos is much, much faster than preg_match.

Of course, you could try matching all the keywords using one single regexp, like /word1|word2|word3/, but I'm not sure it is what you are looking for. And also I think it would be quite heavy and resource-consuming.
Instead, you can try with a different approach, such as splitting the text into words and checking if the words are interesting or not. I would make use of str_word_count() using someting like:
$text = 'this is my string containing some words, some of the words in this string are duplicated, some others are not.';
$words_freq = array_count_values(str_word_count($text, 1));
that splits the text into words and counts occurrences. Then you can check with in_array($keyword, $words_freq) or array_intersect(array_keys($words_freq), $my_keywords).
If you are not interested, as I guess, to the keywords case, you can strtolower() the whole text before proceeding with words splitting.
Of course, the only way to determine which approach is the best is to setup some testing, by running various search functions against some "representative" and quite long text and measuring the execution time and resource usage (try microtime(TRUE) and memory_get_peak_usage() to benchmark this).
EDIT: I cleaned up a bit the code and added a missing semi-colon :)

If you want to look for multiple words from an array, then combine said array into an regular expression:
$regex_array = implode("|", array_map("preg_escape", $array));
preg_match_all("/($regex_array)/", $src, $tags);
This converts your array into /(word|word|word|word|word|...)/. The arrray_map and preg_escape part is optional, only needed if the $array might contain special characters.
Avoid strpos and loops for this case. preg_match is faster for searching after alternatives.

strtr()
If given two arguments, the second
should be an array in the form
array('from' => 'to', ...). The return
value is a string where all the
occurrences of the array keys have
been replaced by the corresponding
values. The longest keys will be tried
first. Once a substring has been
replaced, its new value will not be
searched again.

Add tags manually? Just like we add tags here at SO.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.