how to preg match a string with a levenshtein distance in PHP

how to preg match a string with a levenshtein distance in PHP - php

How can I preg match a string, but tolerate a variable levensthein distance in the pattern?
$string = 'i eat apples and oranges all day long';
$find = 'and orangis';
$distance = 1;
$matches = pregMatch_withLevensthein($find, $distance, $string);
This would return 'and oranges';

By converting the search string into a regexp, we can match the pattern. Then we search using that regexp and do a comparison with levenshtein. If it matches the bounds we can return the values.
$string = 'i eat apples and oranges all day long';
$find = 'and orangis';
$distance = 1;
$matches = preg_match_levensthein($find, $distance, $string);
var_dump($matches);
function preg_match_levensthein($find, $distance, $string)
{
$found = array();
// Covert find into regex
$parts = explode(' ', $find);
$regexes = array();
foreach ($parts as $part) {
$regexes[] = '[a-z0-9]{' . strlen($part) . '}';
}
$regexp = '#' . implode('\s', $regexes) . '#i';
// Find all matches
preg_match_all($regexp, $string, $matches);
foreach ($matches as $match) {
// Check levenshtein distance and add to the found if within bounds
if (levenshtein($match[0], $find) <= $distance) {
$found[] = $match[0];
}
}
// return found
return $found;
}

Related

Replace the Nth occurrence of char in a string with a new substring

I want to do a str_replace() but only at the Nth occurrence.
Inputs:
$originalString = "Hello world, what do you think of today's weather";
$findString = ' ';
$nthOccurrence = 8;
$newWord = ' beautiful ';
Desired Output:
Hello world, what do you think of today's beautiful weather

Here is a tight little regex with \K that allows you to replace the nth occurrence of a string without repeating the needle in the pattern. If your search string is dynamic and might contain characters with special meaning, then preg_quote() is essential to the integrity of the pattern.
If you wanted to statically write the search string and nth occurrence into your pattern, it could be:
(?:.*?\K ){8}
or more efficiently for this particular case: (?:[^ ]*\K ){8}
\K tells the regex pattern to "forget" any previously matched characters in the fullstring match. In other words, "restart the fullstring match" or "Keep from here". In this case, the pattern only keeps the 8th space character.
Code: (Demo)
function replaceNth(string $input, string $find, string $replacement, int $nth = 1): string {
$pattern = '/(?:.*?\K' . preg_quote($find, '/') . '){' . $nth . '}/';
return preg_replace($pattern, $replacement, $input, 1);
}
echo replaceNth($originalString, $findString, $newWord, $nthOccurrence);
// Hello world, what do you think of today's beautiful weather
Another perspective on how to grapple the asked question is: "How to insert a new string after the nth instance of a search string?" Here is a non-regex approach that limits the explosions, prepends the new string to the last element then re-joins the elements. (Demo)
$originalString = "Hello world, what do you think of today's weather";
$findString = ' ';
$nthOccurrence = 8;
$newWord = 'beautiful '; // notice that leading space was removed
function insertAfterNth($input, $find, $newString, $nth = 1) {
$parts = explode($find, $input, $nth + 1);
$parts[$nth] = $newString . $parts[$nth];
return implode($find, $parts);
}
echo insertAfterNth($originalString, $findString, $newWord, $nthOccurrence);
// Hello world, what do you think of today's beautiful weather

I found an answer here - https://gist.github.com/VijayaSankarN/0d180a09130424f3af97b17d276b72bd
$subject = "Hello world, what do you think of today's weather";
$search = ' ';
$occurrence = 8;
$replace = ' nasty ';
/**
* String replace nth occurrence
*
* #param type $search Search string
* #param type $replace Replace string
* #param type $subject Source string
* #param type $occurrence Nth occurrence
* #return type Replaced string
*/
function str_replace_n($search, $replace, $subject, $occurrence)
{
$search = preg_quote($search);
echo preg_replace("/^((?:(?:.*?$search){".--$occurrence."}.*?))$search/", "$1$replace", $subject);
}
str_replace_n($search, $replace, $subject, $occurrence);

$originalString = "Hello world, what do you think of today's weather";
$findString = ' ';
$nthOccurrence = 8;
$newWord = ' beautiful ';
$array = str_split($originalString);
$count = 0;
$num = 0;
foreach ($array as $char) {
if($findString == $char){
$count++;
}
$num++;
if($count == $nthOccurrence){
array_splice( $array, $num, 0, $newWord );
break;
}
}
$newString = '';
foreach ($array as $char) {
$newString .= $char;
}
echo $newString;

I would consider something like:
function replaceNth($string, $substring, $replacement, $nth = 1){
$a = explode($substring, $string); $n = $nth-1;
for($i=0,$l=count($a)-1; $i<$l; $i++){
$a[$i] .= $i === $n ? $replacement : $substring;
}
return join('', $a);
}
$originalString = 'Hello world, what do you think of today\'s weather';
$test = replaceNth($originalString, ' ', ' beautiful ' , 8);
$test2 = replaceNth($originalString, 'today\'s', 'good');

First explode a string by parts, then concatenate the parts together and with search string, but at specific number concatenate with replace string (numbers here start from 0 for convenience):
function str_replace_nth($search, $replace, $subject, $number = 0) {
$parts = explode($search, $subject);
$lastPartKey = array_key_last($parts);
$result = '';
foreach($parts as $key => $part) {
$result .= $part;
if($key != $lastPartKey) {
if($key == $number) {
$result .= $replace;
} else {
$result .= $search;
}
}
}
return $result;
}
Usage:
$originalString = "Hello world, what do you think of today's weather";
$findString = ' ';
$nthOccurrence = 7;
$newWord = ' beautiful ';
$result = str_replace_nth($findString, $newWord, $originalString, $nthOccurrence);

php search on string comma separated and get element that match

I have a question, if anyone can help me to solve this. I have a string separated by commas, and I want to find an item that partially matches:
$search = "PrintOrder";
$string = "IDperson, Inscription, GenomaPrintOrder, GenomaPrintView";
I need to get only the full string from partial match as a result of filter:
$result = "GenomaPrintOrder";

With preg_match_all you can do like this.
Php Code
<?php
$subject = "IDperson, Inscription, GenomaPrintOrder, GenomaPrintView, NewPrintOrder";
$pattern = '/\b([^,]*PrintOrder[^,]*)\b/';
preg_match_all($pattern, $subject, $matches, PREG_SET_ORDER);
foreach ($matches as $val) {
echo "Matched: " . $val[1]. "\n";
}
?>
Output
Matched: GenomaPrintOrder
Matched: NewPrintOrder
Ideone Demo

$search = "PrintOrder";
$string = "IDperson, Inscription, GenomaPrintOrder, GenomaPrintView";
$result = array();
$tmp = explode(",", $string);
foreach($tmp as $entrie){
if(strpos($entrie, $string) !== false)
$result[] = trim($entrie);
}
This will get you an array with all strings that match your search-string.

You can use regular expression to get the result:
$search = "PrintOrder";
$string = "IDperson, Inscription, GenomaPrintOrder, GenomaPrintView";
$regex = '/([^,]*' . preg_quote($search, '/') . '[^,]*)/';
preg_match($regex, $string, $match);
$result = trim($match[1]); // $result == 'GenomaPrintOrder'

$search = "PrintOrder";
$string = "IDperson, Inscription, GenomaPrintOrder, GenomaPrintView";
$array = explode(" ", $string);
echo array_filter($array, function($var) use ($search) { return preg_match("/\b$searchword\b/i", $var); });

Since there are so many different answers already, here is another:
$result = preg_grep("/$search/", explode(", ", $string));
print_r($result);

Edit all odd words in string to upper case

I need to edit all odd words to upper case.
Here is sample of imput string:
very long string with many words
Expected output:
VERY long STRING with MANY words
I have this code, but it seams to me, that I can do it in better way.
<?php
$lines = file($_FILES["fname"]["tmp_name"]);
$pattern = "/(\S[\w]*)/";
foreach($lines as $value)
{
$words = NULL;
$fin_str = NULL;
preg_match_all($pattern, $value, $matches);
for($i = 0; $i < count($matches[0]); $i = $i + 2){
$matches[0][$i] = strtoupper($matches[0][$i]);
$fin_str = implode(" ", $matches[0]);
}
echo $fin_str ."<br>";
P.S. I need to use only preg_match function.

Here's a preg_replace_callback example:
<?php
$str = 'very long string with many words';
$newStr = preg_replace_callback('/([^ ]+) +([^ ]+)/',
function($matches) {
return strtoupper($matches[1]) . ' ' . $matches[2];
}, $str);
print $newStr;
// VERY long STRING with MANY words
?>
You only need to match the repeating pattern: /([^ ]+) +([^ ]+)/, a pair of words, then preg_replace_callback recurses over the string until all possible matches are matched and replaced. preg_replace_callback is necessary to call the strtoupper function and pass the captured backreference to it.
Demo

If you have to use regular expressions, this should get you started:
$input = 'very long string with many words';
if (preg_match_all('/\s*(\S+)\s*(\S+)/', $input, $matches)) {
$words = array();
foreach ($matches[1] as $key => $odd) {
$even = isset($matches[2][$key]) ? $matches[2][$key] : null;
$words[] = strtoupper($odd);
if ($even) {
$words[] = $even;
}
}
echo implode(' ', $words);
}
This will output:
VERY long STRING with MANY words

You may don't need regex simply use explode and concatenate the string again:
<?php
function upperizeEvenWords($str){
$out = "";
$arr = explode(' ', $str);
for ($i = 0; $i < count($arr); $i++){
if (!($i%2)){
$out .= strtoupper($arr[$i])." ";
}
else{
$out .= $arr[$i]." ";
}
}
return trim($out);
}
$str = "very long string with many words";
echo upperizeEvenWords($str);
Checkout this DEMO

str_word_count and Arabic text

I used the function str_word_count to count how many ARABIC words are in a text, but it returns zero:
$sentence = 'بِسْمِ اللَّهِ الرَّحْمَٰنِ الرَّحِيمِ';
$countSentence = str_word_count($sentence);
echo 'Total words '.$countSentence.'<br />';
Thanks in advance

Try to use this function
if (!function_exists('utf8_str_word_count')){
function utf8_str_word_count($string, $format = 0, $charlist = null) {
if ($charlist === null) {
$regex = '/\\pL[\\pL\\p{Mn}\'-]*/u';
}
else {
$split = array_map('preg_quote',
preg_split('//u',$charlist,-1,PREG_SPLIT_NO_EMPTY));
$regex = sprintf('/(\\pL|%1$s)([\\pL\\p{Mn}\'-]|%1$s)*/u',
implode('|', $split));
}
switch ($format) {
default:
case 0:
// For PHP >= 5.4.0 this is fine:
return preg_match_all($regex, $string);
// For PHP < 5.4 it's necessary to do this:
// $results = null;
// return preg_match_all($regex, $string, $results);
case 1:
$results = null;
preg_match_all($regex, $string, $results);
return $results[0];
case 2:
$results = null;
preg_match_all($regex, $string, $results, PREG_OFFSET_CAPTURE);
return empty($results[0])
? array()
: array_combine(
array_map('end', $results[0]),
array_map('reset', $results[0]));
}
}
}
Example
$sentence = 'بِسْمِ اللَّهِ الرَّحْمَٰنِ الرَّحِيمِ';
$countSentence = utf8_str_word_count($sentence);
echo 'Total words '.$countSentence.'<br />';

str_repeat reverse (shrink strings)

str_repeat(A, B) repeat string A, B times:
$string = "This is a " . str_repeat("test", 2) .
"! " . str_repeat("hello", 3) . " and Bye!";
// Return "This is a testtest! hellohellohello and Bye!"
I need reverse operation:
str_shrink($string, array("hello", "test"));
// Return "This is a test(x2)! hello(x3) and Bye!" or
// "This is a [test]x2! [hello]x3 and Bye!"
Best and efficient way for create str_shrink function?

Here are two versions that I could come up with.
The first uses a regular expression and replaces duplicate matches of the $needle string with a single $needle string. This is the most vigorously tested version and handles all possibilities of inputs successfully (as far as I know).
function str_shrink( $str, $needle)
{
if( is_array( $needle))
{
foreach( $needle as $n)
{
$str = str_shrink( $str, $n);
}
return $str;
}
$regex = '/(' . $needle . ')(?:' . $needle . ')+/i';
return preg_replace_callback( $regex, function( $matches) { return $matches[1] . '(x' . substr_count( $matches[0], $matches[1]) . ')'; }, $str);
}
The second uses string manipulation to continually replace occurrences of the $needle concatenated with itself. Note that this one will fail if $needle.$needle occurs more than once in the input string (The first one does not have this problem).
function str_shrink2( $str, $needle)
{
if( is_array( $needle))
{
foreach( $needle as $n)
{
$str = str_shrink2( $str, $n);
}
return $str;
}
$count = 1; $previous = -1;
while( ($i = strpos( $str, $needle.$needle)) > 0)
{
$str = str_replace( $needle.$needle, $needle, $str);
$count++;
$previous = $i;
}
if( $count > 1)
{
$str = substr( $str, 0, $previous) . $needle .'(x' . $count . ')' . substr( $str, $previous + strlen( $needle));
}
return $str;
}
See them both in action
Edit: I didn't realize that the desired output wanted to include the number of repetitions. I've modified my examples accordingly.

You can play around with tis one, not tested a lot though
function shrink($s, $parts, $mask = "%s(x%d)"){
foreach($parts as $part){
$removed = 0;
$regex = "/($part)+/";
preg_match_all($regex, $s, $matches, PREG_OFFSET_CAPTURE);
if(!$matches)
continue;
foreach($matches[0] as $m){
$offset = $m[1] - $removed;
$nb = substr_count($m[0], $part);
$counter = sprintf($mask, $part, $nb);
$s = substr($s, 0, $offset) . $counter . substr($s, $offset + strlen($m[0]));
$removed += strlen($m[0]) - strlen($part);
}
}
return $s;
}

I think you can try with:
<?php
$string = "This is a testtest! hellohellohello and Bye!";
function str_shrink($string, $array){
$tr = array();
foreach($array as $el){
$n = substr_count($string, $el);
$tr[$el] = $el.'(x'.$n.')';
$pattern[] = '/('.$el.'\(x'.$n.'\))+/i';
}
return preg_replace($pattern, '${1}', strtr($string,$tr));
}
echo $string;
echo '<br/>';
echo str_shrink($string,array('test','hello')); //This is a test(x2)! hello(x3) and Bye!
?>
I have a second version in order to works with strings:
<?php
$string = "This is a testtest! hellohellohello and Bye!";
function str_shrink($string, $array){
$tr = array();
$array = is_array($array) ? $array : array($array);
foreach($array as $el){
$sN = 'x'.substr_count($string, $el);
$tr[$el] = $el.'('.$sN.')';
$pattern[] = '/('.$el.'\('.$sN.'\))+/i';
}
return preg_replace($pattern, '${1}', strtr($string,$tr));
}
echo $string;
echo '<br/>';
echo str_shrink($string,array('test','hello')); //This is a test(x2)! hello(x3) and Bye!
echo '<br/>';
echo str_shrink($string,'test'); //This is a test(x2)! hellohellohello and Bye!
?>

I kept it short:
function str_shrink($haystack, $needles, $match_case = true) {
if (!is_array($needles)) $needles = array($needles);
foreach ($needles as $k => $v) $needles[$k] = preg_quote($v, '/');
$regexp = '/(' . implode('|', $needles) . ')+/' . ($match_case ? '' : 'i');
return preg_replace_callback($regexp, function($matches) {
return $matches[1] . '(x' . (strlen($matches[0]) / strlen($matches[1])) . ')';
}, $haystack);
}
The behavior of cases like str_shrink("aaa", array("a", "a(x3)")) is it returns "a(x3)", which I thought was more likely intended if you're specifying an array. For the other behavior, giving a result of "a(x3)(x1)", call the function with each needle individually.
If you don't want multiples of one to get "(x1)" change:
return $matches[1] . '(x' . (strlen($matches[0]) / strlen($matches[1])) . ')';
to:
$multiple = strlen($matches[0]) / strlen($matches[1]);
return $matches[1] . (($multiple > 1) ? '(x' . $multiple . ')' : '');

Here's a very direct, single-regex technique and you don't need to collect the words in the string in advance.
There will be some fringe cases to mitigate which are not represented in the sample input, but as for the general purpose of this task, I reckon this is the way that I'd script this in my project.
Match (and capture) any full word that is repeated one or more times.
Match the contiguous repetitions of the word.
Replace the fullstring match (substring of multiple words) with the captured first instance of the word.
Before returning the replacement string for re-insertion, add the desired formatting and calculate the number of repetitions by dividing the fullstring length by the captured string's length.
Code: (Demo)
$string = "This is a " . str_repeat("test", 2) .
"!\n" . str_repeat("hello", 3) . " and Bye!\n" .
"When I sleep, the thought bubble says " . str_repeat("zz", 3) . ".";
echo preg_replace_callback(
'~\b(\w+?)\1+\b~',
function($m) {
return "[{$m[1]}](" . (strlen($m[0]) / strlen($m[1])) . ")";
},
$string
);
Output:
This is a [test](2)!
[hello](3) and Bye!
When I sleep, the thought bubble says [z](6).
For a whitelist of needles, this adaptation to my above code does virtually the same job.
Code: (Demo)
function str_shrink($string, $needles) {
// this escaping is unnecessary if only working with alphanumeric characters
$needles = array_map(function($needle) {
return preg_quote($needle, '~');
}, $needles);
return preg_replace_callback(
'~\b(' . implode('|', $needles) . ')\1+\b~',
function($m) {
return "[{$m[1]}](" . (strlen($m[0]) / strlen($m[1])) . ")";
},
$string
);
}
echo str_shrink($string, ['test', 'hello']);
Output:
This is a [test](2)!
[hello](3) and Bye!
When I sleep, the thought bubble says zzzzzz.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

how to preg match a string with a levenshtein distance in PHP - php

How can I preg match a string, but tolerate a variable levensthein distance in the pattern? $string = 'i eat apples and oranges all day long'; $find = 'and orangis'; $distance = 1; $matches = pregMatch_withLevensthein($find, $distance, $string); This would return 'and oranges';

Related

Replace the Nth occurrence of char in a string with a new substring

php search on string comma separated and get element that match

Edit all odd words in string to upper case

str_word_count and Arabic text

str_repeat reverse (shrink strings)

Categories

Resources