how to do a preg_replace on a string in php? - php

i have some simple code that does a preg match:
$bad_words = array('dic', 'tit', 'fuc',); //for this example i replaced the bad words
for($i = 0; $i < sizeof($bad_words); $i++)
{
if(preg_match("/$bad_words[$i]/", $str, $matches))
{
$rep = str_pad('', strlen($bad_words[$i]), '*');
$str = str_replace($bad_words[$i], $rep, $str);
}
}
echo $str;
So, if $str was "dic" the result will be '*' and so on.
Now there is a small problem if $str == f.u.c. The solution might be to use:
$pattern = '~f(.*)u(.*)c(.*)~i';
$replacement = '***';
$foo = preg_replace($pattern, $replacement, $str);
In this case i will get ***, in any case. My issue is putting all this code together.
I've tried:
$pattern = '~f(.*)u(.*)c(.*)~i';
$replacement = 'fuc';
$fuc = preg_replace($pattern, $replacement, $str);
$bad_words = array('dic', 'tit', $fuc,);
for($i = 0; $i < sizeof($bad_words); $i++)
{
if(preg_match("/$bad_words[$i]/", $str, $matches))
{
$rep = str_pad('', strlen($bad_words[$i]), '*');
$str = str_replace($bad_words[$i], $rep, $str);
}
}
echo $str;
The idea is that $fuc becomes fuc then I place it in the array then the array does its jobs, but this doesn't seem to work.

First of all, you can do all of the bad word replacements with one (dynamically generated) regex, like this:
$bad_words = array('dic', 'tit', 'fuc',);
$str = preg_replace_callback("/\b(?:" . implode( '|', $bad_words) . ")\b/",
function( $match) {
return str_repeat( '*', strlen( $match[0]));
}, $str);
Now, you have the problem of people adding periods in between the word, which you can search for with another regex and replace them as well. However, you must keep in mind that . matches any character in a regex, and must be escaped (with preg_quote() or a backslash).
$bad_words = array_map( function( $el) {
return implode( '\.', str_split( $el));
}, $bad_words);
This will create a $bad_words array similar to:
array(
'd\.i\.c',
't\.i\.t',
'f\.u\.c'
)
Now, you can use this new $bad_words array just like the above one to replace these obfuscated ones.
Hint: You can make this array_map() call "better" in the sense that it can be smarter to catch more obfuscations. For example, if you wanted to catch a bad word separated with either a period or a whitespace character or a comma, you can do:
$bad_words = array_map( function( $el) {
return implode( '(?:\.|\s|,)', str_split( $el));
}, $bad_words);
Now if you make that obfuscation group optional, you'll catch a lot more bad words:
$bad_words = array_map( function( $el) {
return implode( '(?:\.|\s|,)?', str_split( $el));
}, $bad_words);
Now, bad words should match:
f.u.c
f,u.c
f u c
fu c
f.uc
And many more.

Related

Php search replacer

I have a search String: $str (Something like "test"), a wrap string: $wrap (Something like "|") and a text string: $text (Something like "This is a test Text").
$str is 1 Time in $text. What i want now is a function that will wrap $str with the wrap defined in $wrap and output the modified text (even if $str is more than one time in $text).
But it shall not output the whole text but just 1-2 of the words before $str and then 1-2 of the words after $str and "..." (Only if it isn`t the first or last word). Also it should be case insensitive.
Example:
$str = "Text"
$wrap = "<span>|</span>"
$text = "This is a really long Text where the word Text appears about 3 times Text"
Output would be:
"...long <span>Text</span> where...word <span>Text</span> appears...times <span>Text</span>"
My Code (Obviusly doesnt works):
$tempar = preg_split("/$str/i", $text);
if (count($tempar) <= 2) {
$result = "... ".substr($tempar[0], -7).$wrap.substr($tempar[1], 7)." ...";
} else {
$amount = substr_count($text, $str);
for ($i = 0; $i < $amount; $i++) {
$result = $result.".. ".substr($tempar[$i], -7).$wrap.substr($tempar[$i+1], 0, 7)." ..";
}
}
If you have a tipp or a solution dont hesitate to let me know.
I have taken your approach and made it more flexible. If $str or $wrap changes you could have escaping issues within the regex pattern so I have used preg_quote.
Note that I added $placeholder to make it clearer, but you can use $placeholder = "|" if you don't like [placeholder].
function wrapInString($str, $text, $element = 'span') {
$placeholder = "[placeholder]"; // The string that will be replaced by $str
$wrap = "<{$element}>{$placeholder}</{$element}>"; // Dynamic string that can handle more than just span
$strExp = preg_quote($str, '/');
$matches = [];
$matchCount = preg_match_all("/(\w+\s+)?(\w+\s+)?({$strExp})(\s+\w+)?(\s+\w+)?/i", $text, $matches);
$response = '';
for ($i = 0; $i < $matchCount; $i++) {
if (strlen($matches[1][$i])) {
$response .= '...';
}
if (strlen($matches[2][$i])) {
$response .= $matches[2][$i];
}
$response .= str_replace($placeholder, $matches[3][$i], $wrap);
if (strlen($matches[4][$i])) {
$response .= $matches[4][$i];
}
if (strlen($matches[5][$i]) && $i == $matchCount - 1) {
$response .= '...';
}
}
return $response;
}
$text = "text This is a really long Text where the word Text appears about 3 times Text";
string(107) "<span>text</span> This...long <span>text</span> where...<span>text</span> appears...times <span>text</span>"
To make the replacement case insensitive you can use the i regex option.
If I understand your question correct, just a little bit of implode and explode magic needed
$text = "This is a really long Text where the word Text appears about 3 times Text";
$arr = explode("Text", $text);
print_r(implode('<span>Text</span>', $arr));
If you specifically need to render the span tags using HTML, just write it that way
$arr = explode("Text", $text);
print_r(implode('<span>Text</span>', $arr));
Use patern below to get your word and 1-2 words before and after
/((\w+\s+){1,2}|^)text((\s+\w+){1,2}|$)/i
demo
In PHP code it can be:
$str = "Text";
$wrap = "<span>|</span>";
$text = "This is a really long Text where the word Text appears about 3 times Text";
$temp = str_replace('|', $str, $wrap); // <span>Text</span>
// find patern and 1-2 words before and after
// (to make it casesensitive, delete 'i' from patern)
if(preg_match_all('/((\w+\s+){1,2}|^)text((\s+\w+){1,2}|$)/i', $text, $match)) {
$res = array_map(function($x) use($str, $temp) { return '... '.str_replace($str, $temp, $x) . ' ...';}, $match[0]);
echo implode(' ', $res);
}

Only reverse the direction of consecutive Hebrew words in a string including non-Hebrew characters

I am trying to reverse a string containing Hebrew from RTL to LTR, but my coding attempt is reversing brackets as well. strrev() didn't work because it does not actually work for UTF8 strings. So I wrote a custom function, below is my code:
$str = 'תירס גדול A-10 (פרי גליל)';
function utf8_strrev($str)
{
$arr = '';
$words = explode(" ", $str);
$start_tag = '(';
$end_tag = ')';
foreach ($words as $word)
{
if (preg_match("/\p{Hebrew}/u", $word))
{
preg_match_all('/./us', $word, $ar);
echo print_r($ar[0]);
echo '<br>';
$arr = join('', array_reverse($ar[0])) . " " . $arr;
} else
{
preg_match_all('/./us', $word, $ar);
$arr = join('', $ar[0]) . " " . $arr;
}
}
return $arr;
}
OUTPUT :
)לילג ירפ( A-10 לודג סרית
what it should be:
(לילג ירפ) A-10 לודג סרית
Found this function in the comments on the docs that KoenHoeijmakers posted. I tested it, but I don't read Hebrew so it is hard for me to tell if it's working correctly.
function utf8_strrev($str){
preg_match_all('/./us', $str, $ar);
return join('',array_reverse($ar[0]));
}
Edit
Based on reading your question again, I think this works as you need it to?
function utf8_strrev($str)
{
$arr = '';
$words = explode(" ", $str);
$start_tag = '(';
$end_tag = ')';
foreach ($words as $word)
{
if (preg_match("/\p{Hebrew}/u", $word))
{
preg_match_all('/./us', $word, $ar);
$arr = join('', array_reverse($ar[0])) . " " . $arr;
} else
{
preg_match_all('/./us', $word, $ar);
$arr = join('', $ar[0]) . " " . $arr;
}
}
return preg_replace(array('/\)(.)/','/(.)\(/','/\}(.)/','/(.)\{/'),array('($1','$1)','{$1','$1}'),$arr);
}
$str='תירס גדול A-10 {פרי גליל}';
echo utf8_strrev($str);
Outut
{לילג ירפ} A-10 לודג סרית
Again, I don't read Hebrew, but hopefully it answers your question.
Note The reason I used preg_replace instead of str_replace is because the string replace method was giving me issues text like ( somthing something ( or ) something something )
I don't speak/read Hebrew, but this seems to work as desired.
I generally dislike nesting preg_ calls inside preg_ calls, but in this case, splitting the unicode characters into an array then reversing the elements prevents the need to bother with mb_encodings.
Code: (Demo)
$str = 'תירס גדול A-10 (פרי גליל)';
echo preg_replace_callback(
"/\p{Hebrew}+(?: \p{Hebrew}+)*/u",
fn($m) => implode(
array_reverse(
preg_split('~~u', $m[0], 0, PREG_SPLIT_NO_EMPTY)
)
),
$str
);
Output:
סרית A-10 (לילג ירפ)
The preg_replace_callback() pattern isolates consecutive space-delimited Hebrew words, then the anonymous function splits the multibyte letters into individual array elements, before reversing their order and joining the elements back into a single mutated string.

Regular expression for finding multiple patterns from a given string

I am using regular expression for getting multiple patterns from a given string.
Here, I will explain you clearly.
$string = "about us";
$newtag = preg_replace("/ /", "_", $string);
print_r($newtag);
The above is my code.
Here, i am finding the space in a word and replacing the space with the special character what ever i need, right??
Now, I need a regular expression that gives me patterns like
about_us, about-us, aboutus as output if i give about us as input.
Is this possible to do.
Please help me in that.
Thanks in advance!
And finally, my answer is
$string = "contact_us";
$a = array('-','_',' ');
foreach($a as $b){
if(strpos($string,$b)){
$separators = array('-','_','',' ');
$outputs = array();
foreach ($separators as $sep) {
$outputs[] = preg_replace("/".$b."/", $sep, $string);
}
print_r($outputs);
}
}
exit;
You need to do a loop to handle multiple possible outputs :
$separators = array('-','_','');
$string = "about us";
$outputs = array();
foreach ($separators as $sep) {
$outputs[] = preg_replace("/ /", $sep, $string);
}
print_r($outputs);
You can try without regex:
$string = 'about us';
$specialChar = '-'; // or any other
$newtag = implode($specialChar, explode(' ', $string));
If you put special characters into an array:
$specialChars = array('_', '-', '');
$newtags = array();
foreach ($specialChars as $specialChar) {
$newtags[] = implode($specialChar, explode(' ', $string));
}
Also you can use just str_replace()
foreach ($specialChars as $specialChar) {
$newtags[] = str_replace(' ', $specialChar, $string);
}
Not knowing exactly what you want to do I expect that you might want to replace any occurrence of a non-word (1 or more times) with a single dash.
e.g.
preg_replace('/\W+/', '-', $string);
If you just want to replace the space, use \s
<?php
$string = "about us";
$replacewith = "_";
$newtag = preg_replace("/\s/", $replacewith, $string);
print_r($newtag);
?>
I am not sure that regexes are the good tool for that. However you can simply define this kind of function:
function rep($str) {
return array( strtr($str, ' ', '_'),
strtr($str, ' ', '-'),
str_replace(' ', '', $str) );
}
$result = rep('about us');
print_r($result);
Matches any character that is not a word character
$string = "about us";
$newtag = preg_replace("/(\W)/g", "_", $string);
print_r($newtag);
in case its just that... you would get problems if it's a longer string :)

Check if any array values are present at the end of a string

I am trying to test if a string made up of multiple words and has any values from an array at the end of it. The following is what I have so far. I am stuck on how to check if the string is longer than the array value being tested and that it is present at the end of the string.
$words = trim(preg_replace('/\s+/',' ', $string));
$words = explode(' ', $words);
$words = count($words);
if ($words > 2) {
// Check if $string ends with any of the following
$test_array = array();
$test_array[0] = 'Wizard';
$test_array[1] = 'Wizard?';
$test_array[2] = '/Wizard';
$test_array[4] = '/Wizard?';
// Stuck here
if ($string is longer than $test_array and $test_array is found at the end of the string) {
Do stuff;
}
}
By end of string do you mean the very last word? You could use preg_match
preg_match('~/?Wizard\??$~', $string, $matches);
echo "<pre>".print_r($matches, true)."</pre>";
I think you want something like this:
if (preg_match('/\/?Wizard\??$/', $string)) { // ...
If it has to be an arbitrary array (and not the one containing the 'wizard' strings you provided in your question), you could construct the regex dynamically:
$words = array('wizard', 'test');
foreach ($words as &$word) {
$word = preg_quote($word, '/');
}
$regex = '/(' . implode('|', $words) . ')$/';
if (preg_match($regex, $string)) { // ends with 'wizard' or 'test'
Is this what you want (no guarantee for correctness, couldn't test)?
foreach( $test_array as $testString ) {
$searchLength = strlen( $testString );
$sourceLength = strlen( $string );
if( $sourceLength <= $searchLength && substr( $string, $sourceLength - $searchLength ) == $testString ) {
// ...
}
}
I wonder if some regular expression wouldn't make more sense here.

Uppercase the first character of each word in a string except 'and', 'to', etc

How can I make upper-case the first character of each word in a string accept a couple of words which I don't want to transform them, like - and, to, etc?
For instance, I want this - ucwords('art and design') to output the string below,
'Art and Design'
is it possible to be like - strip_tags($text, '<p><a>') which we allow and in the string?
or I should use something else? please advise!
thanks.
None of these are really UTF8 friendly, so here's one that works flawlessly (so far)
function titleCase($string, $delimiters = array(" ", "-", ".", "'", "O'", "Mc"), $exceptions = array("and", "to", "of", "das", "dos", "I", "II", "III", "IV", "V", "VI"))
{
/*
* Exceptions in lower case are words you don't want converted
* Exceptions all in upper case are any words you don't want converted to title case
* but should be converted to upper case, e.g.:
* king henry viii or king henry Viii should be King Henry VIII
*/
$string = mb_convert_case($string, MB_CASE_TITLE, "UTF-8");
foreach ($delimiters as $dlnr => $delimiter) {
$words = explode($delimiter, $string);
$newwords = array();
foreach ($words as $wordnr => $word) {
if (in_array(mb_strtoupper($word, "UTF-8"), $exceptions)) {
// check exceptions list for any words that should be in upper case
$word = mb_strtoupper($word, "UTF-8");
} elseif (in_array(mb_strtolower($word, "UTF-8"), $exceptions)) {
// check exceptions list for any words that should be in upper case
$word = mb_strtolower($word, "UTF-8");
} elseif (!in_array($word, $exceptions)) {
// convert to uppercase (non-utf8 only)
$word = ucfirst($word);
}
array_push($newwords, $word);
}
$string = join($delimiter, $newwords);
}//foreach
return $string;
}
Usage:
$s = 'SÃO JOÃO DOS SANTOS';
$v = titleCase($s); // 'São João dos Santos'
since we all love regexps, an alternative, that also works with interpunction (unlike the explode(" ",...) solution)
$newString = preg_replace_callback("/[a-zA-Z]+/",'ucfirst_some',$string);
function ucfirst_some($match)
{
$exclude = array('and','not');
if ( in_array(strtolower($match[0]),$exclude) ) return $match[0];
return ucfirst($match[0]);
}
edit added strtolower(), or "Not" would remain "Not".
How about this ?
$string = str_replace(' And ', ' and ', ucwords($string));
You will have to use ucfirst and loop through every word, checking e.g. an array of exceptions for each one.
Something like the following:
$exclude = array('and', 'not');
$words = explode(' ', $string);
foreach($words as $key => $word) {
if(in_array($word, $exclude)) {
continue;
}
$words[$key] = ucfirst($word);
}
$newString = implode(' ', $words);
I know it is a few years after the question, but I was looking for an answer to the insuring proper English in the titles of a CMS I am programming and wrote a light weight function from the ideas on this page so I thought I would share it:
function makeTitle($title){
$str = ucwords($title);
$exclude = 'a,an,the,for,and,nor,but,or,yet,so,such,as,at,around,by,after,along,for,from,of,on,to,with,without';
$excluded = explode(",",$exclude);
foreach($excluded as $noCap){$str = str_replace(ucwords($noCap),strtolower($noCap),$str);}
return ucfirst($str);
}
The excluded list was found at:
http://www.superheronation.com/2011/08/16/words-that-should-not-be-capitalized-in-titles/
USAGE: makeTitle($title);

Categories