Replace whole word only with or without regex - php

I am trying to replace occurances of whole words in a string. There are similar questions here at SO like this and this.
Answers to all these questions recommend using regex like this:
$needle = "a";
$haystack = "oh my dear a";
$haystack = preg_replace("/$needle\b/", 'God', $haystack);
echo $haystack;
This works good for whole words - echoes oh my dear God
But if I replace a with a. in both needle and haystack, i;e
$needle = "a.";
$haystack = "oh my dear a.";
the output becomes oh my deGod a. because . gets evaluated as regex.
I would want a. to be replaced by God with or without regex.

\b refers only to word boundaries in an ASCII perception. Also . is a character of special meaning in regular expression — meaning "match any single character (except newline)"
If the "needle" may contain special characters, use preg_quote() and create DIY boundaries.
preg_quote() takes str and puts a backslash in front of every character that is part of the regular expression syntax. This is useful if you have a run-time string that you need to match in some text and the string may contain special regex characters.
$str = preg_replace("~(?<=^| )" . preg_quote($needle, "~") . "(?= |$)~", 'God', $str);

Maybe this will give you an inspiration...
$haystack="oh my dear a." ;
$needle="a" ;
$hout=$haystack ;
$hout=~ s/\b$needle\b\./God/g ;
print "$haystack $needle $hout\n";
...produces this output...
oh my dear a. a oh my dear God
This works in perl. Sorry, my php is way rusty.

Related

preg_match and preg_replace in php

I have a task to find and replace words starting and ending with "#".
Example - my string will look like:
Put your hands up in the air for #performer1# , Put your hands up in the air for #event#.
What I expect as a output is:
Put your hands up in the air for #performer1# , Put your hands up in the air for #event#.
I have no idea about regular expressions in php, and I'm a beginner, can someone help?
As you already suggested, the preg_replace function should do the trick. What you now need is a regular expression like this
$string = "Put your hands up in the air for #performer#, ...";
$pattern = "/#(\w+)#/";
$replacement = '<strong>$1</strong>';
$new_string = preg_replace($pattern, $replacement, $string);
The magic bit is the $pattern variable where you specify what to look for. If you put parenthesis around something, you can reference the actual contents in the $replacement variable.
The \w+ basically says: match as many characters as possible (and at least one) that are either a-z, A-Z, 0-9 or _.
The PHP PCRE Pattern Syntax can give you some more hints about how to use regular expressions.

Regular Expression - php - getting spaces not preceded and not followed by a word

Having something like this:
'This or is or some or information or stuff or attention here or testing'
I want to capture all the [spaces] that aren't preceded nor followed by the word or.
I reached this, I think I'm on the right track.
/\s(?<!(\bor\b))\s(?!(\bor\b))/
or this
/(?=\s(?<!(\bor\b))(?=\s(?!(\bor\b))))/
I'm not getting all the spaces, though. What is wrong with this? (the second one was a tryout to get the "and" going")
Try this:
<?php
$str = 'This or is or some or information or stuff or attention is not here or testing';
$matches = null;
preg_match_all('/(?<!\bor\b)[\s]+(?!\bor\b)/', $str, $matches);
var_dump($matches);
?>
How about (?<!or)\s(?!or):
$str='This or is or some or information or stuff or attention here or testing';
echo preg_replace('/(?<!or)\s(?!or)/','+',$str);
>>> This or is or some or information or stuff or attention+here or testing
This uses negitive lookbehind and lookahead, this will replace the space in Tor operator for example so if you want to match only or add trailing and preceding spaces:
$str='Tor operator';
echo preg_replace('/\s(?<!or)\s(?!or)\s/','+',$str);
>>> Tor operator
Code: (PHP Demo) (Pattern Demo)
$string = "You may organize to find or seek a neighbor or a pastor in a harbor or orchard.";
echo preg_replace('~(?<!\bor) (?!or\b)~', '_', $string);
Output:
You_may_organize_to_find or seek_a_neighbor or a_pastor_in_a_harbor or orchard.
Effectively the pattern says:
Match every space IF:
the space is not preceded by the full word "or" (a word that ends in "or" doesn't count), and
the space is not followed by the full word "or" (a word that begins with "or" doesn't count)

Regex preg_match change only special character between quote

$value = "abrak'adabra' baba";
$pattern = array();
$replacement = array();
$pattern[] = '/(\'[^\']+\')|(a)/e';
$replacement = "strlen('\\2') ? 'i' : '\\0'";
The code above change abrak'adabra' baba into ibrik'adabra' bibi
What I want to do is to change abrak'adabra' baba into abrak'idibri' baba. How to do that?
Honestly I don't even really understand the regex pattern above.
There are what I know and I don't know about the code:
In $pattern say: (any word which contain has two quotes and no quote between) or (character "a"). In the replacement, php code such a strlen will works because /e modifier will be used. But I can't understand why is it an "or" logic there.
If length of the second part in the pattern (the a character) is more than zero, than replace it with "i", else do something else (I don't understand what \0 means)
I'll appreciate any help. This regex stuff has frustating me :(
Using the e modifier (eval) in patterns is dangerous, as someone could potentially execute malicious code on your server (see the manual's section on that for more).
Instead, if you need to do extra processing on matched items, you can use preg_replace_callback:
// Find all characters between single quotes
$result = preg_replace_callback('/\'(.*?)\'/', function($matches){
// Replace 'a' with 'i' in found matches
return '\''.str_replace('a', 'i', $matches[1]).'\'';
}, $value);
If all you're doing is replacing a with i between the quotes, there may be more optimal ways to go about it, but this way you have room for more advanced processing on the strings found between quotes.

A more efficient string cleaning Regex in PHP

Okay, I was hoping someone could help me with a little regex-fu.
I am trying to clean up a string.
Basically, I am:
Replacing all characters except A-Za-z0-9 with a replacement.
Replacing consecutive duplicates of the replacement with a single instance of the replacement.
Trimming the replacement from the beginning and end of the string.
Example Input:
(&&(%()$()#&#&%&%%(%$+-_The dog jumped over the log*(&)$%&)#)##%&)&^)##)
Required Output:
The+dog+jumped+over+the+log
I am currently using this very discombobulated code and just know there is a much more elegant way to accomplish this....
function clean($string, $replace){
$ok = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
$ok .= $replace;
$pattern = "/[^".preg_quote($ok, "/")."]/";
return trim(preg_replace('/'.preg_quote($replace.$replace).'+/', $replace, preg_replace($pattern, $replace, $string)),$replace);
}
Could a Regex-Fu Master please grace me with a simpler/more efficient solution?
A much better solution suggested and explained by Botond Balázs and hakre:
function clean($string, $replace, $skip=""){
// Escape $skip
$escaped = preg_quote($replace.$skip, "/");
// Regex pattern
// Replace all consecutive occurrences of "Not OK"
// characters with the replacement
$pattern = '/[^A-Za-z0-9'.$escaped.']+/';
// Execute the regex
$result = preg_replace($pattern, $replace, $string);
// Trim and return the result
return trim($result, $replace);
}
I'm not a "regex ninja" but here's how I would do it.
function clean($string, $replace){
/// Remove all "not OK" characters from the beginning and the end:
$result = preg_replace('/^[^A-Za-z0-9]+/', '', $string);
$result = preg_replace('/[^A-Za-z0-9]+$/', '', $result);
// Replace all consecutive occurrences of "not OK"
// characters with the replacement:
$result = preg_replace('/[^A-Za-z0-9]+/', $replace, $result);
return $result;
}
I guess this could be simplified more but when dealing with regexes, clarity and readability is often more important than being clever or writing super-optimal code.
Let's see how it works:
/^[^A-Za-z0-9]+/:
^ matches the beginning of the string.
[^A-Za-z0-9] matches all non-alphanumeric characters
+ means "match one or more of the previous thing"
/[^A-Za-z0-9]+$/:
same thing as above, except $ matches the end of the string
/[^A-Za-z0-9]+/:
same thing as above, except it matches mid-string too
EDIT: OP is right that the first two can be replaced with a call to trim():
function clean($string, $replace){
// Replace all consecutive occurrences of "not OK"
// characters with the replacement:
$result = preg_replace('/[^A-Za-z0-9]+/', $replace, $result);
return trim($result, $replace);
}
I don't want to sound super-clever, but I would not call it regex-foo.
What you do is actually pretty much in the right direction because you use preg_quote, many others are not even aware of that function.
However probably at the wrong place. Wrong place because you quote for characters inside a character class and that has (similar but) different rules for quoting in a regex.
Additionally, regular expressions have been designed with a case like yours in mind. That is probably the part where you look for a wizard, let's see some options how to make your negative character class more compact (I keep the generation out to make this more visible):
[^0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz]
There are constructs like 0-9, A-Z and a-z that can represent exactly that. As you can see - is a special character inside a character class, it is not meant literal but as having some characters from-to:
[^0-9A-Za-z]
So that is already more compact and represents the same. There are also notations like \d and \w which might be handy in your case. But I take the first variant for a moment, because I think it's already pretty visible what it does.
The other part is the repetition. Let's see, there is + which means one or more. So you want to replace one or more of the non-matching characters. You use it by adding it at the end of the part that should match one or more times (and by default it's greedy, so if there are 5 characters, those 5 will be taken, not 4):
[^0-9A-Za-z]+
I hope this is helpful. Another step would be to also just drop the non-matching characters at the beginning and end, but it's early in the morning and I'm not that fluent with that.

Regex to match sentences with at least n words

I'm trying to pull all sentences from a text that consist of, say, at least 5 words in PHP. Assuming sentences end with full stop, question or exclamation mark, I came up with this:
/[\w]{5,*}[\.|\?|\!]/
Any ideas, what's wrong?
Also, what needs to be done for this to work with UTF-8?
\w only matches a single character. A single word would be \w+. If you need at least 5 words, you could do something like:
/(\w+\s){4,}\w+[.?!]/
i.e. at least 4 words followed by spaces, followed by another word followed by a sentence delimiter.
I agree with the solution posted here. If you're using preg functions in PHP you can add 'u' pattern modifier for this to work with UTF-8. /(\w+\s){4,}\w+[.?!]/u for example
The without regex method:
$str = "this is a more than five word sentence. But this is not. Neither this. NO";
$sentences = explode(".", $str);
foreach($sentences as $s)
{
$words = explode(' ', $s);
if(count(array_filter($words, 'is_notempty')) > 5)
echo "Found matching sentence : $s" . "<br/>";
}
function is_notempty($x)
{
return !empty($x);
}
This outputs:
Found matching sentence : this is a more than five word sentence

Categories