I am using preg_replace($oldWords, $newWords, $string); to replace an array of words.
I wish to replace all words starting with foo into hello, and all words starting with bar into world
i.e foo123 should change to hello , foobar should change to hello, barx5 should change to world, etc.
If my arrays are defined as:
$oldWords = array('/foo/', '/bar/');
$newWords = array('hello', 'world');
then foo123 changes to hello123 and not hello. similarly barx5 changes to worldx5 and not world
How do I replace the complete matched word?
Thanks.
This is actually pretty simple if you understand regex, as well as how preg_replace works.
Firstly, your replacement arrays are incorrectly formed. What is:
$oldWords = array('\foo\', '\bar\');
Should instead be:
$oldWords = array('/foo/', '/bar/');
As the backslash in php escapes the character after it, meaning your strings were getting turned into non-strings, and it was messing up the rest of your code.
As to your actual question, however, you can achieve the desired effect with this:
$oldWords = array('/foo\w*/', '/bar\w*/');
\w matches any word character, while * is a quantifier either meaning 0 or any number of matches.
Adding in those two items will cause the regex to match any string with foo and x number of word-characters directly after it, which is what preg_replace then replaces; the match.
one way to do it is to loop through the array checking each word, since we are only checking the first three letters I would use a substr() instead of a regex because regex functions are slower.
foreach( $oldWords as $word ) {
$newWord = substr( $word, 0, 2 );
if( $newWord === 'foo' ) {
$word = 'hello';
}
else if( $newWord === 'bar' ) {
$word = 'world';
}
};
Related
I have an array of words and a string and want to add a hashtag to the words in the string that they have a match inside the array. I use this loop to find and replace the words:
foreach($testArray as $tag){
$str = preg_replace("~\b".$tag."~i","#\$0",$str);
}
Problem: lets say I have the word "is" and "isolate" in my array. I will get ##isolate at the output. this means that the word "isolate" is found once for "is" and once for "isolate". And the pattern ignores the fact that "#isoldated" is not starting with "is" anymore and it starts with "#".
I bring an example BUT this is only an example and I don't want to just solve this one but every other possiblity:
$str = "this is isolated is an example of this and that";
$testArray = array('is','isolated','somethingElse');
Output will be:
this #is ##isolated #is an example of this and that
You may build a regex with an alternation group enclosed with word boundaries on both ends and replace all the matches in one pass:
$str = "this is isolated is an example of this and that";
$testArray = array('is','isolated','somethingElse');
echo preg_replace('~\b(?:' . implode('|', $testArray) . ')\b~i', '#$0', $str);
// => this #is #isolated #is an example of this and that
See the PHP demo.
The regex will look like
~\b(?:is|isolated|somethingElse)\b~
See its online demo.
If you want to make your approach work, you might add a negative lookbehind after \b: "~\b(?<!#)".$tag."~i","#\$0". The lookbehind will fail all matches that are preceded with #. See this PHP demo.
A way to do that is to split your string by words and to build a associative array with your original array of words (to avoid the use of in_array):
$str = "this is isolated is an example of this and that";
$testArray = array('is','isolated','somethingElse');
$hash = array_flip(array_map('strtolower', $testArray));
$parts = preg_split('~\b~', $str);
for ($i=1; $i<count($parts); $i+=2) {
$low = strtolower($parts[$i]);
if (isset($hash[$low])) $parts[$i-1] .= '#';
}
$result = implode('', $parts);
echo $result;
This way, your string is processed only once, whatever the number of words in your array.
So i'm trying to create a regex without success.
This is what i get as in input string:
String A: "##(ABC 50a- {+} UDF 69,22g,-) {*} 3##"
String B: "##ABC 0,10,- DEF {/} 9 ABC {*} UHG 3-##"
And this is what i need processed out of the regex:
Result A: "(50+69,22)*3"
String B: "0,10/9*3"
I just can't get the number replacement combined with the operation symbols.
This is what i got:
'/[^0-9\+\-\*\/\(\)\.]/'
Thankful for every help.
One simple solution consists of getting rid of everything you don't want.
So replace this:
\{(.+?)\}|[^0-9,{}()]+|(?<!\d),|,(?!\d)
With $1.
Simple enough:
$input = "(ABC 50a- {+} UDF 69,22g,-) {*} 3";
$output = preg_replace('#\{(.+?)\}|[^0-9,{}()]+|(?<!\d),|,(?!\d)#', '$1', $input);
\{(.+?)\} part matches everything inside {...} and outputs it (it gets replaced by $1)
[^0-9,{}()]+ gets rid of every character not belonging to the ones we're trying to keep (it's replaced with an empty string)
(?<!\d),|,(?!\d) throws out commas which are not part of a number
Unfortunately, I can't say much else without a better spec.
A good start would be to write down in words the patterns that you want to match. For instance, you've said that you know the operations are inside {}, but that doesn't appear anywhere in your first attempt at a regex.
You can also break it down into separate sections, and then build it up later. So for instance you might say:
if you see parentheses, keep them in the final answer
a number is made up either of digits...
...or digits followed by a comma and more digits
an operation is always in curly braces, and is either +, -, *, or /
everything else should be thrown away
Given the above list:
matching parentheses is easy: [()]
matching a digit can be done with [0-9] or \d; at least one is +; so "digits" is \d+
comma digits is easy: ,\d+; make it optional with ?and you get \d+(,\d+)?
any of four operations is just [+*/-]; escape the / and - to get [+*\/\-] don't forget that { and } have special meanings in regexes, so need to be escaped as \{ and \}; our list of operations in braces becomes: \{[+*\/\-]\}
Now we have to put it together; one way would be to use preg_match_all to find all occurences of any of those patterns, in order, and then we can stick them back together. So our regex is just "this or this or this or this":
/[()]|\d+(,\d+)?|\{[+*\/\-]\}/
I haven't tested this, but given the explanation of how I arrived at it, hopefully you can figure out how to test parts of it and tweak it if necessary.
I`m not good at regex but I found another approach:
Do EXTRA check of input before running eval!!!
$string = "(ABC 50a- {+} UDF 69,22g) {*} 3";
$new ='';
$string = str_split($string);
foreach($string as $char) {
if(!ctype_alnum($char) || ctype_digit($char) ){
//you don't want letters, except symbols like {, ( etc
$new .=$char;
}
}
//echo $new; will output -> ( 50- {+} 69,22) {*} 3
//remove the brackets although you could put it in the if statement ...
$new = str_replace(array('{','}'),array('',''), $new);
//floating point numbers use dot not comma
$new = str_replace(',','.', $new);
$p = eval('return '.$new.';');
print $p; // -57.66
Used: ctype_digit, ctype_alnum, eval, str_split, str_replace
P.S: I assumed that the minus before the base operation is taken into account.
Just a quick try before leaving the office ;-)
$data = array(
"(ABC 50a- {+} UDF 69,22g) {*} 3",
"ABC 0,10- DEF {/} 9 ABC {*} UHG 3-"
);
foreach($data as $d) {
echo $d . " = " . extractFormula($d) . "\n";
}
function extractFormula($string) {
$regex = '/([()])|([0-9]+(,[0-9]+)?)|\{([+\*\/-])\}/';
preg_match_all($regex, $string, $matches);
$formula = implode(' ', $matches[0]);
$formula = str_replace(array('{', '}'),NULL,$formula);
return $formula;
}
Output:
(ABC 50a- {+} UDF 69,22g) {*} 3 = ( 50 + 69,22 ) * 3
ABC 0,10- DEF {/} 9 ABC {*} UHG 3- = 0,10 / 9 * 3
If some one likes to fiddle around with the code, here is a live example: http://sandbox.onlinephpfunctions.com/code/373d76a9c0948314c1d164a555bed847f1a1ed0d
I've been wondering, is it possible to group every 2 words using regex? For 1 word i use this:
((?:\w'|\w|-)+)
This works great. But i need it for 2 (or even more words later on).
But if I use this one:
((?:\w'|\w|-)+) ((?:\w'|\w|-)+) it will make groups of 2 but not really how i want it. And when it encounters a special char it will start over.
Let me give you an example:
If I use it on this text: This is an . example text using & my / Regex expression
It will make groups of
This is
example text
regex expression
and i want groups like this:
This is
is an
an example
example text
text using
using my
my regex
regex expression
It is okay if it resets after a . So that it won't match hello . guys together for example.
Is this even possible to accomplish? I've just started experimenting with RegEx so i don't quite know the possibilities with this.
If this isn't possible could you point me in a direction that I should take with my problem?
Thanks in advance!
Regex is an overkill for this. Simply collect the words, then create the pairs:
$a = array('one', 'two', 'three', 'four');
$pairs = array();
$prev = null;
foreach($a as $word) {
if ($prev !== null) {
$pairs[] = "$prev $word";
}
$prev = $word;
}
Live demo: http://ideone.com/8dqAkz
try this
$samp = "This is an . example text using & my / Regex expression";
//removes anything other than alphabets
$samp = preg_replace('/[^A-Z ]/i', "", $samp);
//removes extra spaces
$samp = str_replace(" "," ",$samp);
//the following code splits the sentence into words
$jk = explode(" ",$samp);
$i = sizeof($jk);
$j = 0;
//this combines words in desired format
$array="";
for($j=0;$j<$i-1;$j++)
{
$array[] = $jk[$j]." ".$jk[$j+1];
}
print_r($array);
Demo
EDIT
for your question
I've changed the regex like this: "/[^A-Z0-9-' ]/i" so it doesn't
mess up words like 'you're' and '9-year-old' for example. But by doing
this when there is a seperate - or ' in my text, it will treat those
as a seperate words. I know why it does this but is it preventable?
change the regex like this
preg_replace('/[^A-Z0-9 ]+[^A-Z0-9\'-]/i', "", $samp)
Demo
First, strip out non-word characters (replace \W with '') Then perform your match. Many problems can be made simpler by breaking them down. Regexes are no exception.
Alternatively, strip out non-word characters, condense whitespace into single spaces, then use explode on space and array_chunk to group your words into pairs.
I have a list of badwords one of them is "S.A."
All my words are saved in an array and I loop through the array and do the replacement.
The word S.A. replaces things which I don't want replaced, i need it only to replace S.A. as a word itself.
So example "This S.A. was bad." should become "This was bad".
But now when I run it on a string with (for example) SEAS in it, it will replace it... no idea why...
ps: the if then check is because if the string $v is exactly the badword, it should not be removed.
Shouldn't the \b word indicator in the regex make the word an exact match?
ps: I only want to remove full words, not a part of a word.
If I have Apple in the badlist but someone wrote Apples it should not be replaced.
foreach ($this->badwordArr as $badword) {
if (strtoupper($v) == strtoupper($badword)) {
// no replace because its the only word
$data[$key] = $v;
} else {
$pattern = "/\b$badword\b/i";
$v = preg_replace($pattern, " ", $v);
}
}
What's wrong with my regex pattern?!?!
The dot is a meta character in a regex that represents any character. You need to escape it:
S\.A\.
To avoid revising your whole list, you can use:
$badword_escaped = preg_quote($badword);
$pattern = "/\b$badword_escaped\b/i";
I have a string that has the following structure:
ABC_ABC_PQR_XYZ
Where PQR has the structure:
ABC+JKL
and
ABC itself is a string that can contain alphanumeric characters and a few other characters like "_", "-", "+", "." and follows no set structure:
eg.qWe_rtY-asdf or pkl123
so, in effect, the string can look like this:
qWe_rtY-asdf_qWe_rtY-asdf_qWe_rtY-asdf+JKL_XYZ
My goal is to find out what string constitutes ABC.
I was initially just using
$arrString = explode("_",$string);
to return $arrString[0] before I was made aware that ABC ($arrString[0]) itself can contain underscores, thus rendering it incorrect.
My next attempt was exlpoding it on "_" anyway and then comparing each of the exploded string parts with the first string part until I get a semblance of a pattern:
function getPatternABC($string)
{
$count = 0;
$pattern ="";
$arrString = explode("_", $string);
foreach($arrString as $expString)
{
if(strcmp($expString,$arrString[0])!==0 || $count==0)
{
$pattern = $pattern ."_". $arrString[$count];
$count++;
}
else break;
}
return substr($pattern,1);
}
This works great - but I wanted to know if there was a more elegant way of doing this using regular expressions?
Here is the regex solution:
'^([a-zA-Z0-9_+-]+)_\1_\1\+'
What this does is match (starting from the beginning of the string) the longest possible sequence consisting of the characters inside the square brackets (edit that per your spec). The sequence must appear exactly twice, each time followed by an underscore, and then must appear once more followed by a plus sign (this is actually the first half of PQR with the delimiter before JKL). The rest of the input is ignored.
You will find ABC captured as capture group 1.
So:
$input = 'qWe_rtY-asdf_qWe_rtY-asdf_qWe_rtY-asdf+JKL_XYZ';
$result = preg_match('/^([a-zA-Z0-9_+-]+)_\1_\1\+/', $input, $matches);
if ($result) {
echo $matches[2];
}
See it in action.
Sure, just make a regular expression that matches your pattern. In this case, something like this:
preg_match('/^([a-zA-Z0-9_+.-]+)_\1_\1\+JKL_XYZ$/', $string, $match);
Your ABC is in $match[1].
If the presence of underscores in these strings has a low frequency, it may be worth checking to see if a simple explode() will do it before bothering with regex.
<?php
$str = 'ABC_ABC_PQR_XYZ';
if(substr_count($str, '_') == 3)
$abc = reset(explode('_', $str));
else
$abc = regexy_function($str);
?>