I'm writing a function that should retrieve all occurrences that I pass.
I'm italian so I think that I could be more clear with an example.
I would check if my phrase contains some fruits.
Ok, so lets see my php code:
$pattern='<apple|orange|pear|lemon|Goji berry>i';
$phrase="I will buy an apple to do an applepie!";
preg_match_all($pattern,$phrase,$match);
the result will be an array with "apple" and "applepie".
How can I search only exact occurency?
Reading the manual I found:
http://php.net/manual/en/regexp.reference.anchors.php
I try to use \A , \Z , ^ and $ but no one seems to work correctly in my case!
Someone can help me?
EDIT: After the #cris85 's answer I try to improve my question ...
My really pattern contains over 200 occorrency and the phrase is over 10000 caracters so the real case is too large to insert here.
After some trials I found an error on the occurrency "microsoft exchange"! There is some special caracters that I must escape?
At the moment I escape "+" "-" "." "?" "$" and "*".
The anchors you tried to use are for the full string, not per word. You can use word boundaries to match individual words. This should allow you to find only complete fruit matches:
$pattern='<\b(?:apple|orange|pear|lemon|Goji berry)\b>i';
The ?: is so you don't make an additional capture group, it is a non-capture group.
Here's the definitation from regex-expressions for what a boundary matches:
Before the first character in the string, if the first character is a word character.
After the last character in the string, if the last character is a word character.
Between two characters in the string, where one is a word character and the other is not a word character.
PHP Demo: https://3v4l.org/h5GCf
Regex Demo: https://regex101.com/r/5aBaMO/1/
Related
I have a regular expression to escape all special characters in a search string. This works great, however I can't seem to get it to work with word boundaries. For example, with the haystack
add +
or
add (+)
and the needle
+
the regular expression /\+/gi matches the "+". However the regular expression /\b\+/gi doesn't. Any ideas on how to make this work?
Using
add (plus)
as the haystack and /\bplus/gi as the regex, it matches fine. I just can't figure out why the escaped characters are having problems.
\b is a zero-width assertion: it doesn't consume any characters, it just asserts that a certain condition holds at a given position. A word boundary asserts that the position is either preceded by a word character and not followed by one, or followed by a word character and not preceded by one. (A "word character" is a letter, a digit, or an underscore.) In your string:
add +
...there's a word boundary at the beginning because the a is not preceded by a word character, and there's one after the second d because it's not followed by a word character. The \b in your regex (/\b\+/) is trying to match between the space and the +, which doesn't work because neither of those is a word character.
Try changing it to:
/\b\s?+/gi
Edit:
Extend this concept as far as you want. If you want the first + after any word boundary:
/\b[^+]*+/gi
Boundaries are very conditional assertions; what they anchor depends on what they touch. See this answer for a detailed explanation, along with what else you can do to deal with it.
What is going on with the \w character type? At the moment it outputs an array called $replace that has all the name except only the first letter of each first name. I don't really understand what its doing to get to this point. \w is any word character but that doesn't help me.
<?php
$rappers = array('Drake Themotto', 'Tom Ford', 'Lil Wayne');
$replace = preg_replace('/(\w)\w* (\w)/', '\1 \2', $rappers);
print_r($replace);
?>
From left to right your regex contains:
A group with one word character
Zero or more word characters
A space
A group with one word character
For "Drake Themotto" this means:
The first group \1 will be "D"
The following word characters "rake" match but will not be stored
The space will not be stored
The second group \2 will be "T"
For the replacement this means that the matching part of your string is "Drake T". This matching string will be replaced by "\1 \2" which is "D T" in this case.
After that, there are some other characters "hemotto". You did not mention them in your regex, but since it does not contain a $ to mark the end of the string (in this case the regex would not match) or another \w* to match (= in this case: to remove) the other characters of the string, this rest simply will be ignored. Because you just "replace" something, "ignored" means that nothing will be replaced here and it will be appended to the result.
I have to search and replace all the words starting with # and # in a sentence. Can you please let me know the best way to do this in PHP. I tried with
preg_replace('/(\#+|\#+).*?(?=\s)/','--', $string);
This will solve only one word in a sentence. I want all the matches to be replace.
I cannot g here like in perl.
preg_replace replaces all matches by default. If it is not doing so, it is an issue with your pattern or the data.
Try this pattern instead:
(?<!\S)[##]+\w+
(?<!\S) - do not match if the pattern is preceded by a non-whitespace character.
[##]+ - match one or more of # and #.
\w+ - match one or more word characters (letter, numbers, underscores). This will preserve punctuation. For example, #foo. would be replaced by --.. If you don't want this, you could use \S+ instead, which matches all characters that are not whitespace.
A word starting with a character implies that it has a space right before this character. Try something like that:
/(?<!\S)[##].*(?=[^a-z])/
Why not use (?=\s)? Because if there is some ponctuation right after the word, it's not part of the word. Note: you can replace [^a-z] by any list of unallowed character in your word.
Be careful though, there are are two particular cases where that doesn't work. You have to use 3 preg_replace in a row, the two others are for words that begin and end the string:
/^[##].*(?=[^a-z])/
/(?<!\S)[##].*$/
Try this :
$string = "#Test let us meet_me#noon see #Prasanth";
$new_pro_name = preg_replace('/(?<!\S)(#\w+|#\w+)/','--', $string);
echo $new_pro_name;
This replaces all the words starting with # OR #
Output: -- let us meet_me#noon see --
If you want to replace word after # OR # even if it at the middle of the word.
$string = "#Test let us meet_me#noon see #Prasanth";
$new_pro_name = preg_replace('/(#\w+|#\w+)/','--', $string);
echo $new_pro_name;
Output: -- let us meet_me-- see --
I have to create regex to match ugly abbreviations and numbers. These can be one of following "formats":
1) [any alphabet char length of 1 char][0-9]
2) [double][whitespace][2-3 length of any alphabet char]
I tried to match double:
preg_match("/^-?(?:\d+|\d*\.\d+)$/", $source, $matches);
But I coldn't get it to select following example: 1.1 AA My test title. What is wrong with my regex and how can I add those others to my regex too?
In your regex you say "start of string, followed by maybe a - followed by at least one digit or followed by 0 or more digits, followed by a dot and followed by at least one digit and followed by the end of string.
So you regex could match for example.. 4.5, -.1 etc. This is exactly what you tell it to do.
You test input string does not match since there are other characters present after the number 1.1 and even if it somehow magically matched your "double" matching regex is wrong.
For a double without scientific notation you usually use this regex :
[-+]?\b[0-9]+(\.[0-9]+)?\b
Now that we have this out of our way we need a whitespace \s and
[2-3 length of alphabet]
Now I have no idea what [2-3 length of alphabet] means but by combining the above you get a regex like this :
[-+]?\b[0-9]+(\.[0-9]+)?\b\s[2-3 length of alphabet]
You can also place anchors ^$ if you want the string to match entirely :
^[-+]?\b[0-9]+(\.[0-9]+)?\b\s[2-3 length of alphabet]$
Feel free to ask if you are stuck! :)
I see multiple issues with your regex:
You try to match the whole string (as a number) by the anchors: ^ at the beginning and $ at the end. If you don't want that, remove those.
The number group is non-catching. It will be checked for matches, but those won't be added to $matches. That's because of the ?: internal options you set in (?:...). Remove ?: to make that group catching.
You place the shorter digit-pattern before the longer one. If you swap the order, the regex engine will look for it first and on success prefer it over the shorter one.
Maybe this already solves your issue:
preg_match("/-?(\d*\.\d+|\d+)/", $source, $matches);
Demo
I'm trying to figure out how to write a regex that can detect if in my string, any character is repeated more than five times consecutively? For example it wouldn't detect "hello", but it would detect "helloooooooooo".
Any ideas?
Edit: Sorry, to clarify, I need it to detect the same character repeated more than five times, not any sequence of five characters. And I also need it to work with any charter, not just "o" like in my example. ".{5,}" is no good because it just detects any sequence of any five characters, not the same character.
This should do it
(\w)\1{5,}
(\w) match any character and put it in the first group
\1{5,} check that the first group match at least 5 times.
Usage :
$input = 'helloooooooooo';
if (preg_match('/(\w)\1{5,}/', $input)) {
# Successful match
} else {
# Match attempt failed
}
Correction, should be (.)\1{5,}, I believe. My mistake. This gets you:
(.) #Any character
\1 #The character captured by (.)
{5,} #At least 5 more repetitions (total of at least 6)
You can also restrict it to letters by using (\w)\1{5,} or ([a-zA-Z])\1{5,}
You can use the regex:
(.)\1{5,}
Explanation:
. : Meta char that matches any
char.
() : Are used for grouping and
remembering the matched single char.
\1 : back reference to the single
char that was remembered in prev
step.
{5,} : Quantifier for 5 or more
and in PHP you can use it as:
$input = 'helloooooooooo';
if(preg_match('/(.)\1{5,}/',$input,$matches)) {
echo "Found repeating char $matches[1] in $input";
}
Output:
Found repeating char o in helloooooooooo
Yep.
(.)\1+
This will match repeated sequences of any character.
The \1 looks at the contents of the first set of brackets. (so if you have more complex regex, you'd need to adjust it to the correct number so it picks up the right set of brackets).
If you need to specify, say more than three of them:
(.)\1{3,}
The \1 syntax is quite powerful -- eg You can also use it elsewhere in your regex to search for the same character appearing in different places in your search string.