Can someone explain what this function
preg_replace('/&\w;/', '', $buf)
does? I have looked at various tutorials and found that it replaces the pattern /&\w;/ with string ''. But I can't understand the pattern /&\w;/. What does it represent?
Similarly in
preg_match_all("/(\b[\w+]+\b)/", $buf, $words)
I can't understand what does the string "/(\b[\w+]+\b)/" represents.
Please help. Thanks in advance :)
The explanation of your first expression is simple, it is:
& # Match the character “&” literally
\w # Match a single character that is a “word character” (letters, digits, and underscores)
; # Match the character “;” literally
The second one is:
( # Match the regular expression below and capture its match into backreference number 1
\b # Assert position at a word boundary
[\w+] # Match a single character present in the list below
# A word character (letters, digits, and underscores)
# The character “+”
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
\b # Assert position at a word boundary
)
The preg_replace function makes use of regular expressions. Regular expressions allow you to find patterns in text in a really powerful way.
To be able to use functions like preg_replace or preg_match I recommend you to take a look first at how regular expressions work.
You can gather a lot of info on this site http://www.regular-expressions.info/
And you can use software tools to help you understand the regex (like RegexBuddy)
In regular expressions, \w stands for any "word" character. That is: a-z, A-Z, 0-9 and underscore. \b stands for "word boundary", that is the beginning and end of a word (a series of word characters).
So, /&\w;/ is a regular expression to match the & sign, followed by a series of word characters, followed by a ;. For example, &foobar; would match, and preg_replace will replace it with an empty string.
In that same manner, /(\b[\w+]+\b)/ matches a word boundary, followed by multiple word characters, followed by another word boundary. The words are captured separately using the parenthesis. So, this regular expression will simply return the words in a string as an array.
Related
I have a regular expression to escape all special characters in a search string. This works great, however I can't seem to get it to work with word boundaries. For example, with the haystack
add +
or
add (+)
and the needle
+
the regular expression /\+/gi matches the "+". However the regular expression /\b\+/gi doesn't. Any ideas on how to make this work?
Using
add (plus)
as the haystack and /\bplus/gi as the regex, it matches fine. I just can't figure out why the escaped characters are having problems.
\b is a zero-width assertion: it doesn't consume any characters, it just asserts that a certain condition holds at a given position. A word boundary asserts that the position is either preceded by a word character and not followed by one, or followed by a word character and not preceded by one. (A "word character" is a letter, a digit, or an underscore.) In your string:
add +
...there's a word boundary at the beginning because the a is not preceded by a word character, and there's one after the second d because it's not followed by a word character. The \b in your regex (/\b\+/) is trying to match between the space and the +, which doesn't work because neither of those is a word character.
Try changing it to:
/\b\s?+/gi
Edit:
Extend this concept as far as you want. If you want the first + after any word boundary:
/\b[^+]*+/gi
Boundaries are very conditional assertions; what they anchor depends on what they touch. See this answer for a detailed explanation, along with what else you can do to deal with it.
I require a regex to match the string in the following way:
#1234abc : Should get matched
#abc123 : Should get matched
#123abc123 : Should get matched
#123 : Should not get matched
#123_ : Should not get matched
#123abc_ : Should get matched
This implies that it should only get matched if the string contains numbers or underscore along with alphabets. Only numbers/underscore should not get matched. Any other special characters should not get matched either.
This regex is basically to get hashtags from string. I have already tried the following but it didn't worked well for me.
preg_match_all('/(?:^|\s)#([a-zA-Z0-9_]+$)/', $text, $matches);
Please suggest something.
If you need to match hashtags in the format you specified in a larger string, use
(?<!\S)#\w*[a-zA-Z]\w*
See the regex demo
Details:
(?<!\S) - there must be a start of string or a whitespace before
# - a hash symbol
\w* - 0+ word chars (that is, letters, digits or underscore)
[a-zA-Z] - a letter (you may use \p{L} instead)
\w* - 0+ word chars.
Other alternatives (that may appear faster, but are a bit more complex):
(?<!\S)#(?![0-9_]+\b)\w+
(?<!\S)#(?=\w*[a-zA-Z])\w+
The point here is that the pattern basically matches 1+ word chars preceded with # that is either at the string start or after whitespace, but (?![0-9_]+\b) negative lookahead fails all matches where the part after # is all digits/underscores, and the (?=\w*[a-zA-Z]) positive lookahead requires that there should be at least 1 ASCII letter after 0+ word chars.
You can use this Regex:
((.*?(\d+)[a-zA-Z]+.*)|(.*[a-zA-Z]+(\d+).*)).
Access it here: http://regexr.com/3ef6q
see it working:
Do:
^(?=.*[A-Za-z])[\w_]+$
[\w_]+ matches one or more of letters, digits, _
The zero width positive lookahead pattern, (?=.*[A-Za-z]), makes sure the match contains at least one letter
Demo
im looking for a regex that matches words that repeat a letter(s) more than once and that are next to each other.
Here's an example:
This is an exxxmaple oooonnnnllllyyyyy!
By far I havent found anything that can exactly match:
exxxmaple and oooonnnnllllyyyyy
I need to find it and place them in an array, like this:
preg_match_all('/\b(???)\b/', $str, $arr) );
Can somebody explain what regexp i have to use?
You can use a very simple regex like
\S*(\w)(?=\1+)\S*
See how the regex matches at http://regex101.com/r/rF3pR7/3
\S matches anything other than a space
* quantifier, zero or more occurance of \S
(\w) matches a single character, captures in \1
(?=\1+) postive look ahead. Asserts that the captrued character is followed by itsef \1
+ quantifiers, one or more occurence of the repeated character
\S* matches anything other than space
EDIT
If the repeating must be more than once, a slight modification of the regex would do the trick
\S*(\w)(?=\1{2,})\S*
for example http://regex101.com/r/rF3pR7/5
Use this if you want discard words like apple etc .
\b\w*(\w)(?=\1\1+)\w*\b
or
\b(?=[^\s]*(\w)\1\1+)\w+\b
Try this.See demo.
http://regex101.com/r/kP8uF5/20
http://regex101.com/r/kP8uF5/21
You can use this pattern:
\b\w*?(\w)\1{2}\w*
The \w class and the word-boundary \b limit the search to words. Note that the word boundary can be removed, however, it reduces the number of steps to obtain a match (as the lazy quantifier). Note too, that if you are looking for words (in the common meaning), you need to remove the word boundary and to use [a-zA-Z] instead of \w.
(\w)\1{2} checks if a repeated character is present. A word character is captured in group 1 and must be followed with the content of the capture group (the backreference \1).
I would like to match the whole "word"—one that starts with a number character and that may include special characters but does not end with a '%'.
Match these:
112 (whole numbers)
10-12 (ranges)
11/2 (fractions)
11.2 (decimal numbers)
1,200 (thousand separator)
but not
12% (percentages)
A38 (words starting with a alphabetic character)
I've tried these regular expressions:
(\b\p{N}\S)*)
but that returns '12%' in '12%'
(\b\p{N}(?:(?!%)\S)*)
but that returns '12' in '12%'
Can I make an exception to the \S term that disregards %?
Or will have to do something else?
I'll be using it in PHP, but just write as you would like and I'll convert it to PHP.
This matches your specification:
\b\p{N}\S*+(?<!%)
Explanation:
\b # Start of number
\p{N} # One Digit
\S*+ # Any number of non-space characters, match possessively
(?<!%) # Last character must not be a %
The possessive quantifier \S*+ makes sure that the regex engine will not backtrack into a string of non-space characters it has already matched. Therefore, it will not "give back" a % to match 12 within 12%.
Of course, that will also match 1!abc, so you might want to be more specific than \S which matches anything that's not a whitespace character.
Can i make an exception to the \S term that disregards %
Yes you can:
[^%\s]
See this expression \b\d[^%\s]* here on Regexr
\d+([-/\.,]\d+)?(?!%)
Explanation:
\d+ one or more digits
(
[-/\.,] one "-", "/", "." or ","
\d+ one or more digits
)? the group above zero or one times
(?!%) not followed by a "%" (negative lookahead)
KISS (restrictive):
/[0-9][0-9.,-/]*\s/
try this one
preg_match("/^[0-9].*[^%]$/", $string);
Try this PCRE regex:
/^(\d[^%]+)$/
It should give you what you need.
I would suggest just:
(\b[\p{N},.-]++(?!%))
That's not very exact regarding decimal delimiters or ranges. (As example). But the ++ possessive quantifier will eat up as many decimals as it can. So that you really just need to check the following character with a simple assertion. Did work for your examples.
/(?![a-z]+:)/
Anyone knows?
the / are delimiters.
?! is negative lookahead.
[a-z] is a character class (any character in the a-z range)
+ is one-or-more times of the preceding pattern ([a-z] in this case)
: is just the colon literal
It roughly means "look ahead and make sure there are no alpha characters followed by a colon".
This regex would make more sense if it had a start of string anchor: /^(?![a-z]+:/, so it wouldn't match abc: (like one of the other answers say), but without the (^) I don't know how useful this is.
according to Regex Buddy (a product i highly recommend):
Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?![a-z]+:)»
Match a single character in the range between “a” and “z” «[a-z]+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the character “:” literally «:»
(?!REGEX) is the syntax for negative lookahead. Check the link for an explanation of lookaheads.
The regex fails if the pattern [a-z]+: appear in the string from the current position. If the pattern is not found, regex would succeed, but won't consume any characters.
It would match 123: or abc but not abc:
It would match the : in abc:.