It is necessary to check string. It should contain one substring AND dont contain other substring. This should works only with regexp.
Examples. We accept string with 'fruit' substring, but don't accept string contains substring 'love':
We all love green fruit. -> dont match
We all love to walk. -> dont match
We all green fruit. -> match
Write one regex if it possible.
/(?<!love).+fruit/ dont work
I think this will work
^(?!.*\blove\b)(?=.*\bfruit\b).*
<------------><------------->
Don't match Match this
this word word
Regex Demo
NOTE :- You can remove \b if you assume to match substring..
Surely, you can achieve what you want with strpos, but you specified you only need a regex solution. Note that this is not the best approach for this task unless you need to check for the substrings in a specific context (like within word boundaries, or after or before specific symbols, etc.)
The (?<!love).+fruit regex matches any 1+ characters that are not preceded with love substring up to the fruit substring. It will match I love fruit because the lookbehind asserts true at the beginning of the string, then .+ grabs the whole string, then backtracking does its job to get fruit.
In fact, you only need 1 lookahead to check if there is no love anchored at the start of the string:
^(?!.*love).*fruit
^^^^^^^^^^
See the regex demo
You only check for the substring love with (?!.*love) at the beginning of the string (due to ^), and then, if it is missing, the regex goes on matching any characters (other than a newline if /s modifier is not use) up to the last fruit.
Here is a PHP demo:
$re = "/^(?!.*love).*fruit/";
if (preg_match($re, "We all love green fruit."))
{
echo "Matched!"; // Won't be displayed since there is no match
}
im looking for a regex that matches words that repeat a letter(s) more than once and that are next to each other.
Here's an example:
This is an exxxmaple oooonnnnllllyyyyy!
By far I havent found anything that can exactly match:
exxxmaple and oooonnnnllllyyyyy
I need to find it and place them in an array, like this:
preg_match_all('/\b(???)\b/', $str, $arr) );
Can somebody explain what regexp i have to use?
You can use a very simple regex like
\S*(\w)(?=\1+)\S*
See how the regex matches at http://regex101.com/r/rF3pR7/3
\S matches anything other than a space
* quantifier, zero or more occurance of \S
(\w) matches a single character, captures in \1
(?=\1+) postive look ahead. Asserts that the captrued character is followed by itsef \1
+ quantifiers, one or more occurence of the repeated character
\S* matches anything other than space
EDIT
If the repeating must be more than once, a slight modification of the regex would do the trick
\S*(\w)(?=\1{2,})\S*
for example http://regex101.com/r/rF3pR7/5
Use this if you want discard words like apple etc .
\b\w*(\w)(?=\1\1+)\w*\b
or
\b(?=[^\s]*(\w)\1\1+)\w+\b
Try this.See demo.
http://regex101.com/r/kP8uF5/20
http://regex101.com/r/kP8uF5/21
You can use this pattern:
\b\w*?(\w)\1{2}\w*
The \w class and the word-boundary \b limit the search to words. Note that the word boundary can be removed, however, it reduces the number of steps to obtain a match (as the lazy quantifier). Note too, that if you are looking for words (in the common meaning), you need to remove the word boundary and to use [a-zA-Z] instead of \w.
(\w)\1{2} checks if a repeated character is present. A word character is captured in group 1 and must be followed with the content of the capture group (the backreference \1).
I'm using capturing groups in regular expressions for the first time and I'm wondering what my problem is, as I assume that the regex engine looks through the string left-to-right.
I'm trying to convert an UpperCamelCase string into a hyphened-lowercase-string, so for example:
HelloWorldThisIsATest => hello-world-this-is-a-test
My precondition is an alphabetic string, so I don't need to worry about numbers or other characters. Here is what I tried:
mb_strtolower(preg_replace('/([A-Za-z])([A-Z])/', '$1-$2', "HelloWorldThisIsATest"));
The result:
hello-world-this-is-atest
This is almost what I want, except there should be a hyphen between a and test. I've already included A-Z in my first capturing group so I would assume that the engine sees AT and hyphenates that.
What am I doing wrong?
The Reason your Regex will Not Work: Overlapping Matches
Your regex matches sA in IsATest, allowing you to insert a - between the s and the A
In order to insert a - between the A and the T, the regex would have to match AT.
This is impossible because the A is already matched as part of sA. You cannot have overlapping matches in direct regex.
Is all hope lost? No! This is a perfect situation for lookarounds.
Do it in Two Easy Lines
Here's the easy way to do it with regex:
$regex = '~(?<=[a-zA-Z])(?=[A-Z])~';
echo strtolower(preg_replace($regex,"-","HelloWorldThisIsATest"));
See the output at the bottom of the php demo:
Output: hello-world-this-is-a-test
Will add explanation in a moment. :)
The regex doesn't match any characters. Rather, it targets positions in the string: the positions between the change in letter case. To do so, it uses a lookbehind and a lookahead
The (?<=[a-zA-Z]) lookbehind asserts that what precedes the current position is a letter
The (?=[A-Z]) lookahead asserts that what follows the current position is an upper-case letter.
We just replace these positions with a -, and convert the lot to lowercase.
If you look carefully on this regex101 screen, you can see lines between the words, where the regex matches.
Reference
Lookahead and Lookbehind Zero-Length Assertions
Mastering Lookahead and Lookbehind
I've separated the two regular expressions for simplicity:
preg_replace(array('/([a-z])([A-Z])/', '/([A-Z]+)([A-Z])/'), '$1-$2', $string);
It processes the string twice to find:
lowercase -> uppercase boundaries
multiple uppercase letters followed by another uppercase letter
This will have the following behaviour:
ThisIsHTMLTest -> This-Is-HTML-Test
ThisIsATest -> This-Is-A-Test
Alternatively, use a look-ahead assertion (this will effect the reuse of the last capital letter that was used in the previous match):
preg_replace('/([A-Z]+|[a-z]+)(?=[A-Z])/', '$1-', $string);
To fix the interesting use case Jack mentioned in your comments (avoid splitting of abbreviations), I went with zx81's route of using lookahead and lookbehinds.
(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])
You can split it in two for the explanation:
First part
(?<= look behind to see if there is:
[a-z] any character of: 'a' to 'z'
) end of look-behind
(?= look ahead to see if there is:
[A-Z] any character of: 'A' to 'Z'
) end of look-ahead
(TL;DR: Match between strings of the CamelCase Pattern.)
Second part
(?<= look behind to see if there is:
[A-Z] any character of: 'A' to 'Z'
) end of look-behind
(?= look ahead to see if there is:
[A-Z] any character of: 'A' to 'Z'
[a-z] any character of: 'a' to 'z'
) end of look-ahead
(TL;DR: Special case, match between abbreviation and CamelCase pattern)
So your code would then be:
mb_strtolower(preg_replace('/(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])/', '-', "HelloWorldThisIsATest"));
Demo of matches
Demo of code
I have some text with heading string and set of letters.
I need to get first one-digit number after set of string characters.
Example text:
ABC105001
ABC205001
ABC305001
ABCD105001
ABCD205001
ABCD305001
My RegEx:
^(\D*)(\d{1})(?=\d*$)
Link: http://www.regexr.com/390gv
As you cans see, RegEx works ok, but it captures first groups in results also. I need to get only this integer and when I try to put ?= in first group like this: ^(?=\D*)(\d{1})(?=\d*$) , Regex doesn't work.
Any ideas?
Thanks in advance.
(?=..) is a lookahead that means followed by and checks the string on the right of the current position.
(?<=...) is a lookbehind that means preceded by and checks the string on the left of the current position.
What is interesting with these two features, is the fact that contents matched inside them are not parts of the whole match result. The only problem is that a lookbehind can't match variable length content.
A way to avoid the problem is to use the \K feature that remove all on the left from match result:
^[A-Z]+\K\d(?=\d*$)
You're trying to use a positive lookahead when really you want to use non-capturing groups.
The one match you want will work with this regex:
^(?:\D*\d{1})(\d*)$
The (?: string will start a non-capturing group. This will not come back in matches.
So, if you used preg_match(';^(?:\D*\d{1})(\d*)$;', $string, $matches) to find your match, $matches[1] would be the string for which you're looking. (This is because $matches[0] will always be the full match from preg_match.)
try:
^(?:\D*)(\d{1})(?=\d*$) // (?: is the beginning of a no capture group
I am trying to search for some pattern in PHP with the help of preg_match. Search pattern is like this (but this is wrong):
/[\d\s*-\s*\d\s*(usd|eur)]{1}/i
\d starts with integer,
\s* there can be any number of whitespaces,
- there must be exactly one minus sign
\s* there can be any number of whitespaces,
\d then must be integer
\s* there can be any number of whitespaces,
(usd|eur) any of the following words must be present but one
[\d\s*-\s*\d\s*(usd|eur)]{1} - in string there should be exactly one occurence
the above pattern does not work, what I am doing wrong? For testing:
<?php
$pattern = '/[\d\s*-\s*\d\s*(usd|eur)]{1}/i';
$query = '100-120 100-120';
echo $pattern.'<br/>';
echo $query.'<br/>';
if(preg_match($pattern, $query))
echo 'OK';
else
echo 'not OK!';
?>
Note:
I am trying to pull out data like this:
The price of item is 100 - 120 usd in our market
[...] is a character class. It means "match any one of these characters". [abc] will match a,b, or c. It doesn't match the string "abc".
In addition:
{1} means "match the preceding expression one time". However, matching once is the default. There is no need to explicitly tell it to match one time.
\d matches a single numeric digit. Based on your example, you want \d+ - match a number made up of at least one digit.
Here is what your pattern should look like:
/\d+\s*-\s*\d+\s*(usd|eur)/i
Regular expressions are a powerful tool for examining and modifying text. Regular expressions themselves, with a general pattern notation almost like a mini programming language, allow you to describe and parse text. They enable you to search for patterns within a string, extracting matches flexibly and precisely. However, you should note that because regular expressions are more powerful, they are also slower than the more basic string functions. You should only use regular expressions if you have a particular need.
This tutorial gives a brief overview of basic regular expression syntax and then considers the functions that PHP provides for working with regular expressions.
The Basics
Matching Patterns
Replacing Patterns
Array Processing
PHP supports two different types of regular expressions: POSIX-extended and Perl-Compatible Regular Expressions (PCRE). The PCRE functions are more powerful than the POSIX ones, and faster too, so we will concentrate on them.
The Basics
In a regular expression, most characters match only themselves. For instance, if you search for the regular expression "foo" in the string "John plays football," you get a match because "foo" occurs in that string. Some characters have special meanings in regular expressions. For instance, a dollar sign ($) is used to match strings that end with the given pattern. Similarly, a caret (^) character at the beginning of a regular expression indicates that it must match the beginning of the string. The characters that match themselves are called literals. The characters that have special meanings are called metacharacters.
The dot (.) metacharacter matches any single character except newline (). So, the pattern h.t matches hat, hothit, hut, h7t, etc. The vertical pipe (|) metacharacter is used for alternatives in a regular expression. It behaves much like a logical OR operator and you should use it if you want to construct a pattern that matches more than one set of characters. For instance, the pattern Utah|Idaho|Nevada matches strings that contain "Utah" or "Idaho" or "Nevada". Parentheses give us a way to group sequences. For example, (Nant|b)ucket matches "Nantucket" or "bucket". Using parentheses to group together characters for alternation is called grouping.
If you want to match a literal metacharacter in a pattern, you have to escape it with a backslash.
To specify a set of acceptable characters in your pattern, you can either build a character class yourself or use a predefined one. A character class lets you represent a bunch of characters as a single item in a regular expression. You can build your own character class by enclosing the acceptable characters in square brackets. A character class matches any one of the characters in the class. For example a character class [abc] matches a, b or c. To define a range of characters, just put the first and last characters in, separated by hyphen. For example, to match all alphanumeric characters: [a-zA-Z0-9]. You can also create a negated character class, which matches any character that is not in the class. To create a negated character class, begin the character class with ^: [^0-9].
The metacharacters +, *, ?, and {} affect the number of times a pattern should be matched. + means "Match one or more of the preceding expression", * means "Match zero or more of the preceding expression", and ? means "Match zero or one of the preceding expression". Curly braces {} can be used differently. With a single integer, {n} means "match exactly n occurrences of the preceding expression", with one integer and a comma, {n,} means "match n or more occurrences of the preceding expression", and with two comma-separated integers {n,m} means "match the previous character if it occurs at least n times, but no more than m times".
Now, have a look at the examples:
Regular Expression Will match...
foo The string "foo"
^foo "foo" at the start of a string
foo$ "foo" at the end of a string
^foo$ "foo" when it is alone on a string
[abc] a, b, or c
[a-z] Any lowercase letter
[^A-Z] Any character that is not a uppercase letter
(gif|jpg) Matches either "gif" or "jpeg"
[a-z]+ One or more lowercase letters
[0-9\.\-] Аny number, dot, or minus sign
^[a-zA-Z0-9_]{1,}$ Any word of at least one letter, number or _
([wx])([yz]) wy, wz, xy, or xz
[^A-Za-z0-9] Any symbol (not a number or a letter)
([A-Z]{3}|[0-9]{4}) Matches three letters or four numbers
Perl-Compatible Regular Expressions emulate the Perl syntax for patterns, which means that each pattern must be enclosed in a pair of delimiters. Usually, the slash (/) character is used. For instance, /pattern/.
The PCRE functions can be divided in several classes: matching, replacing, splitting and filtering.
Matching Patterns
The preg_match() function performs Perl-style pattern matching on a string. preg_match() takes two basic and three optional parameters. These parameters are, in order, a regular expression string, a source string, an array variable which stores matches, a flag argument and an offset parameter that can be used to specify the alternate place from which to start the search:
preg_match ( pattern, subject [, matches [, flags [, offset]]])
The preg_match() function returns 1 if a match is found and 0 otherwise. Let's search the string "Hello World!" for the letters "ll":
<?php
if (preg_match("/ell/", "Hello World!", $matches)) {
echo "Match was found <br />";
echo $matches[0];
}
?>
The letters "ll" exist in "Hello", so preg_match() returns 1 and the first element of the $matches variable is filled with the string that matched the pattern. The regular expression in the next example is looking for the letters "ell", but looking for them with following characters:
<?php
if (preg_match("/ll.*/", "The History of Halloween", $matches)) {
echo "Match was found <br />";
echo $matches[0];
}
?>
Now let's consider more complicated example. The most popular use of regular expressions is validation. The example below checks if the password is "strong", i.e. the password must be at least 8 characters and must contain at least one lower case letter, one upper case letter and one digit:
<?php
$password = "Fyfjk34sdfjfsjq7";
if (preg_match("/^.*(?=.{8,})(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).*$/", $password)) {
echo "Your passwords is strong.";
} else {
echo "Your password is weak.";
}
?>
The ^ and $ are looking for something at the start and the end of the string. The ".*" combination is used at both the start and the end. As mentioned above, the .(dot) metacharacter means any alphanumeric character, and * metacharacter means "zero or more". Between are groupings in parentheses. The "?=" combination means "the next text must be like this". This construct doesn't capture the text. In this example, instead of specifying the order that things should appear, it's saying that it must appear but we're not worried about the order.
The first grouping is (?=.{8,}). This checks if there are at least 8 characters in the string. The next grouping (?=.[0-9]) means "any alphanumeric character can happen zero or more times, then any digit can happen". So this checks if there is at least one number in the string. But since the string isn't captured, that one digit can appear anywhere in the string. The next groupings (?=.[a-z]) and (?=.[A-Z]) are looking for the lower case and upper case letter accordingly anywhere in the string.
Finally, we will consider regular expression that validates an email address:
<?php
$email = firstname.lastname#aaa.bbb.com;
$regexp = "/^[^0-9][A-z0-9_]+([.][A-z0-9_]+)*[#][A-z0-9_]+([.][A-z0-9_]+)*[.][A-z]{2,4}$/";
if (preg_match($regexp, $email)) {
echo "Email address is valid.";
} else {
echo "Email address is <u>not</u> valid.";
}
?>
This regular expression checks for the number at the beginning and also checks for multiple periods in the user name and domain name in the email address. Let's try to investigate this regular expression yourself.
For the speed reasons, the preg_match() function matches only the first pattern it finds in a string. This means it is very quick to check whether a pattern exists in a string. An alternative function, preg_match_all(), matches a pattern against a string as many times as the pattern allows, and returns the number of times it matched.
Replacing Patterns
In the above examples, we have searched for patterns in a string, leaving the search string untouched. The preg_replace() function looks for substrings that match a pattern and then replaces them with new text. preg_replace() takes three basic parameters and an additional one. These parameters are, in order, a regular expression, the text with which to replace a found pattern, the string to modify, and the last optional argument which specifies how many matches will be replaced.
preg_replace( pattern, replacement, subject [, limit ])
The function returns the changed string if a match was found or an unchanged copy of the original string otherwise. In the following example we search for the copyright phrase and replace the year with the current.
<?php
echo preg_replace("/([Cc]opyright) 200(3|4|5|6)/", "$1 2007", "Copyright 2005");
?>
In the above example we use back references in the replacement string. Back references make it possible for you to use part of a matched pattern in the replacement string. To use this feature, you should use parentheses to wrap any elements of your regular expression that you might want to use. You can refer to the text matched by subpattern with a dollar sign ($) and the number of the subpattern. For instance, if you are using subpatterns, $0 is set to the whole match, then $1, $2, and so on are set to the individual matches for each subpattern.
In the following example we will change the date format from "yyyy-mm-dd" to "mm/dd/yyy":
<?php
echo preg_replace("/(\d+)-(\d+)-(\d+)/", "$2/$3/$1", "2007-01-25");
?>
We also can pass an array of strings as subject to make the substitution on all of them. To perform multiple substitutions on the same string or array of strings with one call to preg_replace(), we should pass arrays of patterns and replacements. Have a look at the example:
<?php
$search = array ( "/(\w{6}\s\(w{2})\s(\w+)/e",
"/(\d{4})-(\d{2})-(\d{2})\s(\d{2}:\d{2}:\d{2})/");
$replace = array ('"$1 ".strtoupper("$2")',
"$3/$2/$1 $4");
$string = "Posted by John | 2007-02-15 02:43:41";
echo preg_replace($search, $replace, $string);?>
In the above example we use the other interesting functionality - you can say to PHP that the match text should be executed as PHP code once the replacement has taken place. Since we have appended an "e" to the end of the regular expression, PHP will execute the replacement it makes. That is, it will take strtoupper(name) and replace it with the result of the strtoupper() function, which is NAME.
Array Processing
PHP's preg_split() function enables you to break a string apart basing on something more complicated than a literal sequence of characters. When it's necessary to split a string with a dynamic expression rather than a fixed one, this function comes to the rescue. The basic idea is the same as preg_match_all() except that, instead of returning matched pieces of the subject string, it returns an array of pieces that didn't match the specified pattern. The following example uses a regular expression to split the string by any number of commas or space characters:
<?php
$keywords = preg_split("/[\s,]+/", "php, regular expressions");
print_r( $keywords );
?>
Another useful PHP function is the preg_grep() function which returns those elements of an array that match a given pattern. This function traverses the input array, testing all elements against the supplied pattern. If a match is found, the matching element is returned as part of the array containing all matches. The following example searches through an array and all the names starting with letters A-J:
<?php
$names = array('Andrew','John','Peter','Nastin','Bill');
$output = preg_grep('/^[a-m]/i', $names);
print_r( $output );
?>