Need regexp with condition - php

It is necessary to check string. It should contain one substring AND dont contain other substring. This should works only with regexp.
Examples. We accept string with 'fruit' substring, but don't accept string contains substring 'love':
We all love green fruit. -> dont match
We all love to walk. -> dont match
We all green fruit. -> match
Write one regex if it possible.
/(?<!love).+fruit/ dont work

I think this will work
^(?!.*\blove\b)(?=.*\bfruit\b).*
<------------><------------->
Don't match Match this
this word word
Regex Demo
NOTE :- You can remove \b if you assume to match substring..

Surely, you can achieve what you want with strpos, but you specified you only need a regex solution. Note that this is not the best approach for this task unless you need to check for the substrings in a specific context (like within word boundaries, or after or before specific symbols, etc.)
The (?<!love).+fruit regex matches any 1+ characters that are not preceded with love substring up to the fruit substring. It will match I love fruit because the lookbehind asserts true at the beginning of the string, then .+ grabs the whole string, then backtracking does its job to get fruit.
In fact, you only need 1 lookahead to check if there is no love anchored at the start of the string:
^(?!.*love).*fruit
^^^^^^^^^^
See the regex demo
You only check for the substring love with (?!.*love) at the beginning of the string (due to ^), and then, if it is missing, the regex goes on matching any characters (other than a newline if /s modifier is not use) up to the last fruit.
Here is a PHP demo:
$re = "/^(?!.*love).*fruit/";
if (preg_match($re, "We all love green fruit."))
{
echo "Matched!"; // Won't be displayed since there is no match
}

Related

PHP/Laravel trim all but last word in a namespace

Trying to trim a fully qualified namespace so to use just the last word. Example namepspace is App\Models\FruitTypes\Apple where that final word could be any number of fruit types. Shouldn't this...
$fruitName = 'App\Models\FruitTypes\Apple';
trim($fruitName, "App\\Models\\FruitTypes\\");
...do the trick? It is returning an empty string. If I try to trim just App\\Models\\ it returns FruitTypes\Apples as expected. I know the backslash is an escape character, but doubling should treat those as actual backslashes.
If you want to use native functionality for this rather than string manipulation, then ReflectionClass::getShortName will do the job:
$reflection = new ReflectionClass('App\\Models\\FruitTypes\\Apple');
echo $reflection->getShortName();
Apple
See https://3v4l.org/eVl9v
preg_match() with the regex pattern \\([[:alpha:]]*)$ should do the trick.
$trimmed = preg_match('/\\([[:alpha:]]*)$/', $fruitName);
Your result will then live in `$trimmed1'. If you don't mind the pattern being a bit less explicit, you could do:
preg_match('/([[:alpha:]]*)$/', $fruitName, $trimmed);
And your result would then be in $trimmed[0].
If matches is provided, then it is filled with the results of search. $matches[0] will contain the text that matched the full pattern, $matches[1] will have the text that matched the first captured parenthesized subpattern, and so on.
preg_match - php.net
(matches is the third parameter that I named $trimmed, see documentation for full explanation)
An explanation for the regex pattern
\\ matches the character \ literally to establish the start of the match.
The parentheses () create a capturing group to return the match or a substring of the match.
In the capturing group ([[:alpha:]]*):
[:alpha:] matches a alphabetic character [a-zA-Z]
The * quantifier means match between zero and unlimited times, as many times as possible
Then $ asserts position at the end of the string.
So basically, "Find the last \ then return all letter between this and the end of the string".

NOT words in Regex Pattern

I am trying to grab the text after the first hyphen in a pattern
<title>.*?-(.*?)(-|<\/title>)
which then grabs DesiredText from the pattern below:
<title>Stuff - DesiredText - Other Stuff</title>
However in this pattern:
<title>Stuff - Unwanted - DesiredText - Otherstuff</title>
I want it to skip the 'Unwanted' text and match the text after the next hyphen instead (DesiredText). I made a regex101 with both patterns and need to modify my basic regex so that if a word or words I don't want to match are present in that capture group it then matches the second hyphen text instead:
https://regex101.com/r/veSqH3/1
I believe this is what you are looking for. The key is in using the caret (^) character within the square-bracket character list ([]). Using the caret and brackets together indicate a blacklist. It will only match things that are NOT in the list.
https://regex101.com/r/alAZhj/3
Pattern: <title>.*?-\s*([^-\s]*)\s*- End<\/title>
This matches anything in between the middle hyphens that is not a hyphen or space. You can of course modify the pattern to include such characters by using the following pattern.
Pattern: <title>.*?-\s*([^-]*)\s*- End<\/title>
This will match anything in between the middle hyphens that is not a hyphen, so that you can have less restricted text in there.
This will use a negative lookahead to disqualify Note. There may be ways to optimize the pattern, but I cannot do so with confidence because I don't know how variable your inputs strings are.
Pattern: /<title>.*?- (?P<title>(?!Note).*?)(?= -|<])/
Demo
I am using a positive lookahead to ensure the captured match doesn't have any unwanted trailing characters.
If you just want the second last delimited value, you could do something like this to return the value as the fullstring match:
~- \K[^-]*(?= - [^-]*?</title>)~
Or faster with a capture group:
~- ([^-]*) - [^-]*?</title>~
This assumes there are no hyphens in the value.
I took a different approach and focused on returning the capture prior to the last word, rather than any sort of negation. In this way it's highly generic.
This pattern will match what you want in the capture group:
\s-\s([a-zA-Z]+)\s-\s[a-zA-Z]+<\/title>
If you are concerned that this only match between title tags, then you can add:
<title>.*?\s-\s([a-zA-Z]+)\s-\s[a-zA-Z]+<\/title>
Here's a link to the Test
The only limitation to this I see, is that it uses words and whitespace, so if your desired match is "- Some phrase -" then this won't work with it, but that was not indicated in your example. It's a bit unclear because you used "other stuff" and then "otherstuff".

Php regex that matches substring followed by any length of character and then comma

I have a long string containing Copyright: 'any length of unknown string here',
what regex should I write to exactly match this as substring in a string?
I tried this preg_replace('/Copyright:(.*?)/', 'mytext', $str); but its not working, it only matches the Copyright:
A lazily quantified pattern at the end of the pattern will always match no text in case of *? and 1 char only in case of +?, i.e. will match as few chars as possible to return a valid match.
You need to make sure you get to the ', by putting them into the pattern:
'/Copyright:.*?\',/'
^^^
See the regex demo
The ? in your group 1 (.*?) makes this block lazy, i.e. matching as few characters as possible. Removing that would solve it.
Copyright:(.*)',
However, that would match everything in that same line. If you have text in that same line, make sure to limit it further. My screenshot below just just grouping () to make it easier for you to look, you can do without the parentheses.
I usually use Regxr.com to test my regular expression, there's also many other similar tools online, note that this one is great in UX, but does not support lookbehind.

Fall back to begining of string in RegEx

Is it possible to have a RegEx fall back to the beginning of the string and begin matching again?
Here's why I ask. Given the below string, I'd like to capture the sub strings black, red, blue, and green in that order, regardless of the order of occurrence in the subject string and only if all substrings are present in the subject string.
$str ='blue-ka93-red-kdke3-green-weifk-black'
So, for all of the below strings, the RegEx should capture black, red, blue, and green (in that order)
'blue-ka93-red-kdke3-green-weifk-black'
'green-ka93-red-kdke3-blue-weifk-black'
'blue-ka93-black-kdke3-green-weifk-red'
'green-ka93-black-kdke3-blue-weifk-red'
I wonder if there isn't a way to match a capture group then fall back to the start of the string and find the next capture group. I was hoping that something like ^.*(?=(black))^.*(?=(red))^.*(?=(blue))^.*(?=(green)) would work but of course the ^ and lookaheads do not behave this way.
Is it possible to construct such a RegEx?
For context, I'll be using the RegEx in PHP.
You can use
^(?=.*(black))(?=.*(red))(?=.*(blue))(?=.*(green))
Note: This will require all these keywords to be in the string.
See demo
There is no way to reset RegEx index when matching, so, you can only use capturing mechanism inside a positive lookahead anchored at the start. The lookahead will match an empty location at the start of the string (due to ^) and each of tose lookaheads in the RegEx above will be executed one after another if the previous one returned true (found a string of text meeting its pattern).
Your RegEx did not work the same way because you matched, consumed the text with.* (this subpattern was outside the lookaheads) and repeated the start of string anchor that automatically fails a RegEx if you do not use a multiline modifier.
Why not just use capture groups for maintaining the order.
^(?:(black)|(red)|(blue)|(green)|.)+$
This will match any string, all colors are optional.
See demo at regex101 or php demo at eval.in

REGEX - match words that contain letters repeating next to each other

im looking for a regex that matches words that repeat a letter(s) more than once and that are next to each other.
Here's an example:
This is an exxxmaple oooonnnnllllyyyyy!
By far I havent found anything that can exactly match:
exxxmaple and oooonnnnllllyyyyy
I need to find it and place them in an array, like this:
preg_match_all('/\b(???)\b/', $str, $arr) );
Can somebody explain what regexp i have to use?
You can use a very simple regex like
\S*(\w)(?=\1+)\S*
See how the regex matches at http://regex101.com/r/rF3pR7/3
\S matches anything other than a space
* quantifier, zero or more occurance of \S
(\w) matches a single character, captures in \1
(?=\1+) postive look ahead. Asserts that the captrued character is followed by itsef \1
+ quantifiers, one or more occurence of the repeated character
\S* matches anything other than space
EDIT
If the repeating must be more than once, a slight modification of the regex would do the trick
\S*(\w)(?=\1{2,})\S*
for example http://regex101.com/r/rF3pR7/5
Use this if you want discard words like apple etc .
\b\w*(\w)(?=\1\1+)\w*\b
or
\b(?=[^\s]*(\w)\1\1+)\w+\b
Try this.See demo.
http://regex101.com/r/kP8uF5/20
http://regex101.com/r/kP8uF5/21
You can use this pattern:
\b\w*?(\w)\1{2}\w*
The \w class and the word-boundary \b limit the search to words. Note that the word boundary can be removed, however, it reduces the number of steps to obtain a match (as the lazy quantifier). Note too, that if you are looking for words (in the common meaning), you need to remove the word boundary and to use [a-zA-Z] instead of \w.
(\w)\1{2} checks if a repeated character is present. A word character is captured in group 1 and must be followed with the content of the capture group (the backreference \1).

Categories