Regex to only match two asterisks - php

I'm sorry if the question is unclear
I am trying to make a regular expression that replaces everything with ** at the beginning and end with "Test" (for now at least.)
Currently this is my pattern:
\*{2}[\w\s]+\*{2}
This works so that strings like **Car**, **123**, **This is a test** get replaced with "Test", except also for example ***Bird*** becomes *Test*.
So my question is if there is a way to make sure strings only get replaced with "Test" when there's exactly two ** at beginning and end, no more (so ***Bird*** stays ***Bird*** and doesn't get replaced).

In my opinion, you can have a lazy regex that does match the *-chars-* pattern in a way where it doesn't bother about how many * are there before and after.
Use preg_replace_callback to check with the captured groups and return Test accordingly if only 2 * before and after meet this condition. This way, your code is much more readable and simple.
Snippet:
<?php
$newText = preg_replace_callback(
'/([*]+)[^*]+([*]+)/',
function ($matches) {
return strlen($matches[1]) == 2 && strlen($matches[2]) == 2 ? 'Test' : $matches[0];
},
$text
);
Online Demo
If you wish to keep the text inside ** as is and make it bold, you can capture it in a group and surround it with bold tags.
Snippet:
<?php
$newText = preg_replace_callback(
'/([*]+)([^*]+)([*]+)/',
function ($matches) {
return strlen($matches[1]) == 2 && strlen($matches[3]) == 2 ? '<b>' . $matches[2] . '</b>' : $matches[0];
},
$text
);
Online Demo

You can do it with a handful of zero-length assertions. This is the regex that I suggest: (?<!\*)\*{2}(?!\*).*?(?<!\*)(?<!\*)\*{2}(?!\*) You can play with this here.
Explanation:
(?<!\*) A negative lookbehind: the match must not be preceded with a star character. It can be preceded with any other character, as well as with the line start. For the record, ^ is a well-known zero-length assertion.
\*{2} - matches two stars
(?!\*) - negative lookahead. This means that the next character must not be a star. However, this is a zero-length assertion, so the next character will not be matched.
.*? - everything else - the star is for the non-greedy match. Not necessary, but I find it enhances the regex match. You can also group this if you want to do something with the match later.
(?<!\*) - negative lookbehind - another zero-length assertion. It specifies that the last character must not be a star.
\*{2} - two stars, to close the match
(?!\*) - A negative lookahead: the match must not be followed by a star. It can be any other character, as well as the end of line. Btw, $ is a well-known zero-length assertion.

Related

How to capture all phrases which doesn't have a pattern in the middle of theirself?

I want to capture all strings that doesn't have the pattern _ a[a-z]* _ in the specified position in the example below:
<?php
$myStrings = array(
"123-456",
"123-7-456",
"123-Apple-456",
"123-0-456",
"123-Alphabet-456"
);
foreach($myStrings as $myStr){
echo var_dump(
preg_match("/123-(?!a[a-z]*)-456/i", $myStr)
);
}
?>
You can check the following solution at this Regex101 share link.
^(123-(?:(?![aA][a-zA-Z]*).*)-456)|(123-456)$
It uses regex non-capturing group (?:) and regex negative lookahead (?!) to find all inner sections that do not start with 'a' (or 'A') and any letters after that. Also, the case with no inner section (123-456) is added (with the | sign) as a 2nd alternative for a wrong pattern.
A lookahead is a zero-length assertion. The middle part also needs to be consumed to meet 456. For consuming use e.g. \w+- for one or more word characters and hyphen inside an optional group that starts with your lookahead condition. See this regex101 demo (i flag for caseless matching).
Further for searching an array preg_grep can be used (see php demo at tio.run).
preg_grep('~^123-(?:(?!a[a-z]*-)\w+-)?456$~i', $myStrings);
There is also an invert option: PREG_GREP_INVERT. If you don't need to check for start and end a more simple pattern like -a[a-z]*- without lookahead could be used (another php demo).
Match the pattern and invert the result:
!preg_match('/a[a-z]*/i', $yourStr);
Don't try to do everything with a regex when programming languages exist to do the job.
You are not getting a match because in the pattern 123-(?!a[a-z]*)-456 the lookahead assertion (?!a[a-z]*) is always true because after matching the first - it has to directly match another hyphen like the pattern actually 123--456
If you move the last hyphen inside the lookahead like 123-(?!a[a-z]*-)456 you only get 1 match for 123-456 because you are actually not matching the middle part of the string.
Another option with php can be to consume the part that you don't want, and then use SKIP FAIL
^123-(?:a[a-z]*-(*SKIP)(*F)|\w+-)?456$
Explanation
^ Start of string
123- Match literally
(?: Non capture group for the alternation
a[a-z]*-(*SKIP)(*F) Match a, then optional chars a-z, then match - and skip the match
| Or
\w+- Match 1+ word chars followed by -
)? Close the non capture group and make it optional to also match when there is no middle part
456 Match literally
$ End of string
Regex demo
Example
$myStrings = array(
"123-456",
"123-7-456",
"123-Apple-456",
"123-0-456",
"123-Alphabet-456",
"123-b-456"
);
foreach($myStrings as $myStr) {
if (preg_match("/^123-(?:a[a-z]*-(*SKIP)(*F)|\w+-)?456$/i", $myStr, $match)) {
echo "Match for $match[0]" . PHP_EOL;
} else {
echo "No match for $myStr" . PHP_EOL;
}
}
Output
Match for 123-456
Match for 123-7-456
No match for 123-Apple-456
Match for 123-0-456
No match for 123-Alphabet-456
Match for 123-b-456

Sanitize phone number: regular expression match all except first occurence is on first position

regarding to this post "https://stackoverflow.com/questions/35413960/regular-expression-match-all-except-first-occurence" I'm wondering how to find the first occurence on a string only if it start's with a specfic character in PHP.
I would like to sanitize phonenumbers. Example bad phone number:
+49+12423#23492#aosd#+dasd
Regex to remove all "+" except first occurence.
\G(?:\A[^\+]*\+)?+[^\+]*\K\+
Problem: it should remove every "+" only if it starts with "+" not if the first occurence-position is greater than 1.
The regex to remove everything except numbers is easy:
[^0-9]*
But I don't know how to combine those two within one regex. I would just use preg_replace() twice.
Of course I would be able to use a workaround like if ($str[0] === '+') {...} but I prefer to learn some new stuff (regex :)
Thanks for helping.
You can use
(?:\G(?!\A)|^\+)[^+]*\K\+
See the regex demo. Details:
(?:\G(?!\A)|^\+) - either the end of the preceding successful match or a + at the start of string
[^+]* - zero or more chars other than +
\K - match reset operator discarding the text matched so far
\+ - a + char.
See the PHP demo:
$re = '/(?:\G(?!\A)|^\+)[^+]*\K\+/m';
$str = '+49+12423#23492#aosd#+dasd';
echo preg_replace($re, '', $str);
// => +4912423#23492#aosd#dasd
You seem to want to combine the two queries:
A regex to remove everything except numbers
A regex to remove all "+" except first occurence
Here is my two cents:
(?:^\+|\d)(*SKIP)(*F)|.
Replace what is matched with nothing. Here is an online demo
(?:^\+|\d) - A non-capture group to match a starting literal plus or any digit in the range from 0-9.
(*SKIP)(*F) - Consume the previous matched characters and fail them in the rest of the matching result.
| - Or:
. - Any single character other than newline.
I'd like to think that this is a slight adaptation of what some consider "The best regex trick ever" where one would first try to match what you don't want, then use an alternation to match what you do want. With the use of the backtracking control verbs (*SKIP)(*F) we reverse the logic. We first match what we do want, exclude it from the results and then match what we don't want.

Why regex with lookaheads doesn't match?

I need (in PHP) to split a sententse by the word that cannot be the first or the last one in the sentence. Say the word is "pression" and here is my regex
/^.+?(?=[\s\.\,\:\;])pression(?=[\s\.\,\:\;]).+$/i
Live here: https://regex101.com/r/CHAhKj/1/
First, it doesn't match.
Next, I think - it is at all possible to split that way? I tryed simplified example
print_r(preg_split('/^.+pizza.+$/', 'my pizza is cool'));
live here http://sandbox.onlinephpfunctions.com/code/10b674900fc1ef44ec79bfaf80e83fe1f4248d02
and it prints an array of 2 empty strings, when I expect
['my ', ' is cool']
I need (in PHP) to split a sentence by the word that cannot be the first or the last one in the sentence
You may use this regex:
(?<=[^\s.?]\h)pression(?=\h[^\s.?])
RegEx Demo
RegEx Details:
(?<=[^\s.?]\h): Lookbehind to assert that ahead of current position we have a space and a character that not a whitespace, not a dot and not a ?.
pression: Match word pression
(?=\h[^\s.?]): Lookahead to assert that before current position we have a space and a character that not a whitespace, not a dot and not a ?
First, ^.+?(?=[\s\.\,\:\;])pression(?=[\s\.\,\:\;]).+$ can't match any string at all because the (?=[\s\.\,\:\;])p part requires p to be also either a whitespace char, or a ., ,, : or ;, which invalidates the whole match at once.
Second, ^.+pizza.+$ pattern does not ensure the pizza matched is not the first or last word in a sentence as . matches whitespace, too. It does not return anything meaningful, because preg_split uses the match to break string into chunks, and the two empty values are 1) start of string and 2) empty string positions.
That said, all you need is:
preg_match('~^(.*?\w\W+)pression(\W+\w.*)$~is', $text, $m)
See the regex demo. Details:
^ - start of string
(.*?\w\W+) - Capturing group 1: any zero or more chars, as few as possible, then a word char and then one or more non-word chars
pression - a word
(\W+\w.*) - Capturing group 2: one or more non-word chars, a word char, and then any zero or more chars as many as possible
$ - end of string.
s makes the . match across lines and i flag makes the pattern match in a case insensitive way.
See the PHP demo:
$text = "You can use any regular expression pression inside the lookahead ";
if (preg_match('~^(.*?\w\W+)pression(\W+\w.*)$~is', $text, $m)) {
echo $m[1] . " << | >> " . $m[2];
}
// => You can use any regular expression << | >> inside the lookahead

Regex match overlap/crossover

I need to capitalise acronyms in some text.
I currently have this regex to match on the acronyms:
/(^|[^a-z0-9])(ECU|HVAC|ABS|ESC|EGR|ADAS|HEV|HMI)($|[^a-z0-9])/ig
Explanation: this is aiming to match any of the acronyms where they are either at the start or end of the text, or there isn't a letter or number either side of them (as then they might be part of a word - e.g. I wouldn't want to replace the "Esc" in the word "Escape").
This works most of the time, but doesn't work for the following example:
"abs/esc"
It matches the abs, but not the esc. I'm guessing this is because the matches overlap, in that the forward slash is part of the match relating to abs.
Can anyone suggest how to get a match on both?
As a side note, I'm using PHPs preg_replace_callback to perform the transformation afterwards:
$name = 'abs/esc';
$name = preg_replace_callback('/(^|[^a-z0-9])('ECU|HVAC|ABS|ESC|EGR|ADAS|HEV|HMI')($|[^a-z0-9])/i', function($matches) {
return $matches[1] . strtoupper($matches[2]) . $matches[3];
}, $name);
Yes the reason is because it overlaps (when matching the abs, it also consumes the /. Then for esc, it cannot find [^a-z0-9] because the next letter it is scanning is e).
You could use this RegEx instead:
\b(ECU|HVAC|ABS|ESC|EGR|ADAS|HEV|HMI)\b
\b is a Word Boundary, it does not consume any characters and therefore there will be no overlap
Live Demo on Regex101
You can also change your RegEx to use a Positive Lookahead, since this also does not consume characters:
(^|[^a-z0-9])(ECU|HVAC|ABS|ESC|EGR|ADAS|HEV|HMI)(?=$|[^a-z0-9])
Live Demo on Regex101

how to extract a certain digit from a String using regular expression in php?

I have a String (filename): s_113_2.3gp
How can I extract the number that appears after the second underscore? In this case it's '2' but in some cases that can be a few digits number.
Also the number of digits that appears after the first underscore can vary so the length of this String is not constant.
You can use a capturing group:
preg_match('/_(\d+)\.\w+$/', $str, $matches);
$number = $matches[1];
\d+ represents 1 or more digits. The parentheses around that capture it, so you can later retrieve it with $matches[1]. The . needs to be escaped, because otherwise it would match any character but line breaks. \w+ matches 1 or more word characters (digits, letters, underscores). And finally the $ represents the end of the string and "anchors" the regular expression (otherwise you would get problems with strings containing multiple .).
This also allows for arbitrary file extensions.
As Ωmega pointed out below there is another possibility, that does not use a capturing group. With the concept of lookarounds, you can avoid matching _ at the start and the \.\w+$ at the end:
preg_match('/(?<=_)\d+(?=\.\w+$)/', $str, $matches);
$number = $matches[0];
However, I would recommend profiling, before applying this rather small optimization. But it is something to keep in mind (or rather, to read up on!).
Using regex lookaround it is very short code:
$n = preg_match('/(?<=_)\d+(?=\.)/', $str, $m) ? $m[0] : "";
...which reads: find one or more digits \d+ that are between underscore (?<=_) and period (?=\.)

Categories