How to find and replace single quote in some string? - php

Because assert — Checks if assertion is FALSE in php after 7.2 version changed need using string as the assertion is DEPRECATED as of PHP 7.2.
And I want use replace in path php storm function with regex, changing all point where is it was, how it's look this is regular expression ?
example pioints:
assert('is_array($config)');
assert('is_string($location)');
assert('is_string($entityId)');
I just found and replace first quote, just like this
.*?(\bassert\('\b)[^$]* replace to assert(. But how it be with last quote I don't know.
result must be
assert(is_array($config));
assert(is_string($location));
assert(is_string($entityId));
with first single quote I found solution, need find solution to last single quote
Any ides ?

One option to match a single quote only could be to make use of \G to assert the position at the end of the previous match. Make use of \K to forget what was currently matched and then match the single quote.
In the replacement use an empty string.
(?:^.*?\bassert\(|\G(?!^))[^']*\K'(?=.*\);)
About the pattern
(?: Non capturing group
^.*?\bassert\( Match from the start of the string in a non greedy way until you encounter assert(
| Or
\G(?!^) Assert position at the previous match, not at the start
) Close non capturing group
[^']*\K' Match 0+ times not ', forget what was matched and match '
(?=.*\);) Assert what is on the right is a closing parenthesis and ;
Regex demo
Another way could be to use 3 capturing groups, match the ' that you want to remove and use the groups in the replacement:
^(.*?\bassert\()'([^']+)'(.*)$
Regex demo

We would better off starting with an expression with more boundaries, such as:
.*?(\bassert\b)\s*\(('|")\s*([^'"]*?)\s*('|")\s*\)\s*;
Test
$re = '/.*?(\bassert\b)\s*\((\'|")\s*([^\'"]*?)\s*(\'|")\s*\)\s*;/s';
$str = 'assert (" is_string($entityId)") ;';
$subst = '$1($3);';
echo preg_replace($re, $subst, $str);
The expression is explained on the top right panel of this demo, if you wish to explore further or modify it, and in this link, you can watch how it would match against some sample inputs step by step, if you like.

Related

How to capture all phrases which doesn't have a pattern in the middle of theirself?

I want to capture all strings that doesn't have the pattern _ a[a-z]* _ in the specified position in the example below:
<?php
$myStrings = array(
"123-456",
"123-7-456",
"123-Apple-456",
"123-0-456",
"123-Alphabet-456"
);
foreach($myStrings as $myStr){
echo var_dump(
preg_match("/123-(?!a[a-z]*)-456/i", $myStr)
);
}
?>
You can check the following solution at this Regex101 share link.
^(123-(?:(?![aA][a-zA-Z]*).*)-456)|(123-456)$
It uses regex non-capturing group (?:) and regex negative lookahead (?!) to find all inner sections that do not start with 'a' (or 'A') and any letters after that. Also, the case with no inner section (123-456) is added (with the | sign) as a 2nd alternative for a wrong pattern.
A lookahead is a zero-length assertion. The middle part also needs to be consumed to meet 456. For consuming use e.g. \w+- for one or more word characters and hyphen inside an optional group that starts with your lookahead condition. See this regex101 demo (i flag for caseless matching).
Further for searching an array preg_grep can be used (see php demo at tio.run).
preg_grep('~^123-(?:(?!a[a-z]*-)\w+-)?456$~i', $myStrings);
There is also an invert option: PREG_GREP_INVERT. If you don't need to check for start and end a more simple pattern like -a[a-z]*- without lookahead could be used (another php demo).
Match the pattern and invert the result:
!preg_match('/a[a-z]*/i', $yourStr);
Don't try to do everything with a regex when programming languages exist to do the job.
You are not getting a match because in the pattern 123-(?!a[a-z]*)-456 the lookahead assertion (?!a[a-z]*) is always true because after matching the first - it has to directly match another hyphen like the pattern actually 123--456
If you move the last hyphen inside the lookahead like 123-(?!a[a-z]*-)456 you only get 1 match for 123-456 because you are actually not matching the middle part of the string.
Another option with php can be to consume the part that you don't want, and then use SKIP FAIL
^123-(?:a[a-z]*-(*SKIP)(*F)|\w+-)?456$
Explanation
^ Start of string
123- Match literally
(?: Non capture group for the alternation
a[a-z]*-(*SKIP)(*F) Match a, then optional chars a-z, then match - and skip the match
| Or
\w+- Match 1+ word chars followed by -
)? Close the non capture group and make it optional to also match when there is no middle part
456 Match literally
$ End of string
Regex demo
Example
$myStrings = array(
"123-456",
"123-7-456",
"123-Apple-456",
"123-0-456",
"123-Alphabet-456",
"123-b-456"
);
foreach($myStrings as $myStr) {
if (preg_match("/^123-(?:a[a-z]*-(*SKIP)(*F)|\w+-)?456$/i", $myStr, $match)) {
echo "Match for $match[0]" . PHP_EOL;
} else {
echo "No match for $myStr" . PHP_EOL;
}
}
Output
Match for 123-456
Match for 123-7-456
No match for 123-Apple-456
Match for 123-0-456
No match for 123-Alphabet-456
Match for 123-b-456

Sanitize phone number: regular expression match all except first occurence is on first position

regarding to this post "https://stackoverflow.com/questions/35413960/regular-expression-match-all-except-first-occurence" I'm wondering how to find the first occurence on a string only if it start's with a specfic character in PHP.
I would like to sanitize phonenumbers. Example bad phone number:
+49+12423#23492#aosd#+dasd
Regex to remove all "+" except first occurence.
\G(?:\A[^\+]*\+)?+[^\+]*\K\+
Problem: it should remove every "+" only if it starts with "+" not if the first occurence-position is greater than 1.
The regex to remove everything except numbers is easy:
[^0-9]*
But I don't know how to combine those two within one regex. I would just use preg_replace() twice.
Of course I would be able to use a workaround like if ($str[0] === '+') {...} but I prefer to learn some new stuff (regex :)
Thanks for helping.
You can use
(?:\G(?!\A)|^\+)[^+]*\K\+
See the regex demo. Details:
(?:\G(?!\A)|^\+) - either the end of the preceding successful match or a + at the start of string
[^+]* - zero or more chars other than +
\K - match reset operator discarding the text matched so far
\+ - a + char.
See the PHP demo:
$re = '/(?:\G(?!\A)|^\+)[^+]*\K\+/m';
$str = '+49+12423#23492#aosd#+dasd';
echo preg_replace($re, '', $str);
// => +4912423#23492#aosd#dasd
You seem to want to combine the two queries:
A regex to remove everything except numbers
A regex to remove all "+" except first occurence
Here is my two cents:
(?:^\+|\d)(*SKIP)(*F)|.
Replace what is matched with nothing. Here is an online demo
(?:^\+|\d) - A non-capture group to match a starting literal plus or any digit in the range from 0-9.
(*SKIP)(*F) - Consume the previous matched characters and fail them in the rest of the matching result.
| - Or:
. - Any single character other than newline.
I'd like to think that this is a slight adaptation of what some consider "The best regex trick ever" where one would first try to match what you don't want, then use an alternation to match what you do want. With the use of the backtracking control verbs (*SKIP)(*F) we reverse the logic. We first match what we do want, exclude it from the results and then match what we don't want.

Regex curly braces and quotes get inner text

In the following string {lang('stmt')} I want to get just the stmt where it may also be as follows {lang("stmt")}.
I'm bad with regex, I've tried {lang(.*?)} which gives me ('stmt').
You might match {lang(" or {lang(' and capture the ' or " using a capturing group. This group can by used with a backreference to match the same character.
Use \K to forget what was previously matched.
Then match 0+ characters non greedy .*? and use a positive lookahead using the backreference \1 to assert what follows is ')} or ")}
\{lang\((['"])\K.*?(?=\1\)})
Regex demo
Match either ' or " with a character set, then lazy-repeat any character until the first capture group can be matched again:
lang\((['"])(.*?)\1
https://regex101.com/r/MBKhX3/1
In PHP code:
$str = "{lang('stmt')}";
preg_match('/lang\(([\'"])(.*?)\1/', $str, $matches);
print(json_encode($matches));
Result:
["lang('stmt'","'","stmt"]
(the string you want will be in the second capture group)
Try this one too.
lang\([('")][a-z]*['")]\)
Keep ( and ) outside the (.*) to get value without ( and )
regex:
{lang\('|"['|"]\)}
php: '/{lang\([\'|"](.*?)[\'|"]\)}/'

Match multiple times a group only in single regex

Hi my question is simple:
I want to match all the possible hashtags in an article only if they are in a <figcaption> with PCRE regex. E.g:
<figcaption>blah blah #hashtag1, #hashtag2</figcaption>
I made an attempt here https://regex101.com/r/aL9vS8/1 and removing the last ? would change the capture from #hashtag1 to #hashtag2 but can't get both.
I am not even sure it is doable in one single regex in PHP.
Any idea to help me? :)
If there is no way in one single regex (really? even working with recursion (?R)?? :p), please suggest the most efficient way possible performance wise.
Thank you!
[EDIT]
If there is no way, my PHP next idea is to:
Match every figcaption with preg_replace_callback
In the callback match every instance of #hashtag.
Can I get your opinions on this? Is there a better way? my articles are not very long.
Please suggest the most efficient way possible performance wise
The most reliable way to match some text in between some delimiters with PCRE regex is by using the custom boundaries with \G operator. However, the trailing boundary is a multicharacter string, and to match any text but the </figcaption> you'd need a tempered greedy token. Since this token is very resource consuming, it must be unrolled.
Here is a fast, reliable PCRE regex for your task:
(?:<figcaption|(?!^)\G)[^<#]*(?:(?:<(?!\/figcaption>)|#\B)[^<#]*)*\K#\w+
See the regex demo
Details:
(?:<figcaption|(?!^)\G) - Matches <figcaption or the end of the previous successful match
More details: (?:<figcaption|(?!^)\G) is a non-capturing group ((?:...))that is meant to only group, not keep track of what was matched with this group (i.e. no value is kept in the group stack since the stack is not created) that matches 2 alternatives (| is an alternation operator): 1) literal text <figcaption or 2) (?!^)\G - a location after the previous successful match (note that \G also matches the start of the string, thus, we must add the negative lookahead (?!^) to exclude that behavior).
[^<#]* - 0+ chars other than < and #
(?:(?:<(?!\/figcaption>)|#\B)[^<#]*)* - 0+ sequences of:
(?:<(?!\/figcaption>)|#\B) - a < not followed with /figcaption> or # not followed with a word char
[^<#]* - 0+ chars other than < and #
\K - omit the text matched so far
#\w+ - # and 1+ word chars
Even more details:
\K:
The escape sequence \K causes any previously matched characters not to be included in the final matched sequence. For example, the pattern:
foo\Kbar
matches foobar, but reports that it has matched bar. This feature is similar to a lookbehind assertion.
(?:(?:<(?!\/figcaption>)|#\B)[^<#]*)*: Here, we have an outer non-capturing group (?:...)* to enable matching a sequence of subpatterns zero or more times (we can set a quantifier * only to a grouping if we need to repeat a sequence of subpatterns) and the inner non-capturing group (?:<(?!\/figcaption>)|#\B)[^<#]* is just a way to shrink a longer <(?!\/figcaption>)[^<#]*|#\B[^<#]* (just to group 2 different alternatives <(?!\/figcaption>) and #\B before a common "suffix" [^<#]*.
Wrapping in a tag: just use preg_replace with the <span class="highlight">$0</span> replacement pattern:
Code:
$re = '~(?:<figcaption|(?!^)\G)[^<#]*(?:(?:<(?!\/figcaption>)|#\B)[^<#]*)*\K#\w+~';
$str = "<figcaption>blah # blah #hashtag1, #hashtag2</figcaption> #ee <figcaption>#ddddd";
$subst = "<span class=\"highlight\">$0</span>";
$result = preg_replace($re, $subst, $str);
echo $result;
See the PHP IDEONE demo

Php lookahead assertion at the end of the regex

I want to write a regex with assertions to extract the number 55 from string unknownstring/55.1, here is my regex
$str = 'unknownstring/55.1';
preg_match('/(?<=\/)\d+(?=\.1)$/', $str, $match);
so, basically I am trying to say give me the number that comes after slash, and is followed by a dot and number 1, and after that there are no characters. But it does not match the regex. I just tried to remove the $ sign from the end and it matched. But that condition is essential, as I need that to be the end of the string, because the unknownstring part can contain similar text, e.g. unknow/545.1nstring/55.1. Perhaps I can use preg_match_all, and take the last match, but I want understand why the first regex does not work, where is my mistake.
Thanks
Use anchor $ inside lookahead:
(?<=\/)\d+(?=\.1$)
RegEx Demo
You cannot use $ outside the positive lookahead because your number is NOT at the end of input and there is a \.1 following it.

Categories