Let's say I can have strings like these:
^(www.|)mysite1.com$
^(.*)mysite2.com(.*)$
^(www\.|)mysite3\.com$
How do I get only the mysite1, mysite2 or mysite3 part of such strings. I tried set the non-alphanumeric parts to empty string using:
preg_replace("/[^A-Za-z0-9]/", '', $mystring);
But that returns me
mysite1com
mysite2com
mysite3com
Thanks in advance.
What you might do is use preg_match instead of preg_replace and use for example this regex:
\^\([^)]+\)\K[A-Za-z0-9]+
That would match
\^ # Match ^
\( # Match (
[^)]+ # Match not ) one or more times
\) # Match )
\K # Reset the starting point of the reported match
[A-Za-z0-9]+ # Match one or more upper/lowercase character or digit
For example:
preg_match("/\^\([^)]+\)\K[A-Za-z0-9]+/", "^(www.|)mysite1.com$", $matches);
echo $matches[0];
Demo
With preg_replace an approach could be to use 3 capturing groups where the value you want to keep is in the second group.
In the replacement, you would use $2:
(\^\([^)]+\))([A-Za-z0-9]+)(.*)
preg_replace("/(\^\([^)]+\))([A-Za-z0-9]+)(.*)/", '$2', $mystring);
Demo
Related
I have the following Regex in my PHP code:
// markers for italic set *Text*
if (substr_count($line, '*')>=2)
{
$line = preg_replace('#\*{1}(.*?)\*{1}#', '<i>$1</i>', $line);
}
which works great.
However, when a $line holds a <br>, e.g.
*This is my text<br>* Some other text
Then the regex still considers the text and transforms it to:
<i>This is my text<br></i> Some other text
The goal is to not translate the text if a <br> is encountered. How to do that with a Regex - using a so called "negative lookahead" or how can the existing Regex be changed?
Note: Strings like *This is my text*<br>Some other text<br>And again *italic*<br>END should still be considered and transformed.
Idea: Or should I explode the $line and then iterate over the results with the regex?!
Using match-what-you-don't-want and discard technique, you may use this regex in PHP (PCRE):
\*[^*]*<br>\*(*SKIP)(*F)|\*([^*]*)\*
and replace with <i>$1</i>
RegEx Demo
PHP code:
$r = preg_replace('/\*[^*]*<br>\*(*SKIP)(*F)|\*([^*]*)\*/'),
"<i>$1</i>", $input);
Explanation:
\*: Match a *
[^*]*: Match 0 or more non-* characters
<br>: Match <br>
\*: Match closing *
(*SKIP)(*F): PCRE verbs to discard and skip this match
|: OR
\*([^*]*)\*: Match string enclosed by *s
You can replace matches of the regular expression
\*(?:(?!<br>)[^*])+\*
with
'<i>$0</i>'
where $0 holds the matched string.
Demo
The regular expression can be broken down as follows.
\* # match '*'
(?: # begin a non-capture group
(?!<br>) # negative lookahead asserts that next four chars are not '<br>'
[^*] # match any char other than '*'
)+ # end non-capture group and execute one or more times
\* # match '*'
how to use preg_match_all() to get 1a1a-1a1a and 2B2B2-B2in the following string :
$string = 'Hello #1a1a-1a1a and #2B2B2-B2 too';
my aim is to capture every # followed by a uuid.
i tried :
preg_match_all("/#(.*)/", $string, $matches);
preg_match_all("/#.*?/U", $string, $matches);
preg_match_all("/#([^\"]+)/si", $a, $matches);
but can't make it
Use /(?<=#)[\w-]+/ pattern that match any string after #
preg_match_all("/(?<=#)[\w-]+/", $string, $matches);
print_r($matches[0]);
Output
Array
(
[0] => 1a1a-1a1a
[1] => 2B2B2-B2
)
Check result in demo
The #(.*) regex matches a # and the greedily any 0 or more chars other than line break chars (i.e. the rest of the line). /#.*?/U is a synonymous pattern, it is equal to /#.*/, the text after # just is not captured into a group. #([^\"]+) matches # and captures into Group 1 any one or more chars other than " and that will either match up to the first " or end of string if there is no ".
I suggest using
preg_match_all('~#\K[\w-]+~', $s, $matches)
See the regex demo. #\K[\w-]+ will match # and \K will remove it from the match, and [\w-]+ will match 1 or more word or - chars that will be returned.
To make the pattern a bit more restrictive, say, to only match letters or digits after # that can be hyphen separated, you may use
'~#\K[A-Z0-9]+(?:-[A-Z0-9]+)*~i'
See this regex demo. Here, [A-Z0-9]+ matches 1 or more alphanumeric chars and (?:-[A-Z0-9]+)* will match 0 or more repetitions of a - followed with 1+ alphanumeric chars. i modifier will make the pattern case insensitive.
Your regexes ar matching:
#(.*) Matches # and captures in a group any character 0+ times greedy including the space which will match all in your example
#.*? Matches # followed by any character 0+ times non greedy which will only match the #
#([^\"]+) Matches # and captures in a group matching not a " which will match all in your example
To capture every # followed by a uuid, you could use a character class to list what you would allow to match and repeat that pattern preceded by a dash in a non capturing group 1+ times.
If you want to match the uuid only, you could capture the values in a capturing group.
#([a-zA-Z0-9]+(?:-[a-zA-Z0-9]+)+)
Regex demo
$string = 'Hello #1a1a-1a1a and #2B2B2-B2 too';
preg_match_all("/#([a-zA-Z0-9]+(?:-[a-zA-Z0-9]+)+)/", $string, $matches);
print_r($matches[1]);
Result
Array
(
[0] => 1a1a-1a1a
[1] => 2B2B2-B2
)
Demo php
Try this, it will catch everything after '#', no matter how many characters
preg_match_all("#(\w)*/", $string, $matches)
In the following string {lang('stmt')} I want to get just the stmt where it may also be as follows {lang("stmt")}.
I'm bad with regex, I've tried {lang(.*?)} which gives me ('stmt').
You might match {lang(" or {lang(' and capture the ' or " using a capturing group. This group can by used with a backreference to match the same character.
Use \K to forget what was previously matched.
Then match 0+ characters non greedy .*? and use a positive lookahead using the backreference \1 to assert what follows is ')} or ")}
\{lang\((['"])\K.*?(?=\1\)})
Regex demo
Match either ' or " with a character set, then lazy-repeat any character until the first capture group can be matched again:
lang\((['"])(.*?)\1
https://regex101.com/r/MBKhX3/1
In PHP code:
$str = "{lang('stmt')}";
preg_match('/lang\(([\'"])(.*?)\1/', $str, $matches);
print(json_encode($matches));
Result:
["lang('stmt'","'","stmt"]
(the string you want will be in the second capture group)
Try this one too.
lang\([('")][a-z]*['")]\)
Keep ( and ) outside the (.*) to get value without ( and )
regex:
{lang\('|"['|"]\)}
php: '/{lang\([\'|"](.*?)[\'|"]\)}/'
I need get all matches in string, when word begins with # and then contains only alnym 0-9a-z characters. for example from this string #ww#ee x##vx #ss #aa assadd #sfsd I need get these pieces:
#ss
#aa
#sfsd
I am trying:
$str = "#ww#ee x##vx #ss #aa assadd #sfsd";
preg_match_all("#(^|\s)\#([0-9a-z]+)(\s+|$)#ui", $str, $matches);
var_dump( $matches );
But this gives only #ss
#sfsd and skips #aa.
What would be right pattern for this?
You can use the following regex
'~\B(?<!#)#([0-9a-z]+)(?:\s|$)~iu'
See the regex demo and here is an IDEONE demo:
$re = '~\B(?<!#)#([0-9a-z]+)(?:\s|$)~ui';
$str = "#ww#ee x##vx #ss #aa assadd #sfsd";
preg_match_all($re, $str, $matches);
print_r($matches);
The regex explanation:
\B - match the non-word boundary location (that is, everywhere but between ^ and \w, \w and $, \W and \w, \w and \W))
(?<!#) - fail the match if there is a # before the current location
# - a # symbol (does not have to be escaped)
([0-9a-z]+) - Group 1 (since the (...) are not escaped, they capture a subpattern and store it in a special memory slot)
(?:\s|$) - a non-capturing group (only meant to group alternatives) matching a whitespace (\s) or $.
The ~ui modifiers allow proper handling of Unicode strings (u) and make the pattern case insensitive (i).
Note that \B is forcing a non-word character to appear before #. But you do not want to match if another # precedes the #wwww-like string. Thus, we have to use the negative lookbehind (?<!#) that restricts the matches even further.
I am working on a script that develops certain strings of alphanumeric characters, separated by a dash -. I need to test the string to see if there are any sets of characters (the characters that lie in between the dashes) that are the same. If they are, I need to consolidate them. The repeating chars would always occur at the front in my case.
Examples:
KRS-KRS-454-L
would become:
KRS-454-L
DERP-DERP-545-P
would become:
DERP-545-P
<?php
$s = 'KRS-KRS-454-L';
echo preg_replace('/^(\w+)-(?=\1)/', '', $s);
?>
// KRS-454-L
This uses a positive lookahead (?=...) to check for repeated strings.
Note that \w also contains the underscore. If you want to limit to alphanumeric characters only, use [a-zA-Z0-9].
Also, I've anchored with ^ as you've mentioned: "The repeating chars would always occur at the front [...]"
Try the pattern:
/([a-z]+)(?:-\1)*(.*)/i
and replace it with:
$1$2
A demo:
$tests = array(
'KRS-KRS-454-L',
'DERP-DERP-DERP-545-P',
'OKAY-666-A'
);
foreach ($tests as $t) {
echo preg_replace('/([a-z]+)(?:-\1)*(.*)/i', '$1$2', $t) . "\n";
}
produces:
KRS-454-L
DERP-545-P
OKAY-666-A
A quick explanation:
([a-z]+) # group the first "word" in match group 1
(?:-\1)* # match a hyphen followed by what was matched in
# group 1, and repeat it zero or more times
(.*) # match the rest of the input and store it in group 2
the replacement string $1$2 are replaced by what was matched by group 1 and group 2 in the pattern above.
Use this regex ((?:[A-Z-])+)\1{1} and replaced the matched string by $1.
\1 is used in connection with {1} in the above regex. It will look for repeating instance of characters.
You need back references. Using perl syntax, this would work for you:
$line =~ s/([A-Za-z0-9]+-)\1+/\1/gi;