Use php preg_match_all() to capture uuids followed by # character - php

how to use preg_match_all() to get 1a1a-1a1a and 2B2B2-B2in the following string :
$string = 'Hello #1a1a-1a1a and #2B2B2-B2 too';
my aim is to capture every # followed by a uuid.
i tried :
preg_match_all("/#(.*)/", $string, $matches);
preg_match_all("/#.*?/U", $string, $matches);
preg_match_all("/#([^\"]+)/si", $a, $matches);
but can't make it

Use /(?<=#)[\w-]+/ pattern that match any string after #
preg_match_all("/(?<=#)[\w-]+/", $string, $matches);
print_r($matches[0]);
Output
Array
(
[0] => 1a1a-1a1a
[1] => 2B2B2-B2
)
Check result in demo

The #(.*) regex matches a # and the greedily any 0 or more chars other than line break chars (i.e. the rest of the line). /#.*?/U is a synonymous pattern, it is equal to /#.*/, the text after # just is not captured into a group. #([^\"]+) matches # and captures into Group 1 any one or more chars other than " and that will either match up to the first " or end of string if there is no ".
I suggest using
preg_match_all('~#\K[\w-]+~', $s, $matches)
See the regex demo. #\K[\w-]+ will match # and \K will remove it from the match, and [\w-]+ will match 1 or more word or - chars that will be returned.
To make the pattern a bit more restrictive, say, to only match letters or digits after # that can be hyphen separated, you may use
'~#\K[A-Z0-9]+(?:-[A-Z0-9]+)*~i'
See this regex demo. Here, [A-Z0-9]+ matches 1 or more alphanumeric chars and (?:-[A-Z0-9]+)* will match 0 or more repetitions of a - followed with 1+ alphanumeric chars. i modifier will make the pattern case insensitive.

Your regexes ar matching:
#(.*) Matches # and captures in a group any character 0+ times greedy including the space which will match all in your example
#.*? Matches # followed by any character 0+ times non greedy which will only match the #
#([^\"]+) Matches # and captures in a group matching not a " which will match all in your example
To capture every # followed by a uuid, you could use a character class to list what you would allow to match and repeat that pattern preceded by a dash in a non capturing group 1+ times.
If you want to match the uuid only, you could capture the values in a capturing group.
#([a-zA-Z0-9]+(?:-[a-zA-Z0-9]+)+)
Regex demo
$string = 'Hello #1a1a-1a1a and #2B2B2-B2 too';
preg_match_all("/#([a-zA-Z0-9]+(?:-[a-zA-Z0-9]+)+)/", $string, $matches);
print_r($matches[1]);
Result
Array
(
[0] => 1a1a-1a1a
[1] => 2B2B2-B2
)
Demo php

Try this, it will catch everything after '#', no matter how many characters
preg_match_all("#(\w)*/", $string, $matches)

Related

Maximum character length for PHP multiline regular expressions?

I'm trying to evaluate a multiline RegExp with preg_match_all.
Unfortunately there seems to be a character limit around 24,000 characters (24,577 to be specific).
Does anyone know how to get this to work?
Pseudo-code:
<?php
$data = 'TRACE: aaaa(24,577 characters)';
preg_match_all('/([A-Z]+): ((?:(?![A-Z]+:).)*)\n/s', $data, $matches);
var_dump($matches);
?>
Working example (with < 24,577 characters): https://3v4l.org/8iRCc
Example that's NOT working (with > 24,577 characters): https://3v4l.org/ceKn6
You might rewrite the pattern using a negated character class instead of the tempered greedy token approach with the negative lookahead:
([A-Z]+): ([^A-Z\r\n]*(?>(?:\r?\n|[A-Z](?![A-Z]*:))[^A-Z\r\n]*)*)\r?\n
([A-Z]+): Capture group 1, match 1+ uppercase chars : and a space
( Capture group 2
[^A-Z\r\n]* Match 1+ times any char except A-Z or a newline
(?> Atomic group
(?: Non capture group
\r?\n Match a newline
| Or
[A-Z] Match a char other than A-Z
(?![A-Z]*:) Negative lookahead, assert not optional chars A-Z and :
) Close non capture group
[^A-Z\r\n]* Optionally match any char except A-Z
)* Close atomic group and optionally repeat
)\r?\n Close group 2 and match a newline
Regex demo | Php demo
If the TRACE: is at the start of the string, you can also add an anchor:
^([A-Z]+): ([^A-Z\r\n]*(?>(?:\r?\n|[A-Z](?![A-Z]*:))[^A-Z\r\n]*)*)\r?\n
Regex demo
Edit
If the strings start with the same format, you can capture and match all lines that do not start with the opening format.
^([A-Z]+): (.*(?:\r?\n(?![A-Z]+: ).*)*)
The pattern matches:
^ Start of string
([A-Z]+): Capture group 1
( Capture group 2
.* Match the rest of the line
(?:\r?\n(?![A-Z]+: ).*)* Repeat matching all lines that do not start with the pattern [A-Z]+:
) Close group 2
Regex demo
In php you can use
$re = '/^([A-Z]+): (.*(?:\r?\n(?![A-Z]+: ).*)*)/m';
Php demo
Try this
preg_match('/\A(?>[^\r\n]*(?>\r\n?|\n)){0,4}[^\r\n]*\z/',$data)

Get only alphanumeric part of regex string

Let's say I can have strings like these:
^(www.|)mysite1.com$
^(.*)mysite2.com(.*)$
^(www\.|)mysite3\.com$
How do I get only the mysite1, mysite2 or mysite3 part of such strings. I tried set the non-alphanumeric parts to empty string using:
preg_replace("/[^A-Za-z0-9]/", '', $mystring);
But that returns me
mysite1com
mysite2com
mysite3com
Thanks in advance.
What you might do is use preg_match instead of preg_replace and use for example this regex:
\^\([^)]+\)\K[A-Za-z0-9]+
That would match
\^ # Match ^
\( # Match (
[^)]+ # Match not ) one or more times
\) # Match )
\K # Reset the starting point of the reported match
[A-Za-z0-9]+ # Match one or more upper/lowercase character or digit
For example:
preg_match("/\^\([^)]+\)\K[A-Za-z0-9]+/", "^(www.|)mysite1.com$", $matches);
echo $matches[0];
Demo
With preg_replace an approach could be to use 3 capturing groups where the value you want to keep is in the second group.
In the replacement, you would use $2:
(\^\([^)]+\))([A-Za-z0-9]+)(.*)
preg_replace("/(\^\([^)]+\))([A-Za-z0-9]+)(.*)/", '$2', $mystring);
Demo

How can I split text with dots delimiter when in some case there are dot delimiter that can't be separated

Example text:
There is an unique news in itlogic.com. I was read it when Mrs.leafa is cooking.
I want to get output like this:
Array (
[0] There is an unique news in itlogic.com.
[1] I was read it when Mrs.leafa is cooking.
)
If I use explode() with '.' as the first parameter, itlogic.com and Mrs.leafa are separated.
I think preg_split is a good tool for this as there may or may not be a space after the dot, right?
$array = preg_split("/\.(?=\s|$)/m", $Text);
Explanation:
\. Match a period
(?=\s|$) Then assert a whitespace character or end of line afterwards
See here: Click on preg_split, http://www.phpliveregex.com/p/kdz
Update #2
Regex:
(?(DEFINE) # Construct a definition structure
(?<punc>[!?.]+) # Define `punc` group consisting of `.`, `?` and `!`
) # End of definition
\b # Match a word boundary position
(?> # Open a grouping (non-capturing) (a)
[a-z0-9] # Match a digit or a lower case letter
\w* # And any number of word characters
| # Or
[A-Z] # Match an upper case letter
\w{3,} # And word characters more than 3
(?= # Followed by
(?&punc) # Any number of `.`, `?` and `!` characters
) # End of positive lookahead
) # End of grouping (a)
(?&punc) # Match any number of `.`, `?` and `!` characters
\K\B\s* # Reset match, assert a NWB position + any number of whitespaces
Live demo
PHP code:
$str = 'There is an unique news in itlogic.com. I was read it when Mrs. leafa is cooking.';
print_r(preg_split($RE, $str, -1, PREG_SPLIT_NO_EMPTY));
Outputs:
Array
(
[0] => There is an unique news in itlogic.com.
[1] => I was read it when Mrs. leafa is cooking.
)
try this once
$s= explode('. ',$your_sentence);

replace words from string having more than 6 digits regex php

I want to replace all words from a string having more than 6 digits.
Example:
'my contact no is (432)(323)(322). my other number is +1239343. another one is 343as32240'
TO:
'my contact no is [removed]. my other number is [removed]. another one is [removed]'
I am aware of regex and preg_replace. Just need correct regex for this.
You can use this regex for search:
(?<=\h|^)(?:[^\h\d]*\d){6}\S*
and replace by [removed].
Breakup:
(?<=\h|^) # loookbehind to assert previous position is line start or whitespace
(?: # start of non capturing group
[^\h\d]*\d # 0 or more non-space and non-digits followed by 1 digit
) # end of non capturing group
{6} # match 6 of this group
\S* # followed by 0 or more non-space characters
Code:
$result = preg_replace('/(?<=\h|^)(?:[^\h\d]*\d){6}\S*/', '[removed]', $str);
RegEx Demo

Get all pieces from string, when it begins with #

I need get all matches in string, when word begins with # and then contains only alnym 0-9a-z characters. for example from this string #ww#ee x##vx #ss #aa assadd #sfsd I need get these pieces:
#ss
#aa
#sfsd
I am trying:
$str = "#ww#ee x##vx #ss #aa assadd #sfsd";
preg_match_all("#(^|\s)\#([0-9a-z]+)(\s+|$)#ui", $str, $matches);
var_dump( $matches );
But this gives only #ss
#sfsd and skips #aa.
What would be right pattern for this?
You can use the following regex
'~\B(?<!#)#([0-9a-z]+)(?:\s|$)~iu'
See the regex demo and here is an IDEONE demo:
$re = '~\B(?<!#)#([0-9a-z]+)(?:\s|$)~ui';
$str = "#ww#ee x##vx #ss #aa assadd #sfsd";
preg_match_all($re, $str, $matches);
print_r($matches);
The regex explanation:
\B - match the non-word boundary location (that is, everywhere but between ^ and \w, \w and $, \W and \w, \w and \W))
(?<!#) - fail the match if there is a # before the current location
# - a # symbol (does not have to be escaped)
([0-9a-z]+) - Group 1 (since the (...) are not escaped, they capture a subpattern and store it in a special memory slot)
(?:\s|$) - a non-capturing group (only meant to group alternatives) matching a whitespace (\s) or $.
The ~ui modifiers allow proper handling of Unicode strings (u) and make the pattern case insensitive (i).
Note that \B is forcing a non-word character to appear before #. But you do not want to match if another # precedes the #wwww-like string. Thus, we have to use the negative lookbehind (?<!#) that restricts the matches even further.

Categories