Match rest of string with regex - php

I have a string like this
ch:keyword
ch:test
ch:some_text
I need a regular expression which will match all of the strings, however, it must not match the following:
ch: (ch: is proceeded by a space, or any number of spaces)
ch: (ch: is proceeded by nothing)
I am able to deduce the length of the string with the 'ch:' in it.
Any help would be appreciated; I am using PHP's preg_match()
Edit: I have tried this:
preg_match("/^ch:[A-Za-z_0-9]/", $str, $matches)
However, this only matches 1 character after the string. I tried putting a * after the closing square bracket, but this matches spaces, which I don't want.

preg_match('/^ch:(\S+)/', $string, $matches);
print_r($matches);
\S+ is for matching 1 or more non-space characters. This should work for you.

Try this regular expression:
^ch:\S.*$

$str = <<<TEXT
ch:keyword
ch:test
ch:
ch:some_text
ch: red
TEXT;
preg_match_all('|ch\:(\S+)|', $str, $matches);
echo '<pre>'; print_r($matches); echo '</pre>';
Output:
Array
(
[0] => Array
(
[0] => ch:keyword
[1] => ch:test
[2] => ch:some_text
)
[1] => Array
(
[0] => keyword
[1] => test
[2] => some_text
)
)

Try using this:
preg_match('/(?<! +)ch:[^ ].*/', $str);

Related

PHP split string into integer, string and special character

I need to split this format of strings CF12:10 into array like below,
[0] => CF, [1] => 12, [2] => 10
Numbers and String of the provided string can be any length. I have found the php preg_match function but don't know how to make regular expression for my case. Any solution would be highly appreciated.
You could use this regex to match the individual parts:
^(\D+)(\d+):(.*)$
It matches start of string, some number of non-digit characters (\D+), followed by some number of digits (\d+), a colon and some number of characters after the : and before end-of-line. In PHP you can use preg_match to then find all the matching groups:
$input = 'CF12:10';
preg_match('/^(\D+)(\d+):(.*)$/', $input, $matches);
array_shift($matches);
print_r($matches);
Output:
Array
(
[0] => CF
[1] => 12
[2] => 10
)
Demo on 3v4l.org
Try the following code if it helps you
$str = 'C12:10';
$arr = preg_match('~^(.*?)(\d+):(.*)~m', $str, $matches);
array_shift($matches);
echo '<pre>';print_r($matches);

Regular expression to extract a numeric value on a changing position within a variable string

How can I extract the bold numeric part of a string, when the most of the string can change? /data/ is always present and followed by the relevant, variable, numeric part (in this case 123456).
differentcontentLocationhttps://example.com/api/result/13548/data/123456differentstuffincludingwhitespacesandnewlines8484
$str = "differentcontentLocationhttps://example.com/api/result/13548/data/123456differentstuffincludingwhitespacesandnewlines8484";
$str2 = "differentcontentLocationhttps://example.com/api/result/13548/data/123456";
In this example I need 123456. The only constant parts in the string are /data/ and maybe the first part of the URL, like https://.
preg_match("#/data/([0-9]+)([^0-9]+)#siU", $str, $matches);
Results in Array ( [0] => /data/123456d [1] => 123456 [2] => d ), what would be acceptable. But if there's nothing following the relevant numeric part, like in $str2, this expression fails. I've tried to make the tailing part optional with preg_match("#/ads/([0-9]+)(([^0-9]+)?)#siU", $x, $matches);, but it fails, too; returning only the first number of the numeric part.
The U greediness swapping modifier makes all greedy subpattern lazy here, you should remove it together with ([^0-9]+). You also do not need DOTALL modifier because there is no . in your pattern whose behavior could be modified with that s flag.
preg_match("#/data/([0-9]+)#i", $str, $matches);
Now, the pattern will match:
/data/ - a sequence of literal chars
([0-9]+) - Group 1 capturing 1+ digits (same as (\d+))
See the PHP demo.
$str = "differentcontentLocationhttps://e...content-available-to-author-only...e.com/api/result/13548/data/123456differentstuffincludingwhitespacesandnewlines8484";
$str2 = "differentcontentLocationhttps://e...content-available-to-author-only...e.com/api/result/13548/data/123456";
preg_match("#/data/([0-9]+)#i", $str, $matches);
print_r($matches); // Array ( [0] => /data/123456 [1] => 123456 )
preg_match("#/data/([0-9]+)#i", $str2, $matches2);
print_r($matches2); // Array ( [0] => /data/123456 [1] => 123456 )

How to split a string into an array using a given regex expression

I am trying to explode / preg_split a string so that I get an array of all the values that are enclosed in ( ). I've tried the following code but I always get an empty array, I have tried many things but I cant seem to do it right
Could anyone spot what am I missing to get my desired output?
$pattern = "/^\(.*\)$/";
$string = "(y3,x3),(r4,t4)";
$output = preg_split($pattern, $string);
print_r($output);
Current output Array ( [0] => [1] => )
Desired output Array ( [0] => "(y3,x3)," [1] => "(r4,t4)" )
With preg_split() your regex should be matching the delimiters within the string to split the string into an array. Your regex is currently matching the values, and for that, you can use preg_match_all(), like so:
$pattern = "/\(.*?\)/";
$string = "(y3,x3),(r4,t4)";
preg_match_all($pattern, $string, $output);
print_r($output[0]);
This outputs:
Array
(
[0] => (y3,x3)
[1] => (r4,t4)
)
If you want to use preg_split(), you would want to match the , between ),(, but without consuming the parenthesis, like so:
$pattern = "/(?<=\)),(?=\()/";
$string = "(y3,x3),(r4,t4)";
$output = preg_split($pattern, $string);
print_r($output);
This uses a positive lookbehind and positive lookahead to find the , between the two parenthesis groups, and split on them. It also output the same as the above.
You can use a simple regex like \B,\B to split the string and improve the performance by avoiding lookahead or lookbehind regex.
\B is a non-word boundary so it will match only the , between ) and (
Here is a working example:
http://regex101.com/r/cV7bO7/1
$pattern = "/\B,\B/";
$string = "(y3,x3),(r4,t4),(r5,t5)";
$result = preg_split($pattern, $string);
$result will contain:
Array
(
[0] => (y3,x3)
[1] => (r4,t4)
[2] => (r5,t5)
)

PHP preg_match_all $matches output contains 3 rows

Here is my test code:
$test = '#12345 abc #12 #abd engng#geneng';
preg_match_all('/(^|\s)#([^# ]+)/', $test, $matches);
print_r($matches);
And the output $matches:
Array ( [0] => Array ( [0] => #12345 [1] => #12 [2] => #abd ) [1] => Array ( [0] => [1] => [2] => ) [2] => Array ( [0] => 12345 [1] => 12 [2] => abd ) )
My question is why does it have an empty row?
[1] => Array ( [0] => [1] => [2] => )
If I get ride of (^|\s) in the regex, the second row will disappear. However I would not able to prevent matching #geneng.
Any answer will be appreciated.
The problem with your regular expression is that it matches # even when it is preceded by whitespace. Because \s will match the whitespace, it will be captured into $matches array. You can solve this problem by using lookarounds. In this case, it can be solved with a positive lookbehind:
preg_match_all('/(?<=^|\s)#([^# ]+)/', $test, $matches);
This will match the part after # only if it is preceded by a space or beginning-of-the line anchor. It's important to note that lookarounds do not actually consume characters. They just assert that the given regular expression is either followed or preceded by something.
Demo
It's because of the memory capture to test (^|\s):
preg_match_all('/(^|\s)#([^# ]+)/', $test, $matches);
^^^^^^
It's captured as memory location #1, so to avoid that you can simply use non-capturing parentheses:
preg_match_all('/(?:^|\s)#([^# ]+)/', $test, $matches);
^^
preg_match_all uses by default the PREG_PATTERN_ORDER flag. This means that you will obtain:
$matches[0] -> all substrings that matches the whole pattern
$matches[1] -> all capture groups 1
$matches[2] -> all capture groups 2
etc.
You can change this behavior using the PREG_SET_ORDER flag:
$matches[0] -> array with the whole pattern and the capture groups for the first result
$matches[1] -> same for the second result
$matches[2] -> etc.
In your code you (PREG_PATTERN_ORDER by default) you obtain $matches[1] with only empty or blank items because it is the content of capture group 1 (^|\s)
There is 2 set of parentheses that's why you get an empty row. PHP thinks, you want 2 set of matching in the string. Removing one of them will remove one array.
FYI: In this case, you can not use [^|\s] instead of (^|\s). Cause PHP will think, you want to exclude the white space.

Regular expersion repeat inside a pattern

I have the following text and I would like to preg_match_all what is within the {'s and }'s if it contains only a-zA-Z0-9 and :
some text,{SOMETHING21} {SOMETHI32NG:MORE}some msdf{TEXT:GET:2}sdfssdf sdf sdf
I am trying to match {SOMETHING21} {SOMETHI32NG:MORE} {TEXT:GET:2} there can be several :'s within the tag.
What I currently have is:
preg_match_all('/\{([a-zA-Z0-9\-]+)(\:([a-zA-Z0-9\-]+))*\}/', $from, $matches, PREG_SET_ORDER);
It works as expected for {SOMETHING21} and {SOMETHI32NG:MORE} but for {TEXT:GET:2} it only matches TEXT and 2
So it only matches the first and last word within the tag, and leaves the middle ones out of the $matches array. Is this even possible or should I just match them and then explode on : ?
-- edit --
Well the question isn't if I can get the tags, the question is if I can get them grouped without having to explode the results again. Even though my current regex finds all the results the subpattern does not come back with all the matches in $matches.
I hope the following will clear it up abit more:
\{ // the match has to start with {
([a-zA-Z0-9\-]+) // after the { the match needs to have alphanum consisting out of 1 or more characters
(
\: // if we have : it should be followed by alphanum consisting out of 1 or more characters
([a-zA-Z0-9\-]+) // <---- !! this is what it is about !! even though this subexpression is between brackets it is not put into $matches if more then one of these is found
)* // there could be none or more of the previous subexpression
\} // the match has to end with }
You can't get all the matched values of a capturing group, you only get the last one.
So you have to match the pattern:
preg_match_all('/{([a-z\d-]+(?::[a-z\d-]+)*)}/i', $from, $matches);
and then split each element in $matches[1] on :.
I used non-capture groupings to eliminate the inner groups, and just capture the outer complete colon-separated list.
$from = "some text,{SOMETHING21} {SOMETHI32NG:MORE}some msdf{TEXT:GET:2}sdfssdf sdf sdf";
preg_match_all('/\{((?:[a-zA-Z0-9\-]+)(?:\:(?:[a-zA-Z0-9\-]+))*)\}/', $from, $matches, PREG_SET_ORDER);
print_r($matches);
Result:
Array
(
[0] => Array
(
[0] => {SOMETHING21}
[1] => SOMETHING21
)
[1] => Array
(
[0] => {SOMETHI32NG:MORE}
[1] => SOMETHI32NG:MORE
)
[2] => Array
(
[0] => {TEXT:GET:2}
[1] => TEXT:GET:2
)
)
Maybe I didn't understand the requirement, but...
preg_match_all('/{[A-Za-z0-9:-]+}/', $from, $matches, PREG_PATTERN_ORDER);
results in:
Array
(
[0] => Array
(
[0] => {SOMETHING21}
[1] => {SOMETHI32NG:MORE}
[2] => {TEXT:GET:2}
)
)

Categories