PHP preg_match_all grab info between matches - php

I can't figure this out. How do I grab the information between the regex matches?
My issue seems to be that there are newlines in the string. If I compress it to one line per "Title", some of my attempts work.
I want an output that looks like this:
Array
(
[0] => Array
(
[0] => Title1#
[1] => - contenta
- contentb
)
[1] => Array
(
[0] => Sometitle2#
[1] => - contenta
- contentb
)
[2] => Array
(
[0] => ABC3#
[1] => - asdfasdfasdf
- random stuff
more
something
)
)
Here are some of my attempts so far (I even tried some preg_split here), with example the string.
<?php
$str = 'Title1#
-contenta
-contentb
Sometitle2#
-contenta
-contentb
ABC3#
- asdfasdfasdf
- random stuff
more
something';
$re = '/[A-Za-z]{1,10}[0-9]?#\s?(.*\s)/m';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
print_r($matches);
$re = '/([A-Za-z]{1,10}[0-9]?#\s?)/m';
$keywords = preg_split($re, $str,null,PREG_SPLIT_DELIM_CAPTURE);
print_r($keywords);
$parts = preg_split('/([A-Za-z]{1,10}[0-9]?#\s?)/m', $str,null,PREG_SPLIT_DELIM_CAPTURE);
print_r($parts);
?>
Thanks!

You may use this regex in preg_match_all:
$re = '~(?ms)^([^#\n]+#)\s+(.*?(?=\n+[^#\n]*#\s|\z))~';
RegEx Demo
RegEx Details:
(?ms): Enable MULTILINE and DOTALL modes
^; Line start
([^#\n]+#)\s+: First capture group. Match a line that ends with #
(.*?(?=\n+[^#\n]*#\s|\z)): Second capture group. Match 0 or more characters that either have line with # ahead or \z.
Cude:
$re = '/(?ms)^([^#\n]+#)\s+(.*?(?=\n+[^#\n]*#\s|\z))/';
$str = 'Title1#
-contenta
-contentb
Sometitle2#
-contenta
-contentb
ABC3#
- asdfasdfasdf
- random stuff
more
something';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
// Print the entire match result
var_dump($matches);

Related

php preg_match_all simple regex returns empty values

I need to extract a predefined set of hashtags from a blob of text, then extract what number follows right after it if any. Eg. I'd need to extract 30 from "Test string with #other30 hashtag". I assumed preg_match_all would be the right choice.
Some test code:
$hashtag = '#other';
$string = 'Test string with #other30 hashtag';
$matches = [];
preg_match_all('/' . $hashtag . '\d*/', $string, $matches);
print_r($matches);
Output:
Array
(
[0] => Array
(
[0] => #other30
)
)
Perfect... Works as expected. Now to extract the number:
$string = $matches[0][0]; // #other30
$matches = [];
preg_match_all('/\d*/', $string, $matches);
print_r($matches);
Output:
Array
(
[0] => Array
(
[0] =>
[1] =>
[2] =>
[3] =>
[4] =>
[5] =>
[6] => 30
[7] =>
)
)
What? Looks like it's trying to match every character?
I'm aware of some preg_match_all related answers (one, two), but they all use a parenthesized subpattern. According to documentation - it is optional.
What am I missing? How do I simply get all matches into an array that match such a basic regex like /\d*/ There doesn't seem to be a more appropriate function in php for that.
I never thought I'd be scratching my head with such a basic thing in PHP. Much appreciated.
You need to replace:
preg_match_all('/\d*/', $string, $matches);
with:
preg_match_all('/\d+/', $string, $matches);
Replace * with +
Because
* Match zero or more times.
+ Match one or more times.
You can use a capturing group:
preg_match_all('/' . $hashtag . '(\d*)/', $string, $matches);
echo $matches[1][0] . "\n";
//=> 30
Here (\d*) will capture the number after $hashtag.
Also see, that you can reset after a certain point to get part of a match by using \K. And of course need to use \d+ instead of \d* to match one or more digits. Else there would be matches in gaps in between the characters where zero or more digits matches.
So your code can be reduced to
$hashtag = '#other';
$string = 'Test string with #other30 #other31 hashtag';
preg_match_all('/' . $hashtag . '\K\d+/', $string, $matches);
print_r($matches[0]);
See the demo at eval.in and consider using preg_quote for $hashtag.
PHP Fiddle
<?php
$hashtag = '#other';
$string = 'Test string with #other30 hashtag';
$matches = [];
preg_match_all('/' . $hashtag . '\d*/', $string, $matches);
$string = preg_match_all('#\d+#', $matches[0][0], $m);
echo $m[0][0];
?>

How to get Variable from Regex in PHP?

How can I get only the Name/Variable which is "regexed"? Like in this case the $1 or $0 in the anchor's href?
When I try to echo the $1 or $0 I get a Syntax Error because it's a Number.
At the Moment the $str is a whole Text.
function convertHashtags($str){
$regex = "/#+([a-zA-Z0-9_]+)/";
$str = preg_replace($regex, '$0', $str);
return($str);
}
Simple use preg_match before preg_replace, eg
preg_match($regex, $str, $matches);
Assuming the pattern actually matched, you should have the results in $matches[0] and $matches[1] which are the equivalent of $0 and $1 in the replace string.
FYI, the $n tokens in the replacement string are not variables though I can see how that can be confusing. They are simply references to matched groups (or the entire match in the case of $0) in the regex.
See http://php.net/manual/function.preg-replace.php#refsect1-function.preg-replace-parameters
To find multiple matches in $str, use preg_match_all(). It's almost the same only it populates $matches with a collection of matches. Use the PREG_SET_ORDER flag as the 4th argument to make the array workable. For example...
$str = ' xD #lol and #testing';
$regex = '/#(\w+)/';
preg_match_all($regex, $str, $allMatches, PREG_SET_ORDER);
print_r($allMatches);
produces...
Array
(
[0] => Array
(
[0] => #lol
[1] => lol
)
[1] => Array
(
[0] => #testing
[1] => testing
)
)

Get array of usernames from #Twitter like string

How do I get an array of usernames from a string tagged like in Twitter with the '#' prefix using regex or similar?
For example:
Input:
hello #person my name is #joebloggs
Output (array):
['person', 'joebloggs']
Another solution
#[^\s]+
Usage:
$string = 'hello #person my name is #joebloggs';
$pattern = '/#[^\s]+/';
preg_match_all($pattern, $string, $matches);
print_r($matches[0]);
Output:
Array
(
[0] => #person
[1] => #joebloggs
)
Do this:
$regex = '~#\K\S+~';
preg_match_all($regex, $yourstring, $matches);
print_r($matches[0]);
See the matches in the Regex Demo.
Explanation
# matches the AT (but it will not be returned)
The \K tells the engine to drop what was matched so far from the final match it returns
\S+ matches any non-space characters
use this :
<?php
$re = "/(?<=#)[^\s]+/";
$str = "asdasd asda 232 #asdasd sd232 soi #other asdnasda asjdajh #asdasd";
preg_match_all($re, $str, $matches);
print_r($matches);
demo here : https://eval.in/173103
output:
Array
(
[0] => Array
(
[0] => asdasd
[1] => other
[2] => asdasd
)
)

How to split a string into an array using a given regex expression

I am trying to explode / preg_split a string so that I get an array of all the values that are enclosed in ( ). I've tried the following code but I always get an empty array, I have tried many things but I cant seem to do it right
Could anyone spot what am I missing to get my desired output?
$pattern = "/^\(.*\)$/";
$string = "(y3,x3),(r4,t4)";
$output = preg_split($pattern, $string);
print_r($output);
Current output Array ( [0] => [1] => )
Desired output Array ( [0] => "(y3,x3)," [1] => "(r4,t4)" )
With preg_split() your regex should be matching the delimiters within the string to split the string into an array. Your regex is currently matching the values, and for that, you can use preg_match_all(), like so:
$pattern = "/\(.*?\)/";
$string = "(y3,x3),(r4,t4)";
preg_match_all($pattern, $string, $output);
print_r($output[0]);
This outputs:
Array
(
[0] => (y3,x3)
[1] => (r4,t4)
)
If you want to use preg_split(), you would want to match the , between ),(, but without consuming the parenthesis, like so:
$pattern = "/(?<=\)),(?=\()/";
$string = "(y3,x3),(r4,t4)";
$output = preg_split($pattern, $string);
print_r($output);
This uses a positive lookbehind and positive lookahead to find the , between the two parenthesis groups, and split on them. It also output the same as the above.
You can use a simple regex like \B,\B to split the string and improve the performance by avoiding lookahead or lookbehind regex.
\B is a non-word boundary so it will match only the , between ) and (
Here is a working example:
http://regex101.com/r/cV7bO7/1
$pattern = "/\B,\B/";
$string = "(y3,x3),(r4,t4),(r5,t5)";
$result = preg_split($pattern, $string);
$result will contain:
Array
(
[0] => (y3,x3)
[1] => (r4,t4)
[2] => (r5,t5)
)

Match rest of string with regex

I have a string like this
ch:keyword
ch:test
ch:some_text
I need a regular expression which will match all of the strings, however, it must not match the following:
ch: (ch: is proceeded by a space, or any number of spaces)
ch: (ch: is proceeded by nothing)
I am able to deduce the length of the string with the 'ch:' in it.
Any help would be appreciated; I am using PHP's preg_match()
Edit: I have tried this:
preg_match("/^ch:[A-Za-z_0-9]/", $str, $matches)
However, this only matches 1 character after the string. I tried putting a * after the closing square bracket, but this matches spaces, which I don't want.
preg_match('/^ch:(\S+)/', $string, $matches);
print_r($matches);
\S+ is for matching 1 or more non-space characters. This should work for you.
Try this regular expression:
^ch:\S.*$
$str = <<<TEXT
ch:keyword
ch:test
ch:
ch:some_text
ch: red
TEXT;
preg_match_all('|ch\:(\S+)|', $str, $matches);
echo '<pre>'; print_r($matches); echo '</pre>';
Output:
Array
(
[0] => Array
(
[0] => ch:keyword
[1] => ch:test
[2] => ch:some_text
)
[1] => Array
(
[0] => keyword
[1] => test
[2] => some_text
)
)
Try using this:
preg_match('/(?<! +)ch:[^ ].*/', $str);

Categories