PHP: preg_match `eats` my string - php

Parsing code
$str = 'My name is Michael. I am a sportsman!';
preg_match('|My name is (.*?)\. I am a (.*?)|', $str, $m);
print_r($m);
returns me string:
Array ( [0] => My name is Michael. I am a [1] => Michael [2] => )
Where is sportsman?

That's because the expression is not anchored, or rather, the second (.*?) doesn't have a look-ahead set and therefore matches nothing); you should add the end-of-string anchor like this:
preg_match('|My name is (.*?)\. I am a (.*?)$|', $str, $m);
^
You could also make the second expression greedy:
preg_match('|My name is (.*?)\. I am a (.*)|', $str, $m);
^

Related

PHP preg_match_all grab info between matches

I can't figure this out. How do I grab the information between the regex matches?
My issue seems to be that there are newlines in the string. If I compress it to one line per "Title", some of my attempts work.
I want an output that looks like this:
Array
(
[0] => Array
(
[0] => Title1#
[1] => - contenta
- contentb
)
[1] => Array
(
[0] => Sometitle2#
[1] => - contenta
- contentb
)
[2] => Array
(
[0] => ABC3#
[1] => - asdfasdfasdf
- random stuff
more
something
)
)
Here are some of my attempts so far (I even tried some preg_split here), with example the string.
<?php
$str = 'Title1#
-contenta
-contentb
Sometitle2#
-contenta
-contentb
ABC3#
- asdfasdfasdf
- random stuff
more
something';
$re = '/[A-Za-z]{1,10}[0-9]?#\s?(.*\s)/m';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
print_r($matches);
$re = '/([A-Za-z]{1,10}[0-9]?#\s?)/m';
$keywords = preg_split($re, $str,null,PREG_SPLIT_DELIM_CAPTURE);
print_r($keywords);
$parts = preg_split('/([A-Za-z]{1,10}[0-9]?#\s?)/m', $str,null,PREG_SPLIT_DELIM_CAPTURE);
print_r($parts);
?>
Thanks!
You may use this regex in preg_match_all:
$re = '~(?ms)^([^#\n]+#)\s+(.*?(?=\n+[^#\n]*#\s|\z))~';
RegEx Demo
RegEx Details:
(?ms): Enable MULTILINE and DOTALL modes
^; Line start
([^#\n]+#)\s+: First capture group. Match a line that ends with #
(.*?(?=\n+[^#\n]*#\s|\z)): Second capture group. Match 0 or more characters that either have line with # ahead or \z.
Cude:
$re = '/(?ms)^([^#\n]+#)\s+(.*?(?=\n+[^#\n]*#\s|\z))/';
$str = 'Title1#
-contenta
-contentb
Sometitle2#
-contenta
-contentb
ABC3#
- asdfasdfasdf
- random stuff
more
something';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
// Print the entire match result
var_dump($matches);

Regular expression to extract a numeric value on a changing position within a variable string

How can I extract the bold numeric part of a string, when the most of the string can change? /data/ is always present and followed by the relevant, variable, numeric part (in this case 123456).
differentcontentLocationhttps://example.com/api/result/13548/data/123456differentstuffincludingwhitespacesandnewlines8484
$str = "differentcontentLocationhttps://example.com/api/result/13548/data/123456differentstuffincludingwhitespacesandnewlines8484";
$str2 = "differentcontentLocationhttps://example.com/api/result/13548/data/123456";
In this example I need 123456. The only constant parts in the string are /data/ and maybe the first part of the URL, like https://.
preg_match("#/data/([0-9]+)([^0-9]+)#siU", $str, $matches);
Results in Array ( [0] => /data/123456d [1] => 123456 [2] => d ), what would be acceptable. But if there's nothing following the relevant numeric part, like in $str2, this expression fails. I've tried to make the tailing part optional with preg_match("#/ads/([0-9]+)(([^0-9]+)?)#siU", $x, $matches);, but it fails, too; returning only the first number of the numeric part.
The U greediness swapping modifier makes all greedy subpattern lazy here, you should remove it together with ([^0-9]+). You also do not need DOTALL modifier because there is no . in your pattern whose behavior could be modified with that s flag.
preg_match("#/data/([0-9]+)#i", $str, $matches);
Now, the pattern will match:
/data/ - a sequence of literal chars
([0-9]+) - Group 1 capturing 1+ digits (same as (\d+))
See the PHP demo.
$str = "differentcontentLocationhttps://e...content-available-to-author-only...e.com/api/result/13548/data/123456differentstuffincludingwhitespacesandnewlines8484";
$str2 = "differentcontentLocationhttps://e...content-available-to-author-only...e.com/api/result/13548/data/123456";
preg_match("#/data/([0-9]+)#i", $str, $matches);
print_r($matches); // Array ( [0] => /data/123456 [1] => 123456 )
preg_match("#/data/([0-9]+)#i", $str2, $matches2);
print_r($matches2); // Array ( [0] => /data/123456 [1] => 123456 )

How to get Variable from Regex in PHP?

How can I get only the Name/Variable which is "regexed"? Like in this case the $1 or $0 in the anchor's href?
When I try to echo the $1 or $0 I get a Syntax Error because it's a Number.
At the Moment the $str is a whole Text.
function convertHashtags($str){
$regex = "/#+([a-zA-Z0-9_]+)/";
$str = preg_replace($regex, '$0', $str);
return($str);
}
Simple use preg_match before preg_replace, eg
preg_match($regex, $str, $matches);
Assuming the pattern actually matched, you should have the results in $matches[0] and $matches[1] which are the equivalent of $0 and $1 in the replace string.
FYI, the $n tokens in the replacement string are not variables though I can see how that can be confusing. They are simply references to matched groups (or the entire match in the case of $0) in the regex.
See http://php.net/manual/function.preg-replace.php#refsect1-function.preg-replace-parameters
To find multiple matches in $str, use preg_match_all(). It's almost the same only it populates $matches with a collection of matches. Use the PREG_SET_ORDER flag as the 4th argument to make the array workable. For example...
$str = ' xD #lol and #testing';
$regex = '/#(\w+)/';
preg_match_all($regex, $str, $allMatches, PREG_SET_ORDER);
print_r($allMatches);
produces...
Array
(
[0] => Array
(
[0] => #lol
[1] => lol
)
[1] => Array
(
[0] => #testing
[1] => testing
)
)

Get array of usernames from #Twitter like string

How do I get an array of usernames from a string tagged like in Twitter with the '#' prefix using regex or similar?
For example:
Input:
hello #person my name is #joebloggs
Output (array):
['person', 'joebloggs']
Another solution
#[^\s]+
Usage:
$string = 'hello #person my name is #joebloggs';
$pattern = '/#[^\s]+/';
preg_match_all($pattern, $string, $matches);
print_r($matches[0]);
Output:
Array
(
[0] => #person
[1] => #joebloggs
)
Do this:
$regex = '~#\K\S+~';
preg_match_all($regex, $yourstring, $matches);
print_r($matches[0]);
See the matches in the Regex Demo.
Explanation
# matches the AT (but it will not be returned)
The \K tells the engine to drop what was matched so far from the final match it returns
\S+ matches any non-space characters
use this :
<?php
$re = "/(?<=#)[^\s]+/";
$str = "asdasd asda 232 #asdasd sd232 soi #other asdnasda asjdajh #asdasd";
preg_match_all($re, $str, $matches);
print_r($matches);
demo here : https://eval.in/173103
output:
Array
(
[0] => Array
(
[0] => asdasd
[1] => other
[2] => asdasd
)
)

Match rest of string with regex

I have a string like this
ch:keyword
ch:test
ch:some_text
I need a regular expression which will match all of the strings, however, it must not match the following:
ch: (ch: is proceeded by a space, or any number of spaces)
ch: (ch: is proceeded by nothing)
I am able to deduce the length of the string with the 'ch:' in it.
Any help would be appreciated; I am using PHP's preg_match()
Edit: I have tried this:
preg_match("/^ch:[A-Za-z_0-9]/", $str, $matches)
However, this only matches 1 character after the string. I tried putting a * after the closing square bracket, but this matches spaces, which I don't want.
preg_match('/^ch:(\S+)/', $string, $matches);
print_r($matches);
\S+ is for matching 1 or more non-space characters. This should work for you.
Try this regular expression:
^ch:\S.*$
$str = <<<TEXT
ch:keyword
ch:test
ch:
ch:some_text
ch: red
TEXT;
preg_match_all('|ch\:(\S+)|', $str, $matches);
echo '<pre>'; print_r($matches); echo '</pre>';
Output:
Array
(
[0] => Array
(
[0] => ch:keyword
[1] => ch:test
[2] => ch:some_text
)
[1] => Array
(
[0] => keyword
[1] => test
[2] => some_text
)
)
Try using this:
preg_match('/(?<! +)ch:[^ ].*/', $str);

Categories