Finding values in a string via regex in php

Finding values in a string via regex in php - php

I am trying to get information out of a textarea that contains certain strings (e.g. [name]) and find each item encased in the square brackets using regex patterns (currently tried using preg_match, preg_split, preg_quote, preg_match_all). It seems that the problem is in my regex pattern that I am providing for it.
My current regex:
$menuItems = preg_match_all('/[^[][([^[].*)]/U', $_SESSION['emailBody'], $menuItems);
I have tried many other patterns e.g.
/(?[...]\w+): (?[...]\d+)/
Any help that can be provided with this is greatly appreciated.
EDIT:
Sample input:
[email] address [to] name [from] someone
Message displayed on var_dump of the $menuItems variable:
array(1) { [0]=> string(0) "" }
EDIT 2:
Thank you to everyone for the help and support with this, I am pleased to say that it is all up and running perfectly!

From the comment stream above, you can simplify the regular expression as follows:
preg_match_all('/\[(.*)\]/U', $_SESSION['emailBody'], $menuItems);
One thing to note:
preg_match_all() fills the array in its 3rd parameter with the results of the matches. Your example line then overwrites this array with the result of preg_match_all() (an integer).
You should then be able to iterate over the results by using the following loop:
foreach ($menuItems[1] as $menuItem) {
// ...
}

Escape the square brackets and remove the dot:
$menuItems = preg_match_all('/[^[]\[([^[]*)\]/U', $_SESSION['emailBody'], $menuItems);
// here __^ __^ ^
preg_match_all doesn't return a string. You have to add an array for the last parameter:
preg_match_all('/\[([^[\]]*)\]/U', $_SESSION['emailBody'], $matches);
The matches are in the array $matches
print_r($matches);
Working example:
$str = '[email] address [to] name [from] someone';
preg_match_all('/\[([^[\]]*)\]/U', $str, $matches);
print_r($matches);
Output:
Array
(
[0] => Array
(
[0] => [email]
[1] => [to]
[2] => [from]
)
[1] => Array
(
[0] => email
[1] => to
[2] => from
)
)

Here is a simple solution. This regex will capture all items encased in brackets along with brackets as well.
If you don't want brackets in result change regex to $regex = "/(?:\\[(\\w+)\\])/mi";
$subject = "[email] address [to] name [from] someone";
$regex = "/(\\[\\w+\\])/mi";
$matches = array();
preg_match_all($regex, $subject, &$matches);
print_r($matches);

Related

How to write regex to find empty space after colon in string with no new line in text format?

I am creating one regex to find words after colon in my pdftotext. i
am getting data like:
I am using this xpdf to convert uploaded pdf by user into text format.
$text1 = (new Pdf('C:\xpdf-tools-win-4.00\bin64\pdftotext.exe'))
->setPdf('path')
->setOptions(['layout', 'layout'])
->text();
$string = $text1;
$regex = '/(?<=: ).+/';
preg_match_all($regex, $string, $matches);
In ->setPdf('path') path will be path of uploaded file.
I am getting below data :
Full Name: XYZ
Nationality: Indian
Date of Birth: 1/1/1988
Permanent Residence Address:
In my Above data you can see residence address is empty.
Im writing one regex to find words after colon.
but on $matches it results only:
Current O/P:
Array
(
[0] => Array
(
[0] => xyz
[1] => Indian
[2] => 1/1/1988
)
)
It skips if regex find whitespace or empty value after colon:
I want result with empty value too in array.
Expected O/P:
Array
(
[0] => Array
(
[0] => xyz
[1] => Indian
[2] => 1/1/1988
[3] =>
)
)

Note: The OP has changed his question after several answers were given.
This is an answer to the original question.
Here is one solution, using preg_match_all. We can try matching on the following pattern:
(?<=:)[ ]*(\S*(?:[ ]+\S+)*)
This matches any amount of spaces, following a colon, the whitespace then followed by any number of words. We access the first index of the output array from preg_match_all, because we only want what was captured in the first capture group.
$input = "name: xyz\naddress: db,123,eng.\nage:\ngender: male\nother: hello world goodbye";
preg_match_all ("/(?<=:)[ ]*(\S*(?:[ ]+\S+)*)$/m", $input, $array);
print_r($array[1]);
Array
(
[0] => xyz
[1] => db,123,eng.
[2] =>
[3] => male
[4] => hello world goodbye
)
Using capture groups is a good way to go here, because the captured group, in theory, should appear in the output array, even if there is no captured term.

Your code, $regex = '/\b: \s*'\K[\w-]+/i';, ended right before \K. You have 3 quotes, and the first 2 quotes capture the pattern.
Anyways, what you can do is use groups to capture the output after the colon, including whitespace:
$regex = "^.+: (\s?.*)" should work.

Strange behavior of preg_match_all php

I have a very long string of html. From this string I want to parse pairs of rus and eng names of cities. Example of this string is:
$html = '
Абакан
Хакасия республика
Абан
Красноярский край
Абатский
Тюменская область
';
My code is:
$subject = $this->html;
$pattern = '/<a href="([\/a-zA-Z0-9-"]*)">([а-яА-Я]*)/';
preg_match_all($pattern, $subject, $matches);
For trying I use regexer . You can see it here http://regexr.com/399co
On the test used global modifier - /g
Because of in PHP we can't use /g modifier I use preg_match_all function. But result of preg_match_all is very strange:
Array
(
[0] => Array
(
[0] => <a href="/forecasts5000/russia/republic-khakassia/abakan">Абакан
[1] => <a href="/forecasts5000/russia/krasnoyarsk-territory/aban">Абан
[2] => <a href="/forecasts5000/russia/tyumen-area/abatskij">Аба�
[3] => <a href="/forecasts5000/russia/arkhangelsk-area/abramovskij-ma">Аб�
)
[1] => Array
(
[0] => /forecasts5000/russia/republic-khakassia/abakan
[1] => /forecasts5000/russia/krasnoyarsk-territory/aban
[2] => /forecasts5000/russia/tyumen-area/abatskij
[3] => /forecasts5000/russia/arkhangelsk-area/abramovskij-ma
)
[2] => Array
(
[0] => Абакан
[1] => Абан
[2] => Аба�
[3] => Аб�
)
)
First of all - it found only first match (but I need to get array with all matches)
The second - result is very strange for me. I want to get the next result:
pairs of /forecasts5000/russia/republic-khakassia/abakan and Абакан
What do I do wrong?

Element 0 of the result is an array of each of the full matches of the regexp. Element 1 is an array of all the matches for capture group 1, element 2 contains capture group 2, and so on.
You can invert this by using the PREG_SET_ORDER flag. Then element 0 will contain all the results from the first match, element 1 will contain all the results from the second match, and so on. Within each of these, [0] will be the full match, and the remaining elements will be the capture groups.
If you use this option, you can then get the information you want with:
foreach ($matches as $match) {
$url = $match[1];
$text = $match[2];
// Do something with $url and $text
}

You can also use T-Regx library which has separate methods for each case :)
pattern('<a href="([/a-zA-Z0-9-"]*)">([а-яА-Я]*)')
->match($this->html)
->forEach(function (Match $match) {
$match = $match->text();
$group = $match->group(1);
echo "Match $match with group $group"
});
I also has automatic delimiters

Pattern for preg_match

I have a string contains the following pattern "[link:activate/$id/$test_code]" I need to get the word activate, $id and $test_code out of this when the pattern [link.....] occurs.
I also tried getting the inside items by using grouping but only gets active and $test_code couldn't get $id. Please help me to get all the parameter and action name in array.
Below is my code and output
Code
function match_test()
{
$string = "Sample string contains [link:activate/\$id/\$test_code] again [link:anotheraction/\$key/\$second_param]]] also how the other ationc like [link:action] works";
$pattern = '/\[link:([a-z\_]+)(\/\$[a-z\_]+)+\]/i';
preg_match_all($pattern,$string,$matches);
print_r($matches);
}
Output
Array
(
[0] => Array
(
[0] => [link:activate/$id/$test_code]
[1] => [link:anotheraction/$key/$second_param]
)
[1] => Array
(
[0] => activate
[1] => anotheraction
)
[2] => Array
(
[0] => /$test_code
[1] => /$second_param
)
)

Try this:
$subject = <<<'LOD'
Sample string contains [link:activate/$id/$test_code] again [link:anotheraction/$key/$second_param]]] also how the other ationc like [link:action] works
LOD;
$pattern = '~\[link:([a-z_]+)((?:/\$[a-z_]+)*)]~i';
preg_match_all($pattern, $subject, $matches);
print_r($matches);
if you need to have \$id and \$test_code separated you can use this instead:
$pattern = '~\[link:([a-z_]+)(/\$[a-z_]+)?(/\$[a-z_]+)?]~i';

Is this what you are looking for?
/\[link:([\w\d]+)\/(\$[\w\d]+)\/(\$[\w\d]+)\]/
Edit:
Also the problem with your expression is this part:
(\/\$[a-z\_]+)+
Although you have repeated the group, the match will only return one because it is still only one group declaration. The regex won't invent matching group numbers for you (Not that i've ever seen anyway).

Regular Expressions: get what is outside of the brackets

I'm using PHP and I have text like:
first [abc] middle [xyz] last
I need to get what's inside and outside of the brackets. Searching in StackOverflow I found a pattern to get what's inside:
preg_match_all('/\[.*?\]/', $m, $s)
Now I'd like to know the pattern to get what's outside.
Regards!

You can use preg_split for this as:
$input ='first [abc] middle [xyz] last';
$arr = preg_split('/\[.*?\]/',$input);
print_r($arr);
Output:
Array
(
[0] => first
[1] => middle
[2] => last
)
This allows some surrounding spaces in the output. If you don't want them you can use:
$arr = preg_split('/\s*\[.*?\]\s*/',$input);
preg_split splits the string based on a pattern. The pattern here is [ followed by anything followed by ]. The regex to match anything is .*. Also [ and ] are regex meta char used for char class. Since we want to match them literally we need to escape them to get \[.*\]. .* is by default greedy and will try to match as much as possible. In this case it will match abc] middle [xyz. To avoid this we make it non greedy by appending it with a ? to give \[.*?\]. Since our def of anything here actually means anything other than ] we can also use \[[^]]*?\]
EDIT:
If you want to extract words that are both inside and outside the [], you can use:
$arr = preg_split('/\[|\]/',$input);
which split the string on a [ or a ]

$inside = '\[.+?\]';
$outside = '[^\[\]]+';
$or = '|';
preg_match_all(
"~ $inside $or $outside~x",
"first [abc] middle [xyz] last",
$m);
print_r($m);
or less verbose
preg_match_all("~\[.+?\]|[^\[\]]+~", $str, $matches)

Use preg_split instead of preg_match.
preg_split('/\[.*?\]/', 'first [abc] middle [xyz] last');
Result:
array(3) {
[0]=>
string(6) "first "
[1]=>
string(8) " middle "
[2]=>
string(5) " last"
}
ideone

As every one says that you should use preg_split, but only one person replied with an expression that meets your needs, and i think that is a little complex - not complex, a little to verbose but he has updated his answer to counter that.
This expression is what most of the replies have stated.
/\[.*?\]/
But that only prints out
Array
(
[0] => first
[1] => middle
[2] => last
)
and you stated you wanted whats inside and outside the braces, sio an update would be:
/[\[.*?\]]/
This gives you:
Array
(
[0] => first
[1] => abc
[2] => middle
[3] => xyz
[4] => last
)
but as you can see that its capturing white spaces as well, so lets go a step further and get rid of those:
/[\s]*[\[.*?\]][\s]*/
This will give you a desired result:
Array
(
[0] => first
[1] => abc
[2] => middle
[3] => xyz
[4] => last
)
This i think is the expression your looking for.
Here is a LIVE Demonstration of the above Regex

How does this RegEx for parsing emails work in PHP?

Okay, I have the following PHP code to extract an email address of the following two forms:
Random Stranger <email#domain.com>
email#domain.com
Here is the PHP code:
// The first example
$sender = "Random Stranger <email#domain.com>";
$pattern = '/([\w_-]*#[\w-\.]*)|.*<([\w_-]*#[\w-\.]*)>/';
preg_match($pattern,$sender,$matches,PREG_OFFSET_CAPTURE);
echo "<pre>";
print_r($matches);
echo "</pre><hr>";
// The second example
$sender = "user#domain.com";
preg_match($pattern,$sender,$matches,PREG_OFFSET_CAPTURE);
echo "<pre>";
print_r($matches);
echo "</pre>";
My question is... what is in $matches? It seems to be a strange collection of arrays. Which index holds the match from the parenthesis? How can I be sure I'm getting the email address and only the email address?
Update:
Here is the output:
Array
(
[0] => Array
(
[0] => Random Stranger
[1] => 0
)
[1] => Array
(
[0] =>
[1] => -1
)
[2] => Array
(
[0] => user#domain.com
[1] => 5
)
)
Array
(
[0] => Array
(
[0] => user#domain.com
[1] => 0
)
[1] => Array
(
[0] => user#domain.com
[1] => 0
)
)

This doesn't help you with your preg question but it will simplify your code. Since those are the only 2 options, dont use regular expressions
echo end( explode( '<', rtrim( $sender, '>' ) ) );

The following is copied directly from the help doc at http://us.php.net/preg_match
If matches is provided, then it is filled with the results of search. $matches[0] will contain the text that matched the full pattern, $matches[1] will have the text that matched the first captured parenthesized subpattern, and so on.

The preg_match() manual page explains how $matches works. It's an optional parameter that gets filled with the results of any bracketed sub-expression from your regexp, in the order that they matched. $matches[0] is always the entire expression match, followed by the sub-expressions.
So for example, that pattern contains two sub-expression, ([\w_-]*#[\w-\.]*) and ([\w_-]*#[\w-\.]*). The parts matching those two expressions will be put into $matches[1] and $matches[2], respectively. I would guess after a quick glance that for the email address of Random Stranger <email#domain.com>, you would have something like this in $matches:
Array(
0 => "Random Stranger <email#domain.com>",
1 => "Random Stranger",
2 => "email#domain.com"
)
Think of it as passing an array named $matches by reference, that gets filled with all the sub-parts that are matched.
Edit - note that you are using the PREG_OFFSET_CAPTURE flag, which alters the behaviour of how $matches gets filled, so your result won't match my example. The manual explains how this flag alters the capture as well. In this case, instead of a set of matched sub-expressions, you get a multidimensional array of each expression with the position it was found at in the string.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Finding values in a string via regex in php - php

Related

How to write regex to find empty space after colon in string with no new line in text format?

Strange behavior of preg_match_all php

Pattern for preg_match

Regular Expressions: get what is outside of the brackets

How does this RegEx for parsing emails work in PHP?

Categories

Resources