How to use regular expressions in php to extract data like this? - php

I am using the following code in php to extract username, password and email:
$subject = "fjcljt # 123456789 # chengyong702#126.com";
$pattern2 = '/^(\w+\ # ){2}?\w+ ?/';
preg_match($pattern2, $subject, $matches);
but the returned result using print_r is Array ( [0] => fjcljt # 123456789 # chengyong702 [1] => 123456789 # )
What am I doing wrong with preg_match here?

if " # " delimits your values...no need for regex at all...
$subject = "fjcljt # 123456789 # chengyong702#126.com";
$subject = array_map('trim',explode("#",$subject));

The result of preg_match captures the entire string in [0], and then each captured group in [i]. A captured group is denoted by the brackets in your $pattern2. Since there's only one set of brackets, there's only one captured group.
Even though your pattern matches twice, only the latest match is stored group 1, being 123456789 # (overriding the fjcljt #).
To get explicit groups you have to write the captured groups in your regex explicitly as opposed to with the {2}:
$pattern2 = '/^(\w+\ # )(\w+\ # )\w+ ?/';
Then your return array will have [1] bein fjcljt # and [2] being 1123456789 #.

list($username, $password, $email) = explode(' # ', $subject);

Try using explode instead of regex. regular expression use more resources.
$data = explode('#','fjcljt # 123456789 # chengyong702#126.com');
then you can access data like below:
$data[0]; //username
$data[1]; //password
$data[2]; //email
EDIT
for whitespace use delimiter like below:
" # "

It's two things here. First you are using a quantifier {2} on a match group (..). What happens is that you only get the last of the two matches as result group [1]. If you wanted to get both numbers/words separately, you have to expand the regex.
The second problem is that \w+ does not include #. So you only get half the email.
$pattern2 = '/^(\w+) # (\w+) # ([\w#.]+)/';
Might be what you wanted.

Not sure what the F bomb your trying to do, however if your trying to get login credentials you could try something like this
if(preg_match('/^.*\#.*$/i', $type_of_login) > 0)
{
$request = User::get_by_email($type_of_login);
}
else
{
//get by username or whatevers....
}
//then extract the password!!

Related

simple pattern with preg_match_ALL work fine!, how to use with preg_replace?

thanks by your help.
my target is use preg_replace + pattern for remove very sample strings.
then only using preg_replace in this string or others, I need remove ANY content into <tag and next symbol >, the pattern is so simple, then:
$x = '#<\w+(\s+[^>]*)>#is';
$s = 'DATA<td class="td1">111</td><td class="td2">222</td>DATA';
preg_match_all($x, $s, $Q);
print_r($Q[1]);
[1] => Array
(
[0] => class="td1"
[1] => class="td2"
)
work greath!
now I try remove strings using the same pattern:
$new_string = '';
$Q = preg_replace($x, "\\1$new_string", $s);
print_r($Q);
result is completely different.
what is bad in my use of preg_replace?
using only preg_replace() how I can remove this strings?
(we can use foreach(...) for remove each string, but where is the error in my code?)
my result expected when I intro this value:
$s = 'DATA<td class="td1">111</td><td class="td2">222</td>DATA';
is this output:
$Q = 'DATA<td>111</td><td>222</td>DATA';
Let's break down your RegEx, #<\w+(\s+[^>]*)>#is, and see if that helps.
# // Start delimiter
< // Literal `<` character
\w+ // One or more word-characters, a-z, A-Z, 0-9 or _
( // Start capturing group
\s+ // One or more spaces
[^>]* // Zero or more characters that are not the literal `>`
) // End capturing group
> // Literal `>` character
# // End delimiter
is // Ignore case and `.` matches all characters including newline
Given the input DATA<td class="td1">DATA this matches <td class="td1"> and captures class="td1". The difference between match and capture is very important.
When you use preg_match you'll see the entire match at index 0, and any subsequent captures at incrementing indexes.
When you use preg_replace the entire match will be replaced. You can use the captures, if you so choose, but you are replacing the match.
I'm going to say that again: whatever you pass as the replacement string will replace the entirety of the found match. If you say $1 or \\=1, you are saying replace the entire match with just the capture.
Going back to the sample after the breakdown, using $1 is the equivalent of calling:
str_replace('<td class="td1">', ' class="td1"', $string);
which you can see here: https://3v4l.org/ZkPFb
To your question "how to change [0] by $new_string", you are doing it correctly, it is your RegEx itself that is wrong. To do what you are trying to do, your pattern must capture the tag itself so that you can say "replace the HTML tag with all of the attributes with just the tag".
As one of my comments noted, this is where you'd invert the capturing. You aren't interesting in capturing the attributes, you are throwing those away. Instead, you are interested in capturing the tag itself:
$string = 'DATA<td class="td1">DATA';
$pattern = '#<(\w+)\s+[^>]*>#is';
echo preg_replace($pattern, '<$1>', $string);
Demo: https://3v4l.org/oIW7d

Split log data with php preg_match

my task is to analyze log files with PHP-script.
I'm going to use REGEX in order to split log records for further analyses.
Log records are like following:
param1=val1;param2=val2;param3=val3;[int1Param1=int1Val1;int1Param2=int1Val2;][int2Param1=int2Val1;int2Param2=int2Val2;][int3Param1=int3Val1;int3Param2=int3Val2;]param4=val4;
so, I have set of parameters and values I have to analyze, and I have no problem with this part.
My concern is "session data" which is inside series of square brackets between param3 and param4. The issue is that I have no idea how much records I'll have in this part (it can be 0 or more of such records in this part).
I'm identifying this part with following regex:
(\[[^\]\[]+\])*
It perfectly identifies complete string between "param3=val3;" and "param4=val4;" and returns it as "0" element of preg_match's $matches array. What I need is to get also all this brackets as array elements, for further analyses of its content, but $matches contains only 2 elements: "0" - whale string; "1" - last "brackets".
Any ideas?
Thanks Dennis.
You can use preg_match_all on the string like so:
preg_match_all("/\[[^][]+\]/", $log, $results);
print_r($results);
This results in:
Array
(
[0] => Array
(
[0] => [int1Param1=int1Val1;int1Param2=int1Val2;]
[1] => [int2Param1=int2Val1;int2Param2=int2Val2;]
[2] => [int3Param1=int3Val1;int3Param2=int3Val2;]
)
)
Demo here.
What you can do:
$pattern = '~(?:(?<new>\[)|\G(?!^))(?<key>[^]=]++)=(?<val>[^][;]++);~';
$subject = 'param1=val1;param2=val2;param3=val3;[int1Param1=int1Val1;int1Param2=int1Val2;][int2Param1=int2Val1;int2Param2=int2Val2;][int3Param1=int3Val1;int3Param2=int3Val2;]param4=val4;';
if (preg_match_all($pattern, $subject, $matches, PREG_SET_ORDER)) {
$i=0;
foreach ($matches as $match) {
if ($match['new']) $i++;
$result[$i][$match['key']]=$match['val'];
}
print_r($result);
}
pattern explanation:
~ # pattern delimiter
(?: # open a non-capturing group
(?<new>\[) # the named group "new" contains a possible "[". It's useful
# to know when a new content in square brackets begins.
| # or
\G(?!^) # a match (that can't be at the start of the string)
# contiguous (\G) to a precedent match
) # close the atomic group
(?<key>[^]=]++) # named group "key"
=
(?<val>[^][;]++) # named group "val"
;
~
the alternative in the atomic group describe to possibilities. The first is [ to match the first pair key/value inside square brackets. Then the second (and others) which is forced to be contiguous to a precedent match can succeed.

Looping within a regular expression

can regex able to find a patter to this?
{{foo.bar1.bar2.bar3}}
where in the groups would be
$1 = foo $2 = bar1 $3 = bar2 $4 = bar3 and so on..
it would be like re-doing the expression over and over again until it fails to get a match.
the current expression i am working on is
(?:\{{2})([\w]+).([\w]+)(?:\}{2})
Here's a link from regexr.
http://regexr.com?3203h
--
ok I guess i didn't explain well what I'm trying to achieve here.
let's say I am trying to replace all
.barX inside a {{foo . . . }}
my expected results should be
$foo->bar1->bar2->bar3
This should work, assuming no braces are allowed within the match:
preg_match_all(
'%(?<= # Assert that the previous character(s) are either
\{\{ # {{
| # or
\. # .
) # End of lookbehind
[^{}.]* # Match any number of characters besides braces/dots.
(?= # Assert that the following regex can be matched here:
(?: # Try to match
\. # a dot, followed by
[^{}]* # any number of characters except braces
)? # optionally
\}\} # Match }}
) # End of lookahead%x',
$subject, $result, PREG_PATTERN_ORDER);
$result = $result[0];
I'm not a PHP person, but I managed to construct this piece of code here:
preg_match_all("([a-z0-9]+)",
"{{foo.bar1.bar2.bar3}}",
$out, PREG_PATTERN_ORDER);
foreach($out[0] as $val)
{
echo($val);
echo("<br>");
}
The code above prints the following:
foo
bar1
bar2
bar3
It should allow you to exhaustively search a given string by using a simple regular expression. I think that you should also be able to get what you want by removing the braces and splitting the string.
I don't think so, but it's relatively painless to just split the string on periods like so:
$str = "{{foo.bar1.bar2.bar3}}";
$str = str_replace(array("{","}"), "", $str);
$values = explode(".", $str);
print_r($values); // Yields an array with values foo, bar1, bar2, and bar3
EDIT: In response to your question edit, you could replace all barX in a string by doing the following:
$str = "{{foo.bar1.bar2.bar3}}";
$newStr = preg_replace("#bar\d#, "hi", $str);
echo $newStr; // outputs "{{foo.hi.hi.hi}}"
I don't know the correct syntax in PHP, for pulling out the results, but you could do:
\{{2}(\w+)(?:\.(\w+))*\}{2}
That would capture the first hit in the first capturing group and the rest in second capturing group. regexr.com is lacking the ability to show that as far as I can see though. Try out Expresso, and you'll see what I mean.

PHP: Get last Tag of a String with Regular Expressions

Quite simple problem (but difficult solution): I got a string in PHP like as follows:
['one']['two']['three']
And from this, i must extract the last tags, so i finally got three
it is also possible that there is a number, like
[1][2][3]
and then i must get 3
How can i solve this?
Thanks for your help!
Flo
Your tag is \[[^\]]+\].
3 Tags are: (\[[^\]]+\]){3}
3 Tags at end are: (\[[^\]]+\]){3}$
N Tags at end are: (\[[^\]]+\])*$ (N 0..n)
Example:
<?php
$string = "['one']['two']['three'][1][2][3]['last']";
preg_match("/((?:\[[^\]+]*\]){3})$/", $string, $match);
print_r($match); // Array ( [0] => [2][3]['last'] [1] => [2][3]['last'] )
This tested code may work for you:
function getLastTag($text) {
$re = '/
# Match contents of last [Tag].
\[ # Literal start of last tag.
(?: # Group tag contents alternatives.
\'([^\']+)\' # Either $1: single quoted,
| (\d+) # or $2: un-quoted digits.
) # End group of tag contents alts.
\] # Literal end of last tag.
\s* # Allow trailing whitespace.
$ # Anchor to end of string.
/x';
if (preg_match($re, $text, $matches)) {
if ($matches[1]) return $matches[1]; // Either single quoted,
if ($matches[2]) return $matches[2]; // or non quoted digit.
}
return null; // No match. Return NULL.
}
Here is a regex that may work for you. Try this:
[^\[\]']*(?='?\]$)

PHP regular expressions exact data inside brackets

I have some string like Western Australia 223/5 (59.3 ov)
I would like to split this string and extract the following informations with regular expressions
$team = 'Western Australia'
$runs = 223/5
$overs = 59.3
Issue is, format of the text is varying, it may any of the follwing
Western Australia 223/5 (59.3 ov)
Australia 223/5 (59.3 ov)
KwaZulu-Natal Inland
Sri Lanka v West Indies
Any help (like is it possible to have in a single regexp) will be appreciated..
if (preg_match(
'%^ # start of string
(?P<team>.*?) # Any number of characters, as few as possible (--> team)
(?:\s+ # Try to match the following group: whitespace plus...
(?P<runs>\d+ # a string of the form number...
(?:/\d+)? # optionally followed by /number
) # (--> runs)
)? # optionally
(?:\s+ # Try to match the following group: whitespace plus...
\( # (
(?P<overs>[\d.]+) # a number (optionally decimal) (--> overs)
\s+ov\) # followed by ov)
)? # optionally
\s* # optional whitespace at the end
$ # end of string
%six',
$subject, $regs)) {
$team = $regs['team'];
$runs = $regs['runs'];
$overs = $regs['overs'];
} else {
$result = "";
}
You might need to catch an error if the matches <runs> and/or <overs> are not actually present in the string. I don't know much about PHP. (Don't know much biology...SCNR)
Assuming you are using preg_match, you can use the following:
preg_match('/^([\w\s]+)\s*(\d+\/\d+)?\s*(\(\d+\.\d+ ov\))?$/', $input, $matches);
Then, you can inspect $matches to see which one of the options you are supossed to manage was found.
See preg_match documentation for more information.

Categories