php RegEx extract values from string - php

I am new to regular expressions and I am trying to extract some specific values from this string:
"Iban: EU4320000713864374\r\nSwift: DTEADCCC\r\nreg.no 2361 \r\naccount no. 1234531735"
Values that I am trying to extract:
EU4320000713864374
2361
This is what I am trying to do now:
preg_match('/[^Iban: ](?<iban>.*)[^\\r\\nreg.no ](?<regnr>.*)[^\\r\\n]/',$str,$matches);
All I am getting back is null or empty array. Any suggestions would be highly appreciated

The square brackets make no sense, you perhaps meant to anchor at the beginning of a line:
$result = preg_match(
'/^Iban: (?<iban>.*)\R.*\R^reg.no (?<regnr>.*)/m'
, $str, $matches
);
This requires to set the multi-line modifier (see m at the very end). I also replaced \r\n with \R so that this handles all kind of line-separator sequences easily.
Example: https://eval.in/47062
A slightly better variant then only captures non-whitespace values:
$result = preg_match(
'/^Iban: (?<iban>\S*)\R.*\R^reg.no (?<regnr>\S*)/m'
, $str, $matches
);
Example: https://eval.in/47069
Result then is (beautified):
Array
(
[0] => "Iban: EU4320000713864374
Swift: DTEADCCC
reg.no 2361"
[iban] => "EU4320000713864374"
[1] => "EU4320000713864374"
[regnr] => "2361"
[2] => "2361"
)

preg_match("/Iban: (\\S+).*reg.no (\\S+)/s", $str, $matches);
There is a specific feature about newlines: dot (.) does not match newline character unless s flag is specified.

Related

Trying to create a regex in PHP that matches patterns inside a pattern

I have seen some regex examples where the string is "Test string: Group1Group2", and using preg_match_all(), matching for patterns of text that exists inside the tags.
However, what I am trying to do is a bit different, where my string is something like this:
"some t3xt../s8fo=123,sij(variable1=123,variable2=743,variable3=535)"
What I want to do is match the sections such as 'variable=123' that exist inside the parenthesis.
What I have so far is this:
if( preg_match_all("/\(([^\)]*?)\)"), $string_value, $matches )
{
print_r( $matches[1] );
}
But this just captures everything that's inside the parenthesis, and doesn't match anything else.
Edit:
The desired output would be:
"variable1=123"
"variable2=743"
"variable3=535"
The output that I am getting is:
"variable1=123,variable2=743,variable3=535"
You can extract the matches you need with a single call to preg_match_all if the matches do not contain (, ) or ,:
$s = '"some t3xt../s8fo=123,sij(variable1=123,variable2=743,variable3=535)"';
if (preg_match_all('~(?:\G(?!\A),|\()\K[^,]+(?=[^()]*\))~', $s, $matches)) {
print_r($matches[0]);
}
See the regex demo and a PHP demo.
Details:
(?:\G(?!\A),|\() - either end of the preceding successful match and a comma, or a ( char
\K - match reset operator that discards all text matched so far from the current overall match memory buffer
[^,]+ - one or more chars other than a comma (use [^,]* if you expect empty matches, too)
(?=[^()]*\)) - a positive lookahead that requires zero or more chars other than ( and ) and then a ) immediately to the right of the current location.
I would do this:
preg_match("/\(([^\)]+)\)/", $string_value, $matches);
$result = explode(",", $matches[1]);
If your end result is an array of key => value then you can transform it into a query string:
preg_match("/\(([^\)]+)\)/", $string_value, $matches);
parse_str(str_replace(',', '&', $matches[1]), $result);
Which yields:
Array
(
[variable1] => 123
[variable2] => 743
[variable3] => 535
)
Or replace with a newline \n and use parse_ini_string().

Matching whole words between commas, or a comma at the beginning, or a comma at the end with Regex

I have a string like this:
page-9000,page-template,page-type,page-category-128,image-195,listing-latest,rss-latest,even-more-info,even-more-tags
I made this regex that I expect to get the whole tags with:
(?<=\,)(rss-latest|listing-latest-no-category|category-128|page-9000)(?=\,)
I want it to match all the ocurrences.
In this case:
page-9000 and rss-latest.
This regex checks whole words between commas just fine but it ignores the first and the last because it's not between commas (obviously).
I've also tried that it checks if it's between commas OR one comma at the beginning OR one comma to the end, however it would give me false positives, as it would match:
category-128
while the string contains:
page-category-128
Any help?
Try using the following pattern:
(?<=,|^)(rss-latest|listing-latest-no-category|category-128|page-9000)(?=,|$)
The only change I have made is to add boundary markers ^ and $ to the lookarounds to also match on the start and end of the input.
Script:
$input = "page-9000,page-template,page-type,page-category-128,image-195,listing-latest,rss-latest,even-more-info,even-more-tags";
preg_match_all("/(?<=,|^)(rss-latest|listing-latest-no-category|category-128|page-9000)(?=,|$)/", $input, $matches);
print_r($matches[1]);
This prints:
Array
(
[0] => page-9000
[1] => rss-latest
)
Here is a non-regex way using explode and array_intersect:
$arr1 = explode(',', 'page-9000,page-template,page-type,page-category-128,image-195,listing-latest,rss-latest,even-more-info,even-more-tags');
$arr2 = explode('|', 'rss-latest|listing-latest-no-category|category-128|page-9000');
print_r(array_intersect($arr1, $arr2));
Output:
Array
(
[0] => page-9000
[6] => rss-latest
)
The (?<=\,) and (?=,) require the presence of , on both sides of the matching pattern. You want to match also at the start/end of string, and this is where you need to either explicitly tell to match either , or start/end of string or use double-negating logic with negated character classes inside negative lookarounds.
You may use
(?<![^,])(?:rss-latest|listing-latest-no-category|category-128|page-9000)(?![^,])
See the regex demo
Here, (?<![^,]) matches the start of string position or a , and (?![^,]) matches the end of string position or ,.
Now, you do not even need a capturing group, you may get rid of its overhead using a non-capturing group, (?:...). preg_match_all won't have to allocate memory for the submatches and the resulting array will be much cleaner.
PHP demo:
$re = '/(?<![^,])(?:rss-latest|listing-latest-no-category|category-128|page-9000)(?![^,])/m';
$str = 'page-9000,page-template,page-type,page-category-128,image-195,listing-latest,rss-latest,even-more-info,even-more-tags';
if (preg_match_all($re, $str, $matches)) {
print_r($matches[0]);
}
// => Array ( [0] => page-9000 [1] => rss-latest )

Split String With preg_match

I have string :
$productList="
Saluran Dua(Bothway)-(TAN007);
Speedy Password-(INET PASS);
Memo-(T-Memo);
7-pib r-10/10-(AM);
FBI (R/N/M)-(Rr/R(A));
";
i want the result like this:
Array(
[0]=>TAN007
[1]=>INET PASS
[2]=>T-Memo
[3]=>AM
[4]=>Rr/R(A)
);
I used :
$separator = '/\-\(([A-z ]*)\)/';
preg_match_all($separator, $productList, $match);
$value=$match[1];
but the result:
Array(
[0]=>INET PASS
[1]=>AM
);
there's must wrong code, anybody can help this?
Your regex does not include all the characters that can appear in the piece of text you want to capture.
The correct regex is:
$match = array();
preg_match_all('/-\((.*)\);/', $productList, $match);
Explanation (from the inside to outside):
.* matches anything;
(.*) is the expression above put into parenthesis to capture the match in $match[1];
-\((.*)\); is the above in the context: it matches if it is preceded by -( and followed by );; the parenthesis are escaped to use their literal values and not their special regex interpretation;
there is no need to escape - in regex; it has special interpretation only when it is used inside character ranges ([A-Z], f.e.) but even there, if the dash character (-) is right after the [ or right before the ] then it has no special meaning; e.g. [-A-Z] means: dash (-) or any capital letter (A to Z).
Now, print_r($match[1]); looks like this:
Array
(
[0] => TAN007
[1] => INET PASS
[2] => T-Memo
[3] => AM
[4] => Rr/R(A)
)
for the 1th line you need 0-9
for the 3th line you need a - in and
in the last line you need ()
try this
#\-\(([a-zA-Z/0-9(\)\- ]*)\)#
try with this ReGex
$separator = '#\-\(([A-Za-z0-9/\-\(\) ]*)\)#';

Pattern matching css rules

I have the following pattern:
[\{\}].*[\{\}]
With the following test strings (can provide more if needed):
}.prop{hello:ars;} //shouldn't match
}#prop{} //should match
}.prop #prop {} //should match
The purpose of the pattern is to find empty css rulesets. Can someone suggest how I go about excluding matches with characters between the second set of brackets? I will be updating the pattern as I get closer to a solution.
edit:
on http://gskinner.com/RegExr/
this pattern: [\}].*[\{]{1}[/}]{1}
seems to have the desired result although it is breaking when transfered to php for reasons I don't understand.
edit:
first apologies if this should be a separate question.
Using the pattern in the first edit in php:
$pattern = "/[\}].*[\{]{1}[/}]{1}/";
preg_match_all ($pattern, $new_css, $p);
print_r($p);
When $new_css is a string of the content of an uploaded css file containing empty rulesets, $p is never populated. Yet I know this pattern is ok. Can anyone see what the issue is?
edit: final solution
//take out other unwanted characters
$pattern = "/\}([\.#\w]\w+\s*)+{}/";
//do it twice to beat any deformation
$new_css = preg_replace ($pattern, '}', $new_css);
$new_css = preg_replace ($pattern, '}', $new_css);
Try using single quotes around the regex, or doubling the \ characters. The way PHP handles \ in double-quoted strings is that \{ becomes {, breaking the regex.
Try the pattern: '/}([\.#]\w+\s*)+{}/'
$new_css = "{}.prop{hello:ars;}
{}#prop{} //should match
}.prop #prop {} //should match
}.prop { aslkdfj}
}.prop { }
";
$pattern = '/}([\.#]\w+\s*)+{}/';
preg_match_all ($pattern, $new_css, $p);
print_r($p);
This outputs:
Array
(
[0] => Array
(
[0] => }#prop{}
[1] => }.prop #prop {}
)
[1] => Array
(
[0] => #prop
[1] => #prop
)
)

Why doesn't this PHP regular expression extract the url from the css value?

I want to extract the url from the background css property "url('/img/hw (11).jpg') no-repeat". I tried:
$re = '/url\(([\'\"]?.*\.[png|jpg|jpeg|gif][\'\"]?)\)/i';
$text = "url('/img/hw (11).jpg')";
preg_match_all($re, $text, $matches);
print_r($matches);
and it gives me :
Array
(
[0] => Array
(
)
[1] => Array
(
)
)
Here is the correct regex. The ".*" in the middle of your regex is too greedy. Also, try replacing the square brackets with paranthesis. Also note that since you are using single quotes around the string that you do not need to escape the double quotes.
$re = '/url\(([\'"]?.[^\'"]*\.(png|jpg|jpeg|gif)[\'"]?)\)/i';
Try:
/url\(([\'\"]?.*\.(png|jpg|jpeg|gif)[\'\"]?)\)/i
Instead. The square brackets do a character-by-character comparison rather than the or comparison you're looking for.
I think the probably lies in this part [png|jpg|jpeg|gif]. It's supposed to match only single characters.
You should do this instead :
/url\([\'\"]?(.*\.(jpg|png|jpeg|gif)[\'\"]?)\)/

Categories