Extract part of string matching pattern - regex, close but no cigar

Extract part of string matching pattern - regex, close but no cigar - php

I have a string that can be very long and contain various lines and characters.
I am wanting to extract all lines that are surrounded by SB & EB:
SB1EB
SBa description of various lengthEB
SB123.456.78EB
SB99.99EB
SB99.99EB
SB2EB
SBanother description of various lengthEB
SB123.456.00EB
SB199.99EB
SB199.99EB
3
another description of various length that I don't want to return
123.456.00
599.99
599.99
SB60EB
SBanother description of various length that i want to keepEB
SB500.256.10EB
SB0.99EB
SB0.99EB
another bit of text that i don't want - can span multiple lines
This is the pattern I am using in PHP:
preg_match_all('/SB(\d+)EB\nSB(\w.*)EB\nSB(\d{3}\.\d{3}\.\d{2})EB\nSB(\d.*)EB\nSB(\d.*)EB\n/', $string, $matches)
So this should hopefully return:
[0] -> SB1EB
SBa description of various lengthEB
SB123.456.78EB
SB99.99EB
SB99.99EB
[1] -> SB2EB
SBanother description of various lengthEB
SB123.456.00EB
SB199.99EB
SB199.99EB
[2] -> SB60EB
SBanother description of various length that i want to keepEB
SB500.256.10EB
SB0.99EB
SB0.99EB
But I'm obviously doing something wrong because it isn't matching anything. Can somebody help please?
SOLUTION:
Based on #Sajid reply:
if (preg_match_all('/(?:SB.+?EB(?:[\r\n]+|$))/', $string, $result)) {
for($i=0;$i<count($result[0]);$i++){
$single_item = $result[0][$i];
$single_item = str_replace("SB","",$single_item);
$single_item = str_replace("EB","",$single_item);
if (preg_match('/(\d{3}\.\d{3}\.\d{2})/', $single_item)) {
$id = $single_item;
$qty = $result[0][$i-2];
$name = $result[0][$i-1];
$price = $result[0][$i+1];
$total = $result[0][$i+2];
}
}
}
It's a bit messy, but it works! :)
Thanks

A bit of a hack, but this will do the job:
$a = array();
if (preg_match_all('/(?:SB.+?EB(?:[\r\n]+|$)){5}/', $x, $a)) {
print_r($a);
}
Note that ?: is used to make the group non-capture, and the results will be in $a[0] (eg, $a[0][0], $a[0][1], $a[0][2] ...)

Based on #Sajid reply:
if (preg_match_all('/(?:SB.+?EB(?:[\r\n]+|$))/', $string, $result))
{
for ($i=0; $i<count($result[0]); $i++)
{
$single_item = $result[0][$i];
$single_item = str_replace("SB","",$single_item);
$single_item = str_replace("EB","",$single_item);
if (preg_match('/(\d{3}\.\d{3}\.\d{2})/', $single_item))
{
$id = $single_item;
$qty = $result[0][$i-2];
$name = $result[0][$i-1];
$price = $result[0][$i+1];
$total = $result[0][$i+2];
}
}
}
It's a bit messy, but it works! :)

preg_match_all('/SB\d+EB.*?(?=(?:SB\d+EB)|$)/s', $subject, $result, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($result[0]); $i++) {
# Matched text = $result[0][$i];
}
So basically what I am doing (based on your input) is simply checking the "header" string SB\d+EB as an entry point and consuming everything until I find another "header" or the end of the input. Note the /s modifier so that . matches newlines.
Explanation:
# SB\d+EB.*?(?=(?:SB\d+EB)|$)
#
# Options: dot matches newline
#
# Match the characters “SB” literally «SB»
# Match a single digit 0..9 «\d+»
# Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
# Match the characters “EB” literally «EB»
# Match any single character «.*?»
# Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
# Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=(?:SB\d+EB)|$)»
# Match either the regular expression below (attempting the next alternative only if this one fails) «(?:SB\d+EB)»
# Match the regular expression below «(?:SB\d+EB)»
# Match the characters “SB” literally «SB»
# Match a single digit 0..9 «\d+»
# Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
# Match the characters “EB” literally «EB»
# Or match regular expression number 2 below (the entire group fails if this one fails to match) «$»
# Assert position at the end of the string (or before the line break at the end of the string, if any) «$»

Related

Is it possible to match all attributes in a preg_match with empty or missing attributes?

I'm having a little bit of an issue with pre_match.
I have a string that can come with attributes in any order (eg. [foobar a="b" c="d" f="g"] or [foobar c="d" a="b" f="g"] or [foobar f="g" a="b" c="d"] etc.)
These are the patterns I have tried:
// Matches when all searched for attributes are present
// doesn't match if one of them is missing
// http://www.phpliveregex.com/p/dHi
$pattern = '\[foobar\b(?=\s)(?=(?:(?!\]).)*\s\ba=(["|'])((?:(?!\1).)*)\1)(?=(?:(?!\]).)*\s\bc=(["'])((?:(?!\3).)*)\3)(?:(?!\]).)*]'
// Matches only when attributes are in the right order
// http://www.phpliveregex.com/p/dHj
$pattern = '\[foobar\s+a=["\'](?<a>[^"\']*)["\']\s+c=["\'](?<c>[^"\']*).*?\]'
I'm trying to figure it out, but can't seem to get it right.
Is there a way to match all the attributes, even when other ones are missing or empty (a='')?
I've even toyed with explode at the spaces between the attributes and then str_replace, but that seemed too overkill and not the right way to go about this.
In the links I've only matched for a="b" and c="d" but I also want to match these cases even if there is an e="f" or a z="x"

If you have the [...] strings as separate strings, not inside larger text, it is easy to use a \G based regex to mark a starting boundary ([some_text) and then match any key-value pair with some basic regex subpatterns using negated character classes.
Here is the regex:
(?:\[foobar\b|(?!^)\G)\s+\K(?<key>[^=]+)="(?<val>[^"]*)"(?=\s+[^=]+="|])
Here is what it matches in human words:
(?:\[foobar\b|(?!^)\G) - a leading boundary, the regex engine should find it first before proceeding, and it matches literal [foobar or the end of the previous successful match (\G matches the string start or position right after the last successful match, and since we need the latter only, the negative lookahead (?!^) excludes the beginning of the string)
\s+ - 1 or more whitespaces (they are necessary to delimit tag name with attribute values)
\K - regex operator that forces the regex engine to omit all the matched characters grabbed so far. A cool alternative to a positive lookbehind in PCRE.
(?<key>[^=]+) - Named capture group "key" matching 1 or more characters other than a =.
=" - matches a literal =" sequence
-(?<val>[^"]*) - Named capture group "val" matching 0 or more characters (due to * quantifier) other than a "
" - a literal " that is a closing delimiter for a value substring.
(?=\s+[^=]+="|]) - a positive lookahead making sure there is a next attribute or the end of the [tag xx="yy"...] entity.
PHP code:
$re = '/(?:\[foobar\b|(?!^)\G)\s+\K(?<key>[^=]+)="(?<val>[^"]*)"(?=\s+[^=]+="|])/';
$str = "[foobar a=\"b\" c=\"d\" f=\"g\"]";
preg_match_all($re, $str, $matches);
print_r(array_combine($matches["key"], $matches["val"]));
Output: [a] => b, [c] => d, [f] => g.

You could use the following function:
function toAssociativeArray($str) {
// Single key/pair extraction pattern:
$pattern = '(\w+)\s*=\s*"([^"]*)"';
$res = array();
// Valid string?
if (preg_match("/\[foobar((\s+$pattern)*)\]/", $str, $matches)) {
// Yes, extract key/value pairs:
preg_match_all("/$pattern/", $matches[1], $matches);
for ($i = 0; $i < count($matches[1]); $i += 1) {
$res[$matches[1][$i]] = $matches[2][$i];
}
};
return $res;
}
This is how you could use it:
// Some test data:
$testData = array('[foobar a="b" c="d" f="g"]',
'[foobar a="b" f="g" a="d"]',
'[foobar f="g" a="b" c="d"]',
'[foobar f="g" a="b"]',
'[foobar f="g" c="d" f="x"]');
// Properties I am interested in, with a default value:
$base = array("a" => "null", "c" => "nothing", "f" => "");
// Loop through the test data:
foreach ($testData as $str) {
// get the key/value pairs and merge with defaults:
$res = array_merge($base, toAssociativeArray($str));
// print value of the "a" property
echo "value of a is {$res['a']} <br>";
}
This script outputs:
value of a is b
value of a is d
value of a is b
value of a is b
value of a is null

extracting data from string using regex

I am new to job as well as for regular expressions. I am using php.
For the following string i want to extract the report number.
Dear Patient! (patient name) Your Reports(report number) has arrived.
can someone help me in creating a regular expression.
thank you
Solved:
$str ='Dear Patient! (P.JOHN) Your Reports (REPORTNO9) has arrived.';
$str = str_replace('(', '', $str);
$str = str_replace(')', '', $str);
preg_match('/Reports\s*(\w+)/', $str, $match);
echo $match[1]; //=> "REPORTNO9"

The Regular Expression
/Dear (\w+)! Your Reports(.*?)(?=has arrived)/
PHP usage
<?php
$subject = 'Dear Patient! Your Reports(report number) has arrived.';
if (preg_match('/Dear (\w+)! Your Reports(.*?)(?=has arrived)/', $subject, $regs)) {
var_dump($regs);
}
Result
array(3) {
[0]=>
string(42) "Dear Patient! Your Reports(report number) "
[1]=>
string(7) "Patient"
[2]=>
string(16) "(report number) "
}
Explanation
"
Dear\ # Match the characters “Dear ” literally
( # Match the regular expression below and capture its match into backreference number 1
\w # Match a single character that is a “word character” (letters, digits, etc.)
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
!\ Your\ Reports # Match the characters “! Your Reports” literally
( # Match the regular expression below and capture its match into backreference number 2
. # Match any single character that is not a line break character
*? # Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
)
(?= # Assert that the regex below can be matched, starting at this position (positive lookahead)
has\ arrived # Match the characters “has arrived” literally
)
"

You can use "split()" to extract the specific part of a string like this, so you don't have to use regex :
<?php
$my_string = ""; // Put there you string
$array_my_string = array();
$array_my_string = split('Reports', $my_string);
$tempResult = array_my_string[1]; // Will contains "(report number) has arrived."
$array_my_string = split(' has arrived', $tempResult);
$finalResult = $array_my_result[0]; // Will contains "(report number)"
?>

php regular expression minimum and maximum length doesn't work as expected

I want to create a regular expression in PHP, which will allow to user to enter a phone number in either of the formats below.
345-234 898
345 234-898
235-123-456
548 812 346
The minimum length of number should be 7 and maximum length should be 12.
The problem is that, the regular expression doesn't care about the minimum and maximum length. I don't know what is the problem in it. Please help me to solve it. Here is the regular expression.
if (preg_match("/^([0-9]+((\s?|-?)[0-9]+)*){7,12}$/", $string)) {
echo "ok";
} else {
echo "not ok";
}
Thanks for reading my question. I will wait for responses.

You should use the start (^) and the end ($) sign on your pattern
$subject = "123456789";
$pattern = '/^[0-9]{7,9}$/i';
if(preg_match($pattern, $subject)){
echo 'matched';
}else{
echo 'not matched';
}

You can use preg_replace to strip out non-digit symbols and check length of resulting string.
$onlyDigits = preg_replace('/\\D/', '', $string);
$length = strlen($onlyDigits);
if ($length < 7 OR $length > 12)
echo "not ok";
else
echo "ok";

Simply do this:
if (preg_match("/^\d{3}[ -]\d{3}[ -]\d{3}$/", $string)) {
Here \d means any digits from 0-9. Also [ -] means either a space or a hyphen

You can check the length with a lookahead assertion (?=...) at the begining of the pattern:
/^(?=.{7,12}$)[0-9]+(?:[\s-]?[0-9]+)*$/

Breaking down your original regex, it can read like the following:
^ # start of input
(
[0-9]+ # any number, 1 or more times
(
(\s?|-?) # a space, or a dash.. maybe
[0-9]+ # any number, 1 or more times
)* # repeat group 0 or more times
)
{7,12} # repeat full group 7 to 12 times
$ # end of input
So, basically, you're allowing "any number, 1 or more times" followed by a group of "any number 1 or more times, 0 or more times" repeat "7 to 12 times" - which kind of kills your length check.
You could take a more restricted approach and write out each individual number block:
(
\d{3} # any 3 numbers
(?:[ ]+|-)? # any (optional) spaces or a hyphen
\d{3} # any 3 numbers
(?:[ ]+|-)? # any (optional) spaces or a hyphen
\d{3} # any 3 numbers
)
Simplified:
if (preg_match('/^(\d{3}(?:[ ]+|-)?\d{3}(?:[ ]+|-)?\d{3})$/', $string)) {
If you want to restrict the separators to be only a single space or a hyphen, you can update the regex to use [ -] instead of (?:[ ]+|-); if you want this to be "optional" (i.e. there can be no separator between number groups), add in a ? to the end of each.
if (preg_match('/^(\d{3}[ -]\d{3}[ -]\d{3})$/', $string)) {

may it help you out.
Validator::extend('price', function ($attribute, $value, $args) {
return preg_match('/^\d{0,8}(\.\d{1,2})?$/', $value);
});

Regex: match two adjacent strings

I have some code such as:
if('hello' == 2 && 'world' !== -1){
return true;
}
I'm having some trouble matching the condition in the if statement. The first regex I thought of was /'.*'/, but this matches:
'hello'
' == 2 && '
'world'
which isn't what I was hoping for. I only want to match the single quotes and the text inside.
'hello'
'world'
Any body have any idea?

Try this
preg_match_all('/\'[^\'\r\n]*\'/m', $subject, $result, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($result[0]); $i++) {
# Matched text = $result[0][$i];
}
Explanation
"
' # Match the character “'” literally
[^'\\r\\n] # Match a single character NOT present in the list below
# The character “'”
# A carriage return character
# A line feed character
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
' # Match the character “'” literally
"

The two matching groups int this should pick up your quoted values:
^.*(\'.*?\').*(\'.*?\').*$

For your specific case
\'[a-z]*?\'
For the entire code, if you have uppercase characters in the quotes, you can use
\'[a-zA-Z]*?\'
However, if you have special characters in the quotes as well, then you can use what #Chris Cooper suggested. Depending on your need, there are a variety of answers possible.
Note: '?' after * makes * non-greedy, so it wont try to search till the last quote.
It also matters which regex method you use to get the answers.

Here's what I came up with!
preg_match_all("#'[^'\n\r]*'#", $subject, $matches);
Match '.
Match any character that is not ', new line, or carriage return.
Match '.
Without all of the escaping, I think it's a bit more readable—for being a regular expression, anyway.

php - preg_replace / backreference - Syntax to extract parts from mail address

I do have a var like this:
$mail_from = "Firstname Lastname <email#domain.com>";
I would like to receive either an
array(name=>"firstname lastname", email=>"email#domain.com")
or
the values in two separate vars ($name = "...", $email = "...")
I have been playing around with preg_replace but somehow do not get it done ...
Did extensive search but did not find a way to get this done.
This is the closest I got:
$str = 'My First Name <email#domain.com>';
preg_match('~(?:"([^"]*)")?\s*(.*)~',$str,$var);
print_r($var);
echo "<br>Name: ".$var[0];
echo "<br>Mail: ".$var[2];
How do I get "email#domain.com" into $var['x]?
Thank you.

This works for your example and should always work, when the email is within angle brackets.
$str = 'My First Name <email#domain.com>';
preg_match('~(?:([^<]*?)\s*)?<(.*)>~', $str, $var);
print_r($var);
echo "<br>Name: ".$var[1];
echo "<br>Mail: ".$var[2];
Explanation:
(?:([^<]*?)\s*)? matches optionally everything that is not a < and everything except the trailing whitespace is stored in group 1.
<(.*)> matches something between angle brackets and store it in group 2.

//trythis
$mail_from = "Firstname Lastname <email#domain.com>";
$a = explode("<", $mail_from);
$b=str_replace(">","",$a[1]);
$c=$a[0];
echo $b;
echo $c;

Try this:
(?<=")([^"<>]+?) *<([^<>"]+)>(?=")
Explanation:
<!--
(?<=")([^"<>]+?) *<([^<>"]+)>(?=")
Options: ^ and $ match at line breaks
Assert that the regex below can be matched, with the match ending at this position (positive lookbehind) «(?<=")»
Match the character “"” literally «"»
Match the regular expression below and capture its match into backreference number 1 «([^"<>]+?)»
Match a single character NOT present in the list “"<>” «[^"<>]+?»
Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
Match the character “ ” literally « *»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the character “<” literally «<»
Match the regular expression below and capture its match into backreference number 2 «([^<>"]+)»
Match a single character NOT present in the list “<>"” «[^<>"]+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the character “>” literally «>»
Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=")»
Match the character “"” literally «"»
-->
Code:
$result = preg_replace('/(?<=")([^"<>]+?) *<([^<>"]+)>(?=")/m', '<br>Name:$1<br>Mail:$2', $subject);

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Extract part of string matching pattern - regex, close but no cigar - php

A bit of a hack, but this will do the job: $a = array(); if (preg_match_all('/(?:SB.+?EB(?:[\r\n]+|$)){5}/', $x, $a)) { print_r($a); } Note that ?: is used to make the group non-capture, and the results will be in $a[0] (eg, $a[0][0], $a[0][1], $a[0][2] ...)

Related

Is it possible to match all attributes in a preg_match with empty or missing attributes?

extracting data from string using regex

php regular expression minimum and maximum length doesn't work as expected

Regex: match two adjacent strings

php - preg_replace / backreference - Syntax to extract parts from mail address

Categories

Resources