extracting data from string using regex

extracting data from string using regex - php

I am new to job as well as for regular expressions. I am using php.
For the following string i want to extract the report number.
Dear Patient! (patient name) Your Reports(report number) has arrived.
can someone help me in creating a regular expression.
thank you
Solved:
$str ='Dear Patient! (P.JOHN) Your Reports (REPORTNO9) has arrived.';
$str = str_replace('(', '', $str);
$str = str_replace(')', '', $str);
preg_match('/Reports\s*(\w+)/', $str, $match);
echo $match[1]; //=> "REPORTNO9"

The Regular Expression
/Dear (\w+)! Your Reports(.*?)(?=has arrived)/
PHP usage
<?php
$subject = 'Dear Patient! Your Reports(report number) has arrived.';
if (preg_match('/Dear (\w+)! Your Reports(.*?)(?=has arrived)/', $subject, $regs)) {
var_dump($regs);
}
Result
array(3) {
[0]=>
string(42) "Dear Patient! Your Reports(report number) "
[1]=>
string(7) "Patient"
[2]=>
string(16) "(report number) "
}
Explanation
"
Dear\ # Match the characters “Dear ” literally
( # Match the regular expression below and capture its match into backreference number 1
\w # Match a single character that is a “word character” (letters, digits, etc.)
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
!\ Your\ Reports # Match the characters “! Your Reports” literally
( # Match the regular expression below and capture its match into backreference number 2
. # Match any single character that is not a line break character
*? # Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
)
(?= # Assert that the regex below can be matched, starting at this position (positive lookahead)
has\ arrived # Match the characters “has arrived” literally
)
"

You can use "split()" to extract the specific part of a string like this, so you don't have to use regex :
<?php
$my_string = ""; // Put there you string
$array_my_string = array();
$array_my_string = split('Reports', $my_string);
$tempResult = array_my_string[1]; // Will contains "(report number) has arrived."
$array_my_string = split(' has arrived', $tempResult);
$finalResult = $array_my_result[0]; // Will contains "(report number)"
?>

Related

PHP preg_match returns two matches instead of one

I have this string: ATL.556808.UMO20.02 and I want to get only UMO20.02.
Here is my preg_match:
$e = preg_match('"\.[^\.]+\.(.*?)$"si', $t, $m);
But this code return two matches instead of one. I got:
array(2) {
[0]=> string(16) ".556808.UMO20.02"
[1]=> string(8) "UMO20.02"
}
But I want to get one match:
array(1) {
[0]=> string(8) "UMO20.02"
}
Where is the problem?

You don't have to use the s and i flags as there are no specific cases for upper or lowercase chars, and the dot does not have to match a newline in the example data.
You can use
\.[^.]+\.\K.+$
\. Match .
[^.]+\. Match 1+ times any char except a .
\K Forget what is matched
.+ Match any char 1+ times
$ End of string
Regex demo
Example code
$re = '/\.[^.]+\.\K.+$/';
$str = 'ATL.556808.UMO20.02';
preg_match($re, $str, $matches);
print_r($matches);
Output
Array
(
[0] => UMO20.02
)

Your \.[^\.]+\.(.*?)$ regex matches a ., then any one or more chars other than a dot, then a dot, and then any zero or more chars as few as possible (but as many as necessary to complete a match) up to the end of string. The .*? must be tempered to match any chars but dots.
To remove all up to and including the second dot, you can use
$t = 'ATL.556808.UMO20.02';
echo preg_replace('~^(?:[^.]+\.){2}~', '', $t);
// => UMO20.02
See the PHP demo. See the regex demo. Details:
^ - start of string
(?:[^.]+\.){2} - two occurrences of any one or more chars other than a . and then a . char

split string in numbers and text but accept text with a single digit inside

Let's say I want to split this string in two variables:
$string = "levis 501";
I will use
preg_match('/\d+/', $string, $num);
preg_match('/\D+/', $string, $text);
but then let's say I want to split this one in two
$string = "levis 5° 501";
as $text = "levis 5°"; and $num = "501";
So my guess is I should add a rule to the preg_match('/\d+/', $string, $num); that looks for numbers only at the END of the string and I want it to be between 2 and 3 digits.
But also the $text match now has one number inside...
How would you do it?

To slit a string in two parts, use any of the following:
preg_match('~^(.*?)\s*(\d+)\D*$~s', $s, $matches);
This regex matches:
^ - the start of the string
(.*?) - Group 1 capturing any one or more characters, as few as possible (as *? is a "lazy" quantifier) up to...
\s* - zero or more whitespace symbols
(\d+) - Group 2 capturing 1 or more digits
\D* - zero or more characters other than digit (it is the opposite shorthand character class to \d)
$ - end of string.
The ~s modifier is a DOTALL one forcing the . to match any character, even a newline, that it does not match without this modifier.
Or
preg_split('~\s*(?=\s*\d+\D*$)~', $s);
This \s*(?=\s*\d+\D*$) pattern:
\s* - zero or more whitespaces, but only if followed by...
(?=\s*\d+\D*$) - zero or more whitespaces followed with 1+ digits followed with 0+ characters other than digits followed with end of string.
The (?=...) construct is a positive lookahead that does not consume characters and just checks if the pattern inside matches and if yes, returns "true", and if not, no match occurs.
See IDEONE demo:
$s = "levis 5° 501";
preg_match('~^(.*?)\s*(\d+)\D*$~s', $s, $matches);
print_r($matches[1] . ": ". $matches[2]. PHP_EOL);
print_r(preg_split('~\s*(?=\s*\d+\D*$)~', $s, 2));

Find words starting and ending with dollar signs $ in PHP

I am looking to find and replace words in a long string. I want to find words that start looks like this: $test$ and replace it with nothing.
I have tried a lot of things and can't figure out the regular expression. This is the last one I tried:
preg_replace("/\b\\$(.*)\\$\b/im", '', $text);
No matter what I do, I can't get it to replace words that begin and end with a dollar sign.

Use single quotes instead of double quotes and remove the double escape.
$text = preg_replace('/\$(.*?)\$/', '', $text);
Also a word boundary \b does not consume any characters, it asserts that on one side there is a word character, and on the other side there is not. You need to remove the word boundary for this to work and you have nothing containing word characters in your regular expression, so the i modifier is useless here and you have no anchors so remove the m (multi-line) modifier as well.
As well * is a greedy operator. Therefore, .* will match as much as it can and still allow the remainder of the regular expression to match. To be clear on this, it will replace the entire string:
$text = '$fooo$ bar $baz$ quz $foobar$';
var_dump(preg_replace('/\$(.*)\$/', '', $text));
# => string(0) ""
I recommend using a non-greedy operator *? here. Once you specify the question mark, you're stating (don't be greedy.. as soon as you find a ending $... stop, you're done.)
$text = '$fooo$ bar $baz$ quz $foobar$';
var_dump(preg_replace('/\$(.*?)\$/', '', $text));
# => string(10) " bar quz "
Edit
To fix your problem, you can use \S which matches any non-white space character.
$text = '$20.00 is the $total$';
var_dump(preg_replace('/\$\S+\$/', '', $text));
# string(14) "$20.00 is the "

There are three different positions that qualify as word boundaries \b:
Before the first character in the string, if the first character is a word character.
After the last character in the string, if the last character is a word character.
Between two characters in the string, where one is a word character and the other is not a word character.
$ is not a word character, so don't use \b or it won't work. Also, there is no need for the double escaping and no need for the im modifiers:
preg_replace('/\$(.*)\$/', '', $text);
I would use:
preg_replace('/\$[^$]+\$/', '', $text);

You can use preg_quote to help you out on 'quoting':
$t = preg_replace('/' . preg_quote('$', '/') . '.*?' . preg_quote('$', '/') . '/', '', $text);
echo $t;
From the docs:
This is useful if you have a run-time string that you need to match in some text and the string may contain special regex characters.
The special regular expression characters are: . \ + * ? [ ^ ] $ ( ) { } = ! < > | : -

Contrary to your use of word boundary markers (\b), you actually want the inverse effect (\B)-- you want to make sure that there ISN'T a word character next to the non-word character $.
You also don't need to use capturing parentheses because you are not using a backreference in your replacement string.
\S+ means one or more non-whitespace characters -- with greedy/possessive matching.
Code: (Demo)
$text = '$foo$ boo hi$$ mon$k$ey $how thi$ $baz$ bar $foobar$';
var_export(
preg_replace(
'/\B\$\S+\$\B/',
'',
$text
)
);
Output:
' boo hi$$ mon$k$ey $how thi$ bar '

php - preg_replace / backreference - Syntax to extract parts from mail address

I do have a var like this:
$mail_from = "Firstname Lastname <email#domain.com>";
I would like to receive either an
array(name=>"firstname lastname", email=>"email#domain.com")
or
the values in two separate vars ($name = "...", $email = "...")
I have been playing around with preg_replace but somehow do not get it done ...
Did extensive search but did not find a way to get this done.
This is the closest I got:
$str = 'My First Name <email#domain.com>';
preg_match('~(?:"([^"]*)")?\s*(.*)~',$str,$var);
print_r($var);
echo "<br>Name: ".$var[0];
echo "<br>Mail: ".$var[2];
How do I get "email#domain.com" into $var['x]?
Thank you.

This works for your example and should always work, when the email is within angle brackets.
$str = 'My First Name <email#domain.com>';
preg_match('~(?:([^<]*?)\s*)?<(.*)>~', $str, $var);
print_r($var);
echo "<br>Name: ".$var[1];
echo "<br>Mail: ".$var[2];
Explanation:
(?:([^<]*?)\s*)? matches optionally everything that is not a < and everything except the trailing whitespace is stored in group 1.
<(.*)> matches something between angle brackets and store it in group 2.

//trythis
$mail_from = "Firstname Lastname <email#domain.com>";
$a = explode("<", $mail_from);
$b=str_replace(">","",$a[1]);
$c=$a[0];
echo $b;
echo $c;

Try this:
(?<=")([^"<>]+?) *<([^<>"]+)>(?=")
Explanation:
<!--
(?<=")([^"<>]+?) *<([^<>"]+)>(?=")
Options: ^ and $ match at line breaks
Assert that the regex below can be matched, with the match ending at this position (positive lookbehind) «(?<=")»
Match the character “"” literally «"»
Match the regular expression below and capture its match into backreference number 1 «([^"<>]+?)»
Match a single character NOT present in the list “"<>” «[^"<>]+?»
Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
Match the character “ ” literally « *»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the character “<” literally «<»
Match the regular expression below and capture its match into backreference number 2 «([^<>"]+)»
Match a single character NOT present in the list “<>"” «[^<>"]+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the character “>” literally «>»
Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=")»
Match the character “"” literally «"»
-->
Code:
$result = preg_replace('/(?<=")([^"<>]+?) *<([^<>"]+)>(?=")/m', '<br>Name:$1<br>Mail:$2', $subject);

Extract part of string matching pattern - regex, close but no cigar

I have a string that can be very long and contain various lines and characters.
I am wanting to extract all lines that are surrounded by SB & EB:
SB1EB
SBa description of various lengthEB
SB123.456.78EB
SB99.99EB
SB99.99EB
SB2EB
SBanother description of various lengthEB
SB123.456.00EB
SB199.99EB
SB199.99EB
3
another description of various length that I don't want to return
123.456.00
599.99
599.99
SB60EB
SBanother description of various length that i want to keepEB
SB500.256.10EB
SB0.99EB
SB0.99EB
another bit of text that i don't want - can span multiple lines
This is the pattern I am using in PHP:
preg_match_all('/SB(\d+)EB\nSB(\w.*)EB\nSB(\d{3}\.\d{3}\.\d{2})EB\nSB(\d.*)EB\nSB(\d.*)EB\n/', $string, $matches)
So this should hopefully return:
[0] -> SB1EB
SBa description of various lengthEB
SB123.456.78EB
SB99.99EB
SB99.99EB
[1] -> SB2EB
SBanother description of various lengthEB
SB123.456.00EB
SB199.99EB
SB199.99EB
[2] -> SB60EB
SBanother description of various length that i want to keepEB
SB500.256.10EB
SB0.99EB
SB0.99EB
But I'm obviously doing something wrong because it isn't matching anything. Can somebody help please?
SOLUTION:
Based on #Sajid reply:
if (preg_match_all('/(?:SB.+?EB(?:[\r\n]+|$))/', $string, $result)) {
for($i=0;$i<count($result[0]);$i++){
$single_item = $result[0][$i];
$single_item = str_replace("SB","",$single_item);
$single_item = str_replace("EB","",$single_item);
if (preg_match('/(\d{3}\.\d{3}\.\d{2})/', $single_item)) {
$id = $single_item;
$qty = $result[0][$i-2];
$name = $result[0][$i-1];
$price = $result[0][$i+1];
$total = $result[0][$i+2];
}
}
}
It's a bit messy, but it works! :)
Thanks

A bit of a hack, but this will do the job:
$a = array();
if (preg_match_all('/(?:SB.+?EB(?:[\r\n]+|$)){5}/', $x, $a)) {
print_r($a);
}
Note that ?: is used to make the group non-capture, and the results will be in $a[0] (eg, $a[0][0], $a[0][1], $a[0][2] ...)

Based on #Sajid reply:
if (preg_match_all('/(?:SB.+?EB(?:[\r\n]+|$))/', $string, $result))
{
for ($i=0; $i<count($result[0]); $i++)
{
$single_item = $result[0][$i];
$single_item = str_replace("SB","",$single_item);
$single_item = str_replace("EB","",$single_item);
if (preg_match('/(\d{3}\.\d{3}\.\d{2})/', $single_item))
{
$id = $single_item;
$qty = $result[0][$i-2];
$name = $result[0][$i-1];
$price = $result[0][$i+1];
$total = $result[0][$i+2];
}
}
}
It's a bit messy, but it works! :)

preg_match_all('/SB\d+EB.*?(?=(?:SB\d+EB)|$)/s', $subject, $result, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($result[0]); $i++) {
# Matched text = $result[0][$i];
}
So basically what I am doing (based on your input) is simply checking the "header" string SB\d+EB as an entry point and consuming everything until I find another "header" or the end of the input. Note the /s modifier so that . matches newlines.
Explanation:
# SB\d+EB.*?(?=(?:SB\d+EB)|$)
#
# Options: dot matches newline
#
# Match the characters “SB” literally «SB»
# Match a single digit 0..9 «\d+»
# Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
# Match the characters “EB” literally «EB»
# Match any single character «.*?»
# Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
# Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=(?:SB\d+EB)|$)»
# Match either the regular expression below (attempting the next alternative only if this one fails) «(?:SB\d+EB)»
# Match the regular expression below «(?:SB\d+EB)»
# Match the characters “SB” literally «SB»
# Match a single digit 0..9 «\d+»
# Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
# Match the characters “EB” literally «EB»
# Or match regular expression number 2 below (the entire group fails if this one fails to match) «$»
# Assert position at the end of the string (or before the line break at the end of the string, if any) «$»

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

extracting data from string using regex - php

Related

PHP preg_match returns two matches instead of one

split string in numbers and text but accept text with a single digit inside

Find words starting and ending with dollar signs $ in PHP

php - preg_replace / backreference - Syntax to extract parts from mail address

Extract part of string matching pattern - regex, close but no cigar

Categories

Resources