PHP Regex - how to remove string after "Newline * regards" - php

I've got strings like the following:
Hi X
Blah
Kind regards
ABC
And
Hi X
Blah
Regards
CBA
So the key is the newline and the word "regards" (case insensitive). I'd like to use PHP to get the part of the string before the line that contains "regards". E.g. for these examples, the result should just be:
Hi X
Blah
I've tried the below but it doesn't work as intended in some cases (E.g. if "Kind" appears multiple times in the string). Thanks in advance!
$matches = array();
if (preg_match("/\n(.*?)regards/i", $message, $matches) == 1) {
$stop_at = $matches[1];
$split = explode($stop_at,$message);
$message = $split[0];
}

What you're really after is a regex that handles multi-line strings. For this, you can use the m flag (PCRE_MULTILINE).
I would use preg_split() to split the string on your token, for example
$found = trim(preg_split('/^.*regards$/im', $message, 2)[0]);
Demo ~ https://3v4l.org/idMcP
Some notes:
I've used trim() to remove the empty line after "Blah" (your examples exclude it)
I've set a limit of 2 on preg_split(). This is redundant given you're only retrieving the first split but in my head, it means PHP does less work (realities may vary).
This might fail if a line ends in a word ending in "regards" but not necessarily the word "regards", for example this word I just made up "goregards" (it's like a shin guard but for viscera).

You can use the regular expression
(?si).+\b(?=\n+[\w ]*regards)
It will match everything up to a word boundary, then lookahead for newline(s) followed by a line which has regards on it (possibly preceeded by a combination of word characters or spaces).
$str = "Hi X
Blah
Kind regards
ABC";
preg_match('/(?si).+\b(?=\n\s*[\w ]*regards)/', $str, $match);

Related

preg_replace - similar patterns

I have a string that contains something like "LAB_FF, LAB_FF12" and I'm trying to use preg_replace to look for both patterns and replace them with different strings using a pattern match of;
/LAB_[0-9A-F]{2}|LAB_[0-9A-F]{4}/
So input would be
LAB_FF, LAB_FF12
and the output would need to be
DAB_FF, HAD_FF12
Problem is, for the second string, it interprets it as "LAB_FF" instead of "LAB_FF12" and so the output is
DAB_FF, DAB_FF
I've tried splitting the input line out using 2 different preg_match statements, the first looking for the {2} pattern and the second looking for the {4} pattern. This sort of works in that I can get the correct output into 2 separate strings but then can't combine the two strings to give the single amended output.
\b is word boundary. Meaning it will look at where the word ends and not only pattern match.
https://regex101.com/r/upY0gn/1
$pattern = "/\bLAB_[0-9A-F]{2}\b|\bLAB_[0-9A-F]{4}\b/";
Seeing the comment on the other answer about how to replace the string.
This is one way.
The pattern will create empty entries in the output array for each pattern that fails.
In this case one (the first).
Then it's just a matter of substr.
$re = '/(\bLAB_[0-9A-F]{2}\b)|(\bLAB_[0-9A-F]{4}\b)/';
$str = 'LAB_FF12';
preg_match($re, $str, $matches);
var_dump($matches);
$substitutes = ["", "DAB", "HAD"];
For($i=1; $i<count($matches); $i++){
If($matches[$i] != ""){
$result = $substitutes[$i] . substr($matches[$i],3);
Break;
}
}
Echo $result;
https://3v4l.org/gRvHv
You can specify exact amounts in one set of curly braces, e.g. `{2,4}.
Just tested this and seems to work:
/LAB_[0-9A-F]{2,4}/
LAB_FF, LAB_FFF, LAB_FFFF
EDIT: My mistake, that actually matches between 2 and 4. If you change the order of your selections it matches the first it comes to, e.g.
/LAB_([0-9A-F]{4}|[0-9A-F]{2})/
LAB_FF, LAB_FFFF
EDIT2: The following will match LAB_even_amount_of_characters:
/LAB_([0-9A-F]{2})+/
LAB_FF, LAB_FFFF, LAB_FFFFFF...

PHP RexExp match and substitute

I am testing RegExp with online regexr.com tool. I will test string with multiple cases, but I can't get substitution to work.
RexEx for matching string is:
/^[0-9]{1,3}[0-9]{6,7}$/
Which matches local mobile number in my country like this:
0921234567
But then I want to substitute number in this way: add "+" sign, add my country code "123", add "." sign, and then finaly, add matched number with stripped leading zero.
Final number will be:
+385.921234567
I have basic idea to insert matched string, but I am not sure how prepend characters, and strip zero from matched string in following substitution pattern:
\+$&\n\t
I will use PHP preg_replace function.
EDIT:
As someone mentioned wisely, there is posibility that there will be one, two or none of zeros, but I will create separate test cases with regex just testing number of zeroes. Doing so in one regex seems to complicated for now.
Possible numbers will be:
0921234567
00111921234567
Where 111 is country code. I know that some country codes consist of 2 or 3 digits, but I will create special cases, for most country codes.
You can use this preg_replace to strip optional zeroes from start of your mobile #:
$str = preg_replace('~^0*(\d{7,9})$~', '+385.$1', $str);
^[0-9]([0-9]{1,2}[0-9]{6,7})$
You just need to add groups.Replace by +385.$1.See demo.
https://regex101.com/r/cJ6zQ3/22
$re = "/^[0-9]([0-9]{1,2}[0-9]{6,7})$/m";
$str = "0921234567\n";
$subst = "+385.$1";
$result = preg_replace($re, $subst, $str);
I would use a 2-step solution:
Check if we match the main regex
Replace the number by pre-pending + + country code + . + number without leading zeros.
PHP code:
$re = "/^[0-9]{7,10}$/";
$str = "0921234567";
if (preg_match($re, $str, $match)) {
echo "+385." . preg_replace('/^0+/', '', $match[0]);
}
Note that splitting out character class in your regex pattern makes no sense when not using capture groups. ^[0-9]{7,10}$ is the same then as ^[0-9]{1,3}[0-9]{6,7}$, meaning match 7 to 10 digits from start to end of the string.
Leading zeros are easily trimmed from the start with /^0+/ regex.

Regex to extract substring

really struggling with this...hopefully someone can put me on the right path to a solution.
My input string is structured like this:
66-2141-A-AC107-7
I'm interested in extracting the string 'AC107' using a single regular expression. I know how to do this with other PHP string functions, but I have to do this with a regular expression.
What I need is to extract all data between the third and fourth hyphens. The structure of each section is not fixed (i.e, 66 may be 8798709 and 2141 may be 38). The presence of the number of hyphens is guaranteed (i.e., there will always be a total of four (4) hyphens).
Any help/guidance is greatly appreciated!
This will do what you need:
(?:[^-]*-){3}([^-]+)
Debuggex Demo
Explanation:
(?:[^-]*-) Look for zero or more non-hyphen characters followed by a hyphen
{3} Look for three of the blocks just described
([^-]+) Capture all the consecutive non-hyphen characters from that point forward (will automatically cut off before the next hyphen)
You can use it in PHP like this:
$str = '66-2141-A-AC107-7';
preg_match('/^(?:[^-]*-){3}([^-]+)/', $str, $matches);
echo $matches[1]; // prints AC107
This should look for anything followed by a hyphen 3 times and then in group 2 (the second set of parenthesis) it will have your value, followed by another hyphen and anything else.
/^(.*-){3}(.*)-(.*)/
You can access it by using $2. In php, it would be like this:
$string = '66-2141-A-AC107-7';
preg_match('/^(.*-){3}(.*)-(.*)/', $string, $matches);
$special_id = $matches[2];
print $special_id;

regex to find all text after delimited string

I have some content that contains a token string in the form
$string_text = '[widget_abc]This is some text. This is some text, etc...';
And I want to pull all the text after the first ']' character
So the returned value I'm looking for in this example is:
This is some text. This is some text, etc...
preg_match("/^.+?\](.+)$/is" , $string_text, $match);
echo trim($match[1]);
Edit
As per author's request - added explanation:
preg_match(param1, param2, param3) is a function that allows you to match a single case scenario of a regular expression that you're looking for
param1 = "/^.+?](.+?)$/is"
"//" is what you put on the outside of your regular expression in param1
the i at the end represents case insensitive (it doesn't care if your letters are 'a' or 'A')
s - allows your script to go over multiple lines
^ - start the check from the beginning of the string
$ - go all the way to end of the string
. - represents any character
.+ - at least one or more characters of anything
.+? - at least one more more characters of anything until you reach
.+?] - at least one or more characters of anything until you reach ] (there is a backslash before ] because it represents something in regular expressions - look it up)
(.+)$ - capture everything after ] and store it as a seperate element in the array defined in param3
param2 = the string that you created.
I tried to simplify the explanations, I might be off, but I think I'm right for the most part.
The regex (?<=]).* will solve this problem if you can guarantee that there are no other square brackets on the line. In PHP the code will be:
if (preg_match('/(?<=\]).*/', $input, $group)) {
$match = $group[0];
}
This will transform [widget_abc]This is some text. This is some text, etc... into This is some text. This is some text, etc.... It matches everything that follows the ].
$output = preg_replace('/^[^\]]*\]/', '', $string_text);
Is there any particular reason why a regex is wanted here?
echo substr(strstr($string_text, ']'), 1);
A regex is definitely overkill for this instance.
Here is a nice one-liner :
list(, $result) = explode(']', $inputText, 2);
It does the job and is way less expensive than using regular expressions.

How to strip this part of my string?

$string = "Hot_Chicks_call_me_at_123456789";
How can I strip away so that I only have the numberst after the last letter in the string above?
Example, I need a way to check a string and remove everything in front of (the last UNDERSCORE FOLLOWED by the NUMBERS)
Any smart solutions for this?
Thanks
BTW, it's PHP!
Without using a regular expression
$string = "Hot_Chicks_call_me_at_123456789";
echo end( explode( '_', $string ) );
If it always ends in a number you can just match /(\d+)$/ with regex, is the formatting consistent? Is there anything between the numbers like dashes or spaces?
You can use preg_match for the regex part.
<?php
$subject = "abcdef_sdlfjk_kjdf_39843489328";
preg_match('/(\d+)$/', $subject, $matches);
if ( count( $matches ) > 1 ) {
echo $matches[1];
}
I only recommend this solution if speed isn't an issue, and if the formatting is completely consistent.
PHP's PCRE Regular Expression engine was built for this kind of task
$string = "Hot_Chicks_call_me_at_123456789";
$new_string = preg_replace('{^.*_(\d+)$}x','$1',$string);
//same thing, but with whitespace ignoring and comments turned on for explanations
$new_string = preg_replace('{
^.* #match any character at start of string
_ #up to the last underscore
(\d+) #followed by all digits repeating at least once
$ #up to the end of the string
}x','$1',$string);
echo $new_string . "\n";
To be a bit churlish, your stated specification would suggest the following algorithm:
def trailing_number(s):
results = list()
for char in reversed(s):
if char.isalpha(): break
if char.isdigit(): results.append(char)
return ''.join(reversed(results))
It returns only the digits from the end of the string up to the first letter it encounters.
Of course this example is in Python, since I don't know PHP nearly as well. However it should be easily translated as the concept is easy enough ... reverse the string (or iterate from the end towards the beginning) and accumulate digits until you find a letter and break (or fall out of the loop at the beginning of the string).
In C it would be more efficient to use something a bit like for(x=strlen(s);x>s;x--) to walk backwards through the string, saving a pointer to the most recently encountered digit until we break or drop out of the loop at the beginning of the string. Then return the pointer into the middle of the string where our most recent (leftmost) digit was found.

Categories