Regex to extract substring - php

really struggling with this...hopefully someone can put me on the right path to a solution.
My input string is structured like this:
66-2141-A-AC107-7
I'm interested in extracting the string 'AC107' using a single regular expression. I know how to do this with other PHP string functions, but I have to do this with a regular expression.
What I need is to extract all data between the third and fourth hyphens. The structure of each section is not fixed (i.e, 66 may be 8798709 and 2141 may be 38). The presence of the number of hyphens is guaranteed (i.e., there will always be a total of four (4) hyphens).
Any help/guidance is greatly appreciated!

This will do what you need:
(?:[^-]*-){3}([^-]+)
Debuggex Demo
Explanation:
(?:[^-]*-) Look for zero or more non-hyphen characters followed by a hyphen
{3} Look for three of the blocks just described
([^-]+) Capture all the consecutive non-hyphen characters from that point forward (will automatically cut off before the next hyphen)
You can use it in PHP like this:
$str = '66-2141-A-AC107-7';
preg_match('/^(?:[^-]*-){3}([^-]+)/', $str, $matches);
echo $matches[1]; // prints AC107

This should look for anything followed by a hyphen 3 times and then in group 2 (the second set of parenthesis) it will have your value, followed by another hyphen and anything else.
/^(.*-){3}(.*)-(.*)/
You can access it by using $2. In php, it would be like this:
$string = '66-2141-A-AC107-7';
preg_match('/^(.*-){3}(.*)-(.*)/', $string, $matches);
$special_id = $matches[2];
print $special_id;

Related

preg_replace - similar patterns

I have a string that contains something like "LAB_FF, LAB_FF12" and I'm trying to use preg_replace to look for both patterns and replace them with different strings using a pattern match of;
/LAB_[0-9A-F]{2}|LAB_[0-9A-F]{4}/
So input would be
LAB_FF, LAB_FF12
and the output would need to be
DAB_FF, HAD_FF12
Problem is, for the second string, it interprets it as "LAB_FF" instead of "LAB_FF12" and so the output is
DAB_FF, DAB_FF
I've tried splitting the input line out using 2 different preg_match statements, the first looking for the {2} pattern and the second looking for the {4} pattern. This sort of works in that I can get the correct output into 2 separate strings but then can't combine the two strings to give the single amended output.
\b is word boundary. Meaning it will look at where the word ends and not only pattern match.
https://regex101.com/r/upY0gn/1
$pattern = "/\bLAB_[0-9A-F]{2}\b|\bLAB_[0-9A-F]{4}\b/";
Seeing the comment on the other answer about how to replace the string.
This is one way.
The pattern will create empty entries in the output array for each pattern that fails.
In this case one (the first).
Then it's just a matter of substr.
$re = '/(\bLAB_[0-9A-F]{2}\b)|(\bLAB_[0-9A-F]{4}\b)/';
$str = 'LAB_FF12';
preg_match($re, $str, $matches);
var_dump($matches);
$substitutes = ["", "DAB", "HAD"];
For($i=1; $i<count($matches); $i++){
If($matches[$i] != ""){
$result = $substitutes[$i] . substr($matches[$i],3);
Break;
}
}
Echo $result;
https://3v4l.org/gRvHv
You can specify exact amounts in one set of curly braces, e.g. `{2,4}.
Just tested this and seems to work:
/LAB_[0-9A-F]{2,4}/
LAB_FF, LAB_FFF, LAB_FFFF
EDIT: My mistake, that actually matches between 2 and 4. If you change the order of your selections it matches the first it comes to, e.g.
/LAB_([0-9A-F]{4}|[0-9A-F]{2})/
LAB_FF, LAB_FFFF
EDIT2: The following will match LAB_even_amount_of_characters:
/LAB_([0-9A-F]{2})+/
LAB_FF, LAB_FFFF, LAB_FFFFFF...

Move multiple letters in string using regex

Using a regular expression I want to move two letters in a string.
W28
L36
W29-L32
Should be changed to:
28W
36L
29W-32L
The numbers vary between 25 and 44. The letters that need to be moved are always "W" and/or "L" and the "W" is always first when they both exist in the string.
I need to do this with a single regular expression using PHP. Any ideas would be awesome!
EDIT:
I'm new to regular expressions and tried a lot of things without success. The closest I came was using "/\b(W34)\b/" for each possibility. I also found something about using variables in the replace function but had no luck using these.
Your regex \b(W34)\b matches exactly W34 as a whole word. You need a character class to match W or L, and some alternatives to match the numeric range, and use the most of capturing groups.
You can use the following regex replacement:
$re = '/\b([WL])(2[5-9]|3[0-9]|4[0-4])\b/';
$str = "W28\nL36\nW29-L32";
$result = preg_replace($re, "$2$1", $str);
echo $result;
See IDEONE demo
Here, ([WL]) matches and captures either W or L into group 1, and (2[5-9]|3[0-9]|4[0-4]) matches integer numbers from 25 till 44 and captures into group 2. Backreferences are used to reverse the order of the groups in the replacement string.
And here is a regex demo in case you want to adjust it later.

Regex pattern to match any character except the last one

I am trying to match a string using two different patterns to work together.
My source string is something like this:
Text, white-spaces, new lines and more text then ^^^^<customtag>
I need to get a group (the second one) that would capture one caret or none then a formatted HTML-like tag. So the first group would capture anything else.
It means that the string above should output this:
(Group 1)Text, white-spaces, new lines and more text then ^^^
(Group 2)^<customtag>
In the source string carets may be one, none or up to two thousands.
I need a good pattern that matches all those carets except the last one.
The code below is what I tried.
preg_match_all('/([\s\S]*\^*)(\^?<\w+>)$/', $string, $matches);
Please note: I used [\s\S] instead of the dot to match any character as well as white-spaces and new lines too.
You may follow the below regex:
(?s)(.*)((\^|(?<!\^))<[^>]+>)
Live demo
PHP code:
preg_match_all('/(?s)(.*)((\^|(?<!\^))<[^>]+>)/', $string, $matches);
You can use as this:
preg_match_all('/(.*)((\^<[^>]*>)|([^\^]<[^>]*>))$/', $string, $matches);
See it working here: http://regexr.com?383g9
In this other link it is working fine: http://regex101.com/r/eQ3vV7

pregmatch between characters and any numeric

I'm stuck writing a preg_match
I have a string:
XPMG_ar121023.txt
and need to extract the 2 letters between XPMG_ and the first digit - be it a 0-9
$str = 'XPMG_ar121023.txt';
preg_match('/('XPMG_')|[0-9\,]))/', $str, $match);
print_r($match);
Maybe this isn't the best option: My characters will always be
You can just do
$str = "XPMG_ar121023.txt" ;
preg_match('/_([a-z]+)/i', $str, $match);
var_dump($match[1]);
Output
string 'ar' (length=2)
This is too simple for a regular expression. Just $match = substr($str,5,3) would get what you're asking for.
Let me walk through this step by step so as to help you solve similar problems in the future. Suppose we have the following format for our filenames:
XPMG_ar121023.txt
We know what we want to capture, we want the "ar" right after the _ and just before the numbers begin. So our expression would look something like this:
_[a-z]+
This is pretty straight-forward. We're starting by looking for an underscore, followed by any number of letters between a and z. The square brackets define a character class. Our class consists of the alphabet, but you can push specific numbers in there and more if you like.
Now because we want to capture only the letters, we need to put parenthesis around that part of the pattern:
_([a-z]+)
In the result we will now have access to only that subpattern. Next we put our delimiters in place to specify where our pattern begins, and ends:
/_([a-z]+)/
And lastly, after our closing delimiter we can add some modifiers. As it is written, our pattern only looks for lower-case letters. We can add the i modifier to make this case-insensitive:
/_([a-z]+)/i
Voila, we're done. Now we can pass it into preg_match to see what it spits out:
preg_match( "/_([a-z]+)/i", "XPMG_ar121023.txt", $match );
This function takes a pattern as the first parameter, a string to match it against as the second, and lastly a variable to spit the results into. When all is said and done, we can check $match for our data.
The results of this operation follow:
array(2) {
[0]=> string(3) "_ar"
[1]=> string(2) "ar"
}
This is the contents of $match. Notice our full pattern is found in the first index of the array, and our captured portion is provided in the second index of the array.
echo $match[1]; // ar
Hope this helps.
Well, why not:
$letters = $str[5].$str[6];
:)
After all, you'll always need the 2 chars after the fixed prefix, there are many ways that do not require a regexp (substr() being the best anyway)

regex match as 1 unit

I am using the PHP to run a regex on some strings.
The strings looks like:
somethingsomethin.somethingsomething.extension
I want to match the bits between the 2 periods and including the 2 periods part in the above:
.somethingsomething.
I came up with something simple like: \..+\.
The problem is that it matches all the periods in something like this:
somethingsomethin....somethingsomething....extension matches as ....somethingsomething.... when I only want .somethingsomething..
How can I get my regex expression to match as "1 unit" and to match only once?
Since . matches a ., exclude literal .s: \.[^.]+\. or possibly \.\w+\..
The . matches entire string oin your example. Try this:
<?php
$str = 'somethingsomethin....somethingsomething....extension';
preg_match('#\.\w+\.#', $str, $m);
print_r($m);

Categories