preg_replace - similar patterns - php

I have a string that contains something like "LAB_FF, LAB_FF12" and I'm trying to use preg_replace to look for both patterns and replace them with different strings using a pattern match of;
/LAB_[0-9A-F]{2}|LAB_[0-9A-F]{4}/
So input would be
LAB_FF, LAB_FF12
and the output would need to be
DAB_FF, HAD_FF12
Problem is, for the second string, it interprets it as "LAB_FF" instead of "LAB_FF12" and so the output is
DAB_FF, DAB_FF
I've tried splitting the input line out using 2 different preg_match statements, the first looking for the {2} pattern and the second looking for the {4} pattern. This sort of works in that I can get the correct output into 2 separate strings but then can't combine the two strings to give the single amended output.

\b is word boundary. Meaning it will look at where the word ends and not only pattern match.
https://regex101.com/r/upY0gn/1
$pattern = "/\bLAB_[0-9A-F]{2}\b|\bLAB_[0-9A-F]{4}\b/";
Seeing the comment on the other answer about how to replace the string.
This is one way.
The pattern will create empty entries in the output array for each pattern that fails.
In this case one (the first).
Then it's just a matter of substr.
$re = '/(\bLAB_[0-9A-F]{2}\b)|(\bLAB_[0-9A-F]{4}\b)/';
$str = 'LAB_FF12';
preg_match($re, $str, $matches);
var_dump($matches);
$substitutes = ["", "DAB", "HAD"];
For($i=1; $i<count($matches); $i++){
If($matches[$i] != ""){
$result = $substitutes[$i] . substr($matches[$i],3);
Break;
}
}
Echo $result;
https://3v4l.org/gRvHv

You can specify exact amounts in one set of curly braces, e.g. `{2,4}.
Just tested this and seems to work:
/LAB_[0-9A-F]{2,4}/
LAB_FF, LAB_FFF, LAB_FFFF
EDIT: My mistake, that actually matches between 2 and 4. If you change the order of your selections it matches the first it comes to, e.g.
/LAB_([0-9A-F]{4}|[0-9A-F]{2})/
LAB_FF, LAB_FFFF
EDIT2: The following will match LAB_even_amount_of_characters:
/LAB_([0-9A-F]{2})+/
LAB_FF, LAB_FFFF, LAB_FFFFFF...

Related

How to get a number from a html source page?

I'm trying to retrieve the followed by count on my instagram page. I can't seem to get the Regex right and would very much appreciate some help.
Here's what I'm looking for:
y":{"count":
That's the beginning of the string, and I want the 4 numbers after that.
$string = preg_replace("{y"\"count":([0-9]+)\}","",$code);
Someone suggested this ^ but I can't get the formatting right...
You haven't posted your strings so it is a guess to what the regex should be... so I'll answer on why your codes fail.
preg_replace('"followed_by":{"count":\d')
This is very far from the correct preg_replace usage. You need to give it the replacement string and the string to search on. See http://php.net/manual/en/function.preg-replace.php
Your second usage:
$string = preg_replace(/^y":{"count[0-9]/","",$code);
Is closer but preg_replace is global so this is searching your whole file (or it would if not for the anchor) and will replace the found value with nothing. What your really want (I think) is to use preg_match.
$string = preg_match('/y":\{"count(\d{4})/"', $code, $match);
$counted = $match[1];
This presumes your regex was kind of correct already.
Per your update:
Demo: https://regex101.com/r/aR2iU2/1
$code = 'y":{"count:1234';
$string = preg_match('/y":\{"count:(\d{4})/', $code, $match);
$counted = $match[1];
echo $counted;
PHP Demo: https://eval.in/489436
I removed the ^ which requires the regex starts at the start of your string, escaped the { and made the\d be 4 characters long. The () is a capture group and stores whatever is found inside of it, in this case the 4 numbers.
Also if this isn't just for learning you should be prepared for this to stop working at some point as the service provider may change the format. The API is a safer route to go.
This regexp should capture value you're looking for in the first group:
\{"count":([0-9]+)\}
Use it with preg_match_all function to easily capture what you want into array (you're using preg_replace which isn't for retrieving data but for... well replacing it).
Your regexp isn't working because you didn't escaped curly brackets. And also you didn't put count quantifier (plus sign in my example) so it would only capture first digit anyway.

I need to find a way explode a specific string that has quotes in it

I'm having serious trouble with this and I'm not really experienced enough to understand how I should go about it.
To start off I have a very long string known as $VC. Each time it's slightly different but will always have some things that are the same.
$VC is an htmlspecialchars() string that looks something like
Example Link... Lots of other stuff in between here... 80] ,[] ,"","3245697351286309258",[] ,["812750926... and it goes on ...80] ,[] ,"","6057413202557366578",[] ,["103279554... and it continues on
In this case the <a> tag is always the same so I take my information from there. The numbers listed after it such as ,"3245697351286309258",[] and ,"6057413202557366578",[] will also always be in the same format, just different numbers and one of those numbers will always be a specific ID.
I then find that specific ID I want, I will always want that number inside pid%3D and %26oid.
$pid = explode("pid%3D", $VC, 2);
$pid = explode("%26oid", $pid[1], 2);
$pid = $pid[0];
In this case that number is 6057413202557366578. Next I want to explode $VC in a way that lets me put everything after ,"6057413202557366578",[] into a variable as its own string.
This is where things start to break down. What I want to do is the following
$vinfo = explode(',"'.$pid.'",[]',$VC,2);
$vinfo = $vinfo[1]; //Everything after the value I used to explode it.
Now naturally I did look around and try other things such as preg_split and preg_replace but I've got to admit, it is beyond me and as far as I can tell, those don't let you put your own variable in the middle of them (e.g. ',"'.$pid.'",[]').
If I'm understanding the whole regular expression idea, there might be other problems in that if I look for it without the $pid variable (e.g. just the surrounding characters), it will pick up the similar parts of the string before it gets to the one I want, (e.g. the ,"3245697351286309258",[]).
I hope I've explained this well enough, the main question though is - How can I get the information after that specific part of the string (',"'.$pid.'",[]') into a variable?
I hope this does what you want:
pid%3D(?P<id>\d+).*?"(?P=id)",\[\](?P<vinfo>.*?)}\);<\/script>
It captures the number after pid%3D in group id, and everything after "id",[] (until the next occurence of });</script>) in group vinfo.
Here's a demo with shortened text.
The problem of capturing more than you want is fixed using capture groups. You'll wrap part of a regular expression in parenthesis to capture it.
You can use preg_match_all to do more robust regular expression capture. You will get an array of things that contains matches to the string that matched the entire pattern plus a string with a partial match for each capture group you use. We'll start by capturing the parts of the string you want. There are no capture groups at this point:
$text = 'Example Link... Lots of other stuff in between here... 80] ,[] ,"","3245697351286309258",[] ,["812750926... and it goes on ...80] ,[] ,"","6057413202557366578",[] ,["103279554... and it continues on"';
$pattern = '/,"\\d+",\\[\\]/';
preg_match_all($pattern,
$text,
$out, PREG_PATTERN_ORDER);
echo $out[0][0]; //echo ,"3245697351286309258",[]
Now to get just the pids into a variable, you can add a capture group in your pattern. The capture group is done by adding parenthesis:
$text = ...
$pattern = '/,"(\\d+)",\\[\\]/'; // the \d+ match will be capture
preg_match_all($pattern,
$text,
$out, PREG_PATTERN_ORDER);
$pids = $out[1];
echo $pids[0]; // echo 3245697351286309258
Notice the first (and only in this case) capture group is in $out[1] (which is an array). What we have captured is all the digits.
To capture everything else, assuming everything is between square brackets, you could match more and capture it. To address the question, we'll use two capture groups. The first will capture the digits and the second will capture everything matching square brackets and everything in between:
$text = ...;
$pattern = '/,"(\\d+)",\\[\\] ,(\\[.+?\\])/';
preg_match_all($pattern,
$text,
$out, PREG_PATTERN_ORDER);
$pids = $out[1];
$contents = $out[2];
echo $pids[0] . "=" . $contents[0] ."\n";
echo $pids[1] . "=". $contents[1];

Regex to extract substring

really struggling with this...hopefully someone can put me on the right path to a solution.
My input string is structured like this:
66-2141-A-AC107-7
I'm interested in extracting the string 'AC107' using a single regular expression. I know how to do this with other PHP string functions, but I have to do this with a regular expression.
What I need is to extract all data between the third and fourth hyphens. The structure of each section is not fixed (i.e, 66 may be 8798709 and 2141 may be 38). The presence of the number of hyphens is guaranteed (i.e., there will always be a total of four (4) hyphens).
Any help/guidance is greatly appreciated!
This will do what you need:
(?:[^-]*-){3}([^-]+)
Debuggex Demo
Explanation:
(?:[^-]*-) Look for zero or more non-hyphen characters followed by a hyphen
{3} Look for three of the blocks just described
([^-]+) Capture all the consecutive non-hyphen characters from that point forward (will automatically cut off before the next hyphen)
You can use it in PHP like this:
$str = '66-2141-A-AC107-7';
preg_match('/^(?:[^-]*-){3}([^-]+)/', $str, $matches);
echo $matches[1]; // prints AC107
This should look for anything followed by a hyphen 3 times and then in group 2 (the second set of parenthesis) it will have your value, followed by another hyphen and anything else.
/^(.*-){3}(.*)-(.*)/
You can access it by using $2. In php, it would be like this:
$string = '66-2141-A-AC107-7';
preg_match('/^(.*-){3}(.*)-(.*)/', $string, $matches);
$special_id = $matches[2];
print $special_id;

pregmatch between characters and any numeric

I'm stuck writing a preg_match
I have a string:
XPMG_ar121023.txt
and need to extract the 2 letters between XPMG_ and the first digit - be it a 0-9
$str = 'XPMG_ar121023.txt';
preg_match('/('XPMG_')|[0-9\,]))/', $str, $match);
print_r($match);
Maybe this isn't the best option: My characters will always be
You can just do
$str = "XPMG_ar121023.txt" ;
preg_match('/_([a-z]+)/i', $str, $match);
var_dump($match[1]);
Output
string 'ar' (length=2)
This is too simple for a regular expression. Just $match = substr($str,5,3) would get what you're asking for.
Let me walk through this step by step so as to help you solve similar problems in the future. Suppose we have the following format for our filenames:
XPMG_ar121023.txt
We know what we want to capture, we want the "ar" right after the _ and just before the numbers begin. So our expression would look something like this:
_[a-z]+
This is pretty straight-forward. We're starting by looking for an underscore, followed by any number of letters between a and z. The square brackets define a character class. Our class consists of the alphabet, but you can push specific numbers in there and more if you like.
Now because we want to capture only the letters, we need to put parenthesis around that part of the pattern:
_([a-z]+)
In the result we will now have access to only that subpattern. Next we put our delimiters in place to specify where our pattern begins, and ends:
/_([a-z]+)/
And lastly, after our closing delimiter we can add some modifiers. As it is written, our pattern only looks for lower-case letters. We can add the i modifier to make this case-insensitive:
/_([a-z]+)/i
Voila, we're done. Now we can pass it into preg_match to see what it spits out:
preg_match( "/_([a-z]+)/i", "XPMG_ar121023.txt", $match );
This function takes a pattern as the first parameter, a string to match it against as the second, and lastly a variable to spit the results into. When all is said and done, we can check $match for our data.
The results of this operation follow:
array(2) {
[0]=> string(3) "_ar"
[1]=> string(2) "ar"
}
This is the contents of $match. Notice our full pattern is found in the first index of the array, and our captured portion is provided in the second index of the array.
echo $match[1]; // ar
Hope this helps.
Well, why not:
$letters = $str[5].$str[6];
:)
After all, you'll always need the 2 chars after the fixed prefix, there are many ways that do not require a regexp (substr() being the best anyway)

extract text between two words in php

I got the following URL
http://www.amazon.com/LEGO-Ultimate-Building-Set-Pieces/dp/B000NO9GT4/ref=sr_1_1?m=ATVPDKIKX0DER&s=toys-and-games&ie=UTF8&qid=1350518571&sr=1-1&keywords=lego
and I want to extract
B000NO9GT4
that is the asin...to now, I can get search between the string, but not in this way I require. I saw the split functin, I saw the explode. but cant find a way out...also, the urls will be different in length so I cant hardcode the length two..the only thing which make some sense in my mind is to split the string so that
http://www.amazon.com/LEGO-Ultimate-Building-Set-Pieces/dp/
become first part
and
B000NO9GT4/ref=sr_1_1?m=ATVPDKIKX0DER&s=toys-and-games&ie=UTF8&qid=1350518571&sr=1-1&keywords=lego
becomes the 2nd part , from the second part , I should extract B000NO9GT4
in the same way, i would want to get product name LEGO-Ultimate-Building-Set-Pieces from the first part
I am very bad at regex and cant find a way out..
can somebody guide me how I can do it in php?
thanks
This grabs both pieces of information that you are looking to capture:
$url = 'http://www.amazon.com/LEGO-Ultimate-Building-Set-Pieces/dp/B000NO9GT4/ref=sr_1_1?m=ATVPDKIKX0DER&s=toys-and-games&ie=UTF8&qid=1350518571&sr=1-1&keywords=lego';
$path = parse_url($url, PHP_URL_PATH);
if (preg_match('#^/([^/]+)/dp/([^/]+)/#i', $path, $matches)) {
echo "Description = {$matches[1]}<br />"
."ASIN = {$matches[2]}<br />";
}
Output:
Description = LEGO-Ultimate-Building-Set-Pieces
ASIN = B000NO9GT4
Short Explanation:
Any expressions enclosed in ( ) will be saved as a capture group. This is how we get at the data in $matches[1] and $matches[2].
The expression ([^/]+) says to match all characters EXCEPT / so in effect it captures everything in the URL between the two / separators. I use this pattern twice. The [ ] actually defines the character class which was /, the ^ in this case negates it so instead of matching / it matches everything BUT /. Another example is [a-f0-9] which would say to match the characters a,b,c,d,e,f and the numbers 0,1,2,3,4,5,6,7,8,9. [^a-f0-9] would be the opposite.
# is used as the delimiter for the expression
^ following the delimiter means match from the beginning of the string.
See www.regular-expressions.info and PCRE Pattern Syntax for more info on how regexps work.
You can try
$str = "http://www.amazon.com/LEGO-Ultimate-Building-Set-Pieces/dp/B000NO9GT4/ref=sr_1_1?m=ATVPDKIKX0DER&s=toys-and-games&ie=UTF8&qid=1350518571&sr=1-1&keywords=lego" ;
list(,$desc,,$num,) = explode("/",parse_url($str,PHP_URL_PATH));
var_dump($desc,$num);
Output
string 'LEGO-Ultimate-Building-Set-Pieces' (length=33)
string 'B000NO9GT4' (length=10)

Categories