I have a regex:
preg_match_all('#^(((?:-?>?(?:[A-Z]{3})?\d{3})+)-([0-9]{2})([0-9]{2})([0-9]{2})-\n\/O.([A-Z]{3}).KCLE.([A-Z]{2}).([A-Z]).([0-9]{4}).[0-9]{6}T[0-9]{4}Z-([0-9]{2})([0-9]{4}T[0-9]{4}Z[\/]))#', '', $matches)
that runs against a string(s) on a webpage. An example of a possible string:
OHZ012>018-PAZ015-060815-
/O.EXP.KCLE.BH.S.0015.000000T0000Z-170806T0700Z/
This will correctly match the string. However, for $matches[2] it will output
OHZ012>018-PAZ015
I want this line to read: 012>018-015 (i.e. remove the letters from that group).
I have tried the following using preg_replace:
$matches = preg_replace('/([A-Z]{3})/','',$matches);
Now if I print out $matches[2] it just gives me the 3rd character as opposed to the group. So for example, it will print out "2" instead of "012>018-015". Any idea why it isn't printing out the entire group as I would expect?
preg_match_all populates your $matches variable with an array of arrays. The third parameter of preg_replace should be either a string or an array of strings, so that is probably where you were running into the issue.
$matches[2], however, is an array of strings, so you can call preg_replace passing it as the third parameter and get your results.
$matches[2] = preg_replace('/([A-Z]{3})/','',$matches[2]);
If you would like a more generic letter replacement regex, you can use /[A-Z]/i to remove all letters in the strings.
Related
Here's a string:
n%3A171717%2Cn%3A%747474%2Cn%3A555666%2Cn%3A1234567&bbn=555666
From this string how can I extract 1234567 ? Need a good logic / syntax.
I guess preg_match would be a better option than explode function in PHP.
It's about a PHP script that extracts data. The numbers can vary and the occurrence of numbers can vary as well only %2Cn%3A will always be there in front of the numbers.the end will always have a &bbn=anyNumber.
That looks like part of an encoded URL so there's bound to be better ways to do it, but urldecoded() your string looks like:
n:171717,n:t7474,n:555666,n:1234567&bbn=555666
So:
preg_match_all('/n:(\d+)/', urldecode($string), $matches);
echo array_pop($matches[1]);
Parenthesized matches are in $matches[1] so just array_pop() to get the last element.
If &bbn= can be anywhere (except for at the beginning) then:
preg_match('/n:(\d+)&bbn=/', urldecode($string), $matches);
echo $matches[1];
only %2Cn%3A will always be there in front of the numbers
urldecoded equivalent of %2Cn%3A is ,n:.The last "enclosing boundary" &bbn remains as is.
preg_match function will do the job:
preg_match("/(?<=,n:)\d+(?=&bbn)/", urldecode("n%3A171717%2Cn%3A%747474%2Cn%3A555666%2Cn%3A1234567&bbn=555666"), $m);
print_r($m[0]); // "1234567"
how would I avoid that the following :
$_SESSION['myVar']=preg_match("[^a-zA-Z]",'',$_SESSION['myVar']);
echo $_SESSION['myVar'];
displays
0
and instead it displays/outputs the var content ? preg_match gives out mixed type, but this shouldnt be the problem...
Why, is the value of the string itself not addressable with echo (by comapring its contents, it is OK)?
Formerly I had
$_SESSION['myVar']=ereg_replace("[^a-zA-Z]",'',$_SESSION['myVar']);
ant the output óf ereg_replace was correctly displayed the variable content.
PCRE in PHP need delimiters [docs] and you probably want preg_replace [docs]:
preg_replace("/[^a-zA-Z]/",'',$_SESSION['myVar']);
Assuming you had preg_replace, even then, the brackets ([...]) would be interpreted as delimiters and so the engine would literally try to match a-zA-Z at the beginning of the string and would not interpret the constructor as character class.
preg_match returns an int, not mixed: http://php.net/manual/en/function.preg-match.php
Use the matches parameter to get your matches.
The problem is that preg_match returns a Boolean, 1 if the pattern was matched, 0 if it didn't. preg_match simply matches occurrences, it doesn't replace them. Here's how you use preg_match:
$matched = array();
preg_match("/[^a-zA-Z]/", $_SESSION["myVar"], $matches);
print_r($matches); // All matches are in the array.
I'm trying to convert a Notepad++ Regex to a PHP regular expression which basically get IDs from a list of URL in this format:
http://www.example.com/category-example/1371937-text-blah-blah-blah-2012.html
http://www.example.com/category-example/1471337-text-blah-blah-2-blah-2010.html
Using Notepad++ regex function i get the output that i need in two steps (a list of comma separated IDs)
(.*)/ replace with space
-(.*) replace with comma
Result:
1371937,1471337
I tried to do something similar with PHP preg_replace but i can't figure how to get the correct regex, the below example removes everything except digits but it doesn't work as expected since there can be also numbers that do not belong to ID.
$bb = preg_replace('/[^0-9]+/', ',', $_POST['Text']);
?>
Which is the correct structure?
Thanks
If you are matching against:
http://www.example.com/category-example/1371937-text-blah-blah-blah-2012.html
To get:
1371937
You would:
$url = "http://www.example.com/category-example/1371937-text-blah-blah-blah-2012.html";
preg_match( "/[^\d]+(\d+)-/", $url, $matches );
$code = $matches[1];
.. which matches all non-numeric characters, then an unbroken string of numbers, until it reaches a '-'
If all you want to do is find the ID, then you should use preg_match, not preg_replace.
You've got lost of options for the pattern, the simplest being:
$url = 'http://www.example.com/category-example/1371937-text-blah-blah-blah-2012.html';
preg_match('/\d+/', $url, $matches);
echo $matches[0];
Which simply finds the first bunch of numbers in the URL. This works for the examples.
I've tried multiple functions to extract whatever between two strings, The delimiters might contain special characters, I guess that's why none worked for me.
My current function:
function between($str, $startTag, $endTag){
$delimiter = '#';
$regex = $delimiter . preg_quote($startTag, $delimiter)
. '(.*?)'
. preg_quote($endTag, $delimiter)
. $delimiter
. 's';
preg_match($regex, $str, $matches);
return $matches;
}
Example of string:
#{ST#RT}#
Text i want
#{END}#
#{ST#RT}#
Second text i want
#{END}#
How to improve that or suggest another solution to:
Support any kind of character or new lines
Extract multiple strings if found
Current Behavior: Only returns the first match, And also returns the match plus the surrounding tags which is unwanted
Use the m option for multi-line regular expressions (it allows the . character to match newlines):
preg_match('/foo.+bar/m', $str);
// ^--- this
Use preg_match_all() to get your multiple strings:
preg_match_all($regex, $str, $matches);
return $matches[1]; // an array of the strings
Edit:
The reason your current code returns the match plus the surrounding tags is because you're using return $matches. The $matches array has several elements in it. Index 0 is always the entire string that matched the expression. Indexes 1 and higher are your capture groups. In your expression, you had only one capture group (the "string"), so you would have wanted to only do return $matches[1] instead of return $matches.
You can use preg_match_all to extract multiple strings, besides that your code seems simple enough, normally simpler is faster.
I'm trying to capture the text "Capture This" in $string below.
$string = "</th><td>Capture This</td>";
$pattern = "/<\/th>\r.*<td>(.*)<\/td>$/";
preg_match ($pattern, $string, $matches);
echo($matches);
However, that just returns "Array". I also tried printing $matches using print_r, but that gave me "Array ( )".
This pattern will only come up once, so I just need it to match one time. Can somebody please tell me what I'm doing wrong?
The problem is that you require a CR character \r. Also you should make the search lazy inside the capturing group and use print_r to output the array. Like this:
$pattern = "/<\/th>.*<td>(.*?)<\/td>$/";
You can see it in action here: http://codepad.viper-7.com/djRJ0e
Note that it's recommended to parse html with a proper html parser rather than using regex.
Two things:
You need to drop the \r from your regex as there is no carriage return character in your input string.
Change echo($matches) to print_r($matches) or var_dump($matches)