regular expression to extract a part of string - php

I have following format of transaction from core banking system
This is a <test> and only <test> hope <u> understand
from where i want
<test><test><u> (along with <>)
with simple substring i can do that , but it will be too slow .. is there any way to capture a text between < and > using regex functions?

The easiest I can think of is to use preg_match_all() and then join() the results together to form the final string:
function get_bracketed_words($str)
{
if (preg_match_all('/<[a-z]+>/', $str, $matches)) {
return join('', $matches[0]);
}
return '';
}

If you use this, it should not be too slow (Perl code as an example here):
while (my $line = <FILE>) {
my ($request) = ($line =~ /RequestArray:(.*)/);
next unless $request;
# here, you can split $requests to sub-pieces using another regex
# ...
}

Related

Parse string based on pattern

I am using php 5 to parse a string. My input string looks like the following:
{Billion is|Millions are|Trillion is} {an extremely |a| a generously |
a very} { tiny|little |smallish |short |small} stage in a vast
{galactic| |large|huge|tense|big |cosmic}
{universe|Colosseum|planet|arena}.
Find below my minimum viable example:
<?php
function process($text)
{
return preg_replace_callback('/\[(((?>[^\[\]]+)|(?R))*)\]/x', array(
$this,
'replace'
), $text);
}
function replace($text)
{
$text = $this->process($text[1]);
$parts = explode('|', $text);
return $parts[array_rand($parts)];
}
$text = "{Billion is|Millions are|Trillion is} {an extremely |a| a generously | a very} { tiny|little |smallish |short |small} stage in a vast {galactic| |large|huge|tense|big |cosmic} {universe|Colosseum|planet|arena}.";
$res = process($text);
echo $res;
As you can see I am trying to parse the following pattern f.ex.: {Billion is|Millions are|Trillion is} using the above regex, /\[(((?>[^\[\]]+)|(?R))*)\]/x.
As a result I am getting the same string as inputted. I would like to get as an output for example:
Billion is a very little stage in a vast huge arena.
Any suggestions what I am doing wrong?
How would your current code generate anything.
Your regex doesn't fit. It matches nested bracketed stuff and not braced. Try{([^}]*)} for capturing everything inside {...} to $m[1] if there are no nested braces.
Read about preg_replace_callback(). The second argument can not be an array.
A working code with some further adjustments could look like this:
function process($text) {
return preg_replace_callback('/{([^}]*)}/', 'replace', $text);
}
function replace($m) {
$parts = explode('|', $m[1]);
shuffle($parts);
return $parts[0];
}
$text = "{Billion is|Millions are|Trillion is} {an extremely|a|a generously|a very} {tiny|little|smallish|short|small} stage in a vast {galactic||large|huge|tense|big|cosmic} {universe|Colosseum|planet|arena}.";
echo process($text);
Billion is a generously short stage in a vast Colosseum.
Here is a demo at eval.in
(you can also use an anonymous function if PHP >= 5.3)

substr() to preg_replace() matches php

I have two functions in PHP, trimmer($string,$number) and toUrl($string). I want to trim the urls extracted with toUrl(), to 20 characters for example. from https://www.youtube.com/watch?v=HU3GZTNIZ6M to https://www.youtube.com/wa...
function trimmer($string,$number) {
$string = substr ($string, 0, $number);
return $string."...";
}
function toUrl($string) {
$regex="/[^\W ]+[^\s]+[.]+[^\" ]+[^\W ]+/i";
$string= preg_replace($regex, "<a href='\\0'>".trimmer("\\0",20)."</a>",$string);
return $string;
}
But the problem is that the value of the match return \\0 not a variable like $url which could be easily trimmed with the function trimmer().
The Question is how do I apply substr() to \\0 something like this substr("\\0",0,20)?
What you want is preg_replace_callback:
function _toUrl_callback($m) {
return "" . trimmer($m[0], 20) ."";
}
function toUrl($string) {
$regex = "/[^\W ]+[^\s]+[.]+[^\" ]+[^\W ]+/i";
$string = preg_replace_callback($regex, "_toUrl_callback", $string);
return $string;
}
Also note that (side notes wrt your question):
You have a syntax error, '$regex' is not going to work (they don't replace var names in single-quoted strings)
You may want to look for better regexps to match URLs, you'll find plenty of them with a quick search
You may want to run through htmlspecialchars() your matches (mainly problems with "&", but that depends how you escape the rest of the string.
EDIT: Made it more PHP 4 friendly, requested by the asker.

PHP:preg_replace function

$text = "
<tag>
<html>
HTML
</html>
</tag>
";
I want to replace all the text present inside the tags with htmlspecialchars(). I tried this:
$regex = '/<tag>(.*?)<\/tag>/s';
$code = preg_replace($regex,htmlspecialchars($regex),$text);
But it doesn't work.
I am getting the output as htmlspecialchars of the regex pattern. I want to replace it with htmlspecialchars of the data matching with the regex pattern.
what should i do?
You're replacing the match with the pattern itself, you're not using the back-references and the e-flag, but in this case, preg_replace_callback would be the way to go:
$code = preg_replace_callback($regex,'htmlspecialchars',$text);
This will pass the mathces groups to htmlspecialchars, and use its return value as replacement. The groups might be an array, in which case, you can try either:
function replaceCallback($matches)
{
if (is_array($matches))
{
$matches = implode ('', array_slice($matches, 1));//first element is full string
}
return htmlspecialchars($matches);
}
Or, if your PHP version permits it:
preg_replace_callback($expr, function($matches)
{
$return = '';
for ($i=1, $j = count($matches); $i<$j;$i++)
{//loop like this, skips first index, and allows for any number of groups
$return .= htmlspecialchars($matches[$i]);
}
return $return;
}, $text);
Try any of the above, until you find simething that works... incidentally, if all you want to remove is <tag> and </tag>, why not go for the much faster:
echo htmlspecialchars(str_replace(array('<tag>','</tag>'), '', $text));
That's just keeping it simple, and it'll almost certainly be faster, too.
See the quickest, easiest way in action here
If you want to isolate the actual contents as defined by your pattern, you could use preg_match($regex,$text,$hits);. This will give you an array of hits those bits that were between the paratheses in the pattern, starting at $hits[1], $hits[0] contains the whole matched string). You can then start manipulating these found matches, possibly using htmlspecialchars ... and combine them again into $code.

php replace regular expression instead of string replace

I'm trying to give my client the ability to call a function that has various code snippets by inserted a short code in their WYSIWYG editor.
For example, they will write something like...
[getSnippet(1)]
This will call my getSnippet($id) php function and output the appropriate 'chunk'.
It works when I hard code the $id like this...
echo str_replace('[getSnippet(1)]',getSnippet(1),$rowPage['sidebar_details']);
However, I really want to make the '1' dynamic. I'm sort of on the right track with something like...
function getSnippet($id) {
if ($id == 1) {
echo "car";
}
}
$string = "This [getSnippet(1)] is a sentence.This is the next one.";
$regex = '#([getSnippet(\w)])#';
$string = preg_replace($regex, '. \1', $string);
//If you want to capture more than just periods, you can do:
echo preg_replace('#(\.|,|\?|!)(\w)#', '\1 \2', $string);
Not quite working :(
Firstly in your regex you need to add literal parentheses (the ones you have just capture \w but that will not match the parentheses themselves):
$regex = '#(\[getSnippet\((\w)\)\])#';
I also escaped the square brackets, otherwise they will open a character class. Also be aware that this captures only one character for the parameter!
But I recommend you use preg_replace_callback, with a regex like this:
function getSnippet($id) {
if ($id == 1) {
return "car";
}
}
function replaceCallback($matches) {
return getSnippet($matches[1]);
}
$string = preg_replace_callback(
'#\[getSnippet\((\w+)\)\]#',
'replaceCallback',
$string
);
Note that I changed the echo in your getSnippet to a return.
Within the callback $matches[1] will contain the first captured group, which in this case is your parameter (which now allows for multiple characters). Of course, you could also adjust you getSnippet function to read the id from the $matches array instead of redirecting through the replaceCallback.
But this approach here is slightly more flexible, as it allows you to redirect to multiple functions. Just as an example, if you changed the regex to #\[(getSnippet|otherFunction)\((\w+)\)\]# then you could find two different functions, and replaceCallback could find out the name of the function in $matches[1] and call the function with the parameter $matches[2]. Like this:
function getSnippet($id) {
...
}
function otherFunction($parameter) {
...
}
function replaceCallback($matches) {
return $matches[1]($matches[2]);
}
$string = preg_replace_callback(
'#\[(getSnippet|otherFunction)\((\w+)\)\]#',
'replaceCallback',
$string
);
It really depends on where you want to go with this. The important thing is, there is no way of processing an arbitrary parameter in a replacement without using preg_replace_callback.

Linkify Regex Function PHP Daring Fireball Method

So, I know there are a ton of related questions on SO, but none of them are quite what I'm looking for. I'm trying to implement a PHP function that will convert text URLs from a user-generated post into links. I'm using the 'improved' Regex from Daring Fireball towards the bottom of the page: http://daringfireball.net/2010/07/improved_regex_for_matching_urls
The function does not return anything, and I'm not sure why.
<?php
if ( false === function_exists('linkify') ):
function linkify($str) {
$pattern = '(?xi)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:\'".,<>?«»“”‘’]))';
return preg_replace($pattern, "\\0", $str);
}
endif;
?>
Can someone please help me get this to work?
Thanks!
Try this:
$pattern = '(?xi)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`\!()\[\]{};:\'".,<>?«»“”‘’]))';
return preg_replace("!$pattern!i", "\\0", $str);
PHP's preg function do need delimiters. The i at the end makes it case-insensitive
Update
If you use # as the delimiter, you wan't need to escape the ! in the pattern as such use the original pattern string (the pattern does not have a #): "#$pattern#i"
Update 2
To ensure that the links are correct, do this:
$pattern = '(?xi)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:\'".,<>?«»“”‘’]))';
return preg_replace_callback("#$pattern#i", function($matches) {
$input = $matches[0];
$url = preg_match('!^https?://!i', $input) ? $input : "http://$input";
return '' . "$input";
}, $str);
This will now append http:// to the urls so that browser doesn't think it is a relative link.
I was looking to just get the urls from a string using the same regex from the answer above by d_inevitable and wasn't looking to turn them into links or care about the rest of the string, I only wanted the urls with in the string so this is what I did. Hope it helps.
/**
* Returns the urls in an array from a string.
* This dos NOT return the string, only the urls with-in.
*/
function get_urls($str){
$regex = '(?xi)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:\'".,<>?«»“”‘’]))';
preg_match_all("#$regex#i", $str, $matches);
$urls = $matches[0];
return $urls;
}

Categories