Parse string based on pattern - php

I am using php 5 to parse a string. My input string looks like the following:
{Billion is|Millions are|Trillion is} {an extremely |a| a generously |
a very} { tiny|little |smallish |short |small} stage in a vast
{galactic| |large|huge|tense|big |cosmic}
{universe|Colosseum|planet|arena}.
Find below my minimum viable example:
<?php
function process($text)
{
return preg_replace_callback('/\[(((?>[^\[\]]+)|(?R))*)\]/x', array(
$this,
'replace'
), $text);
}
function replace($text)
{
$text = $this->process($text[1]);
$parts = explode('|', $text);
return $parts[array_rand($parts)];
}
$text = "{Billion is|Millions are|Trillion is} {an extremely |a| a generously | a very} { tiny|little |smallish |short |small} stage in a vast {galactic| |large|huge|tense|big |cosmic} {universe|Colosseum|planet|arena}.";
$res = process($text);
echo $res;
As you can see I am trying to parse the following pattern f.ex.: {Billion is|Millions are|Trillion is} using the above regex, /\[(((?>[^\[\]]+)|(?R))*)\]/x.
As a result I am getting the same string as inputted. I would like to get as an output for example:
Billion is a very little stage in a vast huge arena.
Any suggestions what I am doing wrong?

How would your current code generate anything.
Your regex doesn't fit. It matches nested bracketed stuff and not braced. Try{([^}]*)} for capturing everything inside {...} to $m[1] if there are no nested braces.
Read about preg_replace_callback(). The second argument can not be an array.
A working code with some further adjustments could look like this:
function process($text) {
return preg_replace_callback('/{([^}]*)}/', 'replace', $text);
}
function replace($m) {
$parts = explode('|', $m[1]);
shuffle($parts);
return $parts[0];
}
$text = "{Billion is|Millions are|Trillion is} {an extremely|a|a generously|a very} {tiny|little|smallish|short|small} stage in a vast {galactic||large|huge|tense|big|cosmic} {universe|Colosseum|planet|arena}.";
echo process($text);
Billion is a generously short stage in a vast Colosseum.
Here is a demo at eval.in
(you can also use an anonymous function if PHP >= 5.3)

Related

substr() to preg_replace() matches php

I have two functions in PHP, trimmer($string,$number) and toUrl($string). I want to trim the urls extracted with toUrl(), to 20 characters for example. from https://www.youtube.com/watch?v=HU3GZTNIZ6M to https://www.youtube.com/wa...
function trimmer($string,$number) {
$string = substr ($string, 0, $number);
return $string."...";
}
function toUrl($string) {
$regex="/[^\W ]+[^\s]+[.]+[^\" ]+[^\W ]+/i";
$string= preg_replace($regex, "<a href='\\0'>".trimmer("\\0",20)."</a>",$string);
return $string;
}
But the problem is that the value of the match return \\0 not a variable like $url which could be easily trimmed with the function trimmer().
The Question is how do I apply substr() to \\0 something like this substr("\\0",0,20)?
What you want is preg_replace_callback:
function _toUrl_callback($m) {
return "" . trimmer($m[0], 20) ."";
}
function toUrl($string) {
$regex = "/[^\W ]+[^\s]+[.]+[^\" ]+[^\W ]+/i";
$string = preg_replace_callback($regex, "_toUrl_callback", $string);
return $string;
}
Also note that (side notes wrt your question):
You have a syntax error, '$regex' is not going to work (they don't replace var names in single-quoted strings)
You may want to look for better regexps to match URLs, you'll find plenty of them with a quick search
You may want to run through htmlspecialchars() your matches (mainly problems with "&", but that depends how you escape the rest of the string.
EDIT: Made it more PHP 4 friendly, requested by the asker.

regular expression to extract a part of string

I have following format of transaction from core banking system
This is a <test> and only <test> hope <u> understand
from where i want
<test><test><u> (along with <>)
with simple substring i can do that , but it will be too slow .. is there any way to capture a text between < and > using regex functions?
The easiest I can think of is to use preg_match_all() and then join() the results together to form the final string:
function get_bracketed_words($str)
{
if (preg_match_all('/<[a-z]+>/', $str, $matches)) {
return join('', $matches[0]);
}
return '';
}
If you use this, it should not be too slow (Perl code as an example here):
while (my $line = <FILE>) {
my ($request) = ($line =~ /RequestArray:(.*)/);
next unless $request;
# here, you can split $requests to sub-pieces using another regex
# ...
}

php replace regular expression instead of string replace

I'm trying to give my client the ability to call a function that has various code snippets by inserted a short code in their WYSIWYG editor.
For example, they will write something like...
[getSnippet(1)]
This will call my getSnippet($id) php function and output the appropriate 'chunk'.
It works when I hard code the $id like this...
echo str_replace('[getSnippet(1)]',getSnippet(1),$rowPage['sidebar_details']);
However, I really want to make the '1' dynamic. I'm sort of on the right track with something like...
function getSnippet($id) {
if ($id == 1) {
echo "car";
}
}
$string = "This [getSnippet(1)] is a sentence.This is the next one.";
$regex = '#([getSnippet(\w)])#';
$string = preg_replace($regex, '. \1', $string);
//If you want to capture more than just periods, you can do:
echo preg_replace('#(\.|,|\?|!)(\w)#', '\1 \2', $string);
Not quite working :(
Firstly in your regex you need to add literal parentheses (the ones you have just capture \w but that will not match the parentheses themselves):
$regex = '#(\[getSnippet\((\w)\)\])#';
I also escaped the square brackets, otherwise they will open a character class. Also be aware that this captures only one character for the parameter!
But I recommend you use preg_replace_callback, with a regex like this:
function getSnippet($id) {
if ($id == 1) {
return "car";
}
}
function replaceCallback($matches) {
return getSnippet($matches[1]);
}
$string = preg_replace_callback(
'#\[getSnippet\((\w+)\)\]#',
'replaceCallback',
$string
);
Note that I changed the echo in your getSnippet to a return.
Within the callback $matches[1] will contain the first captured group, which in this case is your parameter (which now allows for multiple characters). Of course, you could also adjust you getSnippet function to read the id from the $matches array instead of redirecting through the replaceCallback.
But this approach here is slightly more flexible, as it allows you to redirect to multiple functions. Just as an example, if you changed the regex to #\[(getSnippet|otherFunction)\((\w+)\)\]# then you could find two different functions, and replaceCallback could find out the name of the function in $matches[1] and call the function with the parameter $matches[2]. Like this:
function getSnippet($id) {
...
}
function otherFunction($parameter) {
...
}
function replaceCallback($matches) {
return $matches[1]($matches[2]);
}
$string = preg_replace_callback(
'#\[(getSnippet|otherFunction)\((\w+)\)\]#',
'replaceCallback',
$string
);
It really depends on where you want to go with this. The important thing is, there is no way of processing an arbitrary parameter in a replacement without using preg_replace_callback.

Find links in string with PHP. Differ from normal and youtube links

I have a string that contain links. I want my php to do different things with my links, depending on the url.
Answer:
function fixLinks($text)
{
$links = array();
$text = strip_tags($text);
$pattern = '!(https?://[^\s]+)!';
if (preg_match_all($pattern, $text, $matches)) {
list(, $links) = ($matches);
}
$i = 0;
$links2 = array();
foreach($links AS $link) {
if(strpos($link,'youtube.com') !== false) {
$search = "!(http://.*youtube\.com.*v=)?([a-zA-Z0-9_-]{11})(&.*)?!";
$youtube = 'http://www.youtube.com/watch?v=\\2';
$link2 = preg_replace($search, $youtube, $link);
} else {
$link2 = preg_replace('#(https?://([-\w\.]+)+(:\d+)?(/([\-\w/_\.]*(\?\S+)?)?)?)#', '<u>$1</u>', $link);
}
$links2[$i] = $link2;
$i++;
}
$text = str_replace($links, $links2, $text);
$text = nl2br($text);
return $text;
}
First of all, ditch eregi. It's deprecated and will disappear soon.
Then, doing this in just one pass is maybe a stretch too far. I think you'll be better off splitting this into three phases.
Phase 1 runs a regex search over your input, finding everything that looks like a link, and storing it in a list.
Phase 2 iterates over the list, checking whether a link goes to youtube (parse_url is tremendously useful for this), and putting a suitable replacement into a second list.
Phase 3: you now have two lists, one containing the original matches, one containing the desired replacements. Run str_replace over your original text, providing the match list for the search parameter and the replacement list for the replacements.
There are several advantages to this approach:
The regular expression for extracting links can be kept relatively simple, since it doesn't have to take special hostnames into account
It is easier to debug; you can dump the search and replace arrays prior to phase 3, and see if they contain what you expect
Because you perform all replacements in one go, you avoid problems with overlapping matches or replacing a piece of already-replaced text (after all, the replaced text still contains a URL, and you don't want to replace that again)
tdammers' answer is good, but another option is to use preg_replace_callback. If you go with that, then the process changes a little:
Create a regular expression to match all links, same as his Phase 1
In the callback, search for the YouTube video id. This will require running a second preg_match, which is (in my opinion) the biggest problem with this technique.
Return the replacement string, based on whether or not it's YouTube.
The code would look something like this:
function replaceem($matches) {
$url = $matches[0];
preg_match('~youtube\.com.*v=([\w\-]{11})~', $url, $matches);
return isset($matches[0]) ?
'<a href="youtube.php?id='.$matches[1].'" class="fancy">'.
'http://www.youtube.com/watch?v='.$matches[1].'</a>' :
'<a href="'.$url.'" title="Åben link" alt="Åben link" '.
'target="_blank">'.$url.'</a>';
}
$text = preg_replace_callback('~(?:f|ht)tps?://[^\s]+~', 'replaceem', $text);

Regular Expression Help - Brackets within brackets

I'm trying to develop a function that can sort through a string that looks like this:
Donny went to the {park|store|{beach with friends|beach alone}} so he could get a breath of fresh air.
What I intend to do is search the text recursively for {} patterns where there is no { or } inside the {}, so only the innermost sandwiched text is selected, where I will then run a php to array the contents and select one at random, repeating process until the whole string has been parsed, showing a complete sentence.
I just cannot wrap my head around regular expressions though.
Appreciate any help!
Don't know about maths theory behind this ;-/ but in practice that's quite easy. Try
$text = "Donny went to the {park|store|{beach with friends|beach alone}} so he could get a breath of fresh air. ";
function rnd($matches) {
$words = explode('|', $matches[1]);
return $words[rand() % count($words)];
}
do {
$text = preg_replace_callback('~{([^{}]+)}~', 'rnd', $text, -1, $count);
} while($count > 0);
echo $text;
Regexes are not capable of counting and therefore cannot find matching brackets reliably.
What you need is a grammar.
See this related question.
$str="Donny went to the {park|store|{beach {with friends}|beach alone}} so he could get a breath of fresh air. ";
$s = explode("}",$str);
foreach($s as $v){
if(strpos($v,"{")!==FALSE){
$t=explode("{",$v);
print end($t)."\n";
}
}
output
$ php test.php
with friends
Regular expressions don't deal well with recursive stuff, but PHP does:
$str = 'Donny went to the {park|store|{beach with friends|beach alone}} so he could get a breath of fresh air.';
echo parse_string($str), "\n";
function parse_string($string) {
if ( preg_match('/\{([^{}]+)\}/', $string, $matches) ) {
$inner_elements = explode('|', $matches[1]);
$random_element = $inner_elements[array_rand($inner_elements)];
$string = str_replace($matches[0], $random_element, $string);
$string = parse_string($string);
}
return $string;
}
You could do this with a lexer/parser. I don't know of any options in PHP (but since there are XML parsers in PHP, there are no doubt generic parsers). On the other hand, what you're asking to do is not too complicated. Using strings in PHP (substring, etc.) you could probably do this in a few recursive functions.
You will then finally have created a MadLibz generator in PHP with a simple grammar. Pretty cool.

Categories