I am trying to create a REGEX that will
Input
> quote
the rest of it
> another paragraph
the rest of it
And OUTPUT
quote
the rest of it
another paragraph
the rest of it
with a resulting HTML of
<blockquote>
<p>quote
the rest of it</p>
<p>another paragraph
the rest of it</p>
</blockquote>
This is what I have below
$text = preg_replace_callback('/^>(.*)(...)$/m',function($matches){
return '<blockquote>'.$matches[1].'</blockquote>';
},$text);
DEMO
Any help or suggestion would be appreciated
Here is a possible solution for the given example.
$text = "> quote
the rest of it
> another paragraph
the rest of it";
preg_match_all('/^>([\w\s]+)/m', $text, $matches);
$out = $text ;
if (!empty($matches)) {
$out = '<blockquote>';
foreach ($matches[1] as $match) {
$out .= '<p>'.trim($match).'</p>';
}
$out .= '</blockquote>';
}
echo $out ;
Outputs :
<blockquote><p>quote
the rest of it</p><p>another paragraph
the rest of it</p></blockquote>
Try this regex:
(?s)>((?!(\r?\n){2}).)*+
meaning:
(?s) # enable dot-all option
b # match the character 'b'
q # match the character 'q'
\. # match the character '.'
( # start capture group 1
(?! # start negative look ahead
( # start capture group 2
\r? # match the character '\r' and match it once or none at all
\n # match the character '\n'
){2} # end capture group 2 and repeat it exactly 2 times
) # end negative look ahead
. # match any character
)*+ # end capture group 1 and repeat it zero or more times, possessively
The \r?\n matches a Windows, *nix and (newer) MacOS line breaks. If you need to account for real old Mac computers, add the single \r to it: \r?\n|\r
question: https://stackoverflow.com/a/2222331/9238511
Related
I want to do something like stackoverflow. actually changing this style []() to this style . here is my try:
$str = '[link](#)';
$str = str_replace('[','<a href="',$str); // output: <a href="link](#)
$str = str_replace(']','">',$str); // output: <a href="link">(#)
$str = str_replace('(','',$str); // output: <a href="link">#)
$str = str_replace(')','</a>',$str); // output: #
but now, I need to change link with #, how can I do that ?
You want to take a look at preg_replace(), with this you can use a regex to replace it, e.g.
$str = preg_replace("/\[(.*?)\]\((.*?)\)/", "<a href='$2'>$1</a>", $str);
regex explanation:
\[(.*?)\]\((.*?)\)
\[ matches the character [ literally
1st Capturing group (.*?)
.*? matches any character (except newline)
Quantifier: *? Between zero and unlimited times, as few times as possible, expanding as needed [lazy]
\] matches the character ] literally
\( matches the character ( literally
2nd Capturing group (.*?)
.*? matches any character (except newline)
Quantifier: *? Between zero and unlimited times, as few times as possible, expanding as needed [lazy]
\) matches the character ) literally
I split a string '3(1-5)' like this:
$pattern = '/^(\d+)\((\d+)\-(\d+)\)$/';
preg_match($pattern, $string, $matches);
But I need to do the same thing for decimals, i.e. '3.5(1.5-4.5)'.
And what do I have to do, if the user writes '3,5(1,5-4,5)'?
Output of '3.5(1.5-4.5)' should be:
$matches[1] = 3.5
$matches[2] = 1.5
$matches[3] = 4.5
You can use the following regular expression.
$pattern = '/^(\d+(?:[.,]\d+)?)\(((?1))-((?1))\)$/';
The first capturing group ( ... ) matches the following pattern:
( # group and capture to \1:
\d+ # digits (0-9) (1 or more times)
(?: # group, but do not capture (optional):
[.,] # any character of: '.', ','
\d+ # digits (0-9) (1 or more times)
)? # end of grouping
) # end of \1
Afterwords we look for an opening parenthesis and then recurse (match/capture) the 1st subpattern followed by a hyphen (-) and then recurse (match/capture) the 1st subpattern again followed by a closing parenthesis.
Code Demo
This pattern should help:
^(\d+\.?\,?\d+)\((\d+\,?\.?\d+)\-(\d+\.?\,?\d+)\)$
I ran into a problem when trying to match all numbers found between spesific words on my page. How would you match all the numbers in the following text, but only between the word "begin" and "end"?
11
a
b
13
begin
t
899
y
50
f
end
91
h
This works:
preg_match("/begin(.*?)end/s", $text, $out);
preg_match_all("/[0-9]{1,}/", $out[1], $result);
But can it be done in one expression?
I tried this but it doesnt do the trick
preg_match_all("/begin.*([0-9]{1,}).*end/s", $text, $out);
You can make use of the \G anchor like this, and some lookaheads to make sure that you're not going 'out of territory' (out of the area between the two words):
(?:begin|(?!^)\G)(?:(?=(?:(?!begin).)*end)\D)*?(\d+)
regex101 demo
(?: # Begin of first non-capture group
begin # Match 'begin'
| # Or
(?!^)\G # Start the match from the previous end of match
) # End of first non-capture group
(?: # Second non-capture group
(?= # Positive lookahead
(?:(?!begin).)* # Negative lookahead to prevent running into another 'begin'
end # And make sure that there's an 'end' ahead
) # End positive lookahead
\D # Match non-digits
)*? # Second non-capture group repeated many times, lazily
(\d+) # Capture digits
A debuggex if that also helps:
Ideal solution
What is really needed here is a positive lookbehind with variable width. The regex would end up like this:
~(?<=begin.*)\d+(?=.*end)~s
However, as of this writing, the PHP regex flavor doesn't support this feature. Only lookbehind with fixed width is supported. (.Net flavor does though).
Workaround
To acheive our goal, we can use preg_replace_callback with the following regex:
~(?<token>begin|end)|(?<number>\d+)|.*?~s
Sample code
function extract_number($input) {
function matchNumbers($match) {
static $in_region = false;
switch ($match['token']) {
case 'begin':
$in_region=true;
break;
case 'end':
$in_region=false;
break;
}
if ($in_region && isset($match['number'])) {
return $match['number'].',';
} else {
return '';
}
}
$ret=preg_replace_callback('~(?<token>begin|end)|(?<number>\d+)|.*?~s', 'matchNumbers', $input);
return array_filter(explode(',',$ret));
}
echo '<pre>';
echo var_dump(extract_number($str));
echo '</pre>';
Output (with OP's example)
array(3) {
[0]=>
string(3) "899"
[1]=>
string(2) "50"
}
Assuming your project data only has one begin and end "marker" in the text, you can build a more direct and efficient pattern...
Code: (PHP Demo) (Pattern Demo)
$text = "11
a
b
13
begin
t
899
y
50
f
end
91
h";
var_export(preg_match_all('~(?:begin|\G(?!^))(?:(?!end)\D)+\K\d+~s', $text, $out) ? $out[0] : 'no matches');
Output:
array (
0 => '899',
1 => '50',
)
Layman's Breakdown:
(?:begin|\G(?!^)) #match "begin" or continue matching from the position immediately after previous match
(?:(?!end)\D)*? #match zero or more occurrences of any non-digit character while screening for "end". If end is found, immediately cease pattern execution.
\K #restart the fullstring match from this position; this avoids the expense of using a capture group on the desired digits
\d+ #match one or more digits (as much as possible)
See the Pattern Demo link for a more academic breakdown of the pattern.
I'm a little confused with preg_match and preg_replace. I have a very long content string (from a blog), and I want to find, separate and replace all [caption] tags. Possible tags can be:
[caption]test[/caption]
[caption align="center" caption="test" width="123"]<img src="...">[/caption]
[caption caption="test" align="center" width="123"]<img src="...">[/caption]
etc.
Here's the code I have (but I'm finding that's it not working the way I want it to...):
public function parse_captions($content) {
if(preg_match("/\[caption(.*) align=\"(.*)\" width=\"(.*)\" caption=\"(.*)\"\](.*)\[\/caption\]/", $content, $c)) {
$caption = $c[4];
$code = "<div>Test<p class='caption-text'>" . $caption . "</p></div>";
// Here, I'd like to ONLY replace what was found above (since there can be
// multiple instances
$content = preg_replace("/\[caption(.*) width=\"(.*)\" caption=\"(.*)\"\](.*)\[\/caption\]/", $code, $content);
}
return $content;
}
The goal is to ignore the content position. You can try this:
$subject = <<<'LOD'
[caption]test1[/caption]
[caption align="center" caption="test2" width="123"][/caption]
[caption caption="test3" align="center" width="123"][/caption]
LOD;
$pattern = <<<'LOD'
~
\[caption # begining of the tag
(?>[^]c]++|c(?!aption\b))* # followed by anything but c and ]
# or c not followed by "aption"
(?| # alternation group
caption="([^"]++)"[^]]*+] # the content is inside the begining tag
| # OR
]([^[]+) # outside
) # end of alternation group
\[/caption] # closing tag
~x
LOD;
$replacement = "<div>Test<p class='caption-text'>$1</p></div>";
echo htmlspecialchars(preg_replace($pattern, $replacement, $subject));
pattern (condensed version):
$pattern = '~\[caption(?>[^]c]++|c(?!aption\b))*(?|caption="([^"]++)"[^]]*+]|]([^[]++))\[/caption]~';
pattern explanation:
After the begining of the tag you could have content before ] or the caption attribute. This content is describe with:
(?> # atomic group
[^]c]++ # all characters that are not ] or c, 1 or more times
| # OR
c(?!aption\b) # c not followed by aption (to avoid the caption attribute)
)* # zero or more times
The alternation group (?| allow multiple capture groups with the same number:
(?|
# case: the target is in the caption attribute #
caption=" # (you can replace it by caption\s*+=\s*+")
([^"]++) # all that is not a " one or more times (capture group)
"
[^]]*+ # all that is not a ] zero or more times
| # OR
# case: the target is outside the opening tag #
] # square bracket close the opening tag
([^[]+) # all that is not a [ 1 or more times (capture group)
)
The two captures have now the same number #1
Note: if you are sure that each caption tags aren't on several lines, you can add the m modifier at the end of the pattern.
Note2: all quantifiers are possessive and i use atomic groups when it's possible for quick fails and better performances.
Hint (and not an answer, per se)
Your best method of action would be:
Match everything after caption.
preg_match("#\[caption(.*?)\]#", $q, $match)
Use an explode function for extracting values in $match[1], if any.
explode(' ', trim($match[1]))
Check the values in array returned, and use in your code accordingly.
My search text is as follows.
...
...
var strings = ["aaa","bbb","ccc","ddd","eee"];
...
...
It contains many lines(actually a javascript file) but need to parse the values in variable strings , ie aaa , bbb, ccc , ddd , eee
Following is the Perl code, or use PHP at bottom
my $str = <<STR;
...
...
var strings = ["aaa","bbb","ccc","ddd","eee"];
...
...
STR
my #matches = $str =~ /(?:\"(.+?)\",?)/g;
print "#matches";
I know the above script will match all instants, but it will parse strings ("xyz") in the other lines also. So I need to check the string var strings =
/var strings = \[(?:\"(.+?)\",?)/g
Using above regex it will parse aaa.
/var strings = \[(?:\"(.+?)\",?)(?:\"(.+?)\",?)/g
Using above, will get aaa , and bbb. So to avoid the regex repeating I used '+' quantifier as below.
/var strings = \[(?:\"(.+?)\",?)+/g
But I got only eee, So my question is why I got eee ONLY when I used '+' quantifier?
Update 1: Using PHP preg_match_all (doing it to get more attention :-) )
$str = <<<STR
...
...
var strings = ["aaa","bbb","ccc","ddd","eee"];
...
...
STR;
preg_match_all("/var strings = \[(?:\"(.+?)\",?)+/",$str,$matches);
print_r($matches);
Update 2: Why it matched eee ? Because of the greediness of (?:\"(.+?)\",?)+ . By removing greediness /var strings = \[(?:\"(.+?)\",?)+?/ aaa will be matched. But why only one result? Is there any way it can be achieved by using single regex?
Here's a single-regex solution:
/(?:\bvar\s+strings\s*=\s*\[|\G,)\s*"([^"]*)"/g
\G is a zero-width assertion that matches the position where the previous match ended (or the beginning of the string if it's the first match attempt). So this acts like:
var\s+strings\s*=\s*[\s*"([^"]*)"
...on the first attempt, then:
,\s*"([^"]*)"
...after that, but each match has to start exactly where the last one left off.
Here's a demo in PHP, but it will work in Perl, too.
You may prefer this solution which first looks for the string var strings = [ using the /g modifier. This sets \G to match immediately after the [ for the next regex, which looks for all immediately following occurrences of double-quoted strings, possibly preceded by commas or whitespace.
my #matches;
if ($str =~ /var \s+ strings \s* = \s* \[ /gx) {
#matches = $str =~ /\G [,\s]* "([^"]+)" /gx;
}
Despite using the /g modifier your regex /var strings = \[(?:\"(.+?)\",?)+/g matches only once because there is no second occurrence of var strings = [. Each match returns a list of the values of the capture variables $1, $2, $3 etc. when the match completed, and /(?:"(.+?)",?)+/ (there is no need to escape the double-quotes) captures multiple values into $1 leaving only the final value there. You need to write something like the above , which captures only a single value into $1 for each match.
Because the + tells it to repeat the exact stuff inside brackets (?:"(.+?)",?) one or more times. So it will match the "eee" string, end then look for repetitions of that "eee" string, which it does not find.
use YAPE::Regex::Explain;
print YAPE::Regex::Explain->new(qr/var strings = \[(?:"(.+?)",?)+/)->explain();
The regular expression:
(?-imsx:var strings = \[(?:"(.+?)",?)+)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
var strings = 'var strings = '
----------------------------------------------------------------------
\[ '['
----------------------------------------------------------------------
(?: group, but do not capture (1 or more times
(matching the most amount possible)):
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
.+? any character except \n (1 or more
times (matching the least amount
possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
,? ',' (optional (matching the most amount
possible))
----------------------------------------------------------------------
)+ end of grouping
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
A simpler example would be:
my #m = ('abcd' =~ m/(\w)+/g);
print "#m";
Prints only d. This is due to:
use YAPE::Regex::Explain;
print YAPE::Regex::Explain->new(qr/(\w)+/)->explain();
The regular expression:
(?-imsx:(\w)+)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
( group and capture to \1 (1 or more times
(matching the most amount possible)):
----------------------------------------------------------------------
\w word characters (a-z, A-Z, 0-9, _)
----------------------------------------------------------------------
)+ end of \1 (NOTE: because you are using a
quantifier on this capture, only the LAST
repetition of the captured pattern will be
stored in \1)
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
If you use the quantifier on the capture group, only the last instance will be used.
Here's a way that works:
my $str = <<STR;
...
...
var strings = ["aaa","bbb","ccc","ddd","eee"];
...
...
STR
my #matches;
$str =~ m/var strings = \[(.+?)\]/; # get the array first
my $jsarray = $1;
#matches = $array =~ m/"(.+?)"/g; # and get the strings from that
print "#matches";
Update:
A single-line solution (though not a single regex) would be:
#matches = ($str =~ m/var strings = \[(.+?)\]/)[0] =~ m/"(.+?)"/g;
But this is highly unreadable imho.