Select a block of YAML with regex

Select a block of YAML with regex - php

I have a big YAML file and I want to select an entire node using regex. For instance:
Node1:
Child:
GrandChild: foo
Node2:
AnotherChild:
AnotherGrandChild: bar
Node3:
LastChild:
LastGrandChild: foo
How can I use regex to select all of Node2 in the above example, and return:
Node2:
AnotherChild:
AnotherGrandChild: bar

As everything else in that node is indented (if I understand YAML right), this does work at least at your example string:
$mask = '~(^%s:\n(?:^[ ].*\n?)*$)~m';
$pattern = sprintf($mask, 'Node2');
$r = preg_match($pattern, $yaml, $matches);
$node = reset($matches);
At least on my computer. Wanted to make a codepad demo, but it gives error. Will check the regex.
Full blown:
$yaml = <<<EOD
Node1:
Child:
GrandChild: foo
Node2:
AnotherChild:
AnotherGrandChild: bar
Node3:
LastChild:
LastGrandChild: foo
EOD;
$mask = '~
( # start matching group
^ # a node start always at the beginning of a line
%s: # placeholder for sprintf for the nodname + :
$ # end of line for the nodename
\n
(?: # non-matching group to hold all subsequent, indented lines
^ # beginning of sublines
(?:[ ]{2})+ # indentation is required, always a muliple of two spaces, non matching group
.*\n? # match anything else on that subsequent line, optionally the newline character
)* # 0 or more subsequent, indented lines
)$ # this ends a line, to not take over the newline of the last subsequent line (see \n? above).
# the following are modifiers:
# m - pcre multiline modifier (in php same as in perl)
# x - to allow spaces and the comments all over here ;)
~mx
';
$pattern = sprintf($mask, 'Node2');
$r = preg_match($pattern, $yaml, $matches);
$node = reset($matches);
var_dump($node);

You probably want to use a library like php-yaml.

Related

php preg_replace_callback blockquote regex

I am trying to create a REGEX that will
Input
> quote
the rest of it
> another paragraph
the rest of it
And OUTPUT
quote
the rest of it
another paragraph
the rest of it
with a resulting HTML of
<blockquote>
<p>quote
the rest of it</p>
<p>another paragraph
the rest of it</p>
</blockquote>
This is what I have below
$text = preg_replace_callback('/^>(.*)(...)$/m',function($matches){
return '<blockquote>'.$matches[1].'</blockquote>';
},$text);
DEMO
Any help or suggestion would be appreciated

Here is a possible solution for the given example.
$text = "> quote
the rest of it
> another paragraph
the rest of it";
preg_match_all('/^>([\w\s]+)/m', $text, $matches);
$out = $text ;
if (!empty($matches)) {
$out = '<blockquote>';
foreach ($matches[1] as $match) {
$out .= '<p>'.trim($match).'</p>';
}
$out .= '</blockquote>';
}
echo $out ;
Outputs :
<blockquote><p>quote
the rest of it</p><p>another paragraph
the rest of it</p></blockquote>

Try this regex:
(?s)>((?!(\r?\n){2}).)*+
meaning:
(?s) # enable dot-all option
b # match the character 'b'
q # match the character 'q'
\. # match the character '.'
( # start capture group 1
(?! # start negative look ahead
( # start capture group 2
\r? # match the character '\r' and match it once or none at all
\n # match the character '\n'
){2} # end capture group 2 and repeat it exactly 2 times
) # end negative look ahead
. # match any character
)*+ # end capture group 1 and repeat it zero or more times, possessively
The \r?\n matches a Windows, *nix and (newer) MacOS line breaks. If you need to account for real old Mac computers, add the single \r to it: \r?\n|\r
question: https://stackoverflow.com/a/2222331/9238511

PHP: How do I linkify all links inside a given text?

I am using the tool https://github.com/jmrware/LinkifyURL to detect URLs in a text unit. Unfortunately, it only recognizes one URL in the whole text. For example, if the text ought to be:
http://www.guiageo-americas.com/imagens/imagem-america-do-sul.jpg I
really think this should be working
http://www.youtube.com/watch?v=Cy8duEIHEig more text and some writing
here and there
what appears is:
http://www.guiageo-americas.com/imagens/imagem-america-do-sul.jpg I
really think this should be working
http://www.youtube.com/watch?v=Cy8duEIHEig more text and some writing
here and there
and what I want is:
http://www.guiageo-americas.com/imagens/imagem-america-do-sul.jpg I
really think this should be working
http://www.youtube.com/watch?v=Cy8duEIHEig more text and some writing
here and there
Any idea on why? Of course, I'll leave the PHP code here:
function linkify($text) {
/* $text being "http://www.guiageo-americas.com/imagens/imagem-america-do-sul.jpg I really think this should be working http://www.youtube.com/watch?v=Cy8duEIHEig more text and some writing here and there" */
$url_pattern = '/# Rev:20100913_0900 github.com\/jmrware\/LinkifyURL
# Match http & ftp URL that is not already linkified.
# Alternative 1: URL delimited by (parentheses).
(\() # $1 "(" start delimiter.
((?:ht|f)tps?:\/\/[a-z0-9\-._~!$&\'()*+,;=:\/?#[\]#%]+) # $2: URL.
(\)) # $3: ")" end delimiter.
| # Alternative 2: URL delimited by [square brackets].
(\[) # $4: "[" start delimiter.
((?:ht|f)tps?:\/\/[a-z0-9\-._~!$&\'()*+,;=:\/?#[\]#%]+) # $5: URL.
(\]) # $6: "]" end delimiter.
| # Alternative 3: URL delimited by {curly braces}.
(\{) # $7: "{" start delimiter.
((?:ht|f)tps?:\/\/[a-z0-9\-._~!$&\'()*+,;=:\/?#[\]#%]+) # $8: URL.
(\}) # $9: "}" end delimiter.
| # Alternative 4: URL delimited by <angle brackets>.
(<|&(?:lt|\#60|\#x3c);) # $10: "<" start delimiter (or HTML entity).
((?:ht|f)tps?:\/\/[a-z0-9\-._~!$&\'()*+,;=:\/?#[\]#%]+) # $11: URL.
(>|&(?:gt|\#62|\#x3e);) # $12: ">" end delimiter (or HTML entity).
| # Alternative 5: URL not delimited by (), [], {} or <>.
( # $13: Prefix proving URL not already linked.
(?: ^ # Can be a beginning of line or string, or
| [^=\s\'"\]] # a non-"=", non-quote, non-"]", followed by
) \s*[\'"]? # optional whitespace and optional quote;
| [^=\s]\s+ # or... a non-equals sign followed by whitespace.
) # End $13. Non-prelinkified-proof prefix.
( \b # $14: Other non-delimited URL.
(?:ht|f)tps?:\/\/ # Required literal http, https, ftp or ftps prefix.
[a-z0-9\-._~!$\'()*+,;=:\/?#[\]#%]+ # All URI chars except "&" (normal*).
(?: # Either on a "&" or at the end of URI.
(?! # Allow a "&" char only if not start of an...
&(?:gt|\#0*62|\#x0*3e); # HTML ">" entity, or
| &(?:amp|apos|quot|\#0*3[49]|\#x0*2[27]); # a [&\'"] entity if
[.!&\',:?;]? # followed by optional punctuation then
(?:[^a-z0-9\-._~!$&\'()*+,;=:\/?#[\]#%]|$) # a non-URI char or EOS.
) & # If neg-assertion true, match "&" (special).
[a-z0-9\-._~!$\'()*+,;=:\/?#[\]#%]* # More non-& URI chars (normal*).
)* # Unroll-the-loop (special normal*)*.
[a-z0-9\-_~$()*+=\/#[\]#%] # Last char can\'t be [.!&\',;:?]
) # End $14. Other non-delimited URL.
/imx';
//below goes my code
$url_replace = '$1$4$7$10$13<a style="color:blue;" onclick="toogleIframe(this)">$2$5$8$11$14</a>$3$6$9$12';
//echo preg_replace($url_pattern, $url_replace, $text);
return preg_replace($url_pattern, $url_replace, $text);
}

That's the kind of thing best left to a 3rd party library (which you're doing, so kudos). I'd recommend trying another one before you roll your own. purl is an excellent alternative.

You can use the following to replace all matches of your regex (though, I won't count on its performance):
while (preg_match($pattern, $string)) {
$string = preg_replace($pattern, $replacement, $string);
}
So, your function will become:
function linkify($text) {
$url_pattern = "<your-pattern-string">;
$url_replace = "<your-replacement-string">;
while (preg_match($url_pattern, $url_replace, $text) {
$text = preg_replace($url_pattern, $url_replace, $text);
}
return $text;
}

PHP - preg_match/preg_replace problems

I'm a little confused with preg_match and preg_replace. I have a very long content string (from a blog), and I want to find, separate and replace all [caption] tags. Possible tags can be:
[caption]test[/caption]
[caption align="center" caption="test" width="123"]<img src="...">[/caption]
[caption caption="test" align="center" width="123"]<img src="...">[/caption]
etc.
Here's the code I have (but I'm finding that's it not working the way I want it to...):
public function parse_captions($content) {
if(preg_match("/\[caption(.*) align=\"(.*)\" width=\"(.*)\" caption=\"(.*)\"\](.*)\[\/caption\]/", $content, $c)) {
$caption = $c[4];
$code = "<div>Test<p class='caption-text'>" . $caption . "</p></div>";
// Here, I'd like to ONLY replace what was found above (since there can be
// multiple instances
$content = preg_replace("/\[caption(.*) width=\"(.*)\" caption=\"(.*)\"\](.*)\[\/caption\]/", $code, $content);
}
return $content;
}

The goal is to ignore the content position. You can try this:
$subject = <<<'LOD'
[caption]test1[/caption]
[caption align="center" caption="test2" width="123"][/caption]
[caption caption="test3" align="center" width="123"][/caption]
LOD;
$pattern = <<<'LOD'
~
\[caption # begining of the tag
(?>[^]c]++|c(?!aption\b))* # followed by anything but c and ]
# or c not followed by "aption"
(?| # alternation group
caption="([^"]++)"[^]]*+] # the content is inside the begining tag
| # OR
]([^[]+) # outside
) # end of alternation group
\[/caption] # closing tag
~x
LOD;
$replacement = "<div>Test<p class='caption-text'>$1</p></div>";
echo htmlspecialchars(preg_replace($pattern, $replacement, $subject));
pattern (condensed version):
$pattern = '~\[caption(?>[^]c]++|c(?!aption\b))*(?|caption="([^"]++)"[^]]*+]|]([^[]++))\[/caption]~';
pattern explanation:
After the begining of the tag you could have content before ] or the caption attribute. This content is describe with:
(?> # atomic group
[^]c]++ # all characters that are not ] or c, 1 or more times
| # OR
c(?!aption\b) # c not followed by aption (to avoid the caption attribute)
)* # zero or more times
The alternation group (?| allow multiple capture groups with the same number:
(?|
# case: the target is in the caption attribute #
caption=" # (you can replace it by caption\s*+=\s*+")
([^"]++) # all that is not a " one or more times (capture group)
"
[^]]*+ # all that is not a ] zero or more times
| # OR
# case: the target is outside the opening tag #
] # square bracket close the opening tag
([^[]+) # all that is not a [ 1 or more times (capture group)
)
The two captures have now the same number #1
Note: if you are sure that each caption tags aren't on several lines, you can add the m modifier at the end of the pattern.
Note2: all quantifiers are possessive and i use atomic groups when it's possible for quick fails and better performances.

Hint (and not an answer, per se)
Your best method of action would be:
Match everything after caption.
preg_match("#\[caption(.*?)\]#", $q, $match)
Use an explode function for extracting values in $match[1], if any.
explode(' ', trim($match[1]))
Check the values in array returned, and use in your code accordingly.

Looping within a regular expression

can regex able to find a patter to this?
{{foo.bar1.bar2.bar3}}
where in the groups would be
$1 = foo $2 = bar1 $3 = bar2 $4 = bar3 and so on..
it would be like re-doing the expression over and over again until it fails to get a match.
the current expression i am working on is
(?:\{{2})([\w]+).([\w]+)(?:\}{2})
Here's a link from regexr.
http://regexr.com?3203h
--
ok I guess i didn't explain well what I'm trying to achieve here.
let's say I am trying to replace all
.barX inside a {{foo . . . }}
my expected results should be
$foo->bar1->bar2->bar3

This should work, assuming no braces are allowed within the match:
preg_match_all(
'%(?<= # Assert that the previous character(s) are either
\{\{ # {{
| # or
\. # .
) # End of lookbehind
[^{}.]* # Match any number of characters besides braces/dots.
(?= # Assert that the following regex can be matched here:
(?: # Try to match
\. # a dot, followed by
[^{}]* # any number of characters except braces
)? # optionally
\}\} # Match }}
) # End of lookahead%x',
$subject, $result, PREG_PATTERN_ORDER);
$result = $result[0];

I'm not a PHP person, but I managed to construct this piece of code here:
preg_match_all("([a-z0-9]+)",
"{{foo.bar1.bar2.bar3}}",
$out, PREG_PATTERN_ORDER);
foreach($out[0] as $val)
{
echo($val);
echo("<br>");
}
The code above prints the following:
foo
bar1
bar2
bar3
It should allow you to exhaustively search a given string by using a simple regular expression. I think that you should also be able to get what you want by removing the braces and splitting the string.

I don't think so, but it's relatively painless to just split the string on periods like so:
$str = "{{foo.bar1.bar2.bar3}}";
$str = str_replace(array("{","}"), "", $str);
$values = explode(".", $str);
print_r($values); // Yields an array with values foo, bar1, bar2, and bar3
EDIT: In response to your question edit, you could replace all barX in a string by doing the following:
$str = "{{foo.bar1.bar2.bar3}}";
$newStr = preg_replace("#bar\d#, "hi", $str);
echo $newStr; // outputs "{{foo.hi.hi.hi}}"

I don't know the correct syntax in PHP, for pulling out the results, but you could do:
\{{2}(\w+)(?:\.(\w+))*\}{2}
That would capture the first hit in the first capturing group and the rest in second capturing group. regexr.com is lacking the ability to show that as far as I can see though. Try out Expresso, and you'll see what I mean.

PHP: Get last Tag of a String with Regular Expressions

Quite simple problem (but difficult solution): I got a string in PHP like as follows:
['one']['two']['three']
And from this, i must extract the last tags, so i finally got three
it is also possible that there is a number, like
[1][2][3]
and then i must get 3
How can i solve this?
Thanks for your help!
Flo

Your tag is \[[^\]]+\].
3 Tags are: (\[[^\]]+\]){3}
3 Tags at end are: (\[[^\]]+\]){3}$
N Tags at end are: (\[[^\]]+\])*$ (N 0..n)
Example:
<?php
$string = "['one']['two']['three'][1][2][3]['last']";
preg_match("/((?:\[[^\]+]*\]){3})$/", $string, $match);
print_r($match); // Array ( [0] => [2][3]['last'] [1] => [2][3]['last'] )

This tested code may work for you:
function getLastTag($text) {
$re = '/
# Match contents of last [Tag].
\[ # Literal start of last tag.
(?: # Group tag contents alternatives.
\'([^\']+)\' # Either $1: single quoted,
| (\d+) # or $2: un-quoted digits.
) # End group of tag contents alts.
\] # Literal end of last tag.
\s* # Allow trailing whitespace.
$ # Anchor to end of string.
/x';
if (preg_match($re, $text, $matches)) {
if ($matches[1]) return $matches[1]; // Either single quoted,
if ($matches[2]) return $matches[2]; // or non quoted digit.
}
return null; // No match. Return NULL.
}

Here is a regex that may work for you. Try this:
[^\[\]']*(?='?\]$)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Select a block of YAML with regex - php

You probably want to use a library like php-yaml.

Related

php preg_replace_callback blockquote regex

PHP: How do I linkify all links inside a given text?

PHP - preg_match/preg_replace problems

Looping within a regular expression

PHP: Get last Tag of a String with Regular Expressions

Categories

Resources