I have a big YAML file and I want to select an entire node using regex. For instance:
Node1:
Child:
GrandChild: foo
Node2:
AnotherChild:
AnotherGrandChild: bar
Node3:
LastChild:
LastGrandChild: foo
How can I use regex to select all of Node2 in the above example, and return:
Node2:
AnotherChild:
AnotherGrandChild: bar
As everything else in that node is indented (if I understand YAML right), this does work at least at your example string:
$mask = '~(^%s:\n(?:^[ ].*\n?)*$)~m';
$pattern = sprintf($mask, 'Node2');
$r = preg_match($pattern, $yaml, $matches);
$node = reset($matches);
At least on my computer. Wanted to make a codepad demo, but it gives error. Will check the regex.
Full blown:
$yaml = <<<EOD
Node1:
Child:
GrandChild: foo
Node2:
AnotherChild:
AnotherGrandChild: bar
Node3:
LastChild:
LastGrandChild: foo
EOD;
$mask = '~
( # start matching group
^ # a node start always at the beginning of a line
%s: # placeholder for sprintf for the nodname + :
$ # end of line for the nodename
\n
(?: # non-matching group to hold all subsequent, indented lines
^ # beginning of sublines
(?:[ ]{2})+ # indentation is required, always a muliple of two spaces, non matching group
.*\n? # match anything else on that subsequent line, optionally the newline character
)* # 0 or more subsequent, indented lines
)$ # this ends a line, to not take over the newline of the last subsequent line (see \n? above).
# the following are modifiers:
# m - pcre multiline modifier (in php same as in perl)
# x - to allow spaces and the comments all over here ;)
~mx
';
$pattern = sprintf($mask, 'Node2');
$r = preg_match($pattern, $yaml, $matches);
$node = reset($matches);
var_dump($node);
You probably want to use a library like php-yaml.
Related
I am trying to create a REGEX that will
Input
> quote
the rest of it
> another paragraph
the rest of it
And OUTPUT
quote
the rest of it
another paragraph
the rest of it
with a resulting HTML of
<blockquote>
<p>quote
the rest of it</p>
<p>another paragraph
the rest of it</p>
</blockquote>
This is what I have below
$text = preg_replace_callback('/^>(.*)(...)$/m',function($matches){
return '<blockquote>'.$matches[1].'</blockquote>';
},$text);
DEMO
Any help or suggestion would be appreciated
Here is a possible solution for the given example.
$text = "> quote
the rest of it
> another paragraph
the rest of it";
preg_match_all('/^>([\w\s]+)/m', $text, $matches);
$out = $text ;
if (!empty($matches)) {
$out = '<blockquote>';
foreach ($matches[1] as $match) {
$out .= '<p>'.trim($match).'</p>';
}
$out .= '</blockquote>';
}
echo $out ;
Outputs :
<blockquote><p>quote
the rest of it</p><p>another paragraph
the rest of it</p></blockquote>
Try this regex:
(?s)>((?!(\r?\n){2}).)*+
meaning:
(?s) # enable dot-all option
b # match the character 'b'
q # match the character 'q'
\. # match the character '.'
( # start capture group 1
(?! # start negative look ahead
( # start capture group 2
\r? # match the character '\r' and match it once or none at all
\n # match the character '\n'
){2} # end capture group 2 and repeat it exactly 2 times
) # end negative look ahead
. # match any character
)*+ # end capture group 1 and repeat it zero or more times, possessively
The \r?\n matches a Windows, *nix and (newer) MacOS line breaks. If you need to account for real old Mac computers, add the single \r to it: \r?\n|\r
question: https://stackoverflow.com/a/2222331/9238511
I am using the tool https://github.com/jmrware/LinkifyURL to detect URLs in a text unit. Unfortunately, it only recognizes one URL in the whole text. For example, if the text ought to be:
http://www.guiageo-americas.com/imagens/imagem-america-do-sul.jpg I
really think this should be working
http://www.youtube.com/watch?v=Cy8duEIHEig more text and some writing
here and there
what appears is:
http://www.guiageo-americas.com/imagens/imagem-america-do-sul.jpg I
really think this should be working
http://www.youtube.com/watch?v=Cy8duEIHEig more text and some writing
here and there
and what I want is:
http://www.guiageo-americas.com/imagens/imagem-america-do-sul.jpg I
really think this should be working
http://www.youtube.com/watch?v=Cy8duEIHEig more text and some writing
here and there
Any idea on why? Of course, I'll leave the PHP code here:
function linkify($text) {
/* $text being "http://www.guiageo-americas.com/imagens/imagem-america-do-sul.jpg I really think this should be working http://www.youtube.com/watch?v=Cy8duEIHEig more text and some writing here and there" */
$url_pattern = '/# Rev:20100913_0900 github.com\/jmrware\/LinkifyURL
# Match http & ftp URL that is not already linkified.
# Alternative 1: URL delimited by (parentheses).
(\() # $1 "(" start delimiter.
((?:ht|f)tps?:\/\/[a-z0-9\-._~!$&\'()*+,;=:\/?#[\]#%]+) # $2: URL.
(\)) # $3: ")" end delimiter.
| # Alternative 2: URL delimited by [square brackets].
(\[) # $4: "[" start delimiter.
((?:ht|f)tps?:\/\/[a-z0-9\-._~!$&\'()*+,;=:\/?#[\]#%]+) # $5: URL.
(\]) # $6: "]" end delimiter.
| # Alternative 3: URL delimited by {curly braces}.
(\{) # $7: "{" start delimiter.
((?:ht|f)tps?:\/\/[a-z0-9\-._~!$&\'()*+,;=:\/?#[\]#%]+) # $8: URL.
(\}) # $9: "}" end delimiter.
| # Alternative 4: URL delimited by <angle brackets>.
(<|&(?:lt|\#60|\#x3c);) # $10: "<" start delimiter (or HTML entity).
((?:ht|f)tps?:\/\/[a-z0-9\-._~!$&\'()*+,;=:\/?#[\]#%]+) # $11: URL.
(>|&(?:gt|\#62|\#x3e);) # $12: ">" end delimiter (or HTML entity).
| # Alternative 5: URL not delimited by (), [], {} or <>.
( # $13: Prefix proving URL not already linked.
(?: ^ # Can be a beginning of line or string, or
| [^=\s\'"\]] # a non-"=", non-quote, non-"]", followed by
) \s*[\'"]? # optional whitespace and optional quote;
| [^=\s]\s+ # or... a non-equals sign followed by whitespace.
) # End $13. Non-prelinkified-proof prefix.
( \b # $14: Other non-delimited URL.
(?:ht|f)tps?:\/\/ # Required literal http, https, ftp or ftps prefix.
[a-z0-9\-._~!$\'()*+,;=:\/?#[\]#%]+ # All URI chars except "&" (normal*).
(?: # Either on a "&" or at the end of URI.
(?! # Allow a "&" char only if not start of an...
&(?:gt|\#0*62|\#x0*3e); # HTML ">" entity, or
| &(?:amp|apos|quot|\#0*3[49]|\#x0*2[27]); # a [&\'"] entity if
[.!&\',:?;]? # followed by optional punctuation then
(?:[^a-z0-9\-._~!$&\'()*+,;=:\/?#[\]#%]|$) # a non-URI char or EOS.
) & # If neg-assertion true, match "&" (special).
[a-z0-9\-._~!$\'()*+,;=:\/?#[\]#%]* # More non-& URI chars (normal*).
)* # Unroll-the-loop (special normal*)*.
[a-z0-9\-_~$()*+=\/#[\]#%] # Last char can\'t be [.!&\',;:?]
) # End $14. Other non-delimited URL.
/imx';
//below goes my code
$url_replace = '$1$4$7$10$13<a style="color:blue;" onclick="toogleIframe(this)">$2$5$8$11$14</a>$3$6$9$12';
//echo preg_replace($url_pattern, $url_replace, $text);
return preg_replace($url_pattern, $url_replace, $text);
}
That's the kind of thing best left to a 3rd party library (which you're doing, so kudos). I'd recommend trying another one before you roll your own. purl is an excellent alternative.
You can use the following to replace all matches of your regex (though, I won't count on its performance):
while (preg_match($pattern, $string)) {
$string = preg_replace($pattern, $replacement, $string);
}
So, your function will become:
function linkify($text) {
$url_pattern = "<your-pattern-string">;
$url_replace = "<your-replacement-string">;
while (preg_match($url_pattern, $url_replace, $text) {
$text = preg_replace($url_pattern, $url_replace, $text);
}
return $text;
}
I'm a little confused with preg_match and preg_replace. I have a very long content string (from a blog), and I want to find, separate and replace all [caption] tags. Possible tags can be:
[caption]test[/caption]
[caption align="center" caption="test" width="123"]<img src="...">[/caption]
[caption caption="test" align="center" width="123"]<img src="...">[/caption]
etc.
Here's the code I have (but I'm finding that's it not working the way I want it to...):
public function parse_captions($content) {
if(preg_match("/\[caption(.*) align=\"(.*)\" width=\"(.*)\" caption=\"(.*)\"\](.*)\[\/caption\]/", $content, $c)) {
$caption = $c[4];
$code = "<div>Test<p class='caption-text'>" . $caption . "</p></div>";
// Here, I'd like to ONLY replace what was found above (since there can be
// multiple instances
$content = preg_replace("/\[caption(.*) width=\"(.*)\" caption=\"(.*)\"\](.*)\[\/caption\]/", $code, $content);
}
return $content;
}
The goal is to ignore the content position. You can try this:
$subject = <<<'LOD'
[caption]test1[/caption]
[caption align="center" caption="test2" width="123"][/caption]
[caption caption="test3" align="center" width="123"][/caption]
LOD;
$pattern = <<<'LOD'
~
\[caption # begining of the tag
(?>[^]c]++|c(?!aption\b))* # followed by anything but c and ]
# or c not followed by "aption"
(?| # alternation group
caption="([^"]++)"[^]]*+] # the content is inside the begining tag
| # OR
]([^[]+) # outside
) # end of alternation group
\[/caption] # closing tag
~x
LOD;
$replacement = "<div>Test<p class='caption-text'>$1</p></div>";
echo htmlspecialchars(preg_replace($pattern, $replacement, $subject));
pattern (condensed version):
$pattern = '~\[caption(?>[^]c]++|c(?!aption\b))*(?|caption="([^"]++)"[^]]*+]|]([^[]++))\[/caption]~';
pattern explanation:
After the begining of the tag you could have content before ] or the caption attribute. This content is describe with:
(?> # atomic group
[^]c]++ # all characters that are not ] or c, 1 or more times
| # OR
c(?!aption\b) # c not followed by aption (to avoid the caption attribute)
)* # zero or more times
The alternation group (?| allow multiple capture groups with the same number:
(?|
# case: the target is in the caption attribute #
caption=" # (you can replace it by caption\s*+=\s*+")
([^"]++) # all that is not a " one or more times (capture group)
"
[^]]*+ # all that is not a ] zero or more times
| # OR
# case: the target is outside the opening tag #
] # square bracket close the opening tag
([^[]+) # all that is not a [ 1 or more times (capture group)
)
The two captures have now the same number #1
Note: if you are sure that each caption tags aren't on several lines, you can add the m modifier at the end of the pattern.
Note2: all quantifiers are possessive and i use atomic groups when it's possible for quick fails and better performances.
Hint (and not an answer, per se)
Your best method of action would be:
Match everything after caption.
preg_match("#\[caption(.*?)\]#", $q, $match)
Use an explode function for extracting values in $match[1], if any.
explode(' ', trim($match[1]))
Check the values in array returned, and use in your code accordingly.
can regex able to find a patter to this?
{{foo.bar1.bar2.bar3}}
where in the groups would be
$1 = foo $2 = bar1 $3 = bar2 $4 = bar3 and so on..
it would be like re-doing the expression over and over again until it fails to get a match.
the current expression i am working on is
(?:\{{2})([\w]+).([\w]+)(?:\}{2})
Here's a link from regexr.
http://regexr.com?3203h
--
ok I guess i didn't explain well what I'm trying to achieve here.
let's say I am trying to replace all
.barX inside a {{foo . . . }}
my expected results should be
$foo->bar1->bar2->bar3
This should work, assuming no braces are allowed within the match:
preg_match_all(
'%(?<= # Assert that the previous character(s) are either
\{\{ # {{
| # or
\. # .
) # End of lookbehind
[^{}.]* # Match any number of characters besides braces/dots.
(?= # Assert that the following regex can be matched here:
(?: # Try to match
\. # a dot, followed by
[^{}]* # any number of characters except braces
)? # optionally
\}\} # Match }}
) # End of lookahead%x',
$subject, $result, PREG_PATTERN_ORDER);
$result = $result[0];
I'm not a PHP person, but I managed to construct this piece of code here:
preg_match_all("([a-z0-9]+)",
"{{foo.bar1.bar2.bar3}}",
$out, PREG_PATTERN_ORDER);
foreach($out[0] as $val)
{
echo($val);
echo("<br>");
}
The code above prints the following:
foo
bar1
bar2
bar3
It should allow you to exhaustively search a given string by using a simple regular expression. I think that you should also be able to get what you want by removing the braces and splitting the string.
I don't think so, but it's relatively painless to just split the string on periods like so:
$str = "{{foo.bar1.bar2.bar3}}";
$str = str_replace(array("{","}"), "", $str);
$values = explode(".", $str);
print_r($values); // Yields an array with values foo, bar1, bar2, and bar3
EDIT: In response to your question edit, you could replace all barX in a string by doing the following:
$str = "{{foo.bar1.bar2.bar3}}";
$newStr = preg_replace("#bar\d#, "hi", $str);
echo $newStr; // outputs "{{foo.hi.hi.hi}}"
I don't know the correct syntax in PHP, for pulling out the results, but you could do:
\{{2}(\w+)(?:\.(\w+))*\}{2}
That would capture the first hit in the first capturing group and the rest in second capturing group. regexr.com is lacking the ability to show that as far as I can see though. Try out Expresso, and you'll see what I mean.
Quite simple problem (but difficult solution): I got a string in PHP like as follows:
['one']['two']['three']
And from this, i must extract the last tags, so i finally got three
it is also possible that there is a number, like
[1][2][3]
and then i must get 3
How can i solve this?
Thanks for your help!
Flo
Your tag is \[[^\]]+\].
3 Tags are: (\[[^\]]+\]){3}
3 Tags at end are: (\[[^\]]+\]){3}$
N Tags at end are: (\[[^\]]+\])*$ (N 0..n)
Example:
<?php
$string = "['one']['two']['three'][1][2][3]['last']";
preg_match("/((?:\[[^\]+]*\]){3})$/", $string, $match);
print_r($match); // Array ( [0] => [2][3]['last'] [1] => [2][3]['last'] )
This tested code may work for you:
function getLastTag($text) {
$re = '/
# Match contents of last [Tag].
\[ # Literal start of last tag.
(?: # Group tag contents alternatives.
\'([^\']+)\' # Either $1: single quoted,
| (\d+) # or $2: un-quoted digits.
) # End group of tag contents alts.
\] # Literal end of last tag.
\s* # Allow trailing whitespace.
$ # Anchor to end of string.
/x';
if (preg_match($re, $text, $matches)) {
if ($matches[1]) return $matches[1]; // Either single quoted,
if ($matches[2]) return $matches[2]; // or non quoted digit.
}
return null; // No match. Return NULL.
}
Here is a regex that may work for you. Try this:
[^\[\]']*(?='?\]$)