PHP - preg_match/preg_replace problems - php

I'm a little confused with preg_match and preg_replace. I have a very long content string (from a blog), and I want to find, separate and replace all [caption] tags. Possible tags can be:
[caption]test[/caption]
[caption align="center" caption="test" width="123"]<img src="...">[/caption]
[caption caption="test" align="center" width="123"]<img src="...">[/caption]
etc.
Here's the code I have (but I'm finding that's it not working the way I want it to...):
public function parse_captions($content) {
if(preg_match("/\[caption(.*) align=\"(.*)\" width=\"(.*)\" caption=\"(.*)\"\](.*)\[\/caption\]/", $content, $c)) {
$caption = $c[4];
$code = "<div>Test<p class='caption-text'>" . $caption . "</p></div>";
// Here, I'd like to ONLY replace what was found above (since there can be
// multiple instances
$content = preg_replace("/\[caption(.*) width=\"(.*)\" caption=\"(.*)\"\](.*)\[\/caption\]/", $code, $content);
}
return $content;
}

The goal is to ignore the content position. You can try this:
$subject = <<<'LOD'
[caption]test1[/caption]
[caption align="center" caption="test2" width="123"][/caption]
[caption caption="test3" align="center" width="123"][/caption]
LOD;
$pattern = <<<'LOD'
~
\[caption # begining of the tag
(?>[^]c]++|c(?!aption\b))* # followed by anything but c and ]
# or c not followed by "aption"
(?| # alternation group
caption="([^"]++)"[^]]*+] # the content is inside the begining tag
| # OR
]([^[]+) # outside
) # end of alternation group
\[/caption] # closing tag
~x
LOD;
$replacement = "<div>Test<p class='caption-text'>$1</p></div>";
echo htmlspecialchars(preg_replace($pattern, $replacement, $subject));
pattern (condensed version):
$pattern = '~\[caption(?>[^]c]++|c(?!aption\b))*(?|caption="([^"]++)"[^]]*+]|]([^[]++))\[/caption]~';
pattern explanation:
After the begining of the tag you could have content before ] or the caption attribute. This content is describe with:
(?> # atomic group
[^]c]++ # all characters that are not ] or c, 1 or more times
| # OR
c(?!aption\b) # c not followed by aption (to avoid the caption attribute)
)* # zero or more times
The alternation group (?| allow multiple capture groups with the same number:
(?|
# case: the target is in the caption attribute #
caption=" # (you can replace it by caption\s*+=\s*+")
([^"]++) # all that is not a " one or more times (capture group)
"
[^]]*+ # all that is not a ] zero or more times
| # OR
# case: the target is outside the opening tag #
] # square bracket close the opening tag
([^[]+) # all that is not a [ 1 or more times (capture group)
)
The two captures have now the same number #1
Note: if you are sure that each caption tags aren't on several lines, you can add the m modifier at the end of the pattern.
Note2: all quantifiers are possessive and i use atomic groups when it's possible for quick fails and better performances.

Hint (and not an answer, per se)
Your best method of action would be:
Match everything after caption.
preg_match("#\[caption(.*?)\]#", $q, $match)
Use an explode function for extracting values in $match[1], if any.
explode(' ', trim($match[1]))
Check the values in array returned, and use in your code accordingly.

Related

preg_match_all Compilation failed: range out of order in character class at offset

I have trouble to find specific object with preg_match_all pattern. I have a text. But I would like to find just one specific
Like I have a string of text
sadasdasd:{"website":["https://bitcoin.org/"]tatic/cloud/img/coinmarketcap_grey_1.svg?_=60ffd80');display:inline-block;background-position:center;background-repeat:no-repeat;background-size:contain;width:239px;height:41px;} .cqVqre.cmc-logo--size-large{width:263px;height:45px;}
/* sc-component-id: sc-2wt0ni-0 */
However I just need to find "website":["https://bitcoin.org/"]. Where website is dynamic data. Such as website can be a google "website":["https://google.com/"]
Right now I have something like this. That's just return a bulk of urls. I need just specific
$parsePage = "sadasdasd:{"website":["https://bitcoin.org/"]tatic/cloud/img/coinmarketcap_grey_1.svg?_=60ffd80');display:inline-block;background-position:center;background-repeat:no-repeat;background-size:contain;width:239px;height:41px;} .cqVqre.cmc-logo--size-large{width:263px;height:45px;}
/* sc-component-id: sc-2wt0ni-0 */";
$pattern = '/
\"website":[" # [ character
(?: # non-capturing group
[^{}] # anything that is not a { or }
| # OR
(?R) # recurses the entire pattern
)* # previous group zero or more times
\"] # ] character
/x';
preg_match_all($pattern, $parsePage, $matches);
print_r($matches[0]);
Try :
$pattern = '~"website":\["([^"]*)"~'

preg_split shortcode attributes into array

I would like to parse shortcode into array via "preg_split".
This is example shortcode:
[contactform id="8411" label="This is \" first label" label2='This is second \' label']
and this should be result array:
Array
(
[id] => 8411
[label] => This is \" first label
[label2] => This is second \' label
)
I have this regexp:
$atts_arr = preg_split('~\s+(?=(?:[^\'"]*[\'"][^\'"]*[\'"])*[^\'"]*$)~', trim($shortcode, '[]'));
Unfortunately, this works only if there is no escaping of quotes \' or \".
Thx in advance!
Using preg_split is not always handy or appropriate in particular when you have to deal with escaped quotes. So, a better approach consists to use preg_match_all, example:
$pattern = <<<'EOD'
~
(\w+) \s*=
(?|
\s* "([^"\\]*(?:\\.[^"\\]*)*)"
|
\s* '([^'\\]*(?:\\.[^'\\]*)*)'
# | uncomment if you want to handle unquoted attributes
# ([^]\s]*)
)
~xs
EOD;
if (preg_match_all($pattern, $yourshortcode, $matches))
$attributes = array_combine($matches[1], $matches[2]);
The pattern uses the branch reset feature (?|...(..)...|...(...)..) that gives the same number(s) to the capture groups for each branch.
I was speaking about the \G anchor in my comment, this anchor succeeds if the current position is immediatly after the last match. It can be useful if you want to check the syntax of your shortcode from start to end at the same time (otherwise it is totally useless). Example:
$pattern2 = <<<'EOD'
~
(?:
\G(?!\A) # anchor for the position after the last match
# it ensures that all matches are contiguous
|
\[(?<tagName>\w+) # begining of the shortcode
)
\s+
(?<key>\w+) \s*=
(?|
\s* "(?<value>[^"\\]*(?:\\.[^"\\]*)*)"
|
\s* '([^'\\]*(?:\\.[^'\\]*)*')
# | uncomment if you want to handle unquoted attributes
# ([^]\s]*)
)
(?<end>\s*+]\z)? # check that the end has been reached
~xs
EOD;
if (preg_match_all($pattern2, $yourshortcode, $matches) && isset($matches['end']))
$attributes = array_combine($matches['key'], $matches['value']);

preg_match_all an array of items, then for each matches

I am editing some Interspire Email code. Currently the program goes through the HTML of the email before sending, and looks for 'a href' code, to replace the links. I want it to also go through and get form action="" and replace the urls in them (it does not currently). I think I can use the regex from this stack post:
PHP - Extract form action url from mailchimp subscribe form code using regex
but I'm having some difficulty wrapping my head around how to handle the arrays. The current code that just does the 'a href=' is below:
preg_match_all('%<a.+(href\s*=\s*(["\']?[^>"\']+?))\s*.+>%isU', $this->body['h'], $matches);
$links_to_replace = $matches[2];
$link_locations = $matches[1];
arsort($link_locations);
reset($links_to_replace);
reset($link_locations);
foreach ($link_locations as $tlinkid => $url) {
// so we know whether we need to put quotes around the replaced url or not.
$singles = false;
$doubles = false;
// make sure the quotes are matched up.
// ie there is either 2 singles or 2 doubles.
$quote_check = substr_count($url, "'");
if (($quote_check % 2) != 0) {
...
I know (or I think I know), that I need to replace preg_match_all with:
preg_match_all(array('%<a.+(href\s*=\s*(["\']?[^>"\']+?))\s*.+>%isU', '|form action="([^"]*?)" method="post" id="formid"|i'), $this->body['h'], $matches);
but then how are the '$matches' handled?
$links_to_replace = $matches[2];
$link_locations = $matches[1];
does not still hold true does it? Is it possible to do what I'm thinking? Or would I need to write another function just to handle the 'forms action=' seperate from the 'a href'
A suggestion:
$pattern = <<<'LOD'
~
(?| # branch reset feature: allows to have the same named
# capturing group in an alternation. ("type" here)
<a\s # the link case
(?> # atomic group: possible content before the "href" attribute
[^h>]++ # all that is not a "h" or the end of the tag ">"
|
\Bh++ # all "h" not preceded by a word boundary
|
h(?!ref\s*+=) # all "h" not followed by "ref=" or "ref ="
)*+ # repeat the atomic group zero or more times.
(?<type> href )
| #### OR ####
<form\s # the form case
(?> # possible content before the "action" attribute. (same principle)
[^a>]++
|
\Ba++
|
a(?!ction\s*+=)
)*+
(?<type> action )
)
\s*+ = \s*+ # optional spaces before and after the "=" sign
\K # resets all on the left from match result
(?<quote> ["']?+ )
(?<url> [^\s"'>]*+ )
\g{quote} # backreference to the "quote" named capture (", ', empty)
~xi
LOD;
Note that this pattern will only match the url with possible quotes. However, the attribute name will be stored inside the named capture group "type" if you need it.
Then you can use all of this with:
$html = preg_replace_callback($pattern,
function ($m) {
$url = $m['url'];
$type = lowercase($m['type']);
$quote = $m['quote'];
// make what you want with the url, type and quotes
return $quote . $url . $quote;
}, $html);

PHP: Get last Tag of a String with Regular Expressions

Quite simple problem (but difficult solution): I got a string in PHP like as follows:
['one']['two']['three']
And from this, i must extract the last tags, so i finally got three
it is also possible that there is a number, like
[1][2][3]
and then i must get 3
How can i solve this?
Thanks for your help!
Flo
Your tag is \[[^\]]+\].
3 Tags are: (\[[^\]]+\]){3}
3 Tags at end are: (\[[^\]]+\]){3}$
N Tags at end are: (\[[^\]]+\])*$ (N 0..n)
Example:
<?php
$string = "['one']['two']['three'][1][2][3]['last']";
preg_match("/((?:\[[^\]+]*\]){3})$/", $string, $match);
print_r($match); // Array ( [0] => [2][3]['last'] [1] => [2][3]['last'] )
This tested code may work for you:
function getLastTag($text) {
$re = '/
# Match contents of last [Tag].
\[ # Literal start of last tag.
(?: # Group tag contents alternatives.
\'([^\']+)\' # Either $1: single quoted,
| (\d+) # or $2: un-quoted digits.
) # End group of tag contents alts.
\] # Literal end of last tag.
\s* # Allow trailing whitespace.
$ # Anchor to end of string.
/x';
if (preg_match($re, $text, $matches)) {
if ($matches[1]) return $matches[1]; // Either single quoted,
if ($matches[2]) return $matches[2]; // or non quoted digit.
}
return null; // No match. Return NULL.
}
Here is a regex that may work for you. Try this:
[^\[\]']*(?='?\]$)

Select a block of YAML with regex

I have a big YAML file and I want to select an entire node using regex. For instance:
Node1:
Child:
GrandChild: foo
Node2:
AnotherChild:
AnotherGrandChild: bar
Node3:
LastChild:
LastGrandChild: foo
How can I use regex to select all of Node2 in the above example, and return:
Node2:
AnotherChild:
AnotherGrandChild: bar
As everything else in that node is indented (if I understand YAML right), this does work at least at your example string:
$mask = '~(^%s:\n(?:^[ ].*\n?)*$)~m';
$pattern = sprintf($mask, 'Node2');
$r = preg_match($pattern, $yaml, $matches);
$node = reset($matches);
At least on my computer. Wanted to make a codepad demo, but it gives error. Will check the regex.
Full blown:
$yaml = <<<EOD
Node1:
Child:
GrandChild: foo
Node2:
AnotherChild:
AnotherGrandChild: bar
Node3:
LastChild:
LastGrandChild: foo
EOD;
$mask = '~
( # start matching group
^ # a node start always at the beginning of a line
%s: # placeholder for sprintf for the nodname + :
$ # end of line for the nodename
\n
(?: # non-matching group to hold all subsequent, indented lines
^ # beginning of sublines
(?:[ ]{2})+ # indentation is required, always a muliple of two spaces, non matching group
.*\n? # match anything else on that subsequent line, optionally the newline character
)* # 0 or more subsequent, indented lines
)$ # this ends a line, to not take over the newline of the last subsequent line (see \n? above).
# the following are modifiers:
# m - pcre multiline modifier (in php same as in perl)
# x - to allow spaces and the comments all over here ;)
~mx
';
$pattern = sprintf($mask, 'Node2');
$r = preg_match($pattern, $yaml, $matches);
$node = reset($matches);
var_dump($node);
You probably want to use a library like php-yaml.

Categories