Regex to change quotation mark conflict - php

I'm trying to create a PHP regex to filter my content on WordPress. I would like to transform quotation marks " " like that « » with non-breaking space.
I also use Timber (TWIG) filter to achieve this.
The problem is that this filter also changes url tags and image tags.
Example :
My link
<a href=« http://www.example.com »>My link</a>
What could I add in my regex to avoid this? Can I have some help please.
functions.php
public function add_to_twig( $twig ) {
$twig->addExtension( new Twig_Extension_StringLoader() );
$twig->addFilter( new Timber\Twig_Filter( 'changemarks', 'changemarks' ) );
return $twig;
}
function changemarks( $text ) {
$regex = '/"(.*?)"/';
$subst = '« $1 »';
$result = preg_replace($regex, $subst, $text);
return $result;
}
single.twig
{{ post.content|changemarks }}

It's difficult to make regular expression in html my solution is just select the string by make some space before and after " for text
$regex = '/ "(.+?)" /';

Related

How to preg_match_all to get the text inside the tags "<h3>" and "<h3> <a/> </h3>"

Hello I am currently creating an automatic table of contents my wordpress web. My reference from
https://webdeasy.de/en/wordpress-table-of-contents-without-plugin/
Problem :
Everything goes well unless in the <h3> tag has an <a> tag link. It make $names result missing.
I see problems because of this regex section
preg_match_all("/<h[3,4](?:\sid=\"(.*)\")?(?:.*)?>(.*)<\/h[3,4]>/", $content, $matches);
// get text under <h3> or <h4> tag.
$names = $matches[2];
I have tried modifying the regex (I don't really understand this)
preg_match_all (/ <h [3,4] (?: \ sid = \ "(. *) \")? (?:. *)?> <a (. *)> (. *) <\ / a> <\ / h [3,4]> /", $content, $matches)
// get text under <a> tag.
$names = $matches[4];
The code above work for to find the text that is in the <h3> <a> a text </a> <h3> tag, but the h3 tag which doesn't contain the <a> tag is a problem.
My Question :
How combine code above?
My expectation is if when the first code result does not appear then it is execute the second code as a result.
Or maybe there is a better solution? Thank you.
Here's a way that will remove any tags inside of header tags
$html = <<<EOT
<h3>Here's an alternative solution</h3> to using regex. <h3>It may <a name='#thing'>not</a></h3> be the most elegant solution, but it works
EOT;
preg_match_all('#<h(.*?)>(.*?)<\/h(.*?)>#si', $html, $matches);
foreach ($matches[0] as $num=>$blah) {
$look_for = preg_quote($matches[0][$num],"/");
$tag = str_replace("<","",explode(">",$matches[0][$num])[0]);
$replace_with = "<$tag>" . strip_tags($matches[2][$num]) . "</$tag>";
$html = preg_replace("/$look_for/", $replace_with,$html,1);
}
echo "<pre>$html</pre>";
The answer #kinglish is the base of this solution, thank you very much. I slightly modify and simplify it according to my question article link. This code worked for me:
preg_match_all('#(\<h[3-4])\sid=\"(.*?)\"?\>(.*?)(<\/h[3-4]>)#si',$content, $matches);
$tags = $matches[0];
$ids = $matches[2];
$raw_names = $matches[3];
/* Clean $rawnames from other html tags */
$clean_names= array_map(function($v){
return trim(strip_tags($v));
}, $raw_names);
$names = $clean_names;

How do I use RegEx in a Twig/Timber filter?

I'm trying to use preg_replace and a regex to remove the brackets [ and ] and all characters inside of them from the text output of a Twig v2 template using a Twig filter.
An example of the content output by Twig is
An example of content is [caption id="attachment_4487"
align="alignright" width="500"] Lorem Ipsum caption blah
blah[/caption] and more content.
I want to remove everything inside the [ and ] brackets, leaving the Lorem Ipsum caption blah blah, as well as the An example of content is, etc.
The problem is that right now, I get no content displayed at all when the filter is used. The issue may be the filter construction for Twig; but I get no errors in the log. I've tested the regex at https://regex101.com/r/sN5hYk/1 but that could still be the issue.
The existing Twig filter https://twig.symfony.com/doc/2.x/filters/striptags.html strips html, but doesn't strip brackets.
This is my filter function in functions.php:
add_filter('timber/twig', function($twig) {
$twig->addExtension(new Twig_Extension_StringLoader());
$twig->addFilter(
new Twig_SimpleFilter(
'strip_square_brackets',
function($string) {
$string = preg_replace('\[.*?\]', '', $string);
return $string;
}
)
);
return $twig;
});
And the filter is called the standard way in the template.twig file:
{{ content|strip_square_brackets }}
The problem is that you replace a string, not a regex. Of course that string won't match.
Your line with Regex should look like:
$string = preg_replace(/\[.*?\]/, '', $string);
Now you're replacing a regex match with the empty string.
Bonus Edit:
The answer to your bonus question:
/\[.*?\].*\[.*?\]/
Basically it just doubles the match and matches everything in between.
An alternative, which is more robust, if you always use 'caption':
/\[caption.*/caption\]/
the above code is working for me except I have used a little tweak. This regular Expression is for removing visual composer short codes.
add_filter('timber/twig', function($twig) {
$twig->addExtension(new Twig_Extension_StringLoader());
$twig->addFilter(
new Twig_SimpleFilter(
'strip_square_brackets',
function($string) {
$regexExp = "/\[(\/*)?vc_(.*?)\]/";
$string = preg_replace($regexExp, '', $string);
return $string;
}
)
);
return $twig;
});
and in twig file:
{{post.content|strip_square_brackets}}
Tweak is:
$regexExp = "/\[(\/*)?vc_(.*?)\]/";
$string = preg_replace($regexExp, '', $string);

PHP - Inner HTML recursive replace

I need to perform a recursive str_replace on a portion of HTML (with recursive I mean inner nodes first), so I wrote:
$str = //get HTML;
$pttOpen = '(\w+) *([^<]{1,100}?)';
$pttClose = '\w+';
$pttHtml = '(?:(?!(?:<x-)).+)';
while (preg_match("%<x-(?:$pttOpen)>($pttHtml)*</x-($pttClose)>%m", $str, $match)) {
list($outerHtml, $open, $attributes, $innerHtml, $close) = $match;
$newHtml = //some work....
str_replace($outerHtml, $newHtml, $str);
}
The idea is to first replace non-nested x-tags.
But it only works if innerHtml in on the same line of the opening tag (so I guess I misunderstood what the /m modifier does). I don't want to use a DOM library, because I just need simple string replacement. Any help?
Try this regex:
%<x-(?P<open>\w+)\s*(?P<attributes>[^>]*)>(?P<innerHtml>.*)</x-(?P=open)>%s
Demo
http://regex101.com/r/nA2zO5
Sample code
$str = // get HTML
$pattern = '%<x-(?P<open>\w+)\s*(?P<attributes>[^>]*)>(?P<innerHtml>.*)</x-(?P=open)>%s';
while (preg_match($pattern, $str, $matches)) {
$newHtml = sprintf('<ns:%1$s>%2$s</ns:%1$s>', $matches['open'], $matches['innerHtml']);
$str = str_replace($matches[0], $newHtml, $str);
}
echo htmlspecialchars($str);
Output
Initially, $str contained this text:
<x-foo>
sdfgsdfgsd
<x-bar>
sdfgsdfg
</x-bar>
<x-baz attr1='5'>
sdfgsdfg
</x-baz>
sdfgsdfgs
</x-foo>
It ends up with:
<ns:foo>
sdfgsdfgsd
<ns:bar>
sdfgsdfg
</ns:bar>
<ns:baz>
sdfgsdfg
</ns:baz>
sdfgsdfgs
</ns:foo>
Since, I didn't know what work is done on $newHtml, I mimic this work somehow by replacing x-with ns: and removing any attributes.
Thanks to #Alex I came up with this:
%<x-(?P<open>\w+)\s*(?P<attributes>[^>]*?)>(?P<innerHtml>((?!<x-).)*)</x-(?P=open)>%is
Without the ((?!<x-).)*) in the innerHtml pattern it won't work with nested tags (it will first match outer ones, which isn't what I wanted). This way innermost ones are matched first. Hope this helps.
I don't know exactly what kind of changes you are trying to do, however this is the way I will proceed:
$pattern = <<<'EOD'
~
<x-(?<tagName>\w++) (?<attributes>[^>]*+) >
(?<content>(?>[^<]++|<(?!/?x-))*) #by far more efficient than (?:(?!</?x-).)*
</x-\g<tagName>>
~x
EOD;
function callback($m) { // exemple function
return '<n-' . $m['tagName'] . $m['attributes'] . '>' . $m['content']
. '</n-' . $m['tagName'] . '>';
};
do {
$code = preg_replace_callback($pattern, 'callback', $code, -1, $count);
} while ($count);
echo htmlspecialchars(print_r($code, true));

PHP regular expression to replace links

I have this replace regex (it's taken from the phpbb source code).
$match = array(
'#<!\-\- ([mw]) \-\-><a (?:class="[\w-]+" )?href="(.*?)" target\=\"_blank\">.*?</a><!\-\- \1 \-\->#',
'#<!\-\- .*? \-\->#s',
'#<.*?>#s',
);
$replace = array( '\2', '', '');
$message = preg_replace($match, $replace, $message);
If I run it through a message like this
asdfafdsfdfdsfds
<!-- m --><a class="postlink" href="http://website.com/link-is-looooooong.txt">http://website.com/link ... oooong.txt</a><!-- m -->
asdfafdsfdfdsfds4324
It returns this
asdfafdsfdfdsfds
http://website.com/link ... oooong.txt
asdfafdsfdfdsfds4324
However I would like to make it into a replace function. So I can replace the link title in a block by providing the href.
I want to provide the url, new url and new title. So I can run a regex with these variables.
$url = 'http://website.com/link-is-looooooong.txt';
$new_title = 'hello';
$new_url = 'http://otherwebsite.com/';
And it would return the same raw message but with the link changed.
<!-- m --><a class="postlink" href="http://otherwebsite.com/">hello</a><!-- m -->
I've tried tweaking it into something like this but I can't get it right. I don't know how to build up the matched result so it has the same format after replacing.
$message = preg_replace('#<!\-\- ([mw]) \-\-><a (?:class="[\w-]+" )?href="'.preg_quote($url).'" target\=\"_blank\">(.*?)</a><!\-\- \1 \-\->#', $replace, $message);
You'll find that parsing HTML with regex can be a pain and get very complex. Your best bet is to use a DOM parser, like this one, and modify the links with that instead.
You need to catch the other parts in groups as well and then use them in the replacement. try something like this:
$replace = '\1http://otherwebsite.com/\3hello\4';
$reg = '#(<!-- ([mw]) --><a (?:class="[\w-]+" )?href=")'.preg_quote($url).'("(?: target="_blank")?>).*?(</a><!-- \2 -->)#';
$message = preg_replace($reg, $replace, $message);
See here.

PHP regular expression to remove tags in HTML document

Say I have the following text
..(content).............
<A HREF="http://foo.com/content" >blah blah blah </A>
...(continue content)...
I want to delete the link and I want to delete the tag (while keeping the text in between). How do I do this with a regular expression (since the URLs will all be different)
Much thanks
This will remove all tags:
preg_replace("/<.*?>/", "", $string);
This will remove just the <a> tags:
preg_replace("/<\\/?a(\\s+.*?>|>)/", "", $string);
Avoid regular expressions whenever you can, especially when processing xml. In this case you can use strip_tags() or simplexml, depending on your string.
<?php
//example to extract the innerText from all anchors in a string
include('simple_html_dom.php');
$html = str_get_html('<A HREF="http://foo.com/content" >blah blah blah </A><A HREF="http://foo.com/content" >blah blah blah </A>');
//print the text of each anchor
foreach($html->find('a') as $e) {
echo $e->innerText;
}
?>
See PHP Simple DOM Parser.
Not pretty but does the job:
$data = str_replace('</a>', '', $data);
$data = preg_replace('/<a[^>]+href[^>]+>/', '', $data);
strip_tags() can also be used.
Please see examples here.
$pattern = '/href="([^"]*)"/';
I use this to replace the anchors with a text string...
function replaceAnchorsWithText($data) {
$regex = '/(<a\s*'; // Start of anchor tag
$regex .= '(.*?)\s*'; // Any attributes or spaces that may or may not exist
$regex .= 'href=[\'"]+?\s*(?P<link>\S+)\s*[\'"]+?'; // Grab the link
$regex .= '\s*(.*?)\s*>\s*'; // Any attributes or spaces that may or may not exist before closing tag
$regex .= '(?P<name>\S+)'; // Grab the name
$regex .= '\s*<\/a>)/i'; // Any number of spaces between the closing anchor tag (case insensitive)
if (is_array($data)) {
// This is what will replace the link (modify to you liking)
$data = "{$data['name']}({$data['link']})";
}
return preg_replace_callback($regex, array('self', 'replaceAnchorsWithText'), $data);
}
use str_replace

Categories