preg_replace() str_replace() apostrophe nightmare! - Drupal menu image replace - php

Can anyone help me decode why this doesnt work?
$cssid = preg_replace("/'/", "", $cssid);
Trying to strip the single quote marks from some html...
Thanks!
H
EDIT
This is the full function - it's designed to rebuild the Drupal menu using images, and it applies CSS classes to each item, allowing you to select the image you want. Stripping out spaces and apostrophes needs to be done or the CSS selector fails.
The title of the menu item causing all this problem is:
What's new
Pretty innocuous you'd think. (Except for that single ')
function primary_links_add_icons() {
$links = menu_primary_links();
$level_tmp = explode('-', key($links));
$level = $level_tmp[0];
$output = "<ul class=\"links-$level\">\n";
if ($links) {
foreach ($links as $link) {
$link = l($link['title'], $link['href'], $link['attributes'], $link['query'], $link['fragment']);
$cssid = str_replace(' ', '_', strip_tags($link));
$cssid = str_replace('\'', '', $cssid);
/*$link = preg_replace('#(<a.*?>).*?(</a>)#', '$1$2', $link);*/
$output .= '<li id="'.$cssid.'">' . $link .'</li>';
};
$output .= '</ul>';
}
return $output;
}
EDIT the saga continues...
I notice that I get the following error in PHPMYADMIN:
The mbstring PHP extension was not
found and you seem to be using a
multibyte charset. Without the
mbstring extension phpMyAdmin is
unable to split strings correctly and
it may result in unexpected results.
I wonder whether this has something to do with it?
In any case the the SQL code is:
('primary-links', 951, 0, 'http://www.google.com', '', 'What''s New',
And this displays in FireBug once it's been rendered as:
<li id="What's_New">
I've created a menu item called "What#s New" and the str_replace() will work on that just fine, so it's ALL about this goddam apostrophe. I think I agree, the expression works, but it has to be an encoding problem. It really is a proper, common, apostrophe and not one of the variants, but for some reason PHP is absolutely unable to recognise it as such.
EDIT oh god oh god - it's Drupal again...
It appears that the function l() which formats all the links is completely impervious to having it's output rewritten?! Whatever the case, this code works...
function primary_links_add_icons() {
$links = menu_primary_links();
$level_tmp = explode('-', key($links));
$level = $level_tmp[0];
$output = "<ul class=\"links-$level\">\n";
if ($links) {
foreach ($links as $link) {
$link['title'] = str_replace('\'', '', $link['title']);
$link = l($link['title'], $link['href'], $link['attributes'], $link['query'], $link['fragment']);
$cssid = str_replace(' ', '_', strip_tags($link));
/*$link = preg_replace('#(<a.*?>).*?(</a>)#', '$1$2', $link);*/
$output .= '<li id="'.$cssid.'">' . $link .'</li>';
};
$output .= '</ul>';
}
return $output;
}
2 hours later and I can carry on theming this site...
Thank you so much for all your suggestions, I'm going to point the drupal snippet authors at this post so hopefully other people will benefit from it too.

CSS image replacement is a much more commonly chosen way for menu item replacing:
First install: Menu Attributes module to be able to assing css id's for every menu item. (these attributes can be set from the menu item edit page on admin panel)
Then use css image replacement. Here is a good tutorial for this.
And this is the method i use for my sites:
#primary-tv
{
display: block;
width: 90px;
height: 0px;
padding-top: 41px;
background: url(images/nghtv.png);
}
This is an example for replacement with an image of 90 x 41px
And for the apostrophe replacement:
$cssid = preg_replace("'","",htmlspecialchars($cssid, ENT_QUOTES));

escape the single quote.

Your code looks fine. But why don’t you use str_replace as you’re replacing a fixed string?
$cssid = str_replace("'", "", $cssid);

If str_replace("'","") doesn't work, are you sure the characters you want to remove are indeed normal apostrophes (') instead of weird alternatives (’), or some weird accent marks (´`˙ ̛̉῾᾿) or single quotes (‘’) or whatnot?
Or maybe the value of $cssid gets replaced back to original by some other bug?
Maybe you're looking at the wrong output for results?
Or by a far chance, are you accidentally running a different copy of the code than the one you are editing - btw, that's really annoying when it happens! :)

Given that it is HTML have you considered that it may be represented as &#39 rather than '?

Related

PHP Looping Through Replacing Tags

I'm trying to do custom tags for links, colour and bullet points on a website so [l]...[/l] gets replaced by the link inside and [li]...[/li] gets replaced by a bullet point list.
I've got it half working but there's a problem with the link descriptions, heres the code:
// Takes in a paragraph, replaces all square-bracket tags with HTML tags. Calls the getBetweenTags() method to get the text between the square tags
function replaceTags($text)
{
$tags = array("[l]", "[/l]", "[list]", "[/list]", "[li]", "[/li]");
$html = array("<a style='text-decoration:underline;' class='common_link' href='", "'>" . getBetweenTags("[l]", "[/l]", $text) . "</a>", "<ul>", "</ul>", "<li>", "</li>");
return str_replace($tags, $html, $text);
}
// Tages in the start and end tag along with the paragraph, returns the text between the two tags.
function getBetweenTags($tag1, $tag2, $text)
{
$startsAt = strpos($text, $tag1) + strlen($tag1);
$endsAt = strpos($text, $tag2, $startsAt);
return substr($text, $startsAt, $endsAt - $startsAt);
}
The problem I'm having is when I have three links:
[l]http://www.example1.com[/l]
[l]http://www.example2.com[/l]
[l]http://www.example3.com[/l]
The links get replaced as:
http://www.example1.com
http://www.example1.com
http://www.example1.com
They are all hyperlinked correctly i.e. 1,2,3 but the text bit is the same for all links.
You can see it in action here at the bottom of the page with the three random links. How can i change the code to make the proper URL descriptions appear under each link - so each link is properly hyperlinked to the corresponding page with the corresponding text showing that URL?
str_replace does all the grunt work for you. The problem is that:
getBetweenTags("[l]", "[/l]", $text)
doesn't change. It will match 3 times but it just resolves to "http://www.example1.com" because that's the first link on the page.
You can't really do a static replacement, you need to keep at least a pointer to where you are in the input text.
My advise would be to write a simple tokenizer/ parser. It's actually not that hard. The tokenizer can be really simple, find all [ and ] and derive tags. Then your parser will try to make sense of the tokens. Your token stream can look like:
array(
array("string", "foo "),
array("tag", "l"),
array("string", "http://example"),
array("endtag", "l"),
array("string", " bar")
);
Here is how I would use preg_match_all instead personally.
$str='
[l]http://www.example1.com[/l]
[l]http://www.example2.com[/l]
[l]http://www.example3.com[/l]
';
preg_match_all('/\[(l|li|list)\](.+?)(\[\/\1\])/is',$str,$m);
if(isset($m[0][0])){
for($x=0;$x<count($m[0]);$x++){
$str=str_replace($m[0][$x],$m[2][$x],$str);
}
}
print_r($str);

PHP Regular Expression - Single Quote not working - TWIG pre-escaping

I got a problem with single quotes in a regular expression.
What i want to do is replace smileys in a string to a html image tag.
All smileys are working, except the sad smiley :'-( because it has a single quote in it.
Magic Quotes is turned off (testet with if (g!et_magic_quotes_gpc()) dd('mq off');).
So, let me show you some code.
protected $emoticons = array(
// ...
'cry' => array(
'image' => '<img class="smiley" src="/image/emoticon/cry.gif" />',
'emoticons' => array(":'(", ";'(", ":'-(", ";'-(")
),
);
My method to replace all the emoticons is the following:
public function replaceEmoticons($input) {
$output = $input;
foreach ($this->emoticons as $emo_group_name => $emo_group) {
$regex_emo_part = array();
foreach ($emo_group['emoticons'] as $emoticon) {
$regex_emo_part[] = preg_quote($emoticon, '#');
}
$regex_emo_part = implode('|', $regex_emo_part);
$regex = '#(?!<\w)(' . $regex_emo_part .')(?!\w)#';
$output = preg_replace($regex, $emo_group['image'], $output);
}
return $output;
}
But as i said: ' kills it. No replacement there. :-) :-/ and so on are working. Why?
FYI Content of $regex: #(?!<\w)(\:\'\(|;\'\(|\:\'\-\(|;\'\-\()(?!\w)#
What is wrong here, can you help me?
UPDATE:
Thanks # cheery and cychoi. The replacing method is okay, you've got right.
I found the problem. My string gets escaped before it is forwarded to the replaceEmoticons method. I use TWIG templating engine and i use |nl2br filter before my selfmade replace_emoticon filter.
Let me show you. This is the output in the final template. It is a template to show a comment for an blog entry:
{{ comment.content|nl2br|replace_emoticons|raw }}
Problem: nl2br is auto pre-escaping the input string, so ' gets replaced by the escaped one '
I need this nl2br to show linebreakes as <br /> - and i need the escaping too, to disallow html tags in the user's input.
I need replace_emoticons to replace my emoticons (selfmade TWIG extension).
And i need raw here at the end of the filter chain too, otherwise all HTML smiley img tags gets escaped and i will see raw html in the comment's text.
What can i do here? The only problem here seems to be that nl2br escapes ' too. This is no bad idea but in my case it will destroy all sad smileyss containing ' in it.
Still searching for a solution to solve this and i hope you can help me.
Best,
titan
I added an optional parameter to the emoticon method:
public function replaceEmoticons($input, $use_emo_encoding_for_regex = true) {
and i changed the foreach part a lil' bit:
foreach ($emo_group['emoticons'] as $emoticon) {
if ($use_emo_encoding_for_regex === true) {
$emoticon = htmlspecialchars($emoticon, ENT_QUOTES);
}
$regex_emo_part[] = preg_quote($emoticon, '#');
}
It works! All emoticons are replaced!

Replacing words with tag links in PHP

I have a text ($text) and an array of words ($tags). These words in the text should be replaced with links to other pages so they don't break the existing links in the text. In CakePHP there is a method in TextHelper for doing this but it is corrupted and it breaks the existing HTML links in the text. The method suppose to work like this:
$text=Text->highlight($text,$tags,'\1',1);
Below there is existing code in CakePHP TextHelper:
function highlight($text, $phrase, $highlighter = '<span class="highlight">\1</span>', $considerHtml = false) {
if (empty($phrase)) {
return $text;
}
if (is_array($phrase)) {
$replace = array();
$with = array();
foreach ($phrase as $key => $value) {
$key = $value;
$value = $highlighter;
$key = '(' . $key . ')';
if ($considerHtml) {
$key = '(?![^<]+>)' . $key . '(?![^<]+>)';
}
$replace[] = '|' . $key . '|ix';
$with[] = empty($value) ? $highlighter : $value;
}
return preg_replace($replace, $with, $text);
} else {
$phrase = '(' . $phrase . ')';
if ($considerHtml) {
$phrase = '(?![^<]+>)' . $phrase . '(?![^<]+>)';
}
return preg_replace('|'.$phrase.'|i', $highlighter, $text);
}
}
You can see (and run) this algorithm here:
http://www.exorithm.com/algorithm/view/highlight
It can be made a little better and simpler with a few changes, but it still isn't perfect. Though less efficient, I'd recommend one of Ben Doom's solutions.
Replacing text in HTML is fundamentally different than replacing plain text. To determine whether text is part of an HTML tag requires you to find all the tags in order not to consider them. Regex is not really the tool for this.
I would attempt one of the following solutions:
Find the positions of all the words. Working from last to first, determine if each is part of a tag. If not, add the anchor.
Split the string into blocks. Each block is either a tag or plain text. Run your replacement(s) on the plain text blocks, and re-assemble.
I think the first one is probably a bit more efficient, but more prone to programmer error, so I'll leave it up to you.
If you want to know why I'm not approaching this problem directly, look at all the questions on the site about regex and HTML, and how regex is not a parser.
This code works just fine. What you may need to do is check the CSS for the <span class="highlight"> and make sure it is set to some color that will allow you to distinguish that it is high lighted.
.highlight { background-color: #FFE900; }
Amorphous - I noticed Gert edited your post. Are the two code fragments exactly as you posted them?
So even though the original code was designed for highlighting, I understand you're trying to repurpose it for generating links - it should, and does work fine for that (tested as posted).
HOWEVER escaping in the first code fragment could be an issue.
$text=Text->highlight($text,$tags,'\1',1);
Works fine... but if you use speach marks rather than quote marks the backslashes disappear as escape marks - you need to escape them. If you don't you get %01 links.
The correct way with speach marks is:
$text=Text->highlight($text,$tags,"\\1",1);
(Notice the use of \1 instead of \1)

PHP - Strings - Remove a HTML tag with a specific class, including its contents

I have a string like this:
<div class="container">
<h3 class="hdr"> Text </h3>
<div class="main">
text
<h3> text... </h3>
....
</div>
</div>
how do I remove the H3 tag with the .hdr class using as little code as possible ?
Using as little code as possible? Shortest code isn't necessarily best. However, if your HTML h3 tag always looks like that, this should suffice:
$html = preg_replace('#<h3 class="hdr">(.*?)</h3>#', '', $html);
Generally speaking, using regex for parsing HTML isn't a particularly good idea though.
Something like this is what you're looking for...
$output = preg_replace("#<h3 class=\"hdr\">(.*?)</h3>#is", "", $input);
Use "is" at the end of the regex because it will cause it to be case insensitive which is more flexible.
Stumbled upon this via Google - for anyone else feeling dirty using regex to parse HTML, here's a DOMDocument solution I feel much safer with going:
function removeTagByClass(string $html, string $className) {
$dom = new \DOMDocument();
$dom->loadHTML($html);
$finder = new \DOMXPath($dom);
$nodes = $finder->query("//*[contains(concat(' ', normalize-space(#class), ' '), ' {$className} ')]");
foreach ($nodes as $node) {
$node->parentNode->removeChild($node);
}
return $dom->saveHTML();
}
Thanks to this other answer for the XPath query.
try a preg_match, then a preg_replace on the following pattern:
/(<h3
[\s]+
[^>]*?
class=[\"\'][^\"\']*?hdr[^\"\']*?[\"\']
[^>]*?>
[\s\S\d\D\w\W]*?
<\/h3>)/i
It's messy, and it should work fine only if the h3 tag doesn't have inline javascript which might contain sequences that this regular expression will react to. It is far from perfect, but in simple cases where h3 tag is used it should work.
Haven't tried it though, might need adjustments.
Another way would be to copy that function, use your copy, without the h3, if it's possible.
This would help someone if above solutions dont work. It remove iframe and content having tag '-webkit-overflow-scrolling: touch;' like i had :)
RegEx, or regular expressions is code for what you would like to remove, and PHP function preg_replace() will remove all div or divs matching, or replacing them with something else. In the examples below, $incoming_data is where you put all your content before removing elements, and $result is the final product. Basically we are telling the code to find all divs with class=”myclass” and replace them with ” ” (nothing).
How to remove a div and its contents by class in PHP
Just change “myclass” to whatever class your div has.
$result = preg_replace('#<div class="myclass">(.*?)</div>#', ' ',
$incoming_data);
How to remove a div and its contents by ID in PHP
Just change “myid” to whatever ID your div has.
$result = preg_replace('#(.*?)#', ' ', $incoming_data);
If your div has multiple classes?
Just change “myid” to whatever ID your div has like this.
$result = preg_replace('#<div id="myid(.*?)</div>#', ' ', $incoming_data);
or if div don’t have an ID, filter on the first class of the div like this.
$result = preg_replace('#<div class="myclass(.*?)</div>#', ' ', $incoming_data);
How to remove all headings in PHP
This is how to remove all headings.
$result = preg_replace('#<h1>(.*?)</h1>#', ' ', $incoming_data);
and if the heading have a class, do something like this:
$result = preg_replace('#<h1 class="myclass">(.*?)</h1>#', ' ', $incoming_data);
Source: http://www.lets-develop.com/html5-html-css-css3-php-wordpress-jquery-javascript-photoshop-illustrator-flash-tutorial/php-programming/remove-div-by-class-php-remove-div-contents/
$content = preg_replace('~(.*?)~', '', $content);
Above code only works if the div haves are both on the same line. what if they aren't?
$content = preg_replace('~[^|]*?~', '', $content);
This works even if there is a line break in between but fails if the not so used | symbol is in between anyone know a better way?

Need a regex to add css class to first and last list item

UPDATE:
Thank you all for your input. Some additional information.
It's really just a small chunk of markup (20 lines) I'm working with and had aimed to to leverage a regex to do the work.
I also do have the ability to hack up the script (an ecommerce one) to insert the classes as the navigation is built. I wanted to limit the number of hacks I have in place to keep things easier on myself when I go to update to the latest version of the software.
With that said, I'm pretty aware of my situation and the various options available to me. The first part of my regex works as expected. I posted really more or less to see if someone would say, "hey dummy, this is easy just change this....."
After coming close with a few of my efforts, it's more of the principle at this point. To just know (and learn) a solution exists for this problem. I also hate being beaten by a piece of code.
ORIGINAL:
I'm trying to leverage regular expressions to add a CSS a class to the first and last list items within an ordered list. I've tried a bunch of different ways but can't produce the results I'm looking for.
I've got a regular expression for the first list item but can't seem to figure a correct one out for the last. Here is what I'm working with:
$patterns = array('/<ul+([^<]*)<li/m', '/<([^<]*)(?<=<li)(.*)<\/ul>/s');
$replace = array('<ul$1<li class="first"','<li class="last"$2$3</ul>');
$navigation = preg_replace($patterns, $replace, $navigation);
Any help would be greatly appreciated.
Jamie Zawinski would have something to say about this...
Do you have a proper HTML parser? I don't know if there's anything like hpricot available for PHP, but that's the right way to deal with it. You could at least employ hpricot to do the first cleanup for you.
If you're actually generating the HTML -- do it there. It looks like you want to generate some navigation and have a .first and .last kind of thing on it. Take a step back and try that.
+1 to generating the right html as the best option.
But a completely different approach, which may or may not be acceptable to you: you could use javascript.
This uses jquery to make it easy ...
$(document).ready(
function() {
$('#id-of-ul:firstChild').addClass('first');
$('#id-of-ul:lastChild').addClass('last');
}
);
As I say, may or may not be any use in this case, but I think its a valid solution to the problem in some cases.
PS: You say ordered list, then give ul in your example. ol = ordered list, ul = unordered list
You wrote:
$patterns = array('/<ul+([^<]*)<li/m','/<([^<]*)(?<=<li)(.*)<\/ul>/s');
First pattern:
ul+ => you search something like ullll...
The m modifier is useless here, since you don't use ^ nor $.
Second pattern:
Using .* along with s is "dangerous", because you might select the whole document up to the last /ul of the page...
And well, I would just drop s modifier and use: (<li\s)(.*?</li>\s*</ul>) with replace: '$1class="last" $2'
In view of above remarks, I would write the first expression: <ul.*?>\s*<li
Although I am tired of seeing the Jamie Zawinski quote each time there is a regex question, Dustin is right in pointing you to a HTML parser (or just generating the right HTML from the start!): regexes and HTML doesn't mix well, because HTML syntax is complex, and unless you act on a well known machine generated output with very predictable result, you are prone to get something breaking in some cases.
I don't know if anyone cares any longer, but I have a solution that works in my simple test case (and I believe it should work in the general case).
First, let me point out two things: While PhiLho is right in that the s is "dangerous", since dots may match everything up to the final of the document, this may very well be what you want. It only becomes a problem with not well formed pages. Be careful with any such regex on large, manually written pages.
Second, php has a special meaning of backslashes, even in single quotes. Most regexen will perform well either way, but you should always double-escape them, just in case.
Now, here's my code:
<?php
$navigation='<ul>
<li>Coffee</li>
<li>Tea</li>
<li>Milk</li>
<li>Beer</li>
<li>Water</li>
</ul>';
$patterns = array('/<ul.*?>\\s*<li/',
'/<li((.(?<!<li))*?<\\/ul>)/s');
$replace = array('$0 class="first"',
'<li class="last"$1');
$navigation = preg_replace($patterns, $replace, $navigation);
echo $navigation;
?>
This will output
<ul>
<li class="first">Coffee</li>
<li>Tea</li>
<li>Milk</li>
<li>Beer</li>
<li class="last">Water</li>
</ul>
This assumes no line feeds inside the opening <ul...> tag. If there are any, use the s modifier on the first expression too.
The magic happens in (.(?<!<li))*?. This will match any character (the dot) that is not the beginning of the string <li, repeated any amount of times (the *) in a non-greedy fashion (the ?).
Of course, the whole thing would have to be expanded if there is a chance the list items already have the class attribute set. Also, if there is only one list item, it will match twice, giving it two such attributes. At least for xhtml, this would break validation.
You could load the navigation in a SimpleXML object and work with that. This prevents you from breaking your markup with some crazy regex :)
As a preface .. this is waaay over-complicating things in most use-cases. Please see other answers for more sanity :)
Here is a little PHP class I wrote to solve a similar problem. It adds 'first', 'last' and any other classes you want. It will handle li's with no "class" attribute as well as those that already have some class(es).
<?php
/**
* Modify list items in pre-rendered html.
*
* Usage Example:
* $replaced_text = ListAlter::addClasses($original_html, array('cool', 'awsome'));
*/
class ListAlter {
private $classes = array();
private $classes_found = FALSE;
private $count = 0;
private $total = 0;
// No public instances.
private function __construct() {}
/**
* Adds 'first', 'last', and any extra classes you want.
*/
static function addClasses($html, $extra_classes = array()) {
$instance = new self();
$instance->classes = $extra_classes;
$total = preg_match_all('~<li([^>]*?)>~', $html, $matches);
$instance->total = $total ? $total : 0;
return preg_replace_callback('~<li([^>]*?)>~', array($instance, 'processListItem'), $html);
}
private function processListItem($matches) {
$this->count++;
$this->classes_found = FALSE;
$processed = preg_replace_callback('~(\w+)="(.*?)"~', array($this, 'appendClasses'), $matches[0]);
if (!$this->classes_found) {
$classes = $this->classes;
if ($this->count == 1) {
$classes[] = 'first';
}
if ($this->count == $this->total) {
$classes[] = 'last';
}
if (!empty($classes)) {
$processed = rtrim($matches[0], '>') . ' class="' . implode(' ', $classes) . '">';
}
}
return $processed;
}
private function appendClasses($matches) {
array_shift($matches);
list($name, $value) = $matches;
if ($name == 'class') {
$value = array_filter(explode(' ', $value));
$value = array_merge($value, $this->classes);
if ($this->count == 1) {
$value[] = 'first';
}
if ($this->count == $this->total) {
$value[] = 'last';
}
$value = implode(' ', $value);
$this->classes_found = TRUE;
}
return sprintf('%s="%s"', $name, $value);
}
}

Categories