I have a preg_replace question. I am using preg_replace to generate contextual links in text blocks using the following code:
$contextualLinkStr = 'mytext';
$content = 'My text string which includes MYTEXT in various cases such as Mytext and mytext. However it also includes image tags such as <img src="http://www.myurl.com/mytext-1.jpg">';
$content = preg_replace('/' . $contextualLinkStr . '/i', '\\0', $content);
The preg_replace is working well on the text and generating the relevant links while retaining case but it's also generating a link within the URL of the image tag. I was thinking If I simply added a trailing space to the expression in the preg_replace function it would fix it due to the fact that all text instances will have a trailing space whereas no image urls will, as follows:
$content = preg_replace('/' . $contextualLinkStr . '/i' . ' ', '\\0' . ' ', $content);
But this doesn't work. Can anybody tell me how I make the trailing space a condition of the match?
Thanks in advance.
Jason.
I've just worked it out guys. I was being daft. For reference the relevant code is:
$content = preg_replace('/' . $contextualLinkStr . ' /i', '\\0 ', $content);
Thanks.
Related
Very(!) new to regex but...
I have the following text strings outputted from a $title variable:
A. This is a title
B. This is another title
etc...
I'm after the following:
<span>A.</span> This is a title
<span>B.</span> This is another title
etc...
Currently I have the following code:
$title = $element['#title'];
if (preg_match("([A-Z][\.])", $title)) {
return '<li' . drupal_attributes($element['#attributes']) . ">Blarg</li>\n";
} else {
return '<li' . drupal_attributes($element['#attributes']) . '>' . $output . $sub_menu . "</li>\n";
}
This replaces anything A. through to Z. with Blarg however I'm not sure how to progress this?
In the Text Wrangler app I could wrap regex in brackets and output each argument like so:
argument 1 = \1
argument 2 = \2
etc...
I know I need to add an additional regex to grab the remainder of the text string.
Perhaps a regex guru could help and novice out!
Thanks,
Steve
Try
$title = 'A. This is a title';
$title = preg_replace('/^[A-Z]\./', '<span>$0</span>', $title);
echo $title;
// <span>A.</span> This is a title
If the string contains newlines and other titles following them, add the m modifier after the ending delimiter.
If the regex doesn't match then no replacements will be made, so there is no need for the if statement.
Is it always just 2 char ("A.", "B.", "C.",...)
because then you could work with a substring instead of regex.
Just pick of the first 2 chars of the link and wrap the span around the substring
Try this (untested):
$title = $element['#title'];
if (preg_match("/([A-Z]\.)(.*)/", $title, $matches)) {
return '<li' . drupal_attributes($element['#attributes']) . "><span>{$matches[0]</span>{$matches[1]}</li>\n";
} else {
return '<li' . drupal_attributes($element['#attributes']) . '>' . $output . $sub_menu . "</li>\n";
}
The change here was to first add / to the start and end of the string (to denote it's a regex), then remove the [ and ] around the period . because that's just a literal character on its own, then to add another grouping which will match the rest of the string. I also Added a $matches to preg_match() to place these two matches in to to use later, which we do on the next life.
Note: You could also do this instead:
$title = preg_replace('/^([A-Z]\.)/', "<span>$1</span>", $title);
This will simply replace the A-Z followed by the period at the start of the string (denoted with the ^ character) with <span>, that character (grabbed with the brackets) and </span>.
Again, that's not tested, but should give you a headstart :)
I'm trying to write a code library for my own personal use and I'm trying to come up with a solution to linkify URLs and mail links. I was originally going to go with a regex statement to transform URLs and mail addresses to links but was worried about covering all the bases. So my current thinking is perhaps use some kind of tag system like this:
l:www.google.com becomes http://www.google.com and where m:john.doe#domain.com becomes john.doe#domain.com.
What do you think of this solution and can you assist with the expression? (REGEX is not my strong point). Any help would be appreciated.
Maybe some regex like this :
$content = "l:www.google.com some text m:john.doe#domain.com some text";
$pattern = '/([a-z])\:([^\s]+)/'; // One caracter followed by ':' and everything who goes next to the ':' which is not a space or tab
if (preg_match_all($pattern, $content, $results))
{
foreach ($results[0] as $key => $result)
{
// $result is the whole matched expression like 'l:www.google.com'
$letter = $results[1][$key];
$content = $results[2][$key];
echo $letter . ' ' . $content . '<br/>';
// You can put str_replace here
}
}
I am working on my Wordpress blog and its required to get the title of a post and split it at the "-". Thing is, its not working, because in the source its &ndash and when I look at the result on the website, its a "long minus" (–). Copying and pasting this long minus into some editor makes it a normal minus (-). I cant split at "-" nor at &ndash, but somehow it must be possible. When I created the article, I just typed "-" (minus), but somewhere it gets converted to – automatically.
Any ideas?
Thanks!
I think I found it. I remember that I have meet the similar problem that when I paste code in my post the quote mark transform to an em-quad one when display to readers.
I found that is in /wp-include/formatting.php line 56 (wordpress ver 3.3.1), it defined some characters need to replace
$static_characters = array_merge( array('---', ' -- ', '--', ' - ', 'xn–', '...', '``', '\'\'', ' (tm)'), $cockney );
$static_replacements = array_merge( array($em_dash, ' ' . $em_dash . ' ', $en_dash, ' ' . $en_dash . ' ', 'xn--', '…', $opening_quote, $closing_quote, ' ™'), $cockneyreplace );
and in line 85 it make an replacement
// This is not a tag, nor is the texturization disabled static strings
$curl = str_replace($static_characters, $static_replacements, $curl);
If you want to split a string at the "-" character, basically you must replace "-" with a space.
Try this:
$string_to_be_stripped = "my-word-test";
$chars = array('-');
$new_string = str_replace($chars, ' ', $string_to_be_stripped);
echo $new_string;
These lines splits the string at the "-". For example, if you have my-word-test, it will echo "my word test". I hope it helps.
For more information about the str_replace function click here.
If you want to do this in a WordPress style, try using filters. I suggest placing these lines in your functions.php file:
add_filter('the_title', function($title) {
$string_to_be_stripped = $title;
$chars = array('-');
$new_string = str_replace($chars, ' ', $string_to_be_stripped);
return $new_string;
})
Now, everytime you use the_title in a loop, the title will be escaped.
This regex is used to replace text links with a clickable anchor tag.
#(?<!href="|">)((?:https?|ftp|nntp)://[^\s<>()]+)#i
My problem is, I don't want it to change links that are in things like <iframe src="http//... or <embed src="http://...
I tried checking for a whitespace character before it by adding \s, but that didn't work.
Or - it appears they're first checking that an href=" doesn't already exist (?) - maybe I can check for the other things too?
Any thoughts / explanations how I would do this is greatly appreciated. Main, I just need the regex - I can implement in CakePHP myself.
The actual code comes from CakePHP's Text->autoLink():
function autoLinkUrls($text, $htmlOptions = array()) {
$options = var_export($htmlOptions, true);
$text = preg_replace_callback('#(?<!href="|">)((?:https?|ftp|nntp)://[^\s<>()]+)#i', create_function('$matches',
'$Html = new HtmlHelper(); $Html->tags = $Html->loadConfig(); return $Html->link($matches[0], $matches[0],' . $options . ');'), $text);
return preg_replace_callback('#(?<!href="|">)(?<!http://|https://|ftp://|nntp://)(www\.[^\n\%\ <]+[^<\n\%\,\.\ <])(?<!\))#i',
create_function('$matches', '$Html = new HtmlHelper(); $Html->tags = $Html->loadConfig(); return $Html->link($matches[0], "http://" . $matches[0],' . $options . ');'), $text);
}
You can expand the lookbehind at the beginning of those regexes to check for src=" as well as href=", like this:
(?<!href="|src="|">)
Ok, basically I have an array of bad urls and I would like to search through a string and strip them out. I want to strip everything from the opening tag to the closing tag, but only if the url in the hyperlink is in the array of bad urls. Here is how I would picture it working but I don't understand regular expressions well.
foreach($bad_urls as $bad_url){
$pattern = "/<a*$bad_url*</a>/";
$replacement = ' ';
preg_replace($pattern, $replacement, $content);
}
Thanks in advance.
Assuming that your 'bad urls' are properly formatted URLs, I would suggest doing something like this:
foreach($bad_urls as $bad_url){
$pattern = '/<[aA]\s.+[href|HREF]\=\"' . convert_to_pattern($bad_url) . '\".+<\/[aA]>/msU';
$replacement = ' ';
$content = preg_replace_all($pattern, $replacement, $content);
}
and separately
function convert_to_pattern($url)
{
searches = array('%', '&', '?', '.', '/', ';', ' ');
replaces = array('\%','\&','\?','\.','\/','\;','\ ');
return preg_replace_all($searches, $replaces, $url);
}
Please do not try to parse HTML using regular expressions. Just load up the HTML in a DOM, find all the <a> tags and check the href property. Much simpler and fool-proof.