The company I work for have asked me to give them the ability to place a modal box on the web page from the CMS, but do not want to type HTML. As I cannot for the life of me understand regex I can't get it.
The layout of the code they should type is this:
++modal++
Some paragraph text.
Another paragraph.
++endmodal++
The paragraphs are already converted by markdown into <p>paragraph</p>.
So really the match has to be ++modal++ any number of A-Za-z0-9any symbol excluding + ++endmodal++ then replaced with HTML.
I'm not sure it preg_match or preg_replace should be used.
I got this far:
$string = '++modal++<p>Hello</p>++endmodal++';
$pattern = '/\+\+modal\+\+/';
preg_match($pattern, $string, $matches);
Thank you in advance.
EDIT: A to be a bit more clear, I wish to replace the ++modal++ and ++endmodal++ with HTML and leave the middle bit as is.
I don't really think you need a RegEx here as your delimiters remain always the same and always on the same position of the string. Regular expressions are also expensive on resources and as a third counter argument you said you're not fit with them.
So why not use a simple replacement or string trimming if it comes to that.
$search = array('++modal++', '++endmodal++');
$replacement = array('<tag>', '</tag>');
$str = '++modal++<p>Hello</p>++endmodal++';
$result = str_replace($search, $replacement, $str);
Where, of course, '<tag>' and '</tag>' are just example placeholders for your replacement.
This is what the manual for str_replace() says:
If you don't need fancy replacing rules (like regular expressions),
you should always use this function instead of preg_replace().
I think you should get your desired content using:
preg_match('/\+\+modal\+\+([^\+]+)\+\+endmodal\+\+/', $string, $matches)
$matches[1] = '<p>Hello</p>
You're trying to re-invent the wheel here. You're trying to write a simple template system here, but there are dozens of templating tools for PHP that you could use, ranging from big and complex like Smarty and Twig to really simple ones that aren't much more than you're trying to write.
I haven't used them all, so rather than recommend one I'll point you to a list of template engines you could try. You'll probably find more with a quick bit of googling.
If you do insist on writing your own, it's important to consider security. If you're outputting anything that contains data entered by your users, you must make sure all your output is properly escaped and sanitised for display on a web page; there a numerous common hacks that can take advantage of an insecure templating system to completely compromise a site.
<?php
$string = '++modal++<p>Hello</p>++endmodal++';
$patterns = array();
$patterns[0] = "/\+\+modal\+\+/"; // put '\' just before +
$patterns[1] = "/\+\+endmodal\+\+/";
$replacements = array();
$replacements[1] = '<html>';
$replacements[0] = '</html>';
echo preg_replace($patterns, $replacements, $string);
?>
Very similar to this example
Related
So I started a post before but it got closed :( since then I have managed to progress a little realizing I need to somehow grab the content inside the [code][/code] tags and do a str_replace() on the smiley bbcode text within them, here is what I have so far but its not working
if (preg_match_all('~[code](.*?)[\/code]~i', $row['message'], $match)){
foreach($match[1] AS $key) {
$find = array(':)',':(',':P',':D',':O',';)','B)',':confused:',':mad:',':redface:',':rolleyes:',':unsure:');
$replace = array(':)',':(',':P',':D',':O',';)','B)',':confused:',':mad:',':redface:',':rolleyes:',':unsure:');
}
$message = str_replace($find, $replace, $key);
} else {
$message = $row['message'];
}
it just returns no message content at all.
if i change this line:
$message = str_replace($find, $replace, $key);
to this:
$message = str_replace($find, $replace, $row['message']);
it sort of works but replaces all smileys inside the whole message rather then just the content inside the [code][/code] tags which I assume is being represented by $key?! ...any help please its causing my brain to overload!
I did find this question which is different but very relevant to mine but there was no real answer to it.
I think it might be easier for you to use an existing BBCode parser (eg. NBBC) or at least checking how they did that. And they did it in a much more intelligent way. Rather than using regexp, they use a lexer, which splits it into separate tags (shown below). Then, they just don’t do anything to a [code] tag. If you still want to stay with your solution, I would, for one, split them out to arrays (bbcode = explode('[code]', bbcode) and then the same for [code]. Every second list element should not have other parsing used. As stated before: there is no purpose in reinventing the wheel, so don’t do that.
This is how their solution works:
[b]Hello![/b]
[code]
#/usr/bin/python
import antigravity
[/code]
Python is much more awesome.
Becomes eg. an array with the following elements:
[b]Hello![/b]
[code]…[/code]
Python is much more awesome.
And then it applies the formatting it should apply. Much better and more human.
not a comment due to length.
I want to give users some formatting options like Reddit or stackOverFlow do, but want to keep it in PHP. How can parse a string in PHP such that it recognizes patterns like **[anything here]**?
explode() doesn't seem to solve the problem as elegantly as I'd like. Should I just use nested ifs and fors based on explode()'s output? Is there a better solution here?
This has already been done countless times by others, I'd recommend using existing libraries.
For example, you can use Markdown: http://michelf.com/projects/php-markdown/
Check regular expressions:
$string = '*bold* or _underscored_ or even /italic/ ..';
// italic
$string = preg_replace('~/([^/]+)/~', '<i>$1</i>', $string);
// bold
$string = preg_replace('/\*([^\*]+)\*/', '<b>$1</b>', $string);
// underscore
$string = preg_replace('/_([^_]+)_/', '<u>$1</u>', $string);
echo $string;
output:
<b>bold</b> or <u>underscored</u> or even <i>italic</i> ..
or use a BBCode parser.
I am trying to grab what is the h4 text
$regex = '/<h4>([A-Za-z0-9\,\.])/';
I am just getting the first letter back, I cannot figure out how to use * to keep grabbing everything to the first < character.
I have made countless attempts and know I am overlooking something simple.
So I was making that much harder than I needed to, the following works:
$regex = '/<h4>.*?<\/h4>/';
If you can trust that grabbing all characters up to the first < is a good enough rule then use this:
$regex = '/<h4>([^<]*?)</';
Of course that definition will only grab 'The ' from <h4>The <b>Best</b> Book</h4> You can fix that be changing it to:
$regex = '/<h4>(.*?)<\/h4>/';
Which will grab everything between a <h4> and a </h4>, but still isn't perfect because anything like <h4 > or <h4 style="..."> will break it, along with a million other valid HTML examples. If you know that the contents won't have any < though, and you know your tag will always be exactly <h4> the first one works well enough for your situation.
If your situation is more complex you will want to use something like PHP's DOM extension (DOMDocument) which is meant for parsing HTML and XML, since neither are regular languages and cannot be parsed error free with regex.
You can use the below function to accomplish this task.
**function getTextBetweenTags($string, $tagname) {
$pattern = "/<$tagname ?.*>(.*)<\/$tagname>/";
preg_match($pattern, $string, $matches);
return $matches;
}**
In the first parameter you have to pass the complete string, and in the second parameter you have to pass the tagname ("h4")..
I have an array like this:
$keywords = array( 'php', 'html', 'css' );
I have a db query to return a paragraph, which contains the keywords previously mentioned in the array.
I have a link template like this:
$linktpl = '%s';
I want a simple function to scan that paragraph and on the fly, whenever it finds a keyword it converts it to a link using the link template above.
And if possible I want it to take into account singular and plural (like framework and frameworks)
and is it safe for SEO to make this automated keyword linking?
Any Ideas?
$string = 'this is the php test subject.';
// associate keywords with their urls
$urls = array(
'php' => 'http://www.php.net',
// and etc...
);
// this callback will take the matches from preg and generate the
// html link making use of the $urls dictionary
$linker = function($matches) use($urls) {
$urlKey = strtolower($matches[1]);
return sprintf(
'%s',
$urls[$urlKey], $matches[1], $matches[1]
);
};
// do the magic
$regex = '/\b(' . preg_quote(implode('|', $keywords), '/') . ')\b/i';
preg_replace_callback($regex, $linker, $string);
Advantage of using regular expressions is that we can leverage the \b modifier to ensure we catch cases such as (php), PHP., or phpp and deal with them properly.
This will work but isn't necessarily the best way. It joins your array with pipe characters, and uses that string to build a regex. preg_replace() then does the rest. Requires that you change your link template to use the preg_replace() style instead of the printf() style
preg_replace("/\b(" . implode("|", $keywords) .")\b/", "<a href='\\1'>\\1</a>", $paragraph);
EDIT: added \b word boundaries so you only match whole words and not inner substrings.
$paragraph = /* YOUR PARAGRAPH CONTENT */;
$paragraph = str_replace( array( 'php' , 'html' , 'css' ) , array( 'PHP' , 'HTML' , 'CSS' ) , $paragraph );
First up, this can be way more complicated then it seems. Namely, this will replace words that are inside of a word, IE if we had script the term javascript would be half link, half word. I dunno if you care. One way to fix it, would be to add spaces before and after the word. But again, this as it's issues, as what about punctuations? (.,!?) etc.
Depending on your needs you may need to do some regex and complicate it up. There is also the note that you could be creating links within links, if your text can contain links.
Just some items to think about. I think there are quite a few examples of this on SO already so it may be worth to search this site to see what you can find. Given the over complexity, I am not able to provide that code. If you just need the simple method, the others who have posted, should work just fine.
Some references:
Replacing keywords in text with php & mysql
For your main question, one of the above 3 answers should suffice.
Regarding this question :
and is it safe for SEO to make this automated keyword linking?
It is safe enough..
But there are some concerns which need be addressed
Check page 13 in this SEO Guide by
Google. So, it is always better
to have good anchor text. I assume
through this method you won't get a
very proper one.
As Brad explained, don't overdo it.
Hence , may be have only 2-3
keywords per page, 1 link per
keyword in a paragraph and a total
of 6-7 links in a page. You need to
be careful in not having lot of
links.
"The title attribute specifies extra
information about an element." So
dumping just a keyword over there
may not help.
It is always better to go for manual methods rather than automation for SEO'ing your stuff.
str_replace() will almost definitely be the fastest way of performing the search/replace
I'd suggest you first build you array of search words, then the replacements and then perform the replace.
$searches = array("php", "html", "css");
$replacements = array();
while($row = mysql_fetch_assoc($r) {
$replacements[] = sprintf($linktmpl, $row['url'], $row['title'], $row['word']);
}
$html = str_replace($searches, $replacements, $html);
and is it safe for SEO to make this automated keyword linking?
Depends on how you make use of it. If it's too obvious and search engines see a pattern you might wake up one day and find your sites banned from SERPS.
Consider this string
hello awesome <a href="" rel="external" title="so awesome is cool"> stuff stuff
What regex could I use to match any occurence of awesome which doesn't appear within the title attribute of the anchor?
So far, this is what I've came up with (it doesn't work sadly)
/[^."]*(awesome)[^."]*/i
Edit
I took Alan M's advice and used a regex to capture every word and send it to a callback. Thanks Alan M for your advice. Here is my final code.
$plantDetails = end($this->_model->getPlantById($plantId));
$botany = new Botany_Model();
$this->_botanyWords = $botany->getArray();
foreach($plantDetails as $key=>$detail) {
$detail = preg_replace_callback('/\b[a-z]+\b/iU', array($this, '_processBotanyWords'), $detail);
$plantDetails[$key] = $detail;
}
And the _processBotanyWords()...
private function _processBotanyWords($match) {
$botanyWords = $this->_botanyWords;
$word = $match[0];
if (array_key_exists($word, $botanyWords)) {
return '' . $word . '';
} else {
return $word;
}
}
Hope this well help someone else some day! Thanks again for all your answers.
This subject comes up pretty much every day here and basically the issue is this: you shouldn't be using regular expressions to parse or alter HTML (or XML). That's what HTML/XML parsers are for. The above problem is just one of the issues you'll face. You may get something that mostly works but there'll still be corner cases where it doesn't.
Just use an HTML parser.
Asssuming this is related to the question you posted and deleted a little while ago (that was you, wasn't it?), it's your fundamental approach that's wrong. You said you were generating these HTML links yourself by replacing words from a list of keywords. The trouble is that keywords farther down the list sometimes appear in the generated title attributes and get replaced by mistake--and now you're trying to fix the mistakes.
The underlying problem is that you're replacing each keyword using a separate call to preg_replace, effectively processing the entire text over and over again. What you should do is process the text once, matching every single word and looking it up in your list of keywords; if it's on the list, replace it. I'm not set up to write/test PHP code, but you probably want to use preg_replace_callback:
$text = preg_replace_callback('/\b[A-Za-z]+\b/', "the_callback", $text);
"the_callback" is the name of a function that looks up the word and, if it's in the list, generates the appropriate link; otherwise it returns the matched word. It may sound inefficient, processing every word like this, but in fact it's a great deal more efficient than your original approach.
Sure, using a parsing library is the industrial-strength solution, but we all have times were we just want to write something in 10 seconds and be done. Next time you want to process the meaty text of a page, ignoring tags, try just run your input through strip_tags first. This way you will get only the plain, visible text and your regex powers will again reign supreme.
This is so horrible I hesitate to post it, but if you want a quick hack, reverse the problem--instead of finding the stuff that isn't X, find the stuff that IS, change it, do the thing and change it back.
This is assuming you're trying to change awesome (to "wonderful"). If you're doing something else, adjust accordingly.
$string = 'Awesome is the man who <b>awesome</b> does and awesome is.';
$string = preg_replace('#(title\s*=\s*\"[^"]*?)awesome#is', "$1PIGDOG", $string);
$string = preg_replace('#awesome#is', 'wonderful', $string);
$string = preg_replace('#pigdog#is', 'awesome', $string);
Don't vote me down. I know it's hack.