We use custom bbcode in our news posts
[newsImage]imageName.jpg[/newsImage]
And i'd like to use preg_match to get the imageName.jpg from between those tags. The whole post is stored in a variable called $newsPost.
I'm new to regex and I just can't figure out the right expression to use in preg_match to get what I want.
Any help is appreciated. Also, do any of you know a good resource for learning what each of the characters in regex do?
preg_match_all('/\[newsImage\]([^\[]+)\[\/newsImage\]/i', $newsPost, $images);
The variable $images should then contain your list of matches.
http://www.php.net/manual/en/regexp.introduction.php
To answer your second question: A very good regex tutorial is regular-expressions.info.
Among other things, it also contains a regular expression syntax reference.
Since different regex flavors use a different syntax, you'll also want to look at the regex flavor comparison page.
As Rob said but escaping last ]
preg_match('/\[newsImage\]([^\[]+)\[newsImage\]/i', $newsPost, $images);
$images[1] will contain the name of image file.
This is not exactly what you asked for, but you can replace your [newsImage] tags with tags using the following code, its not perfect as it will fall down if you have an empty tag e.g. [newsImage][/newsImage]
function process_image_code($text) {
//regex looks for [newsImage]sometext[/newsImage]
$urlReg ="/((?:\[newsImage]{1}){1}.{1,}?(?:\[\/newsImage]){1})/i";
$pregResults = preg_split ($urlReg , $text, -1, PREG_SPLIT_DELIM_CAPTURE);
$output = "";
//loop array to build the output string
for($i = 0; $i < sizeof($pregResults); $i++) {
//if the array item has a regex match process the result
if(preg_match($urlReg, $pregResults[$i]) ) {
$pregResults[$i] = preg_replace ("/(?:\[\/newsImage]){1}/i","\" alt=\"Image\" border=\"0\" />",$pregResults[$i] ,1);
// find if it has a http:// at the start of the image url
if(preg_match("/(?:\[newsImage]http:\/\/?){1}/i",$pregResults[$i])) {
$pregResults[$i] = preg_replace ("/(?:\[newsImage]?){1}/i","<img src=\"",$pregResults[$i] ,1);
}else {
$pregResults[$i] = preg_replace ("/(?:\[newsImage]?){1}/i","<img src=\"http://",$pregResults[$i] ,1);
}
$output .= $pregResults[$i];
}else {
$output .= $pregResults[$i];
}
}
return $output;
}
Related
$text = "
<tag>
<html>
HTML
</html>
</tag>
";
I want to replace all the text present inside the tags with htmlspecialchars(). I tried this:
$regex = '/<tag>(.*?)<\/tag>/s';
$code = preg_replace($regex,htmlspecialchars($regex),$text);
But it doesn't work.
I am getting the output as htmlspecialchars of the regex pattern. I want to replace it with htmlspecialchars of the data matching with the regex pattern.
what should i do?
You're replacing the match with the pattern itself, you're not using the back-references and the e-flag, but in this case, preg_replace_callback would be the way to go:
$code = preg_replace_callback($regex,'htmlspecialchars',$text);
This will pass the mathces groups to htmlspecialchars, and use its return value as replacement. The groups might be an array, in which case, you can try either:
function replaceCallback($matches)
{
if (is_array($matches))
{
$matches = implode ('', array_slice($matches, 1));//first element is full string
}
return htmlspecialchars($matches);
}
Or, if your PHP version permits it:
preg_replace_callback($expr, function($matches)
{
$return = '';
for ($i=1, $j = count($matches); $i<$j;$i++)
{//loop like this, skips first index, and allows for any number of groups
$return .= htmlspecialchars($matches[$i]);
}
return $return;
}, $text);
Try any of the above, until you find simething that works... incidentally, if all you want to remove is <tag> and </tag>, why not go for the much faster:
echo htmlspecialchars(str_replace(array('<tag>','</tag>'), '', $text));
That's just keeping it simple, and it'll almost certainly be faster, too.
See the quickest, easiest way in action here
If you want to isolate the actual contents as defined by your pattern, you could use preg_match($regex,$text,$hits);. This will give you an array of hits those bits that were between the paratheses in the pattern, starting at $hits[1], $hits[0] contains the whole matched string). You can then start manipulating these found matches, possibly using htmlspecialchars ... and combine them again into $code.
I'm using the code at the bottom to grab parameters from a wordpress shortcode. The shortcode itself looks like this:
[FLOWPLAYER=http://www.tvovermind.com/wp-content/uploads/2013/01/pll-316-21.jpg|http://www.tvovermind.com/wp-content/uploads/2013/01/PLL316_fv2.h264HD-Clip2.flv,440,280]
Or
[FLOWPLAYER=http://www.tvovermind.com/wp-content/uploads/2013/01/pll-316-21.jpg|http://www.tvovermind.com/wp-content/uploads/2013/01/PLL316_fv2.h264HD-Clip2.flv,440,280,false]
What I would like to have happen is that if the extra parameter (false/true) is missing then that match becomes "false", however with the current code if the parameter is missing a match is never made. Any ideas?
function legacy_hook($content){
$regex = '/\[FLOWPLAYER=([a-z0-9\:\.\-\&\_\/\|]+)\,([0-9]+)\,([0-9]+)\,([a-z0-9\:\.\-\&\_\/\|]+)\]/i';
$matches = array();
preg_match_all($regex, $content, $matches);
if($matches[0][0] != '') {
foreach($matches[0] as $key => $data) {
$content = str_replace($matches[0][$key], flowplayer::build_player($matches[2][$key], $matches[3][$key], $matches[1][$key],$matches[4][$key]),$content);
}
}
return $content;
}
your regex is looking for the last comma to be there and one or more of the characters in the last set of brackets. Something like
/\[FLOWPLAYER=([a-z0-9\:\.\-\&\_\/\|]+)\,([0-9]+)\,([0-9]+)(\,[a-z]+)?\]/i
only issue is you'll get the comma in the match too.
might be what you're after, then you have to test for the last match being present. preg_match_all returns the number of matches so you might be able to use that, or you could do an inline if...
(count($matches) > 4 ? $matches[4][$key] : false)
You can add OR at the end of your expression
(,true|,false|$)
I didn't check does it work but you get the idea.
Here's my code:
$post = $_POST['test'];
$pattren='/((([http]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+#)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+#)[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\/+=&#?;%#,.\w_]*)#?(?:[\w]*)?))/';
preg_match_all( $pattren, $post, $matches);
foreach($matches[0] as $match) {
$images[]= "<a href=\"$match\" target=\"_blank\" >$match</a> ";
}
for ($i = 0, $c = count($images); $i < $c; $i++) {
$html_links = str_replace($pattren,$images[$i], $post);
}
echo $html_links;
I'm trying to get all urls from $post and convert them to links, but something is wrong.
There are many things wrong with this code, including:
Not sure where you've got your regular expression ($pattren) from, but it looks like complete gibberish to me - [http]{3,9}: means "any of the characters 'h', 't', or 'p', repeated between 3 and 9 times, followed by a colon" - so it would match "thppppppt:", which doesn't look much like the beginning of a URL to me.
str_replace has nothing to do with regular expressions, so str_replace($pattren, ... is looking for the text of that regular expression in the input.
In actual fact, I'm not sure what replacement you are expecting to happen in that loop, since you've already copied $match into the correct parts of the string.
You are over-writing the variable $html_links every time around your second loop. There is also no need for 2 loops, unless there is code not shown - you could simply build the string in the foreach loop and do away with the $images array altogether.
And, incidentally, you have spelled "pattern" wrong, and used an inconsistent convention for your curly-braces - some prefer the { on its own line, some on the line with the for/foreach, but you've managed one of each. [Neither of these will affect the code, though]
use preg_replace()
$post = $_POST['test'];
$pattren='%\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))%s';
$html_links = preg_replace($pattren, '$1', $post);
echo $html_links;
Updated with a good pattern from here.
I would like modify HTML like
I am <b>Sadi, novice</b> programmer.
to
I am <b>Sadi, learner</b> programmer.
To do it I will search using a string "novice programmer". How can I do it please? Any idea?
It search using more than one word "novice programmer". It could be a whole sentence. The extra white space (e.g. new line, tab) should be ignored and any tag must be ignored during the search. But during the replacement tag must be preserved.
It is a sort of converter. It will be better if it is case insensitive.
Thank you
Sadi
More clarification:
I get some nice reply with possible solution. But please keep posting if you have any idea in mind.
I would like to more clarify the problem just in case anyone missed it. Main post shows the problem as an example scenario.
1) Now the problem is find and replace some string without considering the tags. The tags may shows up within a single word. String may contain multiple word. Tag only appear in the content string or the document. The search phrase never contain any tags.
We can easily remove all tags and do some text operation. But here the another problem shows up.
2) The tags must be preserve, even after replacing the text. That is what the example shows.
Thank you Again for helping
ok i think this is what you want. it takes your input search and replace, splits them into arrays of strings delimited by space, generates a regexp that finds the input sentence with any number of whitespace/html tags, and replaces it with the replacement sentence with the same tags replaced between the words.
if the wordcount of the search sentence is higher than that of the replacement, it just uses spaces between any extra words, and if the replacement wordcount is higher than the search, it will add all 'orphaned' tags on the end. it also handles regexp chars in the find and replace.
<?php
function htmlFriendlySearchAndReplace($find, $replace, $subject) {
$findWords = explode(" ", $find);
$replaceWords = explode(" ", $replace);
$findRegexp = "/";
for ($i = 0; $i < count($findWords); $i++) {
$findRegexp .= preg_replace("/([\\$\\^\\|\\.\\+\\*\\?\\(\\)\\[\\]\\{\\}\\\\\\-])/", "\\\\$1", $findWords[$i]);
if ($i < count($findWords) - 1) {
$findRegexp .= "(\s?(?:<[^>]*>)?\s(?:<[^>]*>)?)";
}
}
$findRegexp .= "/i";
$replaceRegexp = "";
for ($i = 0; $i < count($findWords) || $i < count($replaceWords); $i++) {
if ($i < count($replaceWords)) {
$replaceRegexp .= str_replace("$", "\\$", $replaceWords[$i]);
}
if ($i < count($findWords) - 1) {
$replaceRegexp .= "$" . ($i + 1);
} else {
if ($i < count($replaceWords) - 1) {
$replaceRegexp .= " ";
}
}
}
return preg_replace($findRegexp, $replaceRegexp, $subject);
}
?>
here are the results of a few tests :
Original : <b>Novice Programmer</b>
Search : Novice Programmer
Replace : Advanced Programmer
Result : <b>Advanced Programmer</b>
Original : Hi, <b>Novice Programmer</b>
Search : Novice Programmer
Replace : Advanced Programmer
Result : Hi, <b>Advanced Programmer</b>
Original : I am not a <b>Novice</b> Programmer
Search : Novice Programmer
Replace : Advanced Programmer
Result : I am not a <b>Advanced</b> Programmer
Original : Novice <b>Programmer</b> in the house
Search : Novice Programmer
Replace : Advanced Programmer
Result : Advanced <b>Programmer</b> in the house
Original : <i>I am not a <b>Novice</b> Programmer</i>
Search : Novice Programmer
Replace : Advanced Programmer
Result : <i>I am not a <b>Advanced</b> Programmer</i>
Original : I am not a <b><i>Novice</i> Programmer</b> any more
Search : Novice Programmer
Replace : Advanced Programmer
Result : I am not a <b><i>Advanced</i> Programmer</b> any more
Original : I am not a <b><i>Novice</i></b> Programmer any more
Search : Novice Programmer
Replace : Advanced Programmer
Result : I am not a <b><i>Advanced</i></b> Programmer any more
Original : I am not a Novice<b> <i> </i></b> Programmer any more
Search : Novice Programmer
Replace : Advanced Programmer
Result : I am not a Advanced<b> <i> </i></b> Programmer any more
Original : I am not a Novice <b><i> </i></b> Programmer any more
Search : Novice Programmer
Replace : Advanced Programmer
Result : I am not a Advanced <b><i> </i></b> Programmer any more
Original : <i>I am a <b>Novice</b> Programmer</i> too, now
Search : Novice Programmer too
Replace : Advanced Programmer
Result : <i>I am a <b>Advanced</b> Programmer</i> , now
Original : <i>I am a <b>Novice</b> Programmer</i>, now
Search : Novice Programmer
Replace : Advanced Programmer Too
Result : <i>I am a <b>Advanced</b> Programmer Too</i>, now
Original : <i>I make <b>No money</b>, now</i>
Search : No money
Replace : Mucho$1 Dollar$
Result : <i>I make <b>Mucho$1 Dollar$</b>, now</i>
Original : <i>I like regexp, you can do [A-Z]</i>
Search : [A-Z]
Replace : [Z-A]
Result : <i>I like regexp, you can do [Z-A]</i>
I would do this:
if (preg_match('/(.*)novice((?:<.*>)?\s(?:<.*>)?programmer.*)/',$inString,$attributes) {
$inString = $attributes[1].'learner'.$attributes[2];
}
It should match any of the following:
novice programmer
novice</b> programmer
novice </b>programmer
novice<span> programmer
A test version of what the regex states would be something like: Match any set of characters until you reach "novice" and put it into a capturing group, then maybe match something that starts with a '<' and has any number of characters after it and then ends with '>' (but don't capture it), but then there only match something with a white space and then maybe match again something that starts with a '<' and has any number of characters after it and then ends with '>' (but don't capture it) which must then be followed by programmer followed by any number of characters and put that into a capture group.
I would do some specific testing though, as I may have missed some stuff. Regex is a programmers best friend!
Well, there might be a better way, but off the top of my head (assuming that tags won't appear in the middle of words, HTML is well-formed, etc.)...
Essentially, you'll need three things (sorry if this sounds patronising, not intended that way):
1. A method of sub-string matching that ignores tags.
2. A way of making the replacement preserving the tags.
3. A way of putting it all together.
1 - This is probably the most difficult bit. One method would be to iterate through all of the characters in the source string (strings are basically arrays of characters so you can access the characters as if they are array elements), attempting to match as many characters as possible from the search string, stopping when you've either matched all of the characters or run out of characters to match. Any characters between and including '<' and '>' should be ignored. Some pseudo-code (check this over, it's late and there may be mistakes):
findMatch(startingPos : integer, subject : string, searchString : string){
//Variables for keeping track of characters matched, positions, etc.
inTag = false;
matchFound = false;
matchedCharacters = 0;
matchStart = 0;
matchEnd = 0;
for(i from startingPos to length(searchString)){
//Work out when entering or exiting tags, ignore tag contents
if(subject[i] == '<' || subject[i] == '>'){
inTag = !inTag;
}
else if(!inTag){
//Check if the character matches expected in search string
if(subject[i] == searchString[matchedCharacters]){
if(!matchFound){
matchFound = true;
matchStart = i;
}
matchedCharacters++;
//If all of the characters have been matched, return the start and end positions of the substring
if(matchedCharacters + 1 == length(searchString)){
matchEnd = i - matchStart;
return matchStart, matchEnd;
}
}
else{
//Reset counts if not found
matchFound = false;
matchCharacters = 0;
}
}
}
//If no full matches were found, return error
return -1;
}
2 - Split the HTML source code into three strings - the bit you want to work on (between the two positions returned by the matching function) and the part before and after. Split up the bit you want to modify using, for example:
$parts = preg_split("/(<[^>]*>)/",$string, -1, PREG_SPLIT_DELIM_CAPTURE);
Keep a record of where the tags are, concatenate the non-tag segments and perform substring replace on this as normal, then split the modified string up again and reassemble with the tags in place.
3 - This is the easy part, just concatenate the modified part and the other two bits back together.
I may have horribly over complicated this mind, if so just ignore me.
Unless cOm's already written it, the regex would be the best way to go:
$cleaned_string = preg_replace('/\<.\>/', $raw_text, "");
Or something like that. I would need to research/test the regex.
Then you can just use a simple $foobar = str_replace($find, $replace_with, $cleaned_string); to find the text you want to replace.
Didn't realize he wanted to put the HTML back in. It's all regex for that, and more than I know at the moment.
Knowing what I do know, technique-wise I would probably use an expression that didn't ignore whitespace between the words, but did between the < and > brackets, then use the variable-containing abilities of regex to output.
Interesting problem.
I would use the DOM and XPath to find the closest nodes containing that text and then use substring matching to find out which bit of the string is in what node. That will involve character-per-character matching and possible backtracking, though.
Here is the first part, finding the container nodes:
<?php
error_reporting(E_ALL);
header('Content-Type: text/plain; charset=UTF-8');
$doc = new DOMDocument();
$doc->loadHTML(<<<EOD
<p>
<span>
<i>
I am <b>Sadi, novice</b> programmer.
</i>
</span>
</p>
<ul>
<li>
<div>
I am <em>Cornholio, novice</em> programmer of television shows.
</div>
</li>
</ul>
EOD
);
$xpath = new DOMXPath($doc);
// First, get a list of all nodes containing the text anywhere in their tree.
$nodeList = $xpath->evaluate('//*[contains(string(.), "programmer")]');
$deepestNodes = array();
// Now only keep the deepest nodes, because the XPath query will also return HTML, BODY, ...
foreach ($nodeList as $node) {
$deepestNodes[] = $node;
$ancestor = $node;
while (($ancestor = $ancestor->parentNode) && ($ancestor instanceof DOMElement)) {
$deepestNodes = array_filter($deepestNodes, function ($existingNode) use ($ancestor) {
return ($ancestor !== $existingNode);
});
}
}
foreach ($deepestNodes as $node) {
var_dump($node->tagName);
}
I hope that helps you along.
Since you didn't give exact specifics on what you will use this for, I will use your example of "I am sadi, novice programmer".
$before = 'I am <b>sadi, novice</b> programmer';
$after = preg_replace ('/I am (<.*>)?(.*), novice(<.*>)? programmer/','/I am $1$2, learner$3 programmer/',$string);
Alternatively, for any text:
$string = '<b>Hello</b>, world!';
$orig = 'Hello';
$replace = 'Goodbye';
$pattern = "/(<.*>)?$orig(<.*>)?/";
$final = "/$1$replace$2/";
$result = preg_replace($pattern,$final,$string);
//$result should now be 'Goodbye, world!'
Hope that helped. :d
Edit: An example of your example, with the second piece of code:
$string = 'I am sadi, novice programmer.';
$orig = 'novice';
$replace = 'learner';
$pattern = "/(<.>)?$orig(<.>)?/";
$final = "$1$replace$2";
$result = htmlspecialchars(preg_replace($pattern,$final,$string));
echo $result;
The only problem is if you were searching for something that was more than a word long.
Edit 2: Finally came up with a way to do it across multiple words. Here's the code:
function htmlreplace($string,$orig,$replace)
{
$orig = explode(' ',$orig);
$replace = explode(' ',$replace);
$result = $string;
while (count($orig)>0)
{
$shift = array_shift($orig);
$rshift = array_shift($replace);
$pattern = "/$shift\s?(<.*>)?/";
$replacement = "$rshift$1";
$result = preg_replace($pattern,$replacement,$result);
}
$result .= implode(' ',$replace);
return $result;
}
Have fun! :d
I would like to be able to switch this...
My sample [a id="keyword" href="someURLkeyword"] test keyword test[/a] link this keyword here.
To...
My sample [a id="keyword" href="someURLkeyword"] test keyword test[/a] link this [a href="url"]keyword[/a] here.
I can't simply replace all instances of "keyword" because some are used in or within an existing anchor tag.
Note: Using PHP5 preg_replace on Linux.
Using regular expressions may not be the best way to solve this problem, but here is a quick solution:
function link_keywords($str, $keyword, $url) {
$keyword = preg_quote($keyword, '/');
$url = htmlspecialchars($url);
// Use split the string on all <a> tags, keeping the matched delimiters:
$split_str = preg_split('#(<a\s.*?</a>)#i', $str, -1, PREG_SPLIT_DELIM_CAPTURE);
// loop through the results and process the sections between <a> tags
$result = '';
foreach ($split_str as $sub_str) {
if (preg_match('#^<a\s.*?</a>$#i', $sub_str)) {
$result .= $sub_str;
} else {
// split on all remaining tags
$split_sub_str = preg_split('/(<.+?>)/', $sub_str, -1, PREG_SPLIT_DELIM_CAPTURE);
foreach ($split_sub_str as $sub_sub_str) {
if (preg_match('/^<.+>$/', $sub_sub_str)) {
$result .= $sub_sub_str;
} else {
$result .= preg_replace('/'.$keyword.'/', '$0', $sub_sub_str);
}
}
}
}
return $result;
}
The general idea is to split the string into links and everything else. Then split everything outside of a link tag into tags and plain text and insert links into the plain text. That will prevent [p class="keyword"] from being expanded to [p class="[a href="url"]keyword[/a]"].
Again, I would try to find a simpler solution that does not involve regular expressions.
You can't do this with regular expressions alone. Regular expressions are context free -- they simply match a pattern, without regard to the surroundings. To do what you want, you need to parse the source out into an abstract representation, and then transfor it into your target output.