I wrote a little search script for a client, it works and words get highlited, BUT...
Imagine this situation:
search term: test
found result: Hello this is a test
In this example both 'test' in the href part and between the <a> tags get highlited, breaking the link.
How could I prevent this?
Edit:
So this is what I need: A regex replace function that replaces all matched search strings EXCEPT the ones that are located inside a href attribute
You can not parse XML with regular expressions. :( If you want a dirty regex solution that still works in many cases you may try this regex.
">[^<]*?(test)"
First you look for a tag closing brace and than you make sure that no other tag is opened in between.
Ideally you want to parse HTML and replace only the textual parts of it.
Got it!
$body = $row['body'];
$pattern = "/".$search_string."(?!([^<]+)?>)/i";
$replacement = "<strong class='highlite'>".$search_string."</strong>";
$altered_body = preg_replace($pattern, $replacement, $body);
print($altered_body);
Related
I couldn't come up with a better title, my apologies. Basically, we're going to have text that looks like this:
Wow, thanks for that image. It really helps!
[quote]Here, this image may help you.
[img]http://www.url.to.image.jpg[/img]
[/quote]
The text could also appear as
Wow, thanks for that image. It really helps!
[quote="username"]Here, this image may help you.
[img]http://www.url.to.image.jpg[/img]
[/quote]
So, what we're wanting to do is grab any images that are inside a quote and replace those [img] tags with [url=http://www.url.to.image.jpg]Click here to view the image[/url]. But this operation should ONLY happen for images inside quote tags. I've looked at the various BBCode parsers for PHP but can't find anything that would be able to do this, and I'm unsure of the regex required for such a task.
You could try:
(\[quote[^]]*].*?)(?=\[img])\[img](.*?)\[/img]
with replacement string
$1[url=$2]Click here to view the image[/url]
Not sure of the code. Perhaps:
$result = preg_replace('%(\[quote[^]]*].*?)(?=\[img])\[img](.*?)\[/img]%sim', '$1[url=$2]Click here to view the image[/url]', $subject);
We must ensure that there is no [/quote] before [img]. This can be done by not using .*?, but ((?!\[/quote]).)*
$regex = '#(\[quote[^]]*]((?!\[/quote]).)*)\[img](.*?)\[/img]#s';
$replace = '$1[url=$3]Click here to view the image[/url]';
$str = preg_replace($regex, $replace, $str);
I have a shortcode which I want to be able to strip away depending on the context of the post. Eg.
[tooltip slug="test"]Test Text[/tooltip]
I would like the output to be:
<span class="dummy">Test Text</span>
I have experimented (a lot!) with preg_replace and I can't seem to get it to recognize that the replacement string is between the ']' and then delimited by '[/tooltip]' without doing multiple passes.
Ideas?
Update: As so often happens, about 10 seconds after I wrote this one of my attempts seemed to work. I don't think it's as good as the solution below but FWIW...
$my_var .= preg_replace('/(?:\[tooltip slug=\"([^\"]*)"[^\>]*\]([^\<]*)\[\/tooltip\])/', '<span class="dummy">\\2</span>', $my_post->post_content);
Here is the simple regex you are looking for.
$result = preg_replace('%\[tooltip slug="[^"]*"]([^[]*)\[/tooltip]%',
'<span class="dummy">\1</span>', $subject);
What we do here is capture the text between the tooltip tags, and insert it in the replacement.
Let me know if you need any details.
$test = preg_match('/\[([^\]]+)\]([^\[]+)\[/', '[tooltip slug="test"]Test Text[/tooltip]', $matches);
echo $matches[2];
I need to do some cleanup on strings that look like this:
$author_name = '<a href="http://en.wikipedia.org/wiki/Robert_Jones_Burdette>Robert Jones Burdette </a>';
Notice the href tag doesn't have closing quotes - I'm using the DOMParser on a large table of these to extract the text, and it borks on this.
I would like to look at the string in $author_name;
IF the first > does NOT have a " before it, replace it with "> to close the tag correctly. If it is okay, just skip and do the next step. Be sure not to replace the second > at all.
Using php regex, I haven't been able to find a working solution - I could chop up the whole thing and check its parts, but that would be slow and I think there must be a regex that can do what I want.
TIA
What you can do is, find the first closing tag, with or without the double-quote ("), and replace it with (">):
$author_name = preg_replace('/(.+?)"?>(.+?)/', '$1">$2', $author_name);
http://www.barattalo.it/html-fixer/
Download that, then include it in your php.
The rest is quite easy:
$dirty_html = ".....bad html here......";
$a = new HtmlFixer();
$clean_html = $a->getFixedHtml($dirty_html);
It's common for people to want to use regular expressions, but you must remember that HTML is not regular.
I am trying to grab what is the h4 text
$regex = '/<h4>([A-Za-z0-9\,\.])/';
I am just getting the first letter back, I cannot figure out how to use * to keep grabbing everything to the first < character.
I have made countless attempts and know I am overlooking something simple.
So I was making that much harder than I needed to, the following works:
$regex = '/<h4>.*?<\/h4>/';
If you can trust that grabbing all characters up to the first < is a good enough rule then use this:
$regex = '/<h4>([^<]*?)</';
Of course that definition will only grab 'The ' from <h4>The <b>Best</b> Book</h4> You can fix that be changing it to:
$regex = '/<h4>(.*?)<\/h4>/';
Which will grab everything between a <h4> and a </h4>, but still isn't perfect because anything like <h4 > or <h4 style="..."> will break it, along with a million other valid HTML examples. If you know that the contents won't have any < though, and you know your tag will always be exactly <h4> the first one works well enough for your situation.
If your situation is more complex you will want to use something like PHP's DOM extension (DOMDocument) which is meant for parsing HTML and XML, since neither are regular languages and cannot be parsed error free with regex.
You can use the below function to accomplish this task.
**function getTextBetweenTags($string, $tagname) {
$pattern = "/<$tagname ?.*>(.*)<\/$tagname>/";
preg_match($pattern, $string, $matches);
return $matches;
}**
In the first parameter you have to pass the complete string, and in the second parameter you have to pass the tagname ("h4")..
i have a following pattern, inside the html file, that i would like to parse in php to get a link but for now i dont see a solution as i am trying to use QueryPath and my case is simply not a common dom element:
<script>
to.addVariable("site_name","http://www.sitename.com");
</script>
I just would like to return a link part of that pattern in order to print it.
Hope someone could recommend how to.
Thank you.
UPDATE: I would like to get http://www.sitename.com as a value from the code above using php, maybe with phpQuery or QueryPath.
Something like this I guess will work
<?PHP
$text = '
<script>
to.addVariable("site_name","http://www.sitename.com");
</script>
';
preg_match('#to\.addVariable\("site_name","([^"]+)"\);#', $text, $matches);
echo $matches[1];
?>
You can also use preg_match_all if you have more than one to.addVariable(... strings in your <script> section.
Try this regular exp:
$regex = '#to\.addVariable\("(.+?)", "(.+?)"\)#';
Then, use preg_match_all to get the matches. If you want to check that the URL is an actual URL, the get any regular expression to match URLs and place it in the second .+?, these patterns will match anything between "", so you should check that you have what you need unless you trust the source.
NOTE: I'm not so sure that " doesn't needs to be escaped in regex, so check it out
Hope I can help!
If you don't understand something drop a comment!