I have long struggled with programming languages such as PHP, Javascript, HTML, etc. But my weakness is still very disturbing is about regex.
Previously I felt comfortable without understanding it but now I find the point where I have to use a regex function.
I want to replace a html tag that is created from a rich text editor, say [RTE] so that when I type [code] in the box and then I hit enter it will be translated by RTE <div>[code]</div>
What I need is to change the <div>[code]</div> into an opening html tag <div class="code">
I have tried using str_replace() PHP function as bellow :
$content = str_replace(
'<div>[code]</div>',
'<div class="code">',
$_POST['content']
);
but it's not work, I think maybe I need to use preg_replace() function but I can't.
Can someone help me what type the sample code to do that?
In preg_replace() function, you need to escape [,] symbols, so that it would match the literal [,] symbols.
Regex:
<(div)>\[([^\]]*)\]<\/\1>
REplacement string:
<\1 class="\2">
DEMO
Related
Is there a regular expression that can match any of the following?
'<'+'script>'
'<s'+'cript>'
'<script'+'>'
'</'+'script>'
'</scr' + 'ipt>'
'<script></scrip'+'t>'
'<script type=text/javascript src="http://..."></scrip'+'t>'
I need to do this because HTML Tidy is producing errors if I have these strings in the HTML. I want remove them using preg_replace().
wow, interesting, but i think a parser of sorts would be a more reliable solution.
the following regex is bit of an abomination but it'll match what you what:
'</?(?:'\+')?(?=s).+(?=c).(?=r).+(?=i).+(?=p).+(?=t).+>'
it will also match a variety of tags that you don't want, i leave that to you:
'<scdcdacacapt type=text/javascript src="http://..."></cdscdcss'+'t>'
this is because of the javascript string in the type attribute, so if you have the word javascript inside any tag it'll match :(
hopefully it's a starting point for you
Use '\x3cscript\x3e' instead of '<script>'.
Trying to come up with a PHP regexp that would extract the content of the first [img]...[/img] tag in a text. Can be img or IMG as well.
Really appreciate any help.
Using my poor regexp, I came by with the following, which doesn`t work:
/[img](.+)[/img/]
Here is one example of text that should work:
http://pt.wikipedia.org/wiki/Erich_von_D%C3%A4niken]Erich Von Daniken[/url][/align] [align=center][img]http://www.ceticismoaberto.com/wp-content/uploads/2012/04/erich_von_daniken_7.jpg[/img]
It should return only:
http://www.ceticismoaberto.com/wp-content/uploads/2012/04/erich_von_daniken_7.jpg
I am using a webpage to text the regexp:
http://www.myregextester.com/index.php
the Php code I`m using is :
$message=$post["message"];
//try to locate the first image on the post text
if (preg_match("!http://[^?#]+\.(?:jpe?g|png|gif)!Ui", $message, $matches)) {
return $matches[0];
}
The regexp abovev didn`t work for some cases, like the one I showed before and that's why I'm trying a different approach.
You must scape all brackets characters, and perhaps you have carriage returns. Try this:
\[img\](.|\n)*\[/img\]
This should do the trick
/\[img\](.*?)\[\/img\]/i
[] characters should be escaped with \ because they are used by the regex parser.
I need to scrape some data from a website. For that I am using preg_match, but I am not able to write the regex for it. The data on the website is
title="Russia"/></a>
<small>*</small> <a href="/profile/roman
I have written the regex as #title=\"Russia\"\/><\/a>((\n|\r)*)<small>*<\/small> <a href=\"/profile/(.+?)\"#sx
But this is not working and I dont know why ? When I echo my regex it says #title="Russia"\/><\/a>(( | )*)*<\/small> . Where are the others gone? And why is it not working ?
Try this:
#title=\"Russia\"/></a>(\s*)<small>\*</small>\s+<a\s+href=\"/profile/(.+?)\"#sx
I have escaped the * because its a metacharacter. Without it, you would match strings containing the word small followed by zero or more >s.
You really should not use regexes to evaluate markup content, especially when you acquire it by scrapping pages.
In your case there are at least three reasons that might be responsible for breaking your regex.
Do not attempt to write your own whitespace evaluators when you can simply use \s which stands for "any whitespace character"
In regular expressions asterisk (*) has a special meaning which is why you can't simply use it to identify asterisks. If you want to collect content inside the small attribute you should use <small>(.*)</small> instead. If on the other hand you are actually expecting an asterisk then you have to escape it like this <small>\*</small>.
Your regex expects a closing quote for your href attribute on that last <a> but in your sample markup you have none. Provided that on the original page you do have a closing quote the following regex should do the trick.
#title=\"Russia\"\/><\/a>(\s*)<small>\*</small> <a href="/profile/(.+)?\"#sx
However once again I have to advise using a DOM parser like DOMDocument for this not only because it is much more reliable when handling markup content but also because it can interpret bad markup as well (if its loaded as HTML of course).
I'm working in Wordpress and need to be able to remove images and empty paragraphs. So far, I've found out how to remove images without a problem. But, I then need to remove empty paragraph tags. I'm using PHP preg_replace to handle the regex functions.
So, as an example, I have the string:
<p style="text-align:center;"><img src="http://www.blah.com/image.jpg" alt="Blah Image" /></p><p>Some text</p>
I run this regex on it:
/<img.*?(>)/
And I end up with this string:
<p style="text-align:center;"></p><p>Some text</p>
I then need to be able to remove the empty paragraph. I tried this, but it removes all paragraphs and the contents of the paragraphs:
/<p[^>]*><\/p[^>]*>/
Any help/suggestions is greatly appreciated!
The correct regex is no regex. Use an HTML/DOM Parser instead. They're simple to use. Regex is for regular languages (which HTML is not).
/<p[^>]*><\/p[^>]*>/ (the regex you gave) should work fine. If it's giving you trouble you could try double-escaping the / like this: /<p[^>]*><\\/p[^>]*>/
PHP is funny about quoting and escape characters. For example "\n" is not equal to '\n'. The first is a line break, the second is a literal backslash followed by an 'n'. The PHP manual entry on string literals is probably worth a quick look.
I need to preg_match for
src="http:// "
where the blank space following // is the rest of the url ending with the ". My adapted doesn't seem to work:
preg_match('#src="(http://[^"]+)#', $data, $match);
And I am also struggling to get text that starts with > and ends with EITHER a full stop . or an exclamation mark ! or a question mark ? I have no idea how to do this one. An example of the text I want to preg_match for is:
blahblahblah>Hello world this is what I want.
I'm hoping a kind preg_match guru can tell me the answer and save me hours of headscratching.
Thanks for reading.
As for the URL:
preg_match('#src="(.*?)"#', $data, $match);
and for the second case, use />(.*?)(\.|!|\?)/
(.*?)" will match any character greedily up until the time it sees the end double quote
It seems that you want to parse a document or string which follows a HTML, DOM, XML or something similiar structure.
Use XPath, and parse to the Tag and let it return the src Attribute, this will save much trouble and you can forget about regular expressions.
Example: CLICK ME