<h1>title 1</h1 w:id="0"/><p>content</p><h1>title 2</h1 w:id="1"/>...
I want to replace w:id="0"/ from </h1 w:id:="0 or 1 ect "/>
I use this code:
preg_replace("</h1 (.*?)>",'',$html)
But it doesn't work anymore
try this
preg_replace("/<\/h1 (.*?)>/i",'</h1>',$html);
You are missing the delimiters on your regex.
A regex needs a 'starting mark' and a similar 'finishing mark', so PHP can interpret it's content as the match, with all it's flags.
Without the recognized delimiters, it's impossible to diferenciate between simple text and a regex.
Try this regex:
#</h1 (.*?)>#
Or:
~</h1 (.*?)>~
PHP supports a few more delimiters, like <> and /.
As a side note, I would suggest the following regex:
~</h1( [^>]+)?>~i
Related
I have a HTML code like this:
##user_name
How can remove the first # by using preg_match?
I just want the preg_replace code.
match # which came before <a tag. try following regular expression
preg_replace("/#(<a)/", "$1", '##user_name');
preg_match — Perform a regular expression match
preg_replace — Perform a regular expression search and replace
So I guess you really need the second:
preg_replace('/^#/', '', '##user_name');
I am making a preg_replace on html page. My pattern is aimed to add surrounding tag to some words in html. However, sometimes my regular expression modifies html tags. For example, when I try to replace this text:
yasar
So that yasar reads <span class="selected-word">yasar</span> , my regular expression also replaces yasar in alt attribute of anchor tag. Current preg_replace() I am using looks like this:
preg_replace("/(asf|gfd|oyws)/", '<span class=something>${1}</span>',$target);
How can I make a regular expression, so that it doesn't match anything inside a html tag?
You can use an assertion for that, as you just have to ensure that the searched words occur somewhen after an >, or before any <. The latter test is easier to accomplish as lookahead assertions can be variable length:
/(asf|foo|barr)(?=[^>]*(<|$))/
See also http://www.regular-expressions.info/lookaround.html for a nice explanation of that assertion syntax.
Yasar, resurrecting this question because it had another solution that wasn't mentioned.
Instead of just checking that the next tag character is an opening tag, this solution skips all <full tags>.
With all the disclaimers about using regex to parse html, here is the regex:
<[^>]*>(*SKIP)(*F)|word1|word2|word3
Here is a demo. In code, it looks like this:
$target = "word1 <a skip this word2 >word2 again</a> word3";
$regex = "~<[^>]*>(*SKIP)(*F)|word1|word2|word3~";
$repl= '<span class="">\0</span>';
$new=preg_replace($regex,$repl,$target);
echo htmlentities($new);
Here is an online demo of this code.
Reference
How to match pattern except in situations s1, s2, s3
How to match a pattern unless...
This might be the kind of thing that you're after: http://snipplr.com/view/3618/
In general, I'd advise against such. A better alternative is to strip out all HTML tags and instead rely on BBcode, such as:
[b]bold text[b] [i]italic text[i]
However I appreciate that this might not work well with what you're trying to do.
Another option may be HTML Purifier, see: http://htmlpurifier.org/
From top of my mind, this should be working:
echo preg_replace("/<(.*)>(.*)<\/(.*)>/i","<$1><span class=\"some-class\">$2</span></$3>",$target);
But, I don't know how safe this would be. I am just presenting a possibility :)
I am making a preg_replace on html page. My pattern is aimed to add surrounding tag to some words in html. However, sometimes my regular expression modifies html tags. For example, when I try to replace this text:
yasar
So that yasar reads <span class="selected-word">yasar</span> , my regular expression also replaces yasar in alt attribute of anchor tag. Current preg_replace() I am using looks like this:
preg_replace("/(asf|gfd|oyws)/", '<span class=something>${1}</span>',$target);
How can I make a regular expression, so that it doesn't match anything inside a html tag?
You can use an assertion for that, as you just have to ensure that the searched words occur somewhen after an >, or before any <. The latter test is easier to accomplish as lookahead assertions can be variable length:
/(asf|foo|barr)(?=[^>]*(<|$))/
See also http://www.regular-expressions.info/lookaround.html for a nice explanation of that assertion syntax.
Yasar, resurrecting this question because it had another solution that wasn't mentioned.
Instead of just checking that the next tag character is an opening tag, this solution skips all <full tags>.
With all the disclaimers about using regex to parse html, here is the regex:
<[^>]*>(*SKIP)(*F)|word1|word2|word3
Here is a demo. In code, it looks like this:
$target = "word1 <a skip this word2 >word2 again</a> word3";
$regex = "~<[^>]*>(*SKIP)(*F)|word1|word2|word3~";
$repl= '<span class="">\0</span>';
$new=preg_replace($regex,$repl,$target);
echo htmlentities($new);
Here is an online demo of this code.
Reference
How to match pattern except in situations s1, s2, s3
How to match a pattern unless...
This might be the kind of thing that you're after: http://snipplr.com/view/3618/
In general, I'd advise against such. A better alternative is to strip out all HTML tags and instead rely on BBcode, such as:
[b]bold text[b] [i]italic text[i]
However I appreciate that this might not work well with what you're trying to do.
Another option may be HTML Purifier, see: http://htmlpurifier.org/
From top of my mind, this should be working:
echo preg_replace("/<(.*)>(.*)<\/(.*)>/i","<$1><span class=\"some-class\">$2</span></$3>",$target);
But, I don't know how safe this would be. I am just presenting a possibility :)
/any_string/any_string/any_number
with this regular expression:
/(\w+).(\w+).(\d+)/
It works, but I need this url:
/specific_string/any_string/any_string/any_number
And I don't know how to get it. Thanks.
/(specific_string).(\w+).(\w+).(\d+)/
Though note that the .s in your regular expression technically match any character and
not just the /
/(specific_string)\/(\w+)\/(\w+)\/(\d+)/
This will have it match only slashes.
This one will match the second url:
"/(\w+)\/(\w+)\/(\w+)\/(\d+)/"
/\/specific_string\/(\w+).(\w+).(\d+)/
Just insert the specific_string in the regexp:
/specific_string\/(\w+)/(\w+)/\d+)/
Another variant with the outer delimiters changed to avoid extraneous escaping:
preg_match("#/FIXED_STRING/(\w+)/(\w+)/(\d+)#", $_SERVER["REQUEST_URI"],
I would use something like this:
"/\/specific_string\/([^\/]+)\/([^\/]+)\/(\d+)/"
I use [^\/]+ because that will match anything that is not a slash. \w+ will work almost all the time, but this will also work if there is an unexpected character in the path somewhere. Also note that my regex requires the leading slash.
If you want to get a little more complicated, the following regex will match both of the patterns you provided:
"/^(?:\/specific_string)*\/([^\/]+)\/([^\/]+)\/(\d+)$/"
This will match:
"/any_string/any_string/any_number"
"/specific_string/any_string/any_string/any_number"
but it will not match
"/some_other_string/any_string/any_string/any_number"
Since I am completely useless at regex and this has been bugging me for the past half an hour, I think I'll post this up here as it's probably quite simple.
hey.exe
hey2.dll
pomp.jpg
In PHP I need to extract what's between the <a> tags example:
hey.exe
hey2.dll
pomp.jpg
Avoid using '.*' even if you make it ungreedy, until you have some more practice with RegEx. I think a good solution for you would be:
'/<a[^>]+>([^<]+)<\/a>/i'
Note the '/' delimiters - you must use the preg suite of regex functions in PHP. It would look like this:
preg_match_all($pattern, $string, $matches);
// matches get stored in '$matches' variable as an array
// matches in between the <a></a> tags will be in $matches[1]
print_r($matches);
This appears to work:
$pattern = '/<a.*?>(.*?)<\/a>/';
([^<]*)
I found this regular expression tester to be helpful.
Here is a very simple one:
<a.*>(.*)</a>
However, you should be careful if you have several matches in the same line, e.g.
hey.exehey2.dll
In this case, the correct regex would be:
<a.*?>(.*?)</a>
Note the '?' after the '*' quantifier. By default, quantifiers are greedy, which means they eat as much characters as they can (meaning they would return only "hey2.dll" in this example). By appending a quotation mark, you make them ungreedy, which should better fit your needs.