Recently, I'm playing with something related to BBCode in phpBB3. When I trace back my database, the posts table and for a random post. I found that the image tag is written this way [img:fcjsgy5j]. There are 8 random characters generated between [img: ... ] for each post.
[img:fcjsgy5j]http://imageurl.jpg[/img]
My question is, how can I make use of preg_replace() to replace the random characters into this way..
<img src="http://imageurl.jpg">
$output = preg_replace("`\[img:.+?\](.*?)\[/img\]`i", '<img src="$1"/>', $input);
[ begins a character set. We don't want that; we want to match the literal [ character, so we have to escape it with a \
. matches any character
+ means we match 1 or more of the previous thing (any character)
? makes the previous quantifier ungreedy (.+ would match everything, right to the very end of the string, that's not what we want, we want it to match as little as possible... just up to the next ]
(.*?) matches all the junk between the [img] tags. Ungreedy again. We put () around it to make it mtaching set
The ` (back-tick) at the start and the end could be any character... whatever character you start with, you have to end with. A lot of people use / but I prefer the back-tick because it rarely appears anywhere inside the regular expression, thus I don't need to escape it.
The i at the very end means The expression will be case insensitive. (will match img, IMG, ImG, etc.)
The $1 in the replace refers back to the () section we denoted earlier... it basically takes whatever was matched there, and plops it into the place of $1
$result = preg_replace('%\[img:[^]]+\]([^[]+)\[/img\]%', '<img src="\1">', $subject);
or, as a commented regex:
$result = preg_replace(
'%\[img: # match [img:
[^]]+ # match one or more non-] characters
\] # match ]
([^[]+) # match one or more non-[ characters
\[/img\] # match [/img]
%x',
'<img src="\1">', $subject);
Try this code :
<?php
$search = array(
'\[img:.+?\](.*?)\[\/img\]\'
);
$replace = array(
'<img src="\\2">'
);
$result = preg_replace($search, $replace, $string);
}
?>
I used the array form of preg_replace so that u can add more search and replace patterns in the future. I think you are trying to replace some BBCODE tags. There is plenty of libraries on the net to handle BBCODE correctly.
Edited
Like this one :
http://php.net/manual/en/book.bbcode.php
Related
Here is my code
<img src="folder/img1.jpg?somestring">
<img src="folder/img2.jpg?somediffstring">
want to replace somestring & somediffstring with another string in whole html. please suggest some regular expression with php.
example
change to using regular expression or anything
First of all, you shouldn't parse HTML with Regular Expressions.
Solution 1
Now, if you are exclusively parsing img tags, you could come up with a satisfying enough solution like this:
(\b\.jpg|\b\.png)\?(.*?)\"
That is:
(\b\.jpg|\b\.png) # 1st Capturing Group
\b\.jpg # 1st Alternative: match ``.jpg`` literally
\b\.png # 2nd Alternative: match ``.png`` literally
\? # Match the character ? literally
(.+?) # 2nd Capturing Group
.+? # Match any character between one and unlimited times,
# as few times as possible, expanding as needed.
\" # Match the character " literally
Problem
What's the problem? We are not checking if we are inside an img tag. This will match everywhere in the HTML.
Solution 2
Let's add the check for img > src:
<img.+?src=\".*?(\b\.jpg|\b\.png)\?(.+?)\"
That is:
<img # Match ``<img`` literally
.+? # Match any character between one and unlimited times,
# as few times as possible, expanding as needed.
# Needed in case there are rel or alt options inside the img tag.
src=\" # Match ``src="`` literally
... # The rest is same as before.
Problem
Does this really do its job? Apparently yes, but in reality no.
Consider the following HTML code
<img src="" />
<div style="background-image: url(../images/test-background.jpg?)">
blah blah
</div>
It shouldn't match right? But it does (if you remove line-breaks). The regular expression above starts the match at <img src=", and will stop at "> of the div tag. The capturing group will contain the characters between ? and ": ), substituting it will break the HTML.
This was just an example, but many other situations will match even if they should not.
Other solutions...?
No matter how many constraints you can add to your RegEx and how sophisticated it becomes... HTML is a Context-Free Language and it can't be captured by a Regular Expression, which only recognizes Regular Languages.
In PHP
Still sure you're gonna use Regular Expressions? Alright, then your PHP function is preg_replace. You only need to keep in mind that it will replace everything that matched, not only the capturing groups. Hence, you need to wrap what you want to "remember" into another capturing group:
$str = '<img src="folder/img1.jpg?foo">';
$pattern = '/(<img.+?src=\".*?(\b\.jpg|\b\.png)\?)(.+?)(\")/';
$replacement = '$1' . 'bar' . '$4';
$str_replaced = preg_replace($pattern, $replacement, $str);
// Now you have $str_replaced = '<img src="folder/img1.jpg?bar">';
With reference to this How can I use the captured group in the same regex
suppose u wanna change img1.jpg?somestring to img1.jpg?somestringAAA
and img2.jpg?somediffstring to img2.jpg?somediffstringAAA
Search for: src="([a-zA-Z.0-9_]*)[?]([a-zA-Z.0-9_]*)">
Replace with: src="$1?$2AAA">
here $1 represents whatever is inside first round paranthesis () , i.e., img1.jpg
and $2 represents second paranthesis
UPDATE:
$string = 'img1.jpg?somestring';
$pattern = '/([a-zA-Z.0-9_]*)[?]([a-zA-Z.0-9_]*)/i';
$replacement = '$1?$2AAA';
echo preg_replace($pattern, $replacement, $string);
You can do it in this way :
<?php
$url_value = "folder/img2.jpg?somediffstring";
echo $url =substr($url_value, 0, strpos($url_value, "?"));
?>
you can use the regex \?(\w*)"
if u want to replace somestring and somediffstring with xx then u can replace it with regex \?(\w*)" and value as ?xx
https://regex101.com/r/S5pPuW/1
I've been stressing over this for the last day and just cant seem to get the right preg_replace regex combination, as always any help id really appreciated.
My code is as follows, I just can't seem to target the . within the title, likely to be an ...
$content_title_spanned = preg_replace('/<h([1-6]{1})>\.<\/h\\1>/si', '<span class="full-stop">.</span>', $content);
Here is a regex that finds the first . within header tags (<h1> to <h6>) and replaces the . with the HTML you specified. The trick is using () to capture the text before and after the period as well, and substituting those strings back in the replacement with $1 and $4. I also use non-greedy capturing *? to ensure that if there are multiple <h1> matches in the string, it only matches the contents of the first one, not from the start of the first to the end of the last.
Search regex, with isx flags:
( <h([1-6])> .*? )
(\.)
( .*? <\/h\2> )
The x flag lets you write whitespace in the regex to make it clearer. If you prefer, you can write the above on one line, with less whitespace: (<h([1-6])> .*?) (\.) (.*? <\/h\2>)
Replacement string:
$1<span class="full-stop">.</span>$4
Example
Online demo
<h1>My <b>cool</b> book.</h1>
<p>test</p>
is changed into
<h1>My <b>cool</b> book<span class="full-stop">.</span></h1>
<p>test</p>
<?php
$content = '<h1>1.</h1>abcd<h2>2.</h2>abcd<h3>3.</h3><p>abcd</p><h4>4.</h4><h5>5.</h5><h6>6.</h6>';
$content_title_spanned = preg_replace('/(<h[1-6]>[^<]+)(.)(<\/h[1-6]>(?!.*<h.*|.*<\h.*)?)/', '$1<span class="full-stop">$2</span>$3$4', $content);
print_r($content_title_spanned);
?>
Live Demo
There is a website and I would like to get all the <td> (any content) </td> pattern string
So I write like this:
preg_match("/<td>.*</td>/", $web , $matches);
die(var_dump($matches));
That return null, how to fix the problem? Thanks for helping
OK.
You are only not escaping properly I guess.
Also use groups to capture your stuff properly.
<td>(.*)<\/td>
should do. You can try this regex on your given text here. Don't forget the global flag if you are matching ALL td's. (preg_match_all in PHP)
Usually parsing HTML with regex is not a good idea, try to use DOM parsers instead.
Example -> http://simplehtmldom.sourceforge.net/
Test the above regex with
$web = file_get_contents('http://www.w3schools.com/html/html_tables.asp' );
preg_match_all("/<td>(.*)<\/td>/", $web , $matches);
print_r( $matches);
Lazy Quantifier, Different Delimiter
You need .*? rather than .*, otherwise you can overshoot the closing </td>. Also, your / delimiter needed to be escaped when it appeared in </td>. We can replace it with another one that doesn't need escaping.
Do this:
$regex = '~<td>.*?</td>~';
preg_match_all($regex, $web, $matches);
print_r($matches[0]);
Explanation
The ~ is just an esthetic tweak—you can use any delimiter you like around your regex patttern, and in general ~ is more versatile than /, which needs to be escaped more often, for instance in </td>.
The star quantifier in .*? is made "lazy" by the ? so that the dot only matches as many characters as needed to allow the next token to match (shortest match). Without the ?, the .* first matches the whole string, then backtracks only as far as needed to allow the next token to match (longest match).
I have never worked with regular expressions before and I need them now and I am having some issues getting the expected outcome.
Consider this for example:
[x:3xerpz1z]Some Text[/x:3xerpz1z] Some More Text
Using the php preg_replace() function, I want to replace [x:3xerpz1z] with <start> and [/x:3xerpz1z] with </end> but I can't figure this out. I have read some regular expression tutorials but I am still confused.
I have tried this for the starting tag:
preg_replace('/(.*)\[x:/','<start>', $source_string);
The above would return:<start>3xerpz1z
As you can see, the "3xerpz1z" isn't getting removed and it needs to be stripped out. I can't hard code and search and replace "3xerpz1z" because the "3xerpz1z" chars are randomly generated and the characters are always different but the length of the tag is the same.
This is the desired output I want:
<start>Some Text</end> Some More Text
I haven't event tried processing [/x:3xerpz1z] because I can't even get the first tag going.
You must use capturing groups (....):
$data = '[x:3xerpz1z]Some Text[/x:3xerpz1z] Some More Text';
$result = preg_replace('~\[x:([^]]+)](.*?)\[/x:\1]~s', '<start>$2</end>', $data);
pattern details:
~ # pattern delimiter: better than / here (no need to escape slashes)
\[x:
([^]]+) # capture group 1: all that is not a ]
]
(.*?) # capture group 2: content
\[/x:\1] # \1 is a backreference to the first capturing group
~s # s allows the dot to match newlines
I am trying to pull the anchor text from a link that is formatted this way:
<h3><b>File</b> : i_want_this</h3>
I want only the anchor text for the link : "i_want_this"
"variable_text" varies according to the filename so I need to ignore that.
I am using this regex:
<a href=\"\/en\/browse\/file\/variable_text\">(.*?)<\/a>
This is matching of course the complete link.
PHP uses a pretty close version to PCRE (PERL Regex). If you want to know a lot about regex, visit perlretut.org. Also, look into Regex generators like exspresso.
For your use, know that regex is greedy. That means that when you specify that you want something, follwed by anything (any repetitions) followed by something, it will keep on going until that second something is reached.
to be more clear, what you want is this:
<a href="
any character, any number of times (regex = .* )
">
any character, any number of times (regex = .* )
</a>
beyond that, you want to capture the second group of "any character, any number of times". You can do that using what are called capture groups (capture anything inside of parenthesis as a group for reference later, also called back references).
I would also look into named subpatterns, too - with those, you can reference your choice with a human readable string rather than an array index. Syntax for those in PHP are (?P<name>pattern) where name is the name you want and pattern is the actual regex. I'll use that below.
So all that being said, here's the "lazy web" for your regex:
<?php
$str = '<h3><b>File</b> : i_want_this</h3>';
$regex = '/(<a href\=".*">)(?P<target>.*)(<\/a>)/';
preg_match($regex, $str, $matches);
print $matches['target'];
?>
//This should output "i_want_this"
Oh, and one final thought. Depending on what you are doing exactly, you may want to look into SimpleXML instead of using regex for this. This would probably require that the tags that we see are just snippits of a larger whole as SimpleXML requires well-formed XML (or XHTML).
I'm sure someone will probably have a more elegant solution, but I think this will do what you want to done.
Where:
$subject = "<h3><b>File</b> : i_want_this</h3>";
Option 1:
$pattern1 = '/(<a href=")(.*)(">)(.*)(<\/a>)/i';
preg_match($pattern1, $subject, $matches1);
print($matches1[4]);
Option 2:
$pattern2 = '()(.*)()';
ereg($pattern2, $subject, $matches2);
print($matches2[4]);
Do not use regex to parse HTML. Use a DOM parser. Specify the language you're using, too.
Since it's in a captured group and since you claim it's matching, you should be able to reference it through $1 or \1 depending on the language.
$blah = preg_match( $pattern, $subject, $matches );
print_r($matches);
The thing to remember is that regex's return everything you searched for if it matches. You need to specify that only care about the part you've surrounded in parenthesis (the anchor text). I'm not sure what language you're using the regex in, but here's an example in Ruby:
string = 'i_want_this'
data = string.match(/<a href=\"\/en\/browse\/file\/variable_text\">(.*?)<\/a>/)
puts data # => outputs 'i_want_this'
If you specify what you want in parenthesis, you can reference it:
string = 'i_want_this'
data = string.match(/<a href=\"\/en\/browse\/file\/variable_text\">(.*?)<\/a>/)[1]
puts data # => outputs 'i_want_this'
Perl will have you use $1 instead of [1] like this:
$string = 'i_want_this';
$string =~ m/<a href=\"\/en\/browse\/file\/variable_text\">(.*?)<\/a>/;
$data = $1;
print $data . "\n";
Hope that helps.
I'm not 100% sure if I understand what you want. This will match the content between the anchor tags. The URL must start with /en/browse/file/, but may end with anything.
#(.*?)#
I used # as a delimiter as it made it clearer. It'll also help if you put them in single quotes instead of double quotes so you don't have to escape anything at all.
If you want to limit to numbers instead, you can use:
#(.*?)#
If it should have just 5 numbers:
#(.*?)#
If it should have between 3 and 6 numbers:
#(.*?)#
If it should have more than 2 numbers:
#(.*?)#
This should work:
<a href="[^"]*">([^<]*)
this says that take EVERYTHING you find until you meet "
[^"]*
same! take everything with you till you meet <
[^<]*
The paratese around [^<]*
([^<]*)
group it! so you can collect that data in PHP! If you look in the PHP manual om preg_match you will se many fine examples there!
Good luck!
And for your concrete example:
<a href="/en/browse/file/variable_text">([^<]*)
I use
[^<]*
because in some examples...
.*?
can be extremely slow! Shoudln't use that if you can use
[^<]*
You should use the tool Expresso for creating regular expression... Pretty handy..
http://www.ultrapico.com/Expresso.htm