Using preg_replace to reformat bbcode - php

I am making forum from phpbb to php native and I need to parse some bbcode tags with uid inside. This is the code to parse it into regular bbcode without the uid:
$regex = "#\[quote:(.*)=(.*)\](.+)\[/quote:(.+)\]#isU";
$text = "outside sample
[quote:c1891a7ad3]
text with link https://www.facebook.com/groups/35688476100/?fref=ts [/quote:c1891a7ad3]
outside text
[quote:c1891a7ad3="Budi"]
written by me , - budi
[/quote:c1891a7ad3]"
preg_replace($regex,"[quote=$2]$3[\quote]",$text);
but the result is not
"outside sample
[quote:c1891a7ad3]
text with link https://www.facebook.com/groups/35688476100/?fref=ts [/quote:c1891a7ad3]
outside text
[quote="Budi"]
written by me , - budi
[\quote]"
How should the regex be modified to yield expected result?

You have a mismatch between the pattern and the actual string you test against. In the pattern, you have / in [/quote] and in the string, you have \ ([\quote:c1891a7ad3]).
So, if your actual string in fact has /, all you need to fix is the (.*) part as the dot matches any character (including ]) and thus can overmatch even with lazy matching.
So, use
$regex = "#\[quote:([^]]*)=([^]]*)\](.+)\[/quote:([^]]+)\]#isU";
See IDEONE demo
In this regex, I am using a negated character class [^]]* that matches 0 or more characters other than ]. It makes sure we only match text inside [...]. (.*) matches c1891a7ad3]
text with link https://www.facebook.com/groups/35688476100/?fref, so we need to restrict this somehow.

Related

Replace all urls in string not matching url pattern in php

I'm using the following code to filter out urls from a block of HTML text in PHP.
preg_replace('#<a(?![^>]+?href="?http://keepthisdomain.com/foo/bar"?).*?>(.*?)</a>#i', '\1', $text);
It's intended to replace all url's that do not match the specified url pattern. However I do want to include all tags that have the attribute rel="shadowbox[a]" set.
How can I modify this preg_replace to do that?
You are better off not using regex at all and using a parser instead, for the reasons set forth in this answer.
That said, you can do it with regex, but it's tricky:
preg_replace('#<a(?![^>]+?\bhref="?http://keepthisdomain\.com/foo/bar"?|[^>]+\brel="shadowbox\[a\]").*?>(.*?)</a>#i', '\1', $text);
Details on the regex:
<a(?![^>]+?\bhref="?http://keepthisdomain\.com/foo/bar"?|[^>]+\brel="shadowbox\[a\]").*?>(.*?)</a>
Out of the following four tags, only the third would be replaced:
foo // left alone
foo // left alone
foo // REPLACED
foo // left alone
Edited with a minor tweak to make it match a literal . in .com, using \.

php Regular Expression Issues - Can't remove/strip out and replace a string within a string

I have never worked with regular expressions before and I need them now and I am having some issues getting the expected outcome.
Consider this for example:
[x:3xerpz1z]Some Text[/x:3xerpz1z] Some More Text
Using the php preg_replace() function, I want to replace [x:3xerpz1z] with <start> and [/x:3xerpz1z] with </end> but I can't figure this out. I have read some regular expression tutorials but I am still confused.
I have tried this for the starting tag:
preg_replace('/(.*)\[x:/','<start>', $source_string);
The above would return:<start>3xerpz1z
As you can see, the "3xerpz1z" isn't getting removed and it needs to be stripped out. I can't hard code and search and replace "3xerpz1z" because the "3xerpz1z" chars are randomly generated and the characters are always different but the length of the tag is the same.
This is the desired output I want:
<start>Some Text</end> Some More Text
I haven't event tried processing [/x:3xerpz1z] because I can't even get the first tag going.
You must use capturing groups (....):
$data = '[x:3xerpz1z]Some Text[/x:3xerpz1z] Some More Text';
$result = preg_replace('~\[x:([^]]+)](.*?)\[/x:\1]~s', '<start>$2</end>', $data);
pattern details:
~ # pattern delimiter: better than / here (no need to escape slashes)
\[x:
([^]]+) # capture group 1: all that is not a ]
]
(.*?) # capture group 2: content
\[/x:\1] # \1 is a backreference to the first capturing group
~s # s allows the dot to match newlines

PHP/Perl Regular expression help!

I have a string:
$string = "This is my big <span class="big-string">string</span>";
I cannot figure out how to write a regular expression that will replace the 'b' in 'big' without replacing the 'b' in 'big-string'. I need to replace all occurances of a substring except when that substring appears in an html tag.
Any help is appreciated!
Edit
Maybe some more info will help. I'm working on an autocomplete feature that highlights whatever you're searching for in the current result set. Currently if you have typed 'aut' in the search dialog, then the results look like this: automotive
The problem appears when I search for 'auto b'. First I replace all occurrences of 'auto' with '<b>auto</b>' then I replace all occurrences of 'b' with '<b>b</b>'. Unfortunately this second sweep changes '<b>auto</b>' to '<<b>b</b>>auto</<b>b</b>>'
Please do not try to parse HTML using regular expressions. Just load up the HTML in a DOM, walk over the text nodes and do a simple str_replace. You'll thank me around debugging time.
Is there a guarantee that 'big' won't be immediately preceded by "? If so, then s/([^"])b/$1foo/ should replace the b in question with foo.
If you insist upon using a regex, this one will do a pretty decent job:
$re = '/# (Crudely) match a sub-string NOT in an HTML tag.
big # The sub-string to be matched.
(?= # Assert we are not inside an HTML tag.
[^<>]* # Consume all non-<> up to...
(?:<\w+ # either an HTML start tag,
| $ # or the end of string.
) # End group of valid alternatives.
) # End "not-in-html-tag" lookahead assertion.
/ix';
Caveats: This regex has very real limitations. The HTML must not have any angle brackets in the tag attributes. This regex also finds the target substring inside other parts of the HTML file such as comments, scripts and stylesheets, and this may not be desirable.

PHP preg_replace non-greedy trouble

I've been using the following site to test a PHP regex so I don't have to constantly upload:
http://www.spaweditor.com/scripts/regex/index.php
I'm using the following regex:
/(.*?)\.{3}/
on the following string (replacing with nothing):
Non-important data...important data...more important data
and preg_replace is returning:
more important data
yet I expect it to return:
important data...more important data
I thought the ? is the non-greedy modifier. What's going on here?
Your non-greedy modifier is working as expected. But preg_match replaces all occurences of the the (non-greedy) match with the replacement text ("" in your case). If you want only the first one replaced, you could pass 1 as the optional 4th argument (limit) to preg_replace function (PHP docs for preg_replace). On the website you linked, this can be accomplished by typing 1 into the text input between the word "Flags" and the word "limit".
just an actual example of #Asaph solution. In this example ou don't need non-greediness because you can specify a count.
replace just the first occurrence of # in a line with a marker
$line=preg_replace('/#/','zzzzxxxzzz',$line,1);

how to extract a portion of a string in php

I am using preg_replace() for some string replacement.
$str = "<aa>Let's find the stuff qwe in between <id>12345</id> these two previous brackets</h>";
$do = preg_match("/qwe(.*)12345/", $str, $matches);
which is working just fine and gives the following result
$match[0]=qwe in between 12345
$match[1]=in between
but I am using same logic to extract from the following string.
<text>
<src><![CDATA[<TEXTFORMAT LEADING="2"><P ALIGN="LEFT"><FONT FACE="Arial" SIZE="36" COLOR="#999999" LETTERSPACING="0" KERNING="0">r1 text 1 </FONT></P></TEXTFORMAT>]]></src>
<width>45%</width>
<height>12%</height>
<left>30.416666666666668%</left>
<top>3.0416666666666665%</top>
<begin>2s</begin>
<dur>10s</dur>
<transIn>fadeIn</transIn>
<transOut>fadeOut</transOut>
<id>E2159292994B083ACA7ABC7799BBEF3F7198FFA2</id>
</text>
I want to extract the string from
r1text1
to
</id>
The Regular expression I currently Have is:
preg_match('/r1text1(.*)</id\>/', $metadata], $matches);
where $metadata is the above string..
$matches does not return anything....
For some reason...how do i do it?
Thanks in advance
If you want to extract the text, you will probably want to use preg_match. The following might work:
preg_match('#\<P[^\>]*\>\<FONT[^\>]*\>(.*\</id\>)#', $string, $matches)
Whatever gets matched in the parantheses can be found later in the $matches array. In this case everything between a <P> tag followed by a <FONT> tag and </id>, including the latter.
Above regex is untested but might give you a general idea of how to do it. Adapt if your needs are a bit different :)
Even if don't know why you would match the regex on a incomplete XML fragment (starting within a <![CDATA[ and ending right before the closing XML tag </id>, you do have three obvious problems with your regex:
As Amri said: you have to escape the / character in the closing XML tag because you use / as the pattern delimiter. By the way, you don't have to escape the > character. That gives you: '/r1text1(.*)<\/id>/' Alternatively you can change the pattern delimiter to # for example: '#r1text1(.*)</id>#' (I will use the first pattern to further develop the expression).
As Rich Adams already said: the text in your example data is "r1_text_1" (_ is a space character) but you match against '/r1text1(.*)<\/id>/'. You have to include the spaces in your regex or allow for a uncertain number of spaces, such as '/r1(?:\s*)text(?:\s*)1(.*)<\/id>/' (the ?: is the syntax for non-capturing subpatterns)
The . (dot) in your regex does not match newlines by default. You have to add the s (PCRE_DOTALL) pattern modifier to let the . (dot) match against newlines as well: '/r1(?:\s*)text(?:\s*)1(.*)<\/id>/s'
you probably need to parse your string/file and extract the value between the FONT tag. Then insert the value into the id tag
Try googling for php parsing.
try this
preg_match('/r1text1(.*)<\/id\>/', $metadata], $matches);
You are using / as the pattern delimiter but your content has / in . You can use \ as the escape character.
In the sample you have "r1 text 1 ", yet your regular expression has "r1text1". The regular expression doesn't match because there are spaces in the string you are trying to match it against. You should include the spaces in the regular expression.

Categories