PHP Preg_Replace REGEX BB-Code - php

So I have created this function in PHP to output text in the required form. It is a simple BB-Code system. I have cut out the other BB-Codes from it to keep it shorter (Around 15 cut out)
My issue is the final one [title=blue]Test[/title] (Test data) does not work. It outputs exactly the same. I have tried 4-5 different versions of the REGEX code and nothing has changed it.
Does anyone know where I am going wrong or how to fix it?
function bbcode_format($str){
$str = htmlentities($str);
$format_search = array(
'#\[b\](.*?)\[/b\]#is',
'#\[title=(.*?)\](.*?)\[/title\]#i'
);
$format_replace = array(
'<strong>$1</strong>',
'<div class="box_header" id="$1"><center>$2</center></div>'
);
$str = preg_replace($format_search, $format_replace, $str);
$str = nl2br($str);
return $str;
}

Change the delimiter # to /. And change "/[/b\]" to "\[\/b\]". You need to escape the "/" since you need it as literal character.
Maybe the "array()" should use brackets: "array[]".
Note: I borrowed the answer from here: Convert BBcode to HTML using JavaScript/jQuery
Edit: I forgot that "/" isn't a metacharacter so I edited the answer accordingly.
Update: I wasn't able to make it work with function, but this one works. See the comments. (I used the fiddle on the accepted answer for testing from the question I linked above. You may do so also.) Please note that this is JavaScript. You had PHP code in your question. (I can't help you with PHP code at least for awhile.)
$str = 'this is a [b]bolded[/b], [title=xyz xyz]Title of something[/title]';
//doesn't work (PHP function)
//$str = htmlentities($str);
//notes: lose the single quotes
//lose the text "array" and use brackets
//don't know what "ig" means but doesn't work without them
$format_search = [
/\[b\](.*?)\[\/b\]/ig,
/\[title=(.*?)\](.*?)\[\/title\]/ig
];
$format_replace = [
'<strong>$1</strong>',
'<div class="box_header" id="$1"><center>$2</center></div>'
];
// Perform the actual conversion
for (var i =0;i<$format_search.length;i++) {
$str = $str.replace($format_search[i], $format_replace[i]);
}
//place the formatted string somewhere
document.getElementById('output_area').innerHTML=$str;
​
Update2: Now with PHP... (Sorry, you have to format the $replacements to your liking. I just added some tags and text to demostrate the changes.) If there's still trouble with the "title", see what kind of text you are trying to format. I made the title "=" optional with ? so it should work properly work texts like: "[title=id with one or more words]Title with id[/title]" and "[title]Title without id[/title]. Not sure thought if the id attribute is allowed to have spaces, I guess not: http://reference.sitepoint.com/html/core-attributes/id.
$str = '[title=title id]Title text[/title] No style, [b]Bold[/b], [i]emphasis[/i], no style.';
//try without this if there's trouble
$str = htmlentities($str);
//"#" works as delimiter in PHP (not sure abut JS) so no need to escape the "/" with a "\"
$patterns = array();
$patterns = array(
'#\[b\](.*?)\[/b\]#',
'#\[i\](.*?)\[/i\]#', //delete this row if you don't neet emphasis style
'#\[title=?(.*?)\](.*?)\[/title\]#'
);
$replacements = array();
$replacements = array(
'<strong>$1</strong>',
'<em>$1</em>', // delete this row if you don't need emphasis style
'<h1 id="$1">$2</h1>'
);
//perform the conversion
$str = preg_replace($patterns, $replacements, $str);
echo $str;

Related

PHP str_replace scraped content with wild card?

I'm looking for a solution to strip some HTML from a scraped HTML page. The page has some repetitive data I would like to delete so I tried with preg_replace() to delete the variable data.
Data I want to strip:
Producent:<td class="datatable__body__item" data-title="Producent">Example
Groep:<td class="datatable__body__item" data-title="Produkt groep">Example1
Type:<td class="datatable__body__item" data-title="Produkt type">Example2
....
...
Must be like this afterwards:
Producent:Example
Groep:Example1
Type:Example2
So a big piece is the same except the word within the data-title piece. How could I delete this piece of data?
I tried a few things like this one:
$pattern = '/<td class=\"datatable__body__item\"(.*?)>/';
$tech_specs = str_replace($pattern,"", $tech_specs);
But that didn't work. Is there any solution to this?
Just use a wildcard:
$newstr = preg_replace('/<td class="datatable__body__item" data-title=".*?">/', '', $str);
.*? means match anything but don't be greedy
Assuming that the string looked like this:
$string = 'Producent:<td class="datatable__body__item" data-title="Producent">Example';
You could get the beginning and the end of the string with this:
preg_match('/^(\w+:).*\>(\w+)/', $string, $matches);
echo implode([$matches[1], $matches[2]]);
Which, in this case, will throw Producent:Example. So, then you could add this output to another variable/array you intend to use.
OR, since you mentioned replacing:
$string = preg_replace('/^(\w+:).*\>(\w+)/', '$1$2', $string);
But then again, checking as it would probably come in a variable number of lines:
$string = 'Producent:<td class="datatable__body__item" data-title="Producent">Example
Groep:<td class="datatable__body__item" data-title="Produkt groep">Example1
Type:<td class="datatable__body__item" data-title="Produkt type">Example2';
$stringRows = explode(PHP_EOL, $string);
$pattern = '/^(\w+:).*\>(\w+)/';
$replacement = '$1$2';
foreach ($stringRows as &$stringRow) {
$stringRow = preg_replace($pattern, $replacement, $stringRow);
}
$string = implode(PHP_EOL, $stringRows);
Which will then output the string like you expect.
Explaining my regex:
the first group catches the first word until the two dots :, then another group to catch the last word. I had previously specified anchors for both ends, but when breaking each line this wouldn't work as expected, so I kept only the beginning.
^(\w+:) => the word in the beginning of the string until two dots appear
.*\> => everything else until smaller symbol appears (escaped by slash)
(\w+) => the word after the smaller than symbol
Well maybe my question wasn't that good written. I had a table which I needed to scrape from a website. I needed the info in the table, but had to cleanup some parts as mentioned. The solution I finally made was this one and it works. It still has a little work to do with manual replacements but that is because of the stupid " they use for inch. ;-)
Solution:
\\ find the table in the sourcecode
foreach($techdata->find('table') as $table){
\\ filter out the rows
foreach($table->find('tr') as $row){
\\ take the innertext using simplehtmldom
$tech_specs = $row->innertext;
\\ strip some 'garbage'
$tech_specs = str_replace(" \t\t\t\t\t\t\t\t\t\t\t<td class=\"datatable__body__item\">","", $tech_specs);
\\ find the first word of the string so I can use it
$spec1 = explode('</td>', $tech_specs)[0];
\\ use the found string to strip down the rest of the table
$tech_specs = str_replace("<td class=\"datatable__body__item\" data-title=\"" . $spec1 . "\">",":", $tech_specs);
\\ manual correction because of the " used
$tech_specs = str_replace("<td class=\"datatable__body__item\" data-title=\"tbv Montage benodigde 19\">",":", $tech_specs);
\\ manual correction because of the " used
$tech_specs = str_replace("<td class=\"datatable__body__item\" data-title=\"19\">",":", $tech_specs);
\\ strip some 'garbage'
$tech_specs = str_replace("\t\t\t\t\t\t\t\t\t\t","\n", $tech_specs);
$tech_specs = str_replace("</td>","", $tech_specs);
$tech_specs = str_replace(" ","", $tech_specs);
\\ put the clean row in an array ready for usage
$specs[] = $tech_specs;
}
}

Preg replace callback validation

So I need to re-write some old code that I found on a library.
$text = preg_replace("/(<\/?)(\w+)([^>]*>)/e",
"'\\1'.strtolower('\\2').'\\3'", $text);
$text = preg_replace("/<br[ \/]*>\s*/","\n",$text);
$text = preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\n",
$text);
And for the first one I have tried like this:
$text = preg_replace_callback(
"/(<\/?)(\w+)([^>]*>)/",
function($subs) {
return strtolower($subs[0]);
},
$text);
I'm a bit confused b/c I don't understand this part: "'\\1'.strtolower('\\2').'\\3'" so I'm not sure what should I replace it with.
As far as I understand the first line looks for tags, and makes them lowercase in case I have data like
<B>FOO</B>
Can you guys help me out here with a clarification, and If my code is done properly?
The $subs is an array that contains the whole value in the first item and captured texts in the subsequent items. So, Group 1 is in $subs[1], Group 2 value is in $subs[2], etc. The $subs[0] contains the whole match value, and you applied strtolower to it, but the original code left the Group 3 value (captured with ([^>]*>) that may also contain uppercase letters) intact.
Use
$text = preg_replace_callback("~(</?)(\w+)([^>]*>)~", function($subs) {
return $subs[1] . strtolower($subs[2]) . $subs[3];
}, $text);
See the PHP demo.

How to remove commas between double quotes in PHP

Hopefully, this is an easy one. I have an array with lines that contain output from a CSV file. What I need to do is simply remove any commas that appear between double-quotes.
I'm stumbling through regular expressions and having trouble. Here's my sad-looking code:
<?php
$csv_input = '"herp","derp","hey, get rid of these commas, man",1234';
$pattern = '(?<=\")/\,/(?=\")'; //this doesn't work
$revised_input = preg_replace ( $pattern , '' , $csv_input);
echo $revised_input;
//would like revised input to echo: "herp","derp,"hey get rid of these commas man",1234
?>
Thanks VERY much, everyone.
Original Answer
You can use str_getcsv() for this as it is purposely designed for process CSV strings:
$out = array();
$array = str_getcsv($csv_input);
foreach($array as $item) {
$out[] = str_replace(',', '', $item);
}
$out is now an array of elements without any commas in them, which you can then just implode as the quotes will no longer be required once the commas are removed:
$revised_input = implode(',', $out);
Update for comments
If the quotes are important to you then you can just add them back in like so:
$revised_input = '"' . implode('","', $out) . '"';
Another option is to use one of the str_putcsv() (not a standard PHP function) implementations floating about out there on the web such as this one.
This is a very naive approach that will work only if 'valid' commas are those that are between quotes with nothing else but maybe whitespace between.
<?php
$csv_input = '"herp","derp","hey, get rid of these commas, man",1234';
$pattern = '/([^"])\,([^"])/'; //this doesn't work
$revised_input = preg_replace ( $pattern , "$1$2" , $csv_input);
echo $revised_input;
//ouput for this is: "herp","derp","hey get rid of these commas man",1234
It should def be tested more but it works in this case.
Cases where it might not work is where you don't have quotes in the string.
one,two,three,four -> onetwothreefour
EDIT : Corrected the issues with deleting spaces and neighboring letters.
Well, I haven't been lazy and written a small function to do exactly what you need:
function clean_csv_commas($csv){
$len = strlen($csv);
$inside_block = FALSE;
$out='';
for($i=0;$i<$len;$i++){
if($csv[$i]=='"'){
if($inside_block){
$inside_block=FALSE;
}else{
$inside_block=TRUE;
}
}
if($csv[$i]==',' && $inside_block){
// do nothing
}else{
$out.=$csv[$i];
}
}
return $out;
}
You might be coming at this from the wrong angle.
Instead of removing the commas from the text (presumably so you can then split the string on the commas to get the separate elements), how about writing something that works on the quotes?
Once you've found an opening quote, you can check the rest of the string; anything before the next quote is part of this element. You can add some checking here to look for escaped quotes, too, so things like:
"this is a \"quote\""
will still be read properly.
Not exactly an answer you've been looking for - But I've used it for cleaning commas in numbers in CSV.
$csv = preg_replace('%\"([^\"]*)(,)([^\"]*)\"%i','$1$3',$csv);
"3,120", 123, 345, 567 ==> 3120, 123, 345, 567

regex for breadcrumb in php

I am currently building breadcrumb. It works for example for
http://localhost/researchportal/proposal/
<?php
$url_comp = explode('/',substr($url,1,-1));
$end = count($url_comp);
print_r($url_comp);
foreach($url_comp as $breadcrumb) {
$landing="http://localhost/";
$surl .= $breadcrumb.'/';
if(--$end)
echo '
<a href='.$landing.''.$surl.'>'.$breadcrumb.'</a>»';
else
echo '
'.$breadcrumb.'';
};?>
But when I typed in http://localhost////researchportal////proposal//////////
All the formatting was gone as it confuses my code.
I need to have the site path in an array like ([1]->researchportal, [2]->proposal)
regardless of how many slashes I put.
So can $url_comp = explode('/',substr($url,1,-1)); be turned into a regular expression to get my desired output?
You don't need regex. Look at htmlentities() and stripslashes() in the PHP manual. A regex will return a boolean value of whatever it says, and won't really help you achieve what you are trying to do. All the regex can let you do is say if the string matches the regex do something. If you put in a regex requiring at least 2 characters between each slash, then any time anyone puts more than one consecutive slash in there, the if statement will stop.
http://ca3.php.net/manual/en/function.stripslashes.php
http://ca3.php.net/manual/en/function.htmlentities.php
Found this on the php manual.
It uses simple str_replace statements, modifying this should achieve exactly what your post was asking.
<?
function stripslashes2($string) {
$string = str_replace("\\\"", "\"", $string);
$string = str_replace("\\'", "'", $string);
$string = str_replace("\\\\", "\\", $string);
return $string;
}
?>

Having problems getting a PHP regex to match

Here is my problem. It's probably a simple fix. I have a regex that I am using to replace a url BBCode. What I have right now that is not working looks like this.
<?php
$input_string = '[url=www.test.com]Test[url]';
$regex = '/\[url=(.+?)](.+?)\[\/url]/is';
$replacement_string = '$2';
echo preg_replace($regex, $replacement_string, $input_string);
?>
This currently outputs the original $input_string, while I would like it to output the following.
Test
What am I missing?
<?php
$input_string = '[url=www.test.com]Test[/url]';
$regex = '/\[url=(.+?)\](.+?)\[\/url\]/is';
$replacement_string = '$2';
echo preg_replace($regex, $replacement_string, $input_string);
?>
In your BBCode string, I closed the
[url] properly.
I escaped a ] in the regex (not sure if that was an actual problem).
Note that [url]http://example.org[/url] is also a valid way to make a link in BBCode.
You should listen to the comments suggesting you use an existing BBCode parser.
Change this line as follows:
$regex = '/[url=(.+?)](.+?)[url]/is';
OK, the formatting is not proper. While I figure it out, see this: http://pastebin.com/6pF0FEbA

Categories