preg_replace within the preg_replace - php

Right now I'm having issues replacing strings that already come out from preg_match. Lets say I have bbcode of [b]bla[/b], I have this part working with replacing [b] with <b>, but lets just say for all testing purposes that they did [b]hi [b]test[/b][/b], what ends up coming out is "hi [b]test[/b]", with everything being bolded, but the [b] won't get replaced for some reason.
Currently this is my expression: /\[b\](.*)\[\/b\]/
Sorry, I didn't show my code, I'm new to this.
// Will convert string data into readable data
function ConvertStringData2ReadableData($UglyString) {
$CheckArrays = [
"QUOTE" => "/\[quote=?(.*)\](.*)\[\/quote\]/",
"BOLD" => "/\[b\](.*)\[\/b\]/",
"ITALIC" => "/\[i\](.*)\[\/i\]/",
];
$FanceString = $UglyString;
// QUOTES
do {
$FanceString = preg_replace_callback(
$CheckArrays['QUOTE'],
function($match) {
if (is_numeric($match[1])) {
$TPID = GetThreadPoster($match[1]);
$TPUN = GetUsernameS($TPID);
$statement = ('<div class="panel panel-default"><div class="panel-heading">'.$match[2].'<br>- <b>'.$TPUN.'</b></div></div>');
} elseif (!is_numeric($match[1])) {
$statement = ('<div class="panel panel-default"><div class="panel-heading">'.$match[2].'</div></div>');
}
return $statement;
},
$FanceString,
-1,
$count
);
} while ($count > 0);
// BOLD
do {
$FanceString = preg_replace($CheckArrays['BOLD'] , "<b>$1</b>" , $FanceString, -1, $count);
} while ($count > 0);
#$FanceString = preg_replace($CheckArrays['BOLD'] , "<b>$1</b>" , $FanceString, -1);
// ITALIC
do {
$FanceString = preg_replace($CheckArrays['ITALIC'] , "<i style='all: unset; font-style: italic;'>$1</i>" , $FanceString, -1, $count);
} while ($count > 0);
return($FanceString);
}

You could do something like this:
$string = '[b]hi [b]test[/b][/b]';
do {
$string = preg_replace('/\[b\](.*)\[\/b\]/', '<b>$1</b>', $string, -1, $count);
} while ($count > 0);
Or just use #Justinas' idea (from your OT's comment) if it's OK to replace all [b] with <b> and [/b] with </b> (regardless of them being in the right order/as pairs).
Edit: you also need to change your quote regex to this:
/\[quote(?:=(\d+))?\](.*)\[\/quote\]/s
s flag allows . to match newlines (you probably want to add it to the other ones too). I also fixed the quote ID capturing part.

Because you are never going to be able to fully trust user data AND because bbcode is just as vulnerable as html to incorrect parsing by regex, you will never be 100% confident that this method will work.
Non-quote tags can just as easily be replaced by a non-regex method, so I am eliminating the pattern convolution by segmenting the logic.
I am implementing a recursive pattern for quote tags (assuming everything will be balanced) and using your do-while() technique -- I think this is the best approach. This will effectively work from outer quote inward on each iteration (while $count is positive).
Code: (Demo)
function bbcodequote2html($matches){
$text=(isset($matches[2])?$matches[2]:''); // avoid Notices
if(isset($matches[1]) && ctype_digit($matches[1])){
$TPID = "#{$matches[1]}"; // GetThreadPoster($match[1]);
$TPUN = "#{$matches[1]}"; // GetUsernameS($TPID);
$quotee="<br>- <b>$TPUN</b>";
}else{
$quotee=''; // no id value or id is non-numeric default to empty string
}
return "<div class=\"panel panel-default\"><div class=\"panel-heading\">$text$quotee</div></div>";
}
$bbcode=<<<BBCODE
[quote=2]Outer Quote[b]bold [b]nested bold[/b][/b]
[i]italic [i]nested italic[/i][/i][quote]Inner Quote 1: (no id)[/quote]
[quote=bitethatapple]Inner Quote 2[quote=1]Inner Quote 3[/quote] still inner quote 2 [quote=mickmackusa]Inner Quote 4[/quote] end of inner quote 2[/quote][/quote]
BBCODE;
$converted=str_replace(
['[b]','[/b]','[i]','[/i]'],
['<b>','</b>','<i style=\"all:unset;font-style:italic;\">','</i>'],
$bbcode
);
$tabs="\t";
do{
$converted=preg_replace_callback('~\[quote(?:=(.+?))?]((?:(?R)|.*?)+)\[/quote]~is','bbcodequote2html',$converted,-1,$count);
}while($count);
echo $converted;
It is difficult for me to display the output in a fashion that is easy to read. You may be best served to run my code on your server and check that the results render as desired.
Output:
<div class="panel panel-default"><div class="panel-heading">Outer Quote<b>bold <b>nested bold</b></b>
<i style=\"all:unset;font-style:italic;\">italic <i style=\"all:unset;font-style:italic;\">nested italic</i></i><div class="panel panel-default"><div class="panel-heading">Inner Quote 1: (no id)</div></div>
<div class="panel panel-default"><div class="panel-heading">Inner Quote 2<div class="panel panel-default"><div class="panel-heading">Inner Quote 3<br>- <b>#1</b></div></div> still inner quote 2 <div class="panel panel-default"><div class="panel-heading">Inner Quote 4</div></div> end of inner quote 2</div></div><br>- <b>#2</b></div></div>

Related

Locating tags in a string in PHP (with respect to the string with tags removed)

I want to create a function that labels the location of certain HTML tags (e.g., italics tags) in a string with respect to the locations of characters in a tagless version of the string.
(I intend to use this label data to train a neural network for tag recovery from data that has had the tags stripped out.)
The magic function I want to create is label_italics() in the below code.
$string = 'Disney movies: <i>Aladdin</i>, <i>Beauty and the Beast</i>.';
$string_all_tags_stripped_but_italics = strip_tags($string, '<i>'); // same as $string in this example
$string_all_tags_stripped = strip_tags($string); // 'Disney movies: Aladdin, Beauty and the Beast.'
$featr_string = $string_all_tags_stripped.' '; // Add a single space at the end
$label_string = label_italics($string_all_tags_stripped_but_italics);
echo $featr_string; // 'Disney movies: Aladdin, Beauty and the Beast. '
echo $label_string; // '0000000000000001000000101000000000000000000010'
If a character is supposed to have an <i> or </i> tag immediately preceding it, it is labeled with a 1 in $label_string; otherwise, it is labeled with a 0 in $label_string. (I'm thinking I don't need to worry about the difference between <i> and </i> because the recoverer will simply alternate between <i> and </i> so as to maintain well-formed markup, but I'm open to reasons as to why I'm wrong about this.)
I'm just not sure what the best way to create label_italics() is.
I wrote this function that seems to work in most cases, but it also seems a little clunky and I'm posting here in hopes that there is a better way. (If this turns out to be the best way, the below function would be easily generalizable to any HTML tag passed in as a second argument to the function, which could be renamed label_tag().)
function label_italics($stripped) {
while ((stripos($stripped, '<i>') || stripos($stripped, '</i>')) !== FALSE) {
$position = stripos($stripped, '<i>');
if (is_numeric($position)) {
for ($c = 0; $c < $position; $c++) {
$output .= '0';
}
$output .= '1';
}
$stripped = substr($stripped, $position + 4, NULL);
$position = stripos($stripped, '</i>');
if (is_numeric($position)) {
for ($c = 0; $c < $position; $c++) {
$output .= '0';
}
$output .= '1';
}
$stripped = substr($stripped, $position + 5, NULL);
}
for ($c = 0; $c <= strlen($stripped); $c++) {
$output .= '0';
}
return $output;
}
The function produces bad output if the tags are surplus or the markup is badly formed in the input. For example, for the following input:
$string = 'Disney movies: <i><i>Aladdin</i>, <i>Beauty and the Beast</i>.';
The following misaligned output is given.
Disney movies: Aladdin, Beauty and the Beast.
0000000000000001000000000101000000000000000000010
(I'm also open to reasons why I'm going about the creation of the label data all wrong.)
I think I've got something. How about this:
function label_italics($string) {
return preg_replace(['/<i>/', '/<\/i>/', '/[^#]/', '/##0/', '/#0/'],
['#', '#', '0', '2', '1'], $string);
}
see: https://3v4l.org/cKG46
Note that you need to supply the string with the tags in it.
How does it work?
I use preg_replace() because it can use regular expressions, which I need once. This function goes through the two arrays and execute each replacement in order. First it replace all occurrences of <i> and </i> by # and anything else by 0. Then replaces ##0 by 2 and #0 by 1. The 2 is extra to be able to replace <i></i>. You can remove it, and simplify the function, if you don't need it.
The use of the # is arbitrary. You should use anything that doesn't clash with the content of your string.
Here's an updated version. It copes with tags at the end of the line and it ignores any # characters in the line.
function label_italics($string) {
return preg_replace(['/[^<\/i\>]/', '/<i>/', '/<\/i>/', '/i/', '/##0/', '/#0/'],
['0', '#', '#', '0', '2', '1'], $string . ' ');
}
See: https://3v4l.org/BTnLc
After some additional experimentation, this is what I arrived at:
$label_string = mb_ereg_replace('#0', '1', mb_ereg_replace('(#)\1+0', '1', mb_ereg_replace('\/', '0', mb_ereg_replace('i', '0', mb_ereg_replace('<\/i>', '#', mb_ereg_replace('<i>', '#', mb_ereg_replace('[^<\/i\>]', '0', mb_strtolower($featr_string))))))));
I couldn't get #KIKO Software's preg_replace()-based solution to work with multibyte strings. So I changed to this slightly ungainly, but better-operative, mb_ereg_replace()-based solution instead.

php preg_match excluding text within html tags/attributes to find correct place to cut a string

I am trying to determine the absolute position of certain words within a block of html, but only if they are outside of an actual html tag. For instance, if I wanted to determine the position of the word "join" using preg_match in this text:
<p>There are 14 more days until our holiday special so come join us!</p>
I could use:
preg_match('/join/', $post_content, $matches, PREG_OFFSET_CAPTURE, $offset);
The problem is that this is matching the word within the aria-label attribute, when what I need is the one just after the link. It would be fine to match between the <a> and </a>, just not inside the brackets themselves.
My actual end goal, most of what (I think) I have aside from this last element: I am trimming a block of html (not a full document) to cut off at a specific word count. I am trying to determine which character that last word ends at, and then joining the left side of the html block with only the html from the right side, so all html tags close gracefully. I thought I had it working until I ran into an example like I showed where the last word was also within an html attribute, causing me to split the string at the wrong location. This is my code so far:
$post_content = strip_tags ( $p->post_content, "<a><br><p><ul><li>" );
$post_content_stripped = strip_tags ( $p->post_content );
$post_content_stripped = preg_replace("/[^A-Za-z0-9 ]/", ' ', $post_content_stripped);
$post_content_stripped = preg_replace("/\s+/", ' ', $post_content_stripped);
$post_content_stripped_array = explode ( " " , trim($post_content_stripped) );
$excerpt_wordcount = count( $post_content_stripped_array );
$cutpos = 0;
while($excerpt_wordcount>48){
$thiswordrev = "/" . strrev($post_content_stripped_array[$excerpt_wordcount - 1]) . "/";
preg_match($thiswordrev, strrev($post_content), $matches, PREG_OFFSET_CAPTURE, $cutpos);
$cutpos = $matches[0][1] + (strlen($thiswordrev) - 2);
array_pop($post_content_stripped_array);
$excerpt_wordcount = count( $post_content_stripped_array );
}
if($pwordcount>$excerpt_wordcount){
preg_match_all('/<\/?[^>]*>/', substr( $post_content, strlen($post_content) - $cutpos ), $closetags_result);
$excerpt_closetags = "" . $closetags_result[0][0];
$post_excerpt = substr( $post_content, 0, strlen($post_content) - $cutpos ) . $excerpt_closetags;
}else{
$post_excerpt = $post_content;
}
I am actually searching the string in reverse in this case, since I am walking word by word backwards from the end of the string, so I know that my html brackets are backwards, eg:
>p/<!su nioj emoc os >a/<laiceps yadiloh>"su nioj"=lebal-aira "renepoon rerreferon"=ler "knalb_"=tegrat "lmth.egapemos/"=ferh a< ruo litnu syad erom 41 era erehT>p<
But it's easy enough to flip all of the brackets before doing the preg_match, or I am assuming should be easy enough to have the preg_match account for that.
Do not use regex to parse HTML.
You have a simple objective: limit the text content to a given number of words, ensuring that the HTML remains valid.
To this end, I would suggest looping through text nodes until you count a certain number of words, and then removing everything after that.
$dom = new DOMDocument();
$dom->loadHTML($post_content);
$xpath = new DOMXPath($dom);
$all_text_nodes = $xpath->query("//text()");
$words_left = 48;
foreach( $all_text_nodes as $text_node) {
$text = $text_node->textContent;
$words = explode(" ", $text); // TODO: maybe preg_split on /\s/ to support more whitespace types
$word_count = count($words);
if( $word_count < $words_left) {
$words_left -= $word_count;
continue;
}
// reached the threshold
$words_that_fit = implode(" ", array_slice($words, 0, $words_left));
// If the above TODO is implemented, this will need to be adjusted to keep the specific whitespace characters
$text_node->textContent = $words_that_fit;
$remove_after = $text_node;
while( $remove_after->parentNode) {
while( $remove_after->nextSibling) {
$remove_after->parentNode->removeChild($remove_after->nextSibling);
}
$remove_after = $remove_after->parentNode;
}
break;
}
$output = substr($dom->saveHTML($dom->getElementsByTagName("body")->item(0)), strlen("<body>"), -strlen("</body>"));
Live demo
Ok, I figured out a workaround. I don't know if this is the most elegant solution, so if someone sees a better one I would still love to hear it, but for now I realized that I don't have to actually have the html in the string I am searching to determine the position to cut, I just need it to be the same length. I grabbed all of the html elements and just created a dummy string replacing all of them with the same number of asterisks:
// create faux string with placeholders instead of html for search purposes
preg_match_all('/<\/?[^>]*>/', $post_content, $alltags_result);
$tagcount = count( $alltags_result );
$post_content_dummy = $post_content;
foreach($alltags_result[0] as $thistag){
$post_content_dummy = str_replace($thistag, str_repeat("*",strlen($thistag)), $post_content_dummy);
}
Then I just use $post_content_dummy in the while loop instead of $post_content, in order to find the cut position, and then $post_content for the actual cut. So far seems to be working fine.

PHP extra whitespace not being deleted

I'm counting words in an article and removing common words such as "and" or "the".
I"m removing them by use of preg_replace
after it is done I do a quick clean of extra white space by using.
$search_body = preg_replace('/\s+/',' ',$search_body);
However I've got some very stubborn white space that will not go away. I've tried
if($word == "" OR $word == " "){
//chop it's head off
}
But the if statement does not see $word as being just whitespace. I've also tried printing it to the screen to get the raw data type of it and it's still just showing up blank.
Here is the full regex that I'm using.
$pattern = array(
'/\&quot\;/',
'/[0-9]/',
'/\,/',
'/\./',
'/\!/',
'/\#/',
'/\#/',
'/\$/',
'/\%/',
'/\^/',
'/\&/',
'/\*/',
'/\(/',
'/\)/',
'/\_/',
'/\"/',
'/\'/',
'/\:/',
'/\;/',
'/\?/',
'/\`/',
'/\~/',
'/\[/',
'/\]/',
'/\{/',
'/\}/',
'/\|/',
'/\+/',
'/\=/',
'/\-/',
'/–/',
'/°/',
'/\bthe\b/',
'/\band\b/',
'/\bthat\b/',
'/\bhave\b/',
'/\bfor\b/',
'/\bnot\b/',
'/\bwith\b/',
'/\byou\b/',
'/\bthis\b/',
'/\bbut\b/',
'/\bhis\b/',
'/\bfrom\b/',
'/\bthey\b/',
'/\bsay\b/',
'/\bher\b/',
'/\bshe\b/',
'/\bwill\b/',
'/\bone\b/',
'/\ball\b/',
'/\bwould\b/',
'/\bthere\b/',
'/\btheir\b/',
'/\bwhat\b/',
'/\bout\b/',
'/\babout\b/',
'/\bwho\b/',
'/\bget\b/',
'/\bwhich\b/',
'/\bwhen\b/',
'/\bmake\b/',
'/\bcan\b/',
'/\blike\b/',
'/\btime\b/',
'/\bjust\b/',
'/\bhim\b/',
'/\bknow\b/',
'/\btake\b/',
'/\bpeople\b/',
'/\binto\b/',
'/\byear\b/',
'/\byour\b/',
'/\bgood\b/',
'/\bsome\b/',
'/\bcould\b/',
'/\bthem\b/',
'/\bsee\b/',
'/\bother\b/',
'/\bthan\b/',
'/\bthen\b/',
'/\bnow\b/',
'/\blook\b/',
'/\bonly\b/',
'/\bcome\b/',
'/\bits\b/', //it's?
'/\bover\b/',
'/\bthink\b/',
'/\balso\b/',
'/\bback\b/',
'/\bafter\b/',
'/\buse\b/',
'/\btwo\b/',
'/\bhow\b/',
'/\bour\b/',
'/\bwork\b/',
'/\bfirst\b/',
'/\bwell\b/',
'/\bway\b/',
'/\beven\b/',
'/\bnew\b/',
'/\bwant\b/',
'/\bbecause\b/',
'/\bany\b/',
'/\bthese\b/',
'/\bgive\b/',
'/\bday\b/',
'/\bmost\b/',
'/\bare\b/',
'/\bwas\b/',
'/\<\w+\>/', '/\<\/\w+\>/',
'/\b\w{1}\b/', //1 letter word
'/\b\w{2}\b/', //2 letter word
'/\//',
'/\</',
'/\>/'
);
$search_body = strip_tags($body);
$search_body = strtolower($search_body);
$search_body = preg_replace($pattern, ' ', $search_body);
$search_body = preg_replace('/\s+/',' ',$search_body);
$search_body = explode(" ", $search_body);
When exploded blank values show up left and right
Example text that I am using is too long to post here. But I copied and pasted
This article to give it a test and it showed 32 counts of white space, not including the white space in front of or behind of other words even after using trim().
Here's a js.fiddle of the raw data that is being handled by php.
htmlentities and htmlspecialchars also show nothing.
Here's the code counts all the values and puts them into one.
$inhere = array();
$body_hold = array();
foreach($search_body as $value){
$value = trim($value);
if(in_array($value, $inhere) && $value != ""){
$key = array_search($value, $inhere);
$body_hold[$key]['count'] = $body_hold[$key]['count']+1;
}elseif($value != ""){
$inhere[] = $value;
$body_hold[] = array(
'count' => 1,
'word' => $value
);
}
}
rsort($body_hold);
Basic foreach to see values.
foreach($body_hold as $value){
$count = $value['count'];
$word = trim($value['word']);
echo "Count: ".$count;
echo " Word: ".$word;
echo '<br>';
}
Here's a PHP example of what it's returning
Are you sure you put the exact same data you're processing in the js.fiddle? Or did you get it from a subsequent post-processed step?
It's obviously a Wikipedia article. I went to that article on Wikipedia and opened it in Edit mode, and saw that there are s in the raw wikitext. However, those nbsp's don't appear in your js.fiddle data.
TL;DR: Check for in your processing (and convert to spaces, etc.).
This character 160 looks like space but it's not, replacing all of them to the regular spaces (32) and then removing all the double spaces will fix your problem.
$search_body = str_replace(chr(160), chr(32), $search_body);
$search_body = trim(preg_replace('/\s+/', ' ', $search_body));

Preg_replace do not replace everything

$content = $this->comment->getContent(true);
$bbcodes = array (
'#\[cytat=(.*?)\](.*?)\[/cytat\]#' => '<div class="cytata">\\1 napisał/a </div> <div class="cytatb">\\2</div>',
'#\[cytat\](.*?)\[/cytat\]#' => '<div class="cytata">cytat</div><div class="cytatb">\\1</div>',
);
$content = preg_replace(array_keys($bbcodes), array_values($bbcodes), $content);
That preg_replace is not replacing every tag like that should.
For example if there will be only one tag [cytat]some text[/cytat] (cytat means quote in polish) then everything will be ok and the output will be
<div class="cytata">author napisał/a </div> <div class="cytatb">some text</div>
but there will be more than a one quote then preg is replacing only one tag, for example
<div class="cytata">o0skar napisał/a </div> <div class="cytatb">[cytat=o0skar]test nr2</div>[/cytat]
thats the output of the double quote, etc. Any ideas? Something wrong?
Maybe I can put preg_replace in while loop, but i dont know if preg_replace returns any variable.
For the sake of regular expressions awesomeness, let's look at this one. I had to change the pattern by 1 character. I removed one of the lazy ? and made this a preg_replace_callback
function pregcallbackfunc($matches){
$pattern = '#\[cytat=(.*?)\](.*)\[/cytat\]#';
if(preg_match($pattern, $matches[2])){
$matches[2] = preg_replace_callback($pattern,'pregcallbackfunc', $matches[2]);
}
if($matches[2]){
return '<div class="cytata">'.$matches[1].' napisał/a </div> <div class="cytatb">'.$matches[2].'</div>';
}
return '<div class="cytata">cytat</div><div class="cytatb">'.$matches[1].'</div>';
}
$content = '[cytat=o0skar][cytat=o0skar]test nr2[/cytat][/cytat]';
$content = preg_replace_callback('#\[cytat=(.*?)\](.*)\[/cytat\]#', 'pregcallbackfunc', $content);
Making this recursive will guarantee any level of nested quotes.

Create a blurb from text from a database (similar to "read more")

I am trying to cut text from a database off if <!-- break --> is found then only show what is before the break. I currently have this code
//get the description before the more link
$project_blurb = htmlspecialchars_decode($project_data['p_desc']);
if (strstr($project_blurb, '<!-- break -->')) {
$project_blurb = explode("<!-- break -->" , $project_blurb);
$project_desc = $project_blurb[0];
}
else{
$project_desc = $project_blurb;
}
This works except that the text starts with a <p> tag and currently I'm cutting it off the ending </p> tag which is breaking the html on the site. I want to know if there is a better way to only get the description before the "break". I've tried to us strip_tags but this also strips the comment that I need to search for and creates some ugly formatting.
Thanks
A solution I came up with is similar to jheddings method.
I corrected his script up and used a code snippet I found here
Snipplr Close Tags In A HTML Snippet
To find open tags and close them (Note that I am assuming you really only care about closing p tags)
Note: The snippet may have shortcomings but it managed to get the job done for the example I was working with
So in the example script below I am taking the sample blurb cutting out everything after the break marker and appending "..." to it. Then we strip_tag everything except the p tags. Then I am using the closetags function to match all tags and close any that are unmatched.
It is far from neat but if your data set is simple enough it may be a quick way to go about it.
<?php
$project_blurb = "<p>This is a blurb with content</p><p>This is another<!-- break -->blurb</p>";
if ($pos = strpos($project_blurb, '<!-- break -->')) {
$project_desc = substr($project_blurb, 0, $pos)."...";
} else {
$project_desc = $project_blurb;
}
$project_desc = strip_tags($project_desc, '<p>');
$project_desc = closetags($project_desc);
echo $project_desc;
function closetags ( $html )
{
#put all opened tags into an array
preg_match_all ( "#<([a-z]+)( .*)?(?!/)>#iU", $html, $result );
$openedtags = $result[1];
#put all closed tags into an array
preg_match_all ( "#</([a-z]+)>#iU", $html, $result );
$closedtags = $result[1];
$len_opened = count ( $openedtags );
# all tags are closed
if( count ( $closedtags ) == $len_opened )
{
return $html;
}
$openedtags = array_reverse ( $openedtags );
# close tags
for( $i = 0; $i < $len_opened; $i++ )
{
if ( !in_array ( $openedtags[$i], $closedtags ) )
{
$html .= "</" . $openedtags[$i] . ">";
}
else
{
unset ( $closedtags[array_search ( $openedtags[$i], $closedtags)] );
}
}
return $html;
}
?>
You guys are all doing it wrong!!
It's coming fromt he DB so this is how i do it:
SELECT SUBSTR(description, 1, INSTR(description, '<!-- break -->') -1);
Why make it overtly difficult w/ programming languages???
Demo!!
CREATE TEMPORARY TABLE foo (description text);
INSERT INTO foo VALUES ('This is a really long paragraph. <!-- break --> OK, not really ;-)');
SELECT SUBSTRING(description, 1, INSTR(description, '<!-- break -->') - 1) AS description FROM foo;
// Output: This is a really long paragraph.
If you need to select up to your break, call strip tags after you get the field. Also, you can make it a little more efficient by only searching for your break once:
<?
$project_blurb = htmlspecialchars_decode($project_data['p_desc']);
if (($pos = strpos('<!-- break -->'), $project_blurb) >= 0) {
$project_desc = substr(0, $pos);
} else {
$project_desc = $project_blurb;
}
$project_desc = strip_tags($project_desc);
?>
Instead of putting a special comment in your text, consider defining a maximum length for your description, so you don't have to search the string. This will be much more efficient and won't require you to modify your input.
If you are going to link to the full-text from your page, instead of expanding it in place, you could also consider letting your database do some of the work for you. Assuming you are using MySQL, you could use LEFT(p_desc, 200) to pull only the first 200 characters of the database from the p_desc field.

Categories