Truncate text without truncating HTML

Truncate text without truncating HTML - php

This string has 78 characters with HTML and 39 characters without HTML:
<p>I really like the Google search engine.</p>
I want to truncate this string based on the non-HTML character count, so for example if I wanted to truncate the above string to 24 characters, the output would be:
I really like the Google
The truncation did not take into account the html when determining the number of characters to cut off, it only considered the stripped count. However, it didn't leave open HTML tags.

Alright so this is what I put together and it seems to be working:
function truncate_html($string, $length, $postfix = '…', $isHtml = true) {
$string = trim($string);
$postfix = (strlen(strip_tags($string)) > $length) ? $postfix : '';
$i = 0;
$tags = []; // change to array() if php version < 5.4
if($isHtml) {
preg_match_all('/<[^>]+>([^<]*)/', $string, $tagMatches, PREG_OFFSET_CAPTURE | PREG_SET_ORDER);
foreach($tagMatches as $tagMatch) {
if ($tagMatch[0][1] - $i >= $length) {
break;
}
$tag = substr(strtok($tagMatch[0][0], " \t\n\r\0\x0B>"), 1);
if ($tag[0] != '/') {
$tags[] = $tag;
}
elseif (end($tags) == substr($tag, 1)) {
array_pop($tags);
}
$i += $tagMatch[1][1] - $tagMatch[0][1];
}
}
return substr($string, 0, $length = min(strlen($string), $length + $i)) . (count($tags = array_reverse($tags)) ? '</' . implode('></', $tags) . '>' : '') . $postfix;
}
Usage:
truncate_html('<p>I really like the Google search engine.</p>', 24);
The function was grabbed from (made a small modification):
http://www.dzone.com/snippets/truncate-text-preserving-html

Related

PHP substring function returns odd symbol at the end [duplicate]

How can I get the first n characters of a string in PHP? What's the fastest way to trim a string to a specific number of characters, and append '...' if needed?

//The simple version for 10 Characters from the beginning of the string
$string = substr($string,0,10).'...';
Update:
Based on suggestion for checking length (and also ensuring similar lengths on trimmed and untrimmed strings):
$string = (strlen($string) > 13) ? substr($string,0,10).'...' : $string;
So you will get a string of max 13 characters; either 13 (or less) normal characters or 10 characters followed by '...'
Update 2:
Or as function:
function truncate($string, $length, $dots = "...") {
return (strlen($string) > $length) ? substr($string, 0, $length - strlen($dots)) . $dots : $string;
}
Update 3:
It's been a while since I wrote this answer and I don't actually use this code any more. I prefer this function which prevents breaking the string in the middle of a word using the wordwrap function:
function truncate($string,$length=100,$append="…") {
$string = trim($string);
if(strlen($string) > $length) {
$string = wordwrap($string, $length);
$string = explode("\n", $string, 2);
$string = $string[0] . $append;
}
return $string;
}

This functionality has been built into PHP since version 4.0.6. See the docs.
echo mb_strimwidth('Hello World', 0, 10, '...');
// outputs Hello W...
Note that the trimmarker (the ellipsis above) are included in the truncated length.

The Multibyte extension can come in handy if you need control over the string charset.
$charset = 'UTF-8';
$length = 10;
$string = 'Hai to yoo! I like yoo soo!';
if(mb_strlen($string, $charset) > $length) {
$string = mb_substr($string, 0, $length - 3, $charset) . '...';
}

sometimes, you need to limit the string to the last complete word ie: you don't want the last word to be broken instead you stop with the second last word.
eg:
we need to limit "This is my String" to 6 chars but instead of 'This i..." we want it to be 'This..." ie we will skip that broken letters in the last word.
phew, am bad at explaining, here is the code.
class Fun {
public function limit_text($text, $len) {
if (strlen($text) < $len) {
return $text;
}
$text_words = explode(' ', $text);
$out = null;
foreach ($text_words as $word) {
if ((strlen($word) > $len) && $out == null) {
return substr($word, 0, $len) . "...";
}
if ((strlen($out) + strlen($word)) > $len) {
return $out . "...";
}
$out.=" " . $word;
}
return $out;
}
}

If you want to cut being careful to don't split words you can do the following
function ellipse($str,$n_chars,$crop_str=' [...]')
{
$buff=strip_tags($str);
if(strlen($buff) > $n_chars)
{
$cut_index=strpos($buff,' ',$n_chars);
$buff=substr($buff,0,($cut_index===false? $n_chars: $cut_index+1)).$crop_str;
}
return $buff;
}
if $str is shorter than $n_chars returns it untouched.
If $str is equal to $n_chars returns it as is as well.
if $str is longer than $n_chars then it looks for the next space to cut or (if no more spaces till the end) $str gets cut rudely instead at $n_chars.
NOTE: be aware that this method will remove all tags in case of HTML.

The codeigniter framework contains a helper for this, called the "text helper". Here's some documentation from codeigniter's user guide that applies: http://codeigniter.com/user_guide/helpers/text_helper.html
(just read the word_limiter and character_limiter sections).
Here's two functions from it relevant to your question:
if ( ! function_exists('word_limiter'))
{
function word_limiter($str, $limit = 100, $end_char = '…')
{
if (trim($str) == '')
{
return $str;
}
preg_match('/^\s*+(?:\S++\s*+){1,'.(int) $limit.'}/', $str, $matches);
if (strlen($str) == strlen($matches[0]))
{
$end_char = '';
}
return rtrim($matches[0]).$end_char;
}
}
And
if ( ! function_exists('character_limiter'))
{
function character_limiter($str, $n = 500, $end_char = '…')
{
if (strlen($str) < $n)
{
return $str;
}
$str = preg_replace("/\s+/", ' ', str_replace(array("\r\n", "\r", "\n"), ' ', $str));
if (strlen($str) <= $n)
{
return $str;
}
$out = "";
foreach (explode(' ', trim($str)) as $val)
{
$out .= $val.' ';
if (strlen($out) >= $n)
{
$out = trim($out);
return (strlen($out) == strlen($str)) ? $out : $out.$end_char;
}
}
}
}

if(strlen($text) > 10)
$text = substr($text,0,10) . "...";

Use substring
http://php.net/manual/en/function.substr.php
$foo = substr("abcde",0, 3) . "...";

I'm not sure if this is the fastest solution, but it looks like it is the shortest one:
$result = current(explode("\n", wordwrap($str, $width, "...\n")));
P.S. See some examples here https://stackoverflow.com/a/17852480/131337

This function do the job without breaking words in the middle
function str_trim($str,$char_no){
if(strlen($str)<=$char_no)
return $str;
else{
$all_words=explode(" ",$str);
$out_str='';
foreach ($all_words as $word) {
$temp_str=($out_str=='')?$word:$out_str.' '.$word;
if(strlen($temp_str)>$char_no-3)//-3 for 3 dots
return $out_str."...";
$out_str=$temp_str;
}
}
}

The function I used:
function cutAfter($string, $len = 30, $append = '...') {
return (strlen($string) > $len) ?
substr($string, 0, $len - strlen($append)) . $append :
$string;
}
See it in action.

This is what i do
function cutat($num, $tt){
if (mb_strlen($tt)>$num){
$tt=mb_substr($tt,0,$num-2).'...';
}
return $tt;
}
where $num stands for number of chars, and $tt the string for manipulation.

I developed a function for this use
function str_short($string,$limit)
{
$len=strlen($string);
if($len>$limit)
{
$to_sub=$len-$limit;
$crop_temp=substr($string,0,-$to_sub);
return $crop_len=$crop_temp."...";
}
else
{
return $string;
}
}
you just call the function with string and limite
eg:str_short("hahahahahah",5);
it will cut of your string and add "..." at the end
:)

To create within a function (for repeat usage) and dynamical limited length, use:
function string_length_cutoff($string, $limit, $subtext = '...')
{
return (strlen($string) > $limit) ? substr($string, 0, ($limit-strlen(subtext))).$subtext : $string;
}
// example usage:
echo string_length_cutoff('Michelle Lee Hammontree-Garcia', 26);
// or (for custom substitution text
echo string_length_cutoff('Michelle Lee Hammontree-Garcia', 26, '..');

It's best to abstract you're code like so (notice the limit is optional and defaults to 10):
print limit($string);
function limit($var, $limit=10)
{
if ( strlen($var) > $limit )
{
return substr($string, 0, $limit) . '...';
}
else
{
return $var;
}
}

substr() would be best, you'll also want to check the length of the string first
$str = 'someLongString';
$max = 7;
if(strlen($str) > $max) {
$str = substr($str, 0, $max) . '...';
}
wordwrap won't trim the string down, just split it up...

$width = 10;
$a = preg_replace ("~^(.{{$width}})(.+)~", '\\1…', $a);
or with wordwrap
$a = preg_replace ("~^(.{1,${width}}\b)(.+)~", '\\1…', $a);

this solution will not cut words, it will add three dots after the first space.
I edited #Raccoon29 solution and I replaced all functions with mb_ functions so that this will work for all languages such as arabic
function cut_string($str, $n_chars, $crop_str = '...') {
$buff = strip_tags($str);
if (mb_strlen($buff) > $n_chars) {
$cut_index = mb_strpos($buff, ' ', $n_chars);
$buff = mb_substr($buff, 0, ($cut_index === false ? $n_chars : $cut_index + 1), "UTF-8") . $crop_str;
}
return $buff;
}

$yourString = "bla blaaa bla blllla bla bla";
$out = "";
if(strlen($yourString) > 22) {
while(strlen($yourString) > 22) {
$pos = strrpos($yourString, " ");
if($pos !== false && $pos <= 22) {
$out = substr($yourString,0,$pos);
break;
} else {
$yourString = substr($yourString,0,$pos);
continue;
}
}
} else {
$out = $yourString;
}
echo "Output String: ".$out;

If there is no hard requirement on the length of the truncated string, one can use this to truncate and prevent cutting the last word as well:
$text = "Knowledge is a natural right of every human being of which no one
has the right to deprive him or her under any pretext, except in a case where a
person does something which deprives him or her of that right. It is mere
stupidity to leave its benefits to certain individuals and teams who monopolize
these while the masses provide the facilities and pay the expenses for the
establishment of public sports.";
// we don't want new lines in our preview
$text_only_spaces = preg_replace('/\s+/', ' ', $text);
// truncates the text
$text_truncated = mb_substr($text_only_spaces, 0, mb_strpos($text_only_spaces, " ", 50));
// prevents last word truncation
$preview = trim(mb_substr($text_truncated, 0, mb_strrpos($text_truncated, " ")));
In this case, $preview will be "Knowledge is a natural right of every human being".
Live code example:
http://sandbox.onlinephpfunctions.com/code/25484a8b687d1f5ad93f62082b6379662a6b4713

How to fit text block into div?

I have a PHP script that echo's a string into an html div.
The div has room for only 50 chars in width, so if the string is longer, I have to cut it in to how ever many lines it takes.
So I can use strlen to see where I should be cutting the string and echo a <br>. Problem is, I don't want to cut it in the middle of a word.
I had a solution in mind, but it seems that i'm over complicating this.
Thanks,

Just add an attribute class="wrapping" to the div you want to have this behaviour, and add this css to your HTML page:
div.wrapping {
word-wrap: break-word;
}

http://www.totallyphp.co.uk/code/shorten_a_text_string.htm
You can use this simple function to shorten your text

There are some examples in the PHP doc page for substr. Copy/pasting a couple:
$str = "aa bb ccc ddd ee fff gg hhh iii";
echo substr(($str=wordwrap($str,$len,'$$')),0,strpos($str,'$$'));
and
function _substr($str, $length, $minword = 3)
{
$sub = '';
$len = 0;
foreach (explode(' ', $str) as $word)
{
$part = (($sub != '') ? ' ' : '') . $word;
$sub .= $part;
$len += strlen($part);
if (strlen($word) > $minword && strlen($sub) >= $length)
{
break;
}
}
return $sub . (($len < strlen($str)) ? '...' : '');
}

wordwrap($string, 50, "<br>\n")

write function
// Original PHP code by Chirp Internet: www.chirp.com.au
// Please acknowledge use of this code by including this header.
function myTruncate($string, $limit, $break=".", $pad="...") {
// return with no change if string is shorter than $limit
if(strlen($string) <= $limit) return $string;
// is $break present between $limit and the end of the string?
if(false !== ($breakpoint = strpos($string, $break, $limit))) {
if($breakpoint < strlen($string) - 1) {
$string = substr($string, 0, $breakpoint) . $pad;
}
}
return $string;
}

Extract HTML-like tags with PHP

I'm trying to extract OUTERMOST special HTML-like tags from a given string. Here's a sample string:
sample string with <::Class id="some id\" and more">text with possible other tags inside<::/Class> some more text
I need to find where in a string a <::Tag starts and where it ends. The problem is it might contain nested tags inside. Is there a simple loop-like algorithm to find the FIRST ocurrence of the <::Tag and the length of the string until the matching <::/Tag>? I've tried a different way, using a simple HTML tag instead and using DomDocument, but it cannot tell me the position of the tag in a string. I cannot use external libraries, i'm just looking for pointers as to how this could be solved. Maybe you've seen an algorithm that does exactly that - i'd like to have a look at it.
Thanks for the help.
P.S. regex solutions will not work since there are nested tags. Recursive regex solutions will not work as well. I'm just looking for a very simple parsing algorighm for this specific case.

What you're talking about here is making a template. Regex for parsing templates is very slow. Instead, your template-reading/processing engine should be doing a string parse. It's not super-easy, but it's also not terribly hard. Still, my advice is use another template library instead of reinventing the wheel.
There's an open-source template engine in PHPBB that you could utilize or learn from. Or, use something like Smarty. If performance is a major deal, have a look at Blitz.

strpos + strrpos (ouch...)
$str = 'sample string with <::Class id="some id" and more">text with possible <::Strong>other<::/Strong> tags inside<::/Class> some more text';
$tag = '<::';
$first = strpos($str, $tag);
$last = strrpos($str, $tag);
$rtn = array();
$cnt = 0;
while ($first<$last)
{
if (!$cnt)
{
$rtn[] = substr($str, 0, $first);
}
++$cnt;
$next = strpos($str, $tag, $first+1);
if ($next)
{
$pos = strpos($str, '>', $first);
$rtn[] = substr($str, $first, $pos-$first+1);
$rtn[] = substr($str, $pos+1, $next-$pos-1);
$first = $next;
}
}
With the $rtn, you can do whatever you want then ... this code is not perfect yet ...
array (
0 => 'sample string with ',
1 => '<::Class id="some id" and more">',
2 => 'text with possible ',
3 => '<::Strong>',
4 => 'other',
5 => '<::/Strong>',
6 => ' tags inside',
7 => '<::/Class> some more text',
)

So basically here's what i came up with. Something like ajreal's solution only not as clean ;] Not even sure if it works perfectly yet, initial testing was successful.
protected function findFirstControl()
{
$pos = strpos($this->mSource, '<::');
if ($pos === false)
return false;
// get the control name
$endOfName = false;
$controlName = '';
$len = strlen($this->mSource);
$i = $pos + 3;
while (!$endOfName && $i < $len)
{
$char = $this->mSource[$i];
if (($char >= 'a' && $char <= 'z') || ($char >= 'A' && $char <= 'Z'))
$controlName .= $char;
else
$endOfName = true;
$i++;
}
if ($controlName == '')
return false;
$posOfEnd = strpos($this->mSource, '<::/' . $controlName, $i);
$posOfStart = strpos($this->mSource, '<::' . $controlName, $i);
if ($posOfEnd === false)
return false;
if ($posOfStart > $pos)
{
while ($posOfStart > $pos && $posOfEnd !== false && $posOfStart < $posOfEnd)
{
$i = $posOfStart + 1;
$n = $posOfEnd + 1;
$posOfStart = strpos($this->mSource, '<::' . $controlName, $i);
$posOfEnd = strpos($this->mSource, '<::/' . $controlName, $n);
}
}
if ($posOfEnd !== false)
{
$ln = $posOfEnd - $pos + strlen($controlName) + 5;
return array($pos, $ln, $controlName, substr($this->mSource, $pos, $ln));
}
else
return false;
}

Not an extendable solution, but it works.
$startPos = strpos($string, '<::Class');
$endPos = strrpos($string, '<::/Class>');
Note my use of strrpos to fix the nesting problem. Also, this will give you the start position of <::/Class>, not the end.
Why don't you just use regular XML and the DOM? Or just an existing template engine like Smarty?

optimizing a php function that trims strings

i programmed this php function that takes any text/html string and trims it.
For example:
gen_string("Hello, how are you today?",10);
Returns:
Hello, how...
The problem arises when the function string limit is the same as the position of a special character such as: á, ñ, etc...
In which case:
gen_string("Helló my friend",5);
Returns: Hell�...
Any ideas on how to solve this issue? This is the current function:
# string: advanced substr
function gen_string($string,$min,$clean=false) {
$text = trim(strip_tags($string));
if(strlen($text)>$min) {
$blank = strpos($text,' ');
if($blank) {
# limit plus last word
$extra = strpos(substr($text,$min),' ');
$max = $min+$extra;
$r = substr($text,0,$max);
if(strlen($text)>=$max && !$clean) $r=trim($r,'.').'...';
} else {
# if there are no spaces
$r = substr($text,0,$min).'...';
}
} else {
# if original length is lower than limit
$r = $text;
}
return trim($r);
}
Thanks!

You should use the multibyte string functions to correctly handle unicode characters.
For example you could try using mb_strimwidth to truncate a string to a specified length.

You could also take a different approach and make use of the PCRE regex extension's UTF-8 capabilities (assuming your strings are UTF-8!).
function gen_string($string, $length)
{
$str = trim(strip_tags($string));
$strlen = strlen(utf8_decode($str));
// String is less than limit
if ($strlen <= $length) return $str;
// Shorten string, preserving whole "words" (non-whitespace)
preg_match('/^.{'.($length-1).'}\S*/su', $str, $match);
// Append ellipsis if needed (bytes length is OK to check)
if (strlen($match[0]) !== strlen($str)) $match[0] .= '...';
return $match[0];
}

Aside from the multibyte issue, maybe you can write it shorter
function gen_string($str, $limit) {
if ($str >= strlen($limit))
return $str;
$offset = -(strlen($str) - $limit);
return substr($str, 0, strrpos($str, ' ', $offset)).'...';
}
It will limit the length of the string, so rather than cut it after the first word beyond the limit, it ensures that the length is never larger than the limit.

strlen() cannot be used for UTF-8 string, because it would count also the continuation characters, which should not be counted.
You can try with the following code:
define('PREG_CLASS_UNICODE_WORD_BOUNDARY',
'\x{0}-\x{2F}\x{3A}-\x{40}\x{5B}-\x{60}\x{7B}-\x{A9}\x{AB}-\x{B1}\x{B4}' .
'\x{B6}-\x{B8}\x{BB}\x{BF}\x{D7}\x{F7}\x{2C2}-\x{2C5}\x{2D2}-\x{2DF}' .
'\x{2E5}-\x{2EB}\x{2ED}\x{2EF}-\x{2FF}\x{375}\x{37E}-\x{385}\x{387}\x{3F6}' .
'\x{482}\x{55A}-\x{55F}\x{589}-\x{58A}\x{5BE}\x{5C0}\x{5C3}\x{5C6}' .
'\x{5F3}-\x{60F}\x{61B}-\x{61F}\x{66A}-\x{66D}\x{6D4}\x{6DD}\x{6E9}' .
'\x{6FD}-\x{6FE}\x{700}-\x{70F}\x{7F6}-\x{7F9}\x{830}-\x{83E}' .
'\x{964}-\x{965}\x{970}\x{9F2}-\x{9F3}\x{9FA}-\x{9FB}\x{AF1}\x{B70}' .
'\x{BF3}-\x{BFA}\x{C7F}\x{CF1}-\x{CF2}\x{D79}\x{DF4}\x{E3F}\x{E4F}' .
'\x{E5A}-\x{E5B}\x{F01}-\x{F17}\x{F1A}-\x{F1F}\x{F34}\x{F36}\x{F38}' .
'\x{F3A}-\x{F3D}\x{F85}\x{FBE}-\x{FC5}\x{FC7}-\x{FD8}\x{104A}-\x{104F}' .
'\x{109E}-\x{109F}\x{10FB}\x{1360}-\x{1368}\x{1390}-\x{1399}\x{1400}' .
'\x{166D}-\x{166E}\x{1680}\x{169B}-\x{169C}\x{16EB}-\x{16ED}' .
'\x{1735}-\x{1736}\x{17B4}-\x{17B5}\x{17D4}-\x{17D6}\x{17D8}-\x{17DB}' .
'\x{1800}-\x{180A}\x{180E}\x{1940}-\x{1945}\x{19DE}-\x{19FF}' .
'\x{1A1E}-\x{1A1F}\x{1AA0}-\x{1AA6}\x{1AA8}-\x{1AAD}\x{1B5A}-\x{1B6A}' .
'\x{1B74}-\x{1B7C}\x{1C3B}-\x{1C3F}\x{1C7E}-\x{1C7F}\x{1CD3}\x{1FBD}' .
'\x{1FBF}-\x{1FC1}\x{1FCD}-\x{1FCF}\x{1FDD}-\x{1FDF}\x{1FED}-\x{1FEF}' .
'\x{1FFD}-\x{206F}\x{207A}-\x{207E}\x{208A}-\x{208E}\x{20A0}-\x{20B8}' .
'\x{2100}-\x{2101}\x{2103}-\x{2106}\x{2108}-\x{2109}\x{2114}' .
'\x{2116}-\x{2118}\x{211E}-\x{2123}\x{2125}\x{2127}\x{2129}\x{212E}' .
'\x{213A}-\x{213B}\x{2140}-\x{2144}\x{214A}-\x{214D}\x{214F}' .
'\x{2190}-\x{244A}\x{249C}-\x{24E9}\x{2500}-\x{2775}\x{2794}-\x{2B59}' .
'\x{2CE5}-\x{2CEA}\x{2CF9}-\x{2CFC}\x{2CFE}-\x{2CFF}\x{2E00}-\x{2E2E}' .
'\x{2E30}-\x{3004}\x{3008}-\x{3020}\x{3030}\x{3036}-\x{3037}' .
'\x{303D}-\x{303F}\x{309B}-\x{309C}\x{30A0}\x{30FB}\x{3190}-\x{3191}' .
'\x{3196}-\x{319F}\x{31C0}-\x{31E3}\x{3200}-\x{321E}\x{322A}-\x{3250}' .
'\x{3260}-\x{327F}\x{328A}-\x{32B0}\x{32C0}-\x{33FF}\x{4DC0}-\x{4DFF}' .
'\x{A490}-\x{A4C6}\x{A4FE}-\x{A4FF}\x{A60D}-\x{A60F}\x{A673}\x{A67E}' .
'\x{A6F2}-\x{A716}\x{A720}-\x{A721}\x{A789}-\x{A78A}\x{A828}-\x{A82B}' .
'\x{A836}-\x{A839}\x{A874}-\x{A877}\x{A8CE}-\x{A8CF}\x{A8F8}-\x{A8FA}' .
'\x{A92E}-\x{A92F}\x{A95F}\x{A9C1}-\x{A9CD}\x{A9DE}-\x{A9DF}' .
'\x{AA5C}-\x{AA5F}\x{AA77}-\x{AA79}\x{AADE}-\x{AADF}\x{ABEB}' .
'\x{D800}-\x{F8FF}\x{FB29}\x{FD3E}-\x{FD3F}\x{FDFC}-\x{FDFD}' .
'\x{FE10}-\x{FE19}\x{FE30}-\x{FE6B}\x{FEFF}-\x{FF0F}\x{FF1A}-\x{FF20}' .
'\x{FF3B}-\x{FF40}\x{FF5B}-\x{FF65}\x{FFE0}-\x{FFFD}');
function utf8_strlen($text) {
if (function_exists('mb_strlen')) {
return mb_strlen($text);
}
// Do not count UTF-8 continuation bytes.
return strlen(preg_replace("/[\x80-\xBF]/", '', $text));
}
function utf8_truncate($string, $max_length, $wordsafe = FALSE, $add_ellipsis = FALSE, $min_wordsafe_length = 1) {
$ellipsis = '';
$max_length = max($max_length, 0);
$min_wordsafe_length = max($min_wordsafe_length, 0);
if (utf8_strlen($string) <= $max_length) {
// No truncation needed, so don't add ellipsis, just return.
return $string;
}
if ($add_ellipsis) {
// Truncate ellipsis in case $max_length is small.
$ellipsis = utf8_substr('...', 0, $max_length);
$max_length -= utf8_strlen($ellipsis);
$max_length = max($max_length, 0);
}
if ($max_length <= $min_wordsafe_length) {
// Do not attempt word-safe if lengths are bad.
$wordsafe = FALSE;
}
if ($wordsafe) {
$matches = array();
// Find the last word boundary, if there is one within $min_wordsafe_length
// to $max_length characters. preg_match() is always greedy, so it will
// find the longest string possible.
$found = preg_match('/^(.{' . $min_wordsafe_length . ',' . $max_length . '})[' . PREG_CLASS_UNICODE_WORD_BOUNDARY . ']/u', $string, $matches);
if ($found) {
$string = $matches[1];
}
else {
$string = utf8_substr($string, 0, $max_length);
}
}
else {
$string = utf8_substr($string, 0, $max_length);
}
if ($add_ellipsis) {
$string .= $ellipsis;
}
return $string;
}
function utf8_substr($text, $start, $length = NULL) {
if (function_exists('mb_substr')) {
return $length === NULL ? mb_substr($text, $start) : mb_substr($text, $start, $length);
}
else {
$strlen = strlen($text);
// Find the starting byte offset.
$bytes = 0;
if ($start > 0) {
// Count all the continuation bytes from the start until we have found
// $start characters or the end of the string.
$bytes = -1;
$chars = -1;
while ($bytes < $strlen - 1 && $chars < $start) {
$bytes++;
$c = ord($text[$bytes]);
if ($c < 0x80 || $c >= 0xC0) {
$chars++;
}
}
}
elseif ($start < 0) {
// Count all the continuation bytes from the end until we have found
// abs($start) characters.
$start = abs($start);
$bytes = $strlen;
$chars = 0;
while ($bytes > 0 && $chars < $start) {
$bytes--;
$c = ord($text[$bytes]);
if ($c < 0x80 || $c >= 0xC0) {
$chars++;
}
}
}
$istart = $bytes;
// Find the ending byte offset.
if ($length === NULL) {
$iend = $strlen;
}
elseif ($length > 0) {
// Count all the continuation bytes from the starting index until we have
// found $length characters or reached the end of the string, then
// backtrace one byte.
$iend = $istart - 1;
$chars = -1;
$last_real = FALSE;
while ($iend < $strlen - 1 && $chars < $length) {
$iend++;
$c = ord($text[$iend]);
$last_real = FALSE;
if ($c < 0x80 || $c >= 0xC0) {
$chars++;
$last_real = TRUE;
}
}
// Backtrace one byte if the last character we found was a real character
// and we don't need it.
if ($last_real && $chars >= $length) {
$iend--;
}
}
elseif ($length < 0) {
// Count all the continuation bytes from the end until we have found
// abs($start) characters, then backtrace one byte.
$length = abs($length);
$iend = $strlen;
$chars = 0;
while ($iend > 0 && $chars < $length) {
$iend--;
$c = ord($text[$iend]);
if ($c < 0x80 || $c >= 0xC0) {
$chars++;
}
}
// Backtrace one byte if we are not at the beginning of the string.
if ($iend > 0) {
$iend--;
}
}
else {
// $length == 0, return an empty string.
return '';
}
return substr($text, $istart, max(0, $iend - $istart + 1));
}
}

For your return statement you could try:
return htmlspecialchars(trim($r));
EDIT: I tried your code as you provided it and it ran fine for me without having to use htmlspecialchars(). This is probably due to the face that in the <head> of the page the code was running on, the charset was set to UTF-8. So your options could be to set the encoding of the page like this:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
or to use htmlspecialchars() as above.

Truncate a string to first n characters of a string and add three dots if any characters are removed

How can I get the first n characters of a string in PHP? What's the fastest way to trim a string to a specific number of characters, and append '...' if needed?

//The simple version for 10 Characters from the beginning of the string
$string = substr($string,0,10).'...';
Update:
Based on suggestion for checking length (and also ensuring similar lengths on trimmed and untrimmed strings):
$string = (strlen($string) > 13) ? substr($string,0,10).'...' : $string;
So you will get a string of max 13 characters; either 13 (or less) normal characters or 10 characters followed by '...'
Update 2:
Or as function:
function truncate($string, $length, $dots = "...") {
return (strlen($string) > $length) ? substr($string, 0, $length - strlen($dots)) . $dots : $string;
}
Update 3:
It's been a while since I wrote this answer and I don't actually use this code any more. I prefer this function which prevents breaking the string in the middle of a word using the wordwrap function:
function truncate($string,$length=100,$append="…") {
$string = trim($string);
if(strlen($string) > $length) {
$string = wordwrap($string, $length);
$string = explode("\n", $string, 2);
$string = $string[0] . $append;
}
return $string;
}

This functionality has been built into PHP since version 4.0.6. See the docs.
echo mb_strimwidth('Hello World', 0, 10, '...');
// outputs Hello W...
Note that the trimmarker (the ellipsis above) are included in the truncated length.

The Multibyte extension can come in handy if you need control over the string charset.
$charset = 'UTF-8';
$length = 10;
$string = 'Hai to yoo! I like yoo soo!';
if(mb_strlen($string, $charset) > $length) {
$string = mb_substr($string, 0, $length - 3, $charset) . '...';
}

sometimes, you need to limit the string to the last complete word ie: you don't want the last word to be broken instead you stop with the second last word.
eg:
we need to limit "This is my String" to 6 chars but instead of 'This i..." we want it to be 'This..." ie we will skip that broken letters in the last word.
phew, am bad at explaining, here is the code.
class Fun {
public function limit_text($text, $len) {
if (strlen($text) < $len) {
return $text;
}
$text_words = explode(' ', $text);
$out = null;
foreach ($text_words as $word) {
if ((strlen($word) > $len) && $out == null) {
return substr($word, 0, $len) . "...";
}
if ((strlen($out) + strlen($word)) > $len) {
return $out . "...";
}
$out.=" " . $word;
}
return $out;
}
}

If you want to cut being careful to don't split words you can do the following
function ellipse($str,$n_chars,$crop_str=' [...]')
{
$buff=strip_tags($str);
if(strlen($buff) > $n_chars)
{
$cut_index=strpos($buff,' ',$n_chars);
$buff=substr($buff,0,($cut_index===false? $n_chars: $cut_index+1)).$crop_str;
}
return $buff;
}
if $str is shorter than $n_chars returns it untouched.
If $str is equal to $n_chars returns it as is as well.
if $str is longer than $n_chars then it looks for the next space to cut or (if no more spaces till the end) $str gets cut rudely instead at $n_chars.
NOTE: be aware that this method will remove all tags in case of HTML.

The codeigniter framework contains a helper for this, called the "text helper". Here's some documentation from codeigniter's user guide that applies: http://codeigniter.com/user_guide/helpers/text_helper.html
(just read the word_limiter and character_limiter sections).
Here's two functions from it relevant to your question:
if ( ! function_exists('word_limiter'))
{
function word_limiter($str, $limit = 100, $end_char = '…')
{
if (trim($str) == '')
{
return $str;
}
preg_match('/^\s*+(?:\S++\s*+){1,'.(int) $limit.'}/', $str, $matches);
if (strlen($str) == strlen($matches[0]))
{
$end_char = '';
}
return rtrim($matches[0]).$end_char;
}
}
And
if ( ! function_exists('character_limiter'))
{
function character_limiter($str, $n = 500, $end_char = '…')
{
if (strlen($str) < $n)
{
return $str;
}
$str = preg_replace("/\s+/", ' ', str_replace(array("\r\n", "\r", "\n"), ' ', $str));
if (strlen($str) <= $n)
{
return $str;
}
$out = "";
foreach (explode(' ', trim($str)) as $val)
{
$out .= $val.' ';
if (strlen($out) >= $n)
{
$out = trim($out);
return (strlen($out) == strlen($str)) ? $out : $out.$end_char;
}
}
}
}

if(strlen($text) > 10)
$text = substr($text,0,10) . "...";

Use substring
http://php.net/manual/en/function.substr.php
$foo = substr("abcde",0, 3) . "...";

I'm not sure if this is the fastest solution, but it looks like it is the shortest one:
$result = current(explode("\n", wordwrap($str, $width, "...\n")));
P.S. See some examples here https://stackoverflow.com/a/17852480/131337

This function do the job without breaking words in the middle
function str_trim($str,$char_no){
if(strlen($str)<=$char_no)
return $str;
else{
$all_words=explode(" ",$str);
$out_str='';
foreach ($all_words as $word) {
$temp_str=($out_str=='')?$word:$out_str.' '.$word;
if(strlen($temp_str)>$char_no-3)//-3 for 3 dots
return $out_str."...";
$out_str=$temp_str;
}
}
}

The function I used:
function cutAfter($string, $len = 30, $append = '...') {
return (strlen($string) > $len) ?
substr($string, 0, $len - strlen($append)) . $append :
$string;
}
See it in action.

This is what i do
function cutat($num, $tt){
if (mb_strlen($tt)>$num){
$tt=mb_substr($tt,0,$num-2).'...';
}
return $tt;
}
where $num stands for number of chars, and $tt the string for manipulation.

I developed a function for this use
function str_short($string,$limit)
{
$len=strlen($string);
if($len>$limit)
{
$to_sub=$len-$limit;
$crop_temp=substr($string,0,-$to_sub);
return $crop_len=$crop_temp."...";
}
else
{
return $string;
}
}
you just call the function with string and limite
eg:str_short("hahahahahah",5);
it will cut of your string and add "..." at the end
:)

To create within a function (for repeat usage) and dynamical limited length, use:
function string_length_cutoff($string, $limit, $subtext = '...')
{
return (strlen($string) > $limit) ? substr($string, 0, ($limit-strlen(subtext))).$subtext : $string;
}
// example usage:
echo string_length_cutoff('Michelle Lee Hammontree-Garcia', 26);
// or (for custom substitution text
echo string_length_cutoff('Michelle Lee Hammontree-Garcia', 26, '..');

It's best to abstract you're code like so (notice the limit is optional and defaults to 10):
print limit($string);
function limit($var, $limit=10)
{
if ( strlen($var) > $limit )
{
return substr($string, 0, $limit) . '...';
}
else
{
return $var;
}
}

substr() would be best, you'll also want to check the length of the string first
$str = 'someLongString';
$max = 7;
if(strlen($str) > $max) {
$str = substr($str, 0, $max) . '...';
}
wordwrap won't trim the string down, just split it up...

$width = 10;
$a = preg_replace ("~^(.{{$width}})(.+)~", '\\1…', $a);
or with wordwrap
$a = preg_replace ("~^(.{1,${width}}\b)(.+)~", '\\1…', $a);

this solution will not cut words, it will add three dots after the first space.
I edited #Raccoon29 solution and I replaced all functions with mb_ functions so that this will work for all languages such as arabic
function cut_string($str, $n_chars, $crop_str = '...') {
$buff = strip_tags($str);
if (mb_strlen($buff) > $n_chars) {
$cut_index = mb_strpos($buff, ' ', $n_chars);
$buff = mb_substr($buff, 0, ($cut_index === false ? $n_chars : $cut_index + 1), "UTF-8") . $crop_str;
}
return $buff;
}

$yourString = "bla blaaa bla blllla bla bla";
$out = "";
if(strlen($yourString) > 22) {
while(strlen($yourString) > 22) {
$pos = strrpos($yourString, " ");
if($pos !== false && $pos <= 22) {
$out = substr($yourString,0,$pos);
break;
} else {
$yourString = substr($yourString,0,$pos);
continue;
}
}
} else {
$out = $yourString;
}
echo "Output String: ".$out;

If there is no hard requirement on the length of the truncated string, one can use this to truncate and prevent cutting the last word as well:
$text = "Knowledge is a natural right of every human being of which no one
has the right to deprive him or her under any pretext, except in a case where a
person does something which deprives him or her of that right. It is mere
stupidity to leave its benefits to certain individuals and teams who monopolize
these while the masses provide the facilities and pay the expenses for the
establishment of public sports.";
// we don't want new lines in our preview
$text_only_spaces = preg_replace('/\s+/', ' ', $text);
// truncates the text
$text_truncated = mb_substr($text_only_spaces, 0, mb_strpos($text_only_spaces, " ", 50));
// prevents last word truncation
$preview = trim(mb_substr($text_truncated, 0, mb_strrpos($text_truncated, " ")));
In this case, $preview will be "Knowledge is a natural right of every human being".
Live code example:
http://sandbox.onlinephpfunctions.com/code/25484a8b687d1f5ad93f62082b6379662a6b4713

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Truncate text without truncating HTML - php

Related

PHP substring function returns odd symbol at the end [duplicate]

How to fit text block into div?

Extract HTML-like tags with PHP

optimizing a php function that trims strings

Truncate a string to first n characters of a string and add three dots if any characters are removed

Categories

Resources