optimizing a php function that trims strings

optimizing a php function that trims strings - php

i programmed this php function that takes any text/html string and trims it.
For example:
gen_string("Hello, how are you today?",10);
Returns:
Hello, how...
The problem arises when the function string limit is the same as the position of a special character such as: á, ñ, etc...
In which case:
gen_string("Helló my friend",5);
Returns: Hell�...
Any ideas on how to solve this issue? This is the current function:
# string: advanced substr
function gen_string($string,$min,$clean=false) {
$text = trim(strip_tags($string));
if(strlen($text)>$min) {
$blank = strpos($text,' ');
if($blank) {
# limit plus last word
$extra = strpos(substr($text,$min),' ');
$max = $min+$extra;
$r = substr($text,0,$max);
if(strlen($text)>=$max && !$clean) $r=trim($r,'.').'...';
} else {
# if there are no spaces
$r = substr($text,0,$min).'...';
}
} else {
# if original length is lower than limit
$r = $text;
}
return trim($r);
}
Thanks!

You should use the multibyte string functions to correctly handle unicode characters.
For example you could try using mb_strimwidth to truncate a string to a specified length.

You could also take a different approach and make use of the PCRE regex extension's UTF-8 capabilities (assuming your strings are UTF-8!).
function gen_string($string, $length)
{
$str = trim(strip_tags($string));
$strlen = strlen(utf8_decode($str));
// String is less than limit
if ($strlen <= $length) return $str;
// Shorten string, preserving whole "words" (non-whitespace)
preg_match('/^.{'.($length-1).'}\S*/su', $str, $match);
// Append ellipsis if needed (bytes length is OK to check)
if (strlen($match[0]) !== strlen($str)) $match[0] .= '...';
return $match[0];
}

Aside from the multibyte issue, maybe you can write it shorter
function gen_string($str, $limit) {
if ($str >= strlen($limit))
return $str;
$offset = -(strlen($str) - $limit);
return substr($str, 0, strrpos($str, ' ', $offset)).'...';
}
It will limit the length of the string, so rather than cut it after the first word beyond the limit, it ensures that the length is never larger than the limit.

strlen() cannot be used for UTF-8 string, because it would count also the continuation characters, which should not be counted.
You can try with the following code:
define('PREG_CLASS_UNICODE_WORD_BOUNDARY',
'\x{0}-\x{2F}\x{3A}-\x{40}\x{5B}-\x{60}\x{7B}-\x{A9}\x{AB}-\x{B1}\x{B4}' .
'\x{B6}-\x{B8}\x{BB}\x{BF}\x{D7}\x{F7}\x{2C2}-\x{2C5}\x{2D2}-\x{2DF}' .
'\x{2E5}-\x{2EB}\x{2ED}\x{2EF}-\x{2FF}\x{375}\x{37E}-\x{385}\x{387}\x{3F6}' .
'\x{482}\x{55A}-\x{55F}\x{589}-\x{58A}\x{5BE}\x{5C0}\x{5C3}\x{5C6}' .
'\x{5F3}-\x{60F}\x{61B}-\x{61F}\x{66A}-\x{66D}\x{6D4}\x{6DD}\x{6E9}' .
'\x{6FD}-\x{6FE}\x{700}-\x{70F}\x{7F6}-\x{7F9}\x{830}-\x{83E}' .
'\x{964}-\x{965}\x{970}\x{9F2}-\x{9F3}\x{9FA}-\x{9FB}\x{AF1}\x{B70}' .
'\x{BF3}-\x{BFA}\x{C7F}\x{CF1}-\x{CF2}\x{D79}\x{DF4}\x{E3F}\x{E4F}' .
'\x{E5A}-\x{E5B}\x{F01}-\x{F17}\x{F1A}-\x{F1F}\x{F34}\x{F36}\x{F38}' .
'\x{F3A}-\x{F3D}\x{F85}\x{FBE}-\x{FC5}\x{FC7}-\x{FD8}\x{104A}-\x{104F}' .
'\x{109E}-\x{109F}\x{10FB}\x{1360}-\x{1368}\x{1390}-\x{1399}\x{1400}' .
'\x{166D}-\x{166E}\x{1680}\x{169B}-\x{169C}\x{16EB}-\x{16ED}' .
'\x{1735}-\x{1736}\x{17B4}-\x{17B5}\x{17D4}-\x{17D6}\x{17D8}-\x{17DB}' .
'\x{1800}-\x{180A}\x{180E}\x{1940}-\x{1945}\x{19DE}-\x{19FF}' .
'\x{1A1E}-\x{1A1F}\x{1AA0}-\x{1AA6}\x{1AA8}-\x{1AAD}\x{1B5A}-\x{1B6A}' .
'\x{1B74}-\x{1B7C}\x{1C3B}-\x{1C3F}\x{1C7E}-\x{1C7F}\x{1CD3}\x{1FBD}' .
'\x{1FBF}-\x{1FC1}\x{1FCD}-\x{1FCF}\x{1FDD}-\x{1FDF}\x{1FED}-\x{1FEF}' .
'\x{1FFD}-\x{206F}\x{207A}-\x{207E}\x{208A}-\x{208E}\x{20A0}-\x{20B8}' .
'\x{2100}-\x{2101}\x{2103}-\x{2106}\x{2108}-\x{2109}\x{2114}' .
'\x{2116}-\x{2118}\x{211E}-\x{2123}\x{2125}\x{2127}\x{2129}\x{212E}' .
'\x{213A}-\x{213B}\x{2140}-\x{2144}\x{214A}-\x{214D}\x{214F}' .
'\x{2190}-\x{244A}\x{249C}-\x{24E9}\x{2500}-\x{2775}\x{2794}-\x{2B59}' .
'\x{2CE5}-\x{2CEA}\x{2CF9}-\x{2CFC}\x{2CFE}-\x{2CFF}\x{2E00}-\x{2E2E}' .
'\x{2E30}-\x{3004}\x{3008}-\x{3020}\x{3030}\x{3036}-\x{3037}' .
'\x{303D}-\x{303F}\x{309B}-\x{309C}\x{30A0}\x{30FB}\x{3190}-\x{3191}' .
'\x{3196}-\x{319F}\x{31C0}-\x{31E3}\x{3200}-\x{321E}\x{322A}-\x{3250}' .
'\x{3260}-\x{327F}\x{328A}-\x{32B0}\x{32C0}-\x{33FF}\x{4DC0}-\x{4DFF}' .
'\x{A490}-\x{A4C6}\x{A4FE}-\x{A4FF}\x{A60D}-\x{A60F}\x{A673}\x{A67E}' .
'\x{A6F2}-\x{A716}\x{A720}-\x{A721}\x{A789}-\x{A78A}\x{A828}-\x{A82B}' .
'\x{A836}-\x{A839}\x{A874}-\x{A877}\x{A8CE}-\x{A8CF}\x{A8F8}-\x{A8FA}' .
'\x{A92E}-\x{A92F}\x{A95F}\x{A9C1}-\x{A9CD}\x{A9DE}-\x{A9DF}' .
'\x{AA5C}-\x{AA5F}\x{AA77}-\x{AA79}\x{AADE}-\x{AADF}\x{ABEB}' .
'\x{D800}-\x{F8FF}\x{FB29}\x{FD3E}-\x{FD3F}\x{FDFC}-\x{FDFD}' .
'\x{FE10}-\x{FE19}\x{FE30}-\x{FE6B}\x{FEFF}-\x{FF0F}\x{FF1A}-\x{FF20}' .
'\x{FF3B}-\x{FF40}\x{FF5B}-\x{FF65}\x{FFE0}-\x{FFFD}');
function utf8_strlen($text) {
if (function_exists('mb_strlen')) {
return mb_strlen($text);
}
// Do not count UTF-8 continuation bytes.
return strlen(preg_replace("/[\x80-\xBF]/", '', $text));
}
function utf8_truncate($string, $max_length, $wordsafe = FALSE, $add_ellipsis = FALSE, $min_wordsafe_length = 1) {
$ellipsis = '';
$max_length = max($max_length, 0);
$min_wordsafe_length = max($min_wordsafe_length, 0);
if (utf8_strlen($string) <= $max_length) {
// No truncation needed, so don't add ellipsis, just return.
return $string;
}
if ($add_ellipsis) {
// Truncate ellipsis in case $max_length is small.
$ellipsis = utf8_substr('...', 0, $max_length);
$max_length -= utf8_strlen($ellipsis);
$max_length = max($max_length, 0);
}
if ($max_length <= $min_wordsafe_length) {
// Do not attempt word-safe if lengths are bad.
$wordsafe = FALSE;
}
if ($wordsafe) {
$matches = array();
// Find the last word boundary, if there is one within $min_wordsafe_length
// to $max_length characters. preg_match() is always greedy, so it will
// find the longest string possible.
$found = preg_match('/^(.{' . $min_wordsafe_length . ',' . $max_length . '})[' . PREG_CLASS_UNICODE_WORD_BOUNDARY . ']/u', $string, $matches);
if ($found) {
$string = $matches[1];
}
else {
$string = utf8_substr($string, 0, $max_length);
}
}
else {
$string = utf8_substr($string, 0, $max_length);
}
if ($add_ellipsis) {
$string .= $ellipsis;
}
return $string;
}
function utf8_substr($text, $start, $length = NULL) {
if (function_exists('mb_substr')) {
return $length === NULL ? mb_substr($text, $start) : mb_substr($text, $start, $length);
}
else {
$strlen = strlen($text);
// Find the starting byte offset.
$bytes = 0;
if ($start > 0) {
// Count all the continuation bytes from the start until we have found
// $start characters or the end of the string.
$bytes = -1;
$chars = -1;
while ($bytes < $strlen - 1 && $chars < $start) {
$bytes++;
$c = ord($text[$bytes]);
if ($c < 0x80 || $c >= 0xC0) {
$chars++;
}
}
}
elseif ($start < 0) {
// Count all the continuation bytes from the end until we have found
// abs($start) characters.
$start = abs($start);
$bytes = $strlen;
$chars = 0;
while ($bytes > 0 && $chars < $start) {
$bytes--;
$c = ord($text[$bytes]);
if ($c < 0x80 || $c >= 0xC0) {
$chars++;
}
}
}
$istart = $bytes;
// Find the ending byte offset.
if ($length === NULL) {
$iend = $strlen;
}
elseif ($length > 0) {
// Count all the continuation bytes from the starting index until we have
// found $length characters or reached the end of the string, then
// backtrace one byte.
$iend = $istart - 1;
$chars = -1;
$last_real = FALSE;
while ($iend < $strlen - 1 && $chars < $length) {
$iend++;
$c = ord($text[$iend]);
$last_real = FALSE;
if ($c < 0x80 || $c >= 0xC0) {
$chars++;
$last_real = TRUE;
}
}
// Backtrace one byte if the last character we found was a real character
// and we don't need it.
if ($last_real && $chars >= $length) {
$iend--;
}
}
elseif ($length < 0) {
// Count all the continuation bytes from the end until we have found
// abs($start) characters, then backtrace one byte.
$length = abs($length);
$iend = $strlen;
$chars = 0;
while ($iend > 0 && $chars < $length) {
$iend--;
$c = ord($text[$iend]);
if ($c < 0x80 || $c >= 0xC0) {
$chars++;
}
}
// Backtrace one byte if we are not at the beginning of the string.
if ($iend > 0) {
$iend--;
}
}
else {
// $length == 0, return an empty string.
return '';
}
return substr($text, $istart, max(0, $iend - $istart + 1));
}
}

For your return statement you could try:
return htmlspecialchars(trim($r));
EDIT: I tried your code as you provided it and it ran fine for me without having to use htmlspecialchars(). This is probably due to the face that in the <head> of the page the code was running on, the charset was set to UTF-8. So your options could be to set the encoding of the page like this:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
or to use htmlspecialchars() as above.

Related

PHP substring function returns odd symbol at the end [duplicate]

How can I get the first n characters of a string in PHP? What's the fastest way to trim a string to a specific number of characters, and append '...' if needed?

//The simple version for 10 Characters from the beginning of the string
$string = substr($string,0,10).'...';
Update:
Based on suggestion for checking length (and also ensuring similar lengths on trimmed and untrimmed strings):
$string = (strlen($string) > 13) ? substr($string,0,10).'...' : $string;
So you will get a string of max 13 characters; either 13 (or less) normal characters or 10 characters followed by '...'
Update 2:
Or as function:
function truncate($string, $length, $dots = "...") {
return (strlen($string) > $length) ? substr($string, 0, $length - strlen($dots)) . $dots : $string;
}
Update 3:
It's been a while since I wrote this answer and I don't actually use this code any more. I prefer this function which prevents breaking the string in the middle of a word using the wordwrap function:
function truncate($string,$length=100,$append="…") {
$string = trim($string);
if(strlen($string) > $length) {
$string = wordwrap($string, $length);
$string = explode("\n", $string, 2);
$string = $string[0] . $append;
}
return $string;
}

This functionality has been built into PHP since version 4.0.6. See the docs.
echo mb_strimwidth('Hello World', 0, 10, '...');
// outputs Hello W...
Note that the trimmarker (the ellipsis above) are included in the truncated length.

The Multibyte extension can come in handy if you need control over the string charset.
$charset = 'UTF-8';
$length = 10;
$string = 'Hai to yoo! I like yoo soo!';
if(mb_strlen($string, $charset) > $length) {
$string = mb_substr($string, 0, $length - 3, $charset) . '...';
}

sometimes, you need to limit the string to the last complete word ie: you don't want the last word to be broken instead you stop with the second last word.
eg:
we need to limit "This is my String" to 6 chars but instead of 'This i..." we want it to be 'This..." ie we will skip that broken letters in the last word.
phew, am bad at explaining, here is the code.
class Fun {
public function limit_text($text, $len) {
if (strlen($text) < $len) {
return $text;
}
$text_words = explode(' ', $text);
$out = null;
foreach ($text_words as $word) {
if ((strlen($word) > $len) && $out == null) {
return substr($word, 0, $len) . "...";
}
if ((strlen($out) + strlen($word)) > $len) {
return $out . "...";
}
$out.=" " . $word;
}
return $out;
}
}

If you want to cut being careful to don't split words you can do the following
function ellipse($str,$n_chars,$crop_str=' [...]')
{
$buff=strip_tags($str);
if(strlen($buff) > $n_chars)
{
$cut_index=strpos($buff,' ',$n_chars);
$buff=substr($buff,0,($cut_index===false? $n_chars: $cut_index+1)).$crop_str;
}
return $buff;
}
if $str is shorter than $n_chars returns it untouched.
If $str is equal to $n_chars returns it as is as well.
if $str is longer than $n_chars then it looks for the next space to cut or (if no more spaces till the end) $str gets cut rudely instead at $n_chars.
NOTE: be aware that this method will remove all tags in case of HTML.

The codeigniter framework contains a helper for this, called the "text helper". Here's some documentation from codeigniter's user guide that applies: http://codeigniter.com/user_guide/helpers/text_helper.html
(just read the word_limiter and character_limiter sections).
Here's two functions from it relevant to your question:
if ( ! function_exists('word_limiter'))
{
function word_limiter($str, $limit = 100, $end_char = '…')
{
if (trim($str) == '')
{
return $str;
}
preg_match('/^\s*+(?:\S++\s*+){1,'.(int) $limit.'}/', $str, $matches);
if (strlen($str) == strlen($matches[0]))
{
$end_char = '';
}
return rtrim($matches[0]).$end_char;
}
}
And
if ( ! function_exists('character_limiter'))
{
function character_limiter($str, $n = 500, $end_char = '…')
{
if (strlen($str) < $n)
{
return $str;
}
$str = preg_replace("/\s+/", ' ', str_replace(array("\r\n", "\r", "\n"), ' ', $str));
if (strlen($str) <= $n)
{
return $str;
}
$out = "";
foreach (explode(' ', trim($str)) as $val)
{
$out .= $val.' ';
if (strlen($out) >= $n)
{
$out = trim($out);
return (strlen($out) == strlen($str)) ? $out : $out.$end_char;
}
}
}
}

if(strlen($text) > 10)
$text = substr($text,0,10) . "...";

Use substring
http://php.net/manual/en/function.substr.php
$foo = substr("abcde",0, 3) . "...";

I'm not sure if this is the fastest solution, but it looks like it is the shortest one:
$result = current(explode("\n", wordwrap($str, $width, "...\n")));
P.S. See some examples here https://stackoverflow.com/a/17852480/131337

This function do the job without breaking words in the middle
function str_trim($str,$char_no){
if(strlen($str)<=$char_no)
return $str;
else{
$all_words=explode(" ",$str);
$out_str='';
foreach ($all_words as $word) {
$temp_str=($out_str=='')?$word:$out_str.' '.$word;
if(strlen($temp_str)>$char_no-3)//-3 for 3 dots
return $out_str."...";
$out_str=$temp_str;
}
}
}

The function I used:
function cutAfter($string, $len = 30, $append = '...') {
return (strlen($string) > $len) ?
substr($string, 0, $len - strlen($append)) . $append :
$string;
}
See it in action.

This is what i do
function cutat($num, $tt){
if (mb_strlen($tt)>$num){
$tt=mb_substr($tt,0,$num-2).'...';
}
return $tt;
}
where $num stands for number of chars, and $tt the string for manipulation.

I developed a function for this use
function str_short($string,$limit)
{
$len=strlen($string);
if($len>$limit)
{
$to_sub=$len-$limit;
$crop_temp=substr($string,0,-$to_sub);
return $crop_len=$crop_temp."...";
}
else
{
return $string;
}
}
you just call the function with string and limite
eg:str_short("hahahahahah",5);
it will cut of your string and add "..." at the end
:)

To create within a function (for repeat usage) and dynamical limited length, use:
function string_length_cutoff($string, $limit, $subtext = '...')
{
return (strlen($string) > $limit) ? substr($string, 0, ($limit-strlen(subtext))).$subtext : $string;
}
// example usage:
echo string_length_cutoff('Michelle Lee Hammontree-Garcia', 26);
// or (for custom substitution text
echo string_length_cutoff('Michelle Lee Hammontree-Garcia', 26, '..');

It's best to abstract you're code like so (notice the limit is optional and defaults to 10):
print limit($string);
function limit($var, $limit=10)
{
if ( strlen($var) > $limit )
{
return substr($string, 0, $limit) . '...';
}
else
{
return $var;
}
}

substr() would be best, you'll also want to check the length of the string first
$str = 'someLongString';
$max = 7;
if(strlen($str) > $max) {
$str = substr($str, 0, $max) . '...';
}
wordwrap won't trim the string down, just split it up...

$width = 10;
$a = preg_replace ("~^(.{{$width}})(.+)~", '\\1…', $a);
or with wordwrap
$a = preg_replace ("~^(.{1,${width}}\b)(.+)~", '\\1…', $a);

this solution will not cut words, it will add three dots after the first space.
I edited #Raccoon29 solution and I replaced all functions with mb_ functions so that this will work for all languages such as arabic
function cut_string($str, $n_chars, $crop_str = '...') {
$buff = strip_tags($str);
if (mb_strlen($buff) > $n_chars) {
$cut_index = mb_strpos($buff, ' ', $n_chars);
$buff = mb_substr($buff, 0, ($cut_index === false ? $n_chars : $cut_index + 1), "UTF-8") . $crop_str;
}
return $buff;
}

$yourString = "bla blaaa bla blllla bla bla";
$out = "";
if(strlen($yourString) > 22) {
while(strlen($yourString) > 22) {
$pos = strrpos($yourString, " ");
if($pos !== false && $pos <= 22) {
$out = substr($yourString,0,$pos);
break;
} else {
$yourString = substr($yourString,0,$pos);
continue;
}
}
} else {
$out = $yourString;
}
echo "Output String: ".$out;

If there is no hard requirement on the length of the truncated string, one can use this to truncate and prevent cutting the last word as well:
$text = "Knowledge is a natural right of every human being of which no one
has the right to deprive him or her under any pretext, except in a case where a
person does something which deprives him or her of that right. It is mere
stupidity to leave its benefits to certain individuals and teams who monopolize
these while the masses provide the facilities and pay the expenses for the
establishment of public sports.";
// we don't want new lines in our preview
$text_only_spaces = preg_replace('/\s+/', ' ', $text);
// truncates the text
$text_truncated = mb_substr($text_only_spaces, 0, mb_strpos($text_only_spaces, " ", 50));
// prevents last word truncation
$preview = trim(mb_substr($text_truncated, 0, mb_strrpos($text_truncated, " ")));
In this case, $preview will be "Knowledge is a natural right of every human being".
Live code example:
http://sandbox.onlinephpfunctions.com/code/25484a8b687d1f5ad93f62082b6379662a6b4713

SQL and PHP: Is it possible to convert a string to binary without using SQL function [duplicate]

I got the problem when convert between this 2 type in PHP. This is the code I searched in google
function strToHex($string){
$hex='';
for ($i=0; $i < strlen($string); $i++){
$hex .= dechex(ord($string[$i]));
}
return $hex;
}
function hexToStr($hex){
$string='';
for ($i=0; $i < strlen($hex)-1; $i+=2){
$string .= chr(hexdec($hex[$i].$hex[$i+1]));
}
return $string;
}
I check it and found out this when I use XOR to encrypt.
I have the string "this is the test", after XOR with a key, I have the result in string ↕↑↔§P↔§P ♫§T↕§↕. After that, I tried to convert it to hex by function strToHex() and I got these 12181d15501d15500e15541215712. Then, I tested with the function hexToStr() and I have ↕↑↔§P↔§P♫§T↕§q. So, what should I do to solve this problem? Why does it wrong when I convert this 2 style value?

For people that end up here and are just looking for the hex representation of a (binary) string.
bin2hex("that's all you need");
# 74686174277320616c6c20796f75206e656564
hex2bin('74686174277320616c6c20796f75206e656564');
# that's all you need
Doc: bin2hex, hex2bin.

For any char with ord($char) < 16 you get a HEX back which is only 1 long. You forgot to add 0 padding.
This should solve it:
<?php
function strToHex($string){
$hex = '';
for ($i=0; $i<strlen($string); $i++){
$ord = ord($string[$i]);
$hexCode = dechex($ord);
$hex .= substr('0'.$hexCode, -2);
}
return strToUpper($hex);
}
function hexToStr($hex){
$string='';
for ($i=0; $i < strlen($hex)-1; $i+=2){
$string .= chr(hexdec($hex[$i].$hex[$i+1]));
}
return $string;
}
// Tests
header('Content-Type: text/plain');
function test($expected, $actual, $success) {
if($expected !== $actual) {
echo "Expected: '$expected'\n";
echo "Actual: '$actual'\n";
echo "\n";
$success = false;
}
return $success;
}
$success = true;
$success = test('00', strToHex(hexToStr('00')), $success);
$success = test('FF', strToHex(hexToStr('FF')), $success);
$success = test('000102FF', strToHex(hexToStr('000102FF')), $success);
$success = test('↕↑↔§P↔§P ♫§T↕§↕', hexToStr(strToHex('↕↑↔§P↔§P ♫§T↕§↕')), $success);
echo $success ? "Success" : "\nFailed";

PHP :
string to hex:
implode(unpack("H*", $string));
hex to string:
pack("H*", $hex);

Here's what I use:
function strhex($string) {
$hexstr = unpack('H*', $string);
return array_shift($hexstr);
}

function hexToStr($hex){
// Remove spaces if the hex string has spaces
$hex = str_replace(' ', '', $hex);
return hex2bin($hex);
}
// Test it
$hex = "53 44 43 30 30 32 30 30 30 31 37 33";
echo hexToStr($hex); // SDC002000173
/**
* Test Hex To string with PHP UNIT
* #param string $value
* #return
*/
public function testHexToString()
{
$string = 'SDC002000173';
$hex = "53 44 43 30 30 32 30 30 30 31 37 33";
$result = hexToStr($hex);
$this->assertEquals($result,$string);
}

Using #bill-shirley answer with a little addition
function str_to_hex($string) {
$hexstr = unpack('H*', $string);
return array_shift($hexstr);
}
function hex_to_str($string) {
return hex2bin("$string");
}
Usage:
$str = "Go placidly amidst the noise";
$hexstr = str_to_hex($str);// 476f20706c616369646c7920616d6964737420746865206e6f697365
$strstr = hex_to_str($str);// Go placidly amidst the noise

You can try the following code to convert the image to hex string
<?php
$image = 'sample.bmp';
$file = fopen($image, 'r') or die("Could not open $image");
while ($file && !feof($file)){
$chunk = fread($file, 1000000); # You can affect performance altering
this number. YMMV.
# This loop will be dog-slow, almost for sure...
# You could snag two or three bytes and shift/add them,
# but at 4 bytes, you violate the 7fffffff limit of dechex...
# You could maybe write a better dechex that would accept multiple bytes
# and use substr... Maybe.
for ($byte = 0; $byte < strlen($chunk); $byte++)){
echo dechex(ord($chunk[$byte]));
}
}
?>

I only have half the answer, but I hope that it is useful as it adds unicode (utf-8) support
/**
* hexadecimal to unicode character
* #param string $hex
* #return string
*/
function hex2uni($hex) {
$dec = hexdec($hex);
if($dec < 128) {
return chr($dec);
}
if($dec < 2048) {
$utf = chr(192 + (($dec - ($dec % 64)) / 64));
} else {
$utf = chr(224 + (($dec - ($dec % 4096)) / 4096));
$utf .= chr(128 + ((($dec % 4096) - ($dec % 64)) / 64));
}
return $utf . chr(128 + ($dec % 64));
}
To string
var_dump(hex2uni('e641'));
Based on: http://www.php.net/manual/en/function.chr.php#Hcom55978

getting a substring(charset independent) between given two offsets

I just want to know if there is any built in php function where I can get a substring between given two keywords (keyword1 and keyword2). Note that keywords may repeat in the string so I must be able to get the substring between xth keyword1 and yth keyword2. Moreover, I mainly use unicode characters so the function should be charset independen.
Please help me out to handle this problem.
E.g. $string=This is their cat with a hat in the theater.
$keyword1="is"; $keyword2="the";
Task: how to get substring between 2nd occurance of "is" and 3nd occurance of "the" in the given string above.
Answer: " the cat with a hat in the "

You can use regular expressions:
$string = "This is their cat with a hat in the theater";
$regex1 = "/.*? is |^is/";
$regex2 = "/ the .*| the$/";
echo preg_replace($regex1, '', preg_replace($regex2, ' the', $string));
EDIT Here is more generic code:
function find($text, $str, $offset) {
$len = strlen($text);
$search_len = strlen($str);
$count = 0;
for ($i=0; $i<$len; ++$i) {
if (substr($text, $i, $search_len) == $str) {
if (++$count == $offset) {
return $i;
}
}
}
return -1;
}
function between($text, $word1, $offset1, $word2, $offset2) {
$start = find($text, $word1, $offset1);
$end = find($text, $word2, $offset2);
if ($start != -1 && $end != -1) {
return substr($text, $start + strlen($word1), $end-$start-strlen($word2));
} else {
return '';
}
}
$string = "This is their cat with a hat in the theater";
echo between($string, 'is', 2, 'the', 3);
echo between($string, 'at', 1, 'at', 3);

Combination of following two functions work for any string including unicode characters:
//Gets the position of a given substring with its offset;
function strposOffset($string, $search, $offset)
{
/*** explode the string ***/
$arr = explode($search, $string);
/*** check the search is not out of bounds ***/
switch( $offset )
{
case $offset == 0:
return false;
break;
case $offset > max(array_keys($arr)):
return false;
break;
default:
return mb_strlen(implode($search, array_slice($arr, 0, $offset)), "utf-8");
}
} //Source: www.phpro.org
//Extracts a substring between given two given substrings with their offsets.
function extractMiddleSubstr($string, $substr1, $offset1, $substr2, $offset2){
$strlen_substr1 = mb_strlen($substr1, "utf-8"); //length of substr1;
$strpos_substr1 = strposOffset($string, $substr1, $offset1); //position of substr1;
$strpos_substr2 = strposOffset($string, $substr2, $offset2); //position of substr2;
if($strpos_substr1!==null && $strpos_substr2!==null && $strpos_substr1!==false && $strpos_substr2!==false){
if($strpos_substr1<=$strpos_substr2){
$strpos_substr = $strlen_substr1+$strpos_substr1; //position of substr;
$strlen_substr = $strpos_substr2-$strpos_substr; //length of substr;
$substr = mb_substr($string, $strpos_substr, $strlen_substr, "utf-8"); //substr;
$substr = trim($substr); // removes whitespaces;
return $substr;
}else{
return false;
}
}else{
return false;
}
}

Truncate text without truncating HTML

This string has 78 characters with HTML and 39 characters without HTML:
<p>I really like the Google search engine.</p>
I want to truncate this string based on the non-HTML character count, so for example if I wanted to truncate the above string to 24 characters, the output would be:
I really like the Google
The truncation did not take into account the html when determining the number of characters to cut off, it only considered the stripped count. However, it didn't leave open HTML tags.

Alright so this is what I put together and it seems to be working:
function truncate_html($string, $length, $postfix = '…', $isHtml = true) {
$string = trim($string);
$postfix = (strlen(strip_tags($string)) > $length) ? $postfix : '';
$i = 0;
$tags = []; // change to array() if php version < 5.4
if($isHtml) {
preg_match_all('/<[^>]+>([^<]*)/', $string, $tagMatches, PREG_OFFSET_CAPTURE | PREG_SET_ORDER);
foreach($tagMatches as $tagMatch) {
if ($tagMatch[0][1] - $i >= $length) {
break;
}
$tag = substr(strtok($tagMatch[0][0], " \t\n\r\0\x0B>"), 1);
if ($tag[0] != '/') {
$tags[] = $tag;
}
elseif (end($tags) == substr($tag, 1)) {
array_pop($tags);
}
$i += $tagMatch[1][1] - $tagMatch[0][1];
}
}
return substr($string, 0, $length = min(strlen($string), $length + $i)) . (count($tags = array_reverse($tags)) ? '</' . implode('></', $tags) . '>' : '') . $postfix;
}
Usage:
truncate_html('<p>I really like the Google search engine.</p>', 24);
The function was grabbed from (made a small modification):
http://www.dzone.com/snippets/truncate-text-preserving-html

Truncate a string to first n characters of a string and add three dots if any characters are removed

How can I get the first n characters of a string in PHP? What's the fastest way to trim a string to a specific number of characters, and append '...' if needed?

//The simple version for 10 Characters from the beginning of the string
$string = substr($string,0,10).'...';
Update:
Based on suggestion for checking length (and also ensuring similar lengths on trimmed and untrimmed strings):
$string = (strlen($string) > 13) ? substr($string,0,10).'...' : $string;
So you will get a string of max 13 characters; either 13 (or less) normal characters or 10 characters followed by '...'
Update 2:
Or as function:
function truncate($string, $length, $dots = "...") {
return (strlen($string) > $length) ? substr($string, 0, $length - strlen($dots)) . $dots : $string;
}
Update 3:
It's been a while since I wrote this answer and I don't actually use this code any more. I prefer this function which prevents breaking the string in the middle of a word using the wordwrap function:
function truncate($string,$length=100,$append="…") {
$string = trim($string);
if(strlen($string) > $length) {
$string = wordwrap($string, $length);
$string = explode("\n", $string, 2);
$string = $string[0] . $append;
}
return $string;
}

This functionality has been built into PHP since version 4.0.6. See the docs.
echo mb_strimwidth('Hello World', 0, 10, '...');
// outputs Hello W...
Note that the trimmarker (the ellipsis above) are included in the truncated length.

The Multibyte extension can come in handy if you need control over the string charset.
$charset = 'UTF-8';
$length = 10;
$string = 'Hai to yoo! I like yoo soo!';
if(mb_strlen($string, $charset) > $length) {
$string = mb_substr($string, 0, $length - 3, $charset) . '...';
}

sometimes, you need to limit the string to the last complete word ie: you don't want the last word to be broken instead you stop with the second last word.
eg:
we need to limit "This is my String" to 6 chars but instead of 'This i..." we want it to be 'This..." ie we will skip that broken letters in the last word.
phew, am bad at explaining, here is the code.
class Fun {
public function limit_text($text, $len) {
if (strlen($text) < $len) {
return $text;
}
$text_words = explode(' ', $text);
$out = null;
foreach ($text_words as $word) {
if ((strlen($word) > $len) && $out == null) {
return substr($word, 0, $len) . "...";
}
if ((strlen($out) + strlen($word)) > $len) {
return $out . "...";
}
$out.=" " . $word;
}
return $out;
}
}

If you want to cut being careful to don't split words you can do the following
function ellipse($str,$n_chars,$crop_str=' [...]')
{
$buff=strip_tags($str);
if(strlen($buff) > $n_chars)
{
$cut_index=strpos($buff,' ',$n_chars);
$buff=substr($buff,0,($cut_index===false? $n_chars: $cut_index+1)).$crop_str;
}
return $buff;
}
if $str is shorter than $n_chars returns it untouched.
If $str is equal to $n_chars returns it as is as well.
if $str is longer than $n_chars then it looks for the next space to cut or (if no more spaces till the end) $str gets cut rudely instead at $n_chars.
NOTE: be aware that this method will remove all tags in case of HTML.

The codeigniter framework contains a helper for this, called the "text helper". Here's some documentation from codeigniter's user guide that applies: http://codeigniter.com/user_guide/helpers/text_helper.html
(just read the word_limiter and character_limiter sections).
Here's two functions from it relevant to your question:
if ( ! function_exists('word_limiter'))
{
function word_limiter($str, $limit = 100, $end_char = '…')
{
if (trim($str) == '')
{
return $str;
}
preg_match('/^\s*+(?:\S++\s*+){1,'.(int) $limit.'}/', $str, $matches);
if (strlen($str) == strlen($matches[0]))
{
$end_char = '';
}
return rtrim($matches[0]).$end_char;
}
}
And
if ( ! function_exists('character_limiter'))
{
function character_limiter($str, $n = 500, $end_char = '…')
{
if (strlen($str) < $n)
{
return $str;
}
$str = preg_replace("/\s+/", ' ', str_replace(array("\r\n", "\r", "\n"), ' ', $str));
if (strlen($str) <= $n)
{
return $str;
}
$out = "";
foreach (explode(' ', trim($str)) as $val)
{
$out .= $val.' ';
if (strlen($out) >= $n)
{
$out = trim($out);
return (strlen($out) == strlen($str)) ? $out : $out.$end_char;
}
}
}
}

if(strlen($text) > 10)
$text = substr($text,0,10) . "...";

Use substring
http://php.net/manual/en/function.substr.php
$foo = substr("abcde",0, 3) . "...";

I'm not sure if this is the fastest solution, but it looks like it is the shortest one:
$result = current(explode("\n", wordwrap($str, $width, "...\n")));
P.S. See some examples here https://stackoverflow.com/a/17852480/131337

This function do the job without breaking words in the middle
function str_trim($str,$char_no){
if(strlen($str)<=$char_no)
return $str;
else{
$all_words=explode(" ",$str);
$out_str='';
foreach ($all_words as $word) {
$temp_str=($out_str=='')?$word:$out_str.' '.$word;
if(strlen($temp_str)>$char_no-3)//-3 for 3 dots
return $out_str."...";
$out_str=$temp_str;
}
}
}

The function I used:
function cutAfter($string, $len = 30, $append = '...') {
return (strlen($string) > $len) ?
substr($string, 0, $len - strlen($append)) . $append :
$string;
}
See it in action.

This is what i do
function cutat($num, $tt){
if (mb_strlen($tt)>$num){
$tt=mb_substr($tt,0,$num-2).'...';
}
return $tt;
}
where $num stands for number of chars, and $tt the string for manipulation.

I developed a function for this use
function str_short($string,$limit)
{
$len=strlen($string);
if($len>$limit)
{
$to_sub=$len-$limit;
$crop_temp=substr($string,0,-$to_sub);
return $crop_len=$crop_temp."...";
}
else
{
return $string;
}
}
you just call the function with string and limite
eg:str_short("hahahahahah",5);
it will cut of your string and add "..." at the end
:)

To create within a function (for repeat usage) and dynamical limited length, use:
function string_length_cutoff($string, $limit, $subtext = '...')
{
return (strlen($string) > $limit) ? substr($string, 0, ($limit-strlen(subtext))).$subtext : $string;
}
// example usage:
echo string_length_cutoff('Michelle Lee Hammontree-Garcia', 26);
// or (for custom substitution text
echo string_length_cutoff('Michelle Lee Hammontree-Garcia', 26, '..');

It's best to abstract you're code like so (notice the limit is optional and defaults to 10):
print limit($string);
function limit($var, $limit=10)
{
if ( strlen($var) > $limit )
{
return substr($string, 0, $limit) . '...';
}
else
{
return $var;
}
}

substr() would be best, you'll also want to check the length of the string first
$str = 'someLongString';
$max = 7;
if(strlen($str) > $max) {
$str = substr($str, 0, $max) . '...';
}
wordwrap won't trim the string down, just split it up...

$width = 10;
$a = preg_replace ("~^(.{{$width}})(.+)~", '\\1…', $a);
or with wordwrap
$a = preg_replace ("~^(.{1,${width}}\b)(.+)~", '\\1…', $a);

this solution will not cut words, it will add three dots after the first space.
I edited #Raccoon29 solution and I replaced all functions with mb_ functions so that this will work for all languages such as arabic
function cut_string($str, $n_chars, $crop_str = '...') {
$buff = strip_tags($str);
if (mb_strlen($buff) > $n_chars) {
$cut_index = mb_strpos($buff, ' ', $n_chars);
$buff = mb_substr($buff, 0, ($cut_index === false ? $n_chars : $cut_index + 1), "UTF-8") . $crop_str;
}
return $buff;
}

$yourString = "bla blaaa bla blllla bla bla";
$out = "";
if(strlen($yourString) > 22) {
while(strlen($yourString) > 22) {
$pos = strrpos($yourString, " ");
if($pos !== false && $pos <= 22) {
$out = substr($yourString,0,$pos);
break;
} else {
$yourString = substr($yourString,0,$pos);
continue;
}
}
} else {
$out = $yourString;
}
echo "Output String: ".$out;

If there is no hard requirement on the length of the truncated string, one can use this to truncate and prevent cutting the last word as well:
$text = "Knowledge is a natural right of every human being of which no one
has the right to deprive him or her under any pretext, except in a case where a
person does something which deprives him or her of that right. It is mere
stupidity to leave its benefits to certain individuals and teams who monopolize
these while the masses provide the facilities and pay the expenses for the
establishment of public sports.";
// we don't want new lines in our preview
$text_only_spaces = preg_replace('/\s+/', ' ', $text);
// truncates the text
$text_truncated = mb_substr($text_only_spaces, 0, mb_strpos($text_only_spaces, " ", 50));
// prevents last word truncation
$preview = trim(mb_substr($text_truncated, 0, mb_strrpos($text_truncated, " ")));
In this case, $preview will be "Knowledge is a natural right of every human being".
Live code example:
http://sandbox.onlinephpfunctions.com/code/25484a8b687d1f5ad93f62082b6379662a6b4713

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

optimizing a php function that trims strings - php

You should use the multibyte string functions to correctly handle unicode characters. For example you could try using mb_strimwidth to truncate a string to a specified length.

Related

PHP substring function returns odd symbol at the end [duplicate]

SQL and PHP: Is it possible to convert a string to binary without using SQL function [duplicate]

getting a substring(charset independent) between given two offsets

Truncate text without truncating HTML

Truncate a string to first n characters of a string and add three dots if any characters are removed

Categories

Resources