getting a substring(charset independent) between given two offsets

getting a substring(charset independent) between given two offsets - php

I just want to know if there is any built in php function where I can get a substring between given two keywords (keyword1 and keyword2). Note that keywords may repeat in the string so I must be able to get the substring between xth keyword1 and yth keyword2. Moreover, I mainly use unicode characters so the function should be charset independen.
Please help me out to handle this problem.
E.g. $string=This is their cat with a hat in the theater.
$keyword1="is"; $keyword2="the";
Task: how to get substring between 2nd occurance of "is" and 3nd occurance of "the" in the given string above.
Answer: " the cat with a hat in the "

You can use regular expressions:
$string = "This is their cat with a hat in the theater";
$regex1 = "/.*? is |^is/";
$regex2 = "/ the .*| the$/";
echo preg_replace($regex1, '', preg_replace($regex2, ' the', $string));
EDIT Here is more generic code:
function find($text, $str, $offset) {
$len = strlen($text);
$search_len = strlen($str);
$count = 0;
for ($i=0; $i<$len; ++$i) {
if (substr($text, $i, $search_len) == $str) {
if (++$count == $offset) {
return $i;
}
}
}
return -1;
}
function between($text, $word1, $offset1, $word2, $offset2) {
$start = find($text, $word1, $offset1);
$end = find($text, $word2, $offset2);
if ($start != -1 && $end != -1) {
return substr($text, $start + strlen($word1), $end-$start-strlen($word2));
} else {
return '';
}
}
$string = "This is their cat with a hat in the theater";
echo between($string, 'is', 2, 'the', 3);
echo between($string, 'at', 1, 'at', 3);

Combination of following two functions work for any string including unicode characters:
//Gets the position of a given substring with its offset;
function strposOffset($string, $search, $offset)
{
/*** explode the string ***/
$arr = explode($search, $string);
/*** check the search is not out of bounds ***/
switch( $offset )
{
case $offset == 0:
return false;
break;
case $offset > max(array_keys($arr)):
return false;
break;
default:
return mb_strlen(implode($search, array_slice($arr, 0, $offset)), "utf-8");
}
} //Source: www.phpro.org
//Extracts a substring between given two given substrings with their offsets.
function extractMiddleSubstr($string, $substr1, $offset1, $substr2, $offset2){
$strlen_substr1 = mb_strlen($substr1, "utf-8"); //length of substr1;
$strpos_substr1 = strposOffset($string, $substr1, $offset1); //position of substr1;
$strpos_substr2 = strposOffset($string, $substr2, $offset2); //position of substr2;
if($strpos_substr1!==null && $strpos_substr2!==null && $strpos_substr1!==false && $strpos_substr2!==false){
if($strpos_substr1<=$strpos_substr2){
$strpos_substr = $strlen_substr1+$strpos_substr1; //position of substr;
$strlen_substr = $strpos_substr2-$strpos_substr; //length of substr;
$substr = mb_substr($string, $strpos_substr, $strlen_substr, "utf-8"); //substr;
$substr = trim($substr); // removes whitespaces;
return $substr;
}else{
return false;
}
}else{
return false;
}
}

Related

PHP substring function returns odd symbol at the end [duplicate]

How can I get the first n characters of a string in PHP? What's the fastest way to trim a string to a specific number of characters, and append '...' if needed?

//The simple version for 10 Characters from the beginning of the string
$string = substr($string,0,10).'...';
Update:
Based on suggestion for checking length (and also ensuring similar lengths on trimmed and untrimmed strings):
$string = (strlen($string) > 13) ? substr($string,0,10).'...' : $string;
So you will get a string of max 13 characters; either 13 (or less) normal characters or 10 characters followed by '...'
Update 2:
Or as function:
function truncate($string, $length, $dots = "...") {
return (strlen($string) > $length) ? substr($string, 0, $length - strlen($dots)) . $dots : $string;
}
Update 3:
It's been a while since I wrote this answer and I don't actually use this code any more. I prefer this function which prevents breaking the string in the middle of a word using the wordwrap function:
function truncate($string,$length=100,$append="…") {
$string = trim($string);
if(strlen($string) > $length) {
$string = wordwrap($string, $length);
$string = explode("\n", $string, 2);
$string = $string[0] . $append;
}
return $string;
}

This functionality has been built into PHP since version 4.0.6. See the docs.
echo mb_strimwidth('Hello World', 0, 10, '...');
// outputs Hello W...
Note that the trimmarker (the ellipsis above) are included in the truncated length.

The Multibyte extension can come in handy if you need control over the string charset.
$charset = 'UTF-8';
$length = 10;
$string = 'Hai to yoo! I like yoo soo!';
if(mb_strlen($string, $charset) > $length) {
$string = mb_substr($string, 0, $length - 3, $charset) . '...';
}

sometimes, you need to limit the string to the last complete word ie: you don't want the last word to be broken instead you stop with the second last word.
eg:
we need to limit "This is my String" to 6 chars but instead of 'This i..." we want it to be 'This..." ie we will skip that broken letters in the last word.
phew, am bad at explaining, here is the code.
class Fun {
public function limit_text($text, $len) {
if (strlen($text) < $len) {
return $text;
}
$text_words = explode(' ', $text);
$out = null;
foreach ($text_words as $word) {
if ((strlen($word) > $len) && $out == null) {
return substr($word, 0, $len) . "...";
}
if ((strlen($out) + strlen($word)) > $len) {
return $out . "...";
}
$out.=" " . $word;
}
return $out;
}
}

If you want to cut being careful to don't split words you can do the following
function ellipse($str,$n_chars,$crop_str=' [...]')
{
$buff=strip_tags($str);
if(strlen($buff) > $n_chars)
{
$cut_index=strpos($buff,' ',$n_chars);
$buff=substr($buff,0,($cut_index===false? $n_chars: $cut_index+1)).$crop_str;
}
return $buff;
}
if $str is shorter than $n_chars returns it untouched.
If $str is equal to $n_chars returns it as is as well.
if $str is longer than $n_chars then it looks for the next space to cut or (if no more spaces till the end) $str gets cut rudely instead at $n_chars.
NOTE: be aware that this method will remove all tags in case of HTML.

The codeigniter framework contains a helper for this, called the "text helper". Here's some documentation from codeigniter's user guide that applies: http://codeigniter.com/user_guide/helpers/text_helper.html
(just read the word_limiter and character_limiter sections).
Here's two functions from it relevant to your question:
if ( ! function_exists('word_limiter'))
{
function word_limiter($str, $limit = 100, $end_char = '…')
{
if (trim($str) == '')
{
return $str;
}
preg_match('/^\s*+(?:\S++\s*+){1,'.(int) $limit.'}/', $str, $matches);
if (strlen($str) == strlen($matches[0]))
{
$end_char = '';
}
return rtrim($matches[0]).$end_char;
}
}
And
if ( ! function_exists('character_limiter'))
{
function character_limiter($str, $n = 500, $end_char = '…')
{
if (strlen($str) < $n)
{
return $str;
}
$str = preg_replace("/\s+/", ' ', str_replace(array("\r\n", "\r", "\n"), ' ', $str));
if (strlen($str) <= $n)
{
return $str;
}
$out = "";
foreach (explode(' ', trim($str)) as $val)
{
$out .= $val.' ';
if (strlen($out) >= $n)
{
$out = trim($out);
return (strlen($out) == strlen($str)) ? $out : $out.$end_char;
}
}
}
}

if(strlen($text) > 10)
$text = substr($text,0,10) . "...";

Use substring
http://php.net/manual/en/function.substr.php
$foo = substr("abcde",0, 3) . "...";

I'm not sure if this is the fastest solution, but it looks like it is the shortest one:
$result = current(explode("\n", wordwrap($str, $width, "...\n")));
P.S. See some examples here https://stackoverflow.com/a/17852480/131337

This function do the job without breaking words in the middle
function str_trim($str,$char_no){
if(strlen($str)<=$char_no)
return $str;
else{
$all_words=explode(" ",$str);
$out_str='';
foreach ($all_words as $word) {
$temp_str=($out_str=='')?$word:$out_str.' '.$word;
if(strlen($temp_str)>$char_no-3)//-3 for 3 dots
return $out_str."...";
$out_str=$temp_str;
}
}
}

The function I used:
function cutAfter($string, $len = 30, $append = '...') {
return (strlen($string) > $len) ?
substr($string, 0, $len - strlen($append)) . $append :
$string;
}
See it in action.

This is what i do
function cutat($num, $tt){
if (mb_strlen($tt)>$num){
$tt=mb_substr($tt,0,$num-2).'...';
}
return $tt;
}
where $num stands for number of chars, and $tt the string for manipulation.

I developed a function for this use
function str_short($string,$limit)
{
$len=strlen($string);
if($len>$limit)
{
$to_sub=$len-$limit;
$crop_temp=substr($string,0,-$to_sub);
return $crop_len=$crop_temp."...";
}
else
{
return $string;
}
}
you just call the function with string and limite
eg:str_short("hahahahahah",5);
it will cut of your string and add "..." at the end
:)

To create within a function (for repeat usage) and dynamical limited length, use:
function string_length_cutoff($string, $limit, $subtext = '...')
{
return (strlen($string) > $limit) ? substr($string, 0, ($limit-strlen(subtext))).$subtext : $string;
}
// example usage:
echo string_length_cutoff('Michelle Lee Hammontree-Garcia', 26);
// or (for custom substitution text
echo string_length_cutoff('Michelle Lee Hammontree-Garcia', 26, '..');

It's best to abstract you're code like so (notice the limit is optional and defaults to 10):
print limit($string);
function limit($var, $limit=10)
{
if ( strlen($var) > $limit )
{
return substr($string, 0, $limit) . '...';
}
else
{
return $var;
}
}

substr() would be best, you'll also want to check the length of the string first
$str = 'someLongString';
$max = 7;
if(strlen($str) > $max) {
$str = substr($str, 0, $max) . '...';
}
wordwrap won't trim the string down, just split it up...

$width = 10;
$a = preg_replace ("~^(.{{$width}})(.+)~", '\\1…', $a);
or with wordwrap
$a = preg_replace ("~^(.{1,${width}}\b)(.+)~", '\\1…', $a);

this solution will not cut words, it will add three dots after the first space.
I edited #Raccoon29 solution and I replaced all functions with mb_ functions so that this will work for all languages such as arabic
function cut_string($str, $n_chars, $crop_str = '...') {
$buff = strip_tags($str);
if (mb_strlen($buff) > $n_chars) {
$cut_index = mb_strpos($buff, ' ', $n_chars);
$buff = mb_substr($buff, 0, ($cut_index === false ? $n_chars : $cut_index + 1), "UTF-8") . $crop_str;
}
return $buff;
}

$yourString = "bla blaaa bla blllla bla bla";
$out = "";
if(strlen($yourString) > 22) {
while(strlen($yourString) > 22) {
$pos = strrpos($yourString, " ");
if($pos !== false && $pos <= 22) {
$out = substr($yourString,0,$pos);
break;
} else {
$yourString = substr($yourString,0,$pos);
continue;
}
}
} else {
$out = $yourString;
}
echo "Output String: ".$out;

If there is no hard requirement on the length of the truncated string, one can use this to truncate and prevent cutting the last word as well:
$text = "Knowledge is a natural right of every human being of which no one
has the right to deprive him or her under any pretext, except in a case where a
person does something which deprives him or her of that right. It is mere
stupidity to leave its benefits to certain individuals and teams who monopolize
these while the masses provide the facilities and pay the expenses for the
establishment of public sports.";
// we don't want new lines in our preview
$text_only_spaces = preg_replace('/\s+/', ' ', $text);
// truncates the text
$text_truncated = mb_substr($text_only_spaces, 0, mb_strpos($text_only_spaces, " ", 50));
// prevents last word truncation
$preview = trim(mb_substr($text_truncated, 0, mb_strrpos($text_truncated, " ")));
In this case, $preview will be "Knowledge is a natural right of every human being".
Live code example:
http://sandbox.onlinephpfunctions.com/code/25484a8b687d1f5ad93f62082b6379662a6b4713

PHP Ucwords with or and special characters

Here is what I'm doing.
I have a couple of strings that is uppercase
†HELLO THERE
DAY OR NIGHT
So to convert them, I'm using the following code:
ucwords(strtolower($string));
Here is the end result:
†hello There
Day Or Night
How can I ignore the † or any special characters so it the words can show
†Hello There
and how can I keep words like or all lowercase.

Try:
print preg_replace_callback('#([a-zA-ZÄÜÖäüö0-9]+)#',function($a){
return ucfirst(strtolower($a[0]));
},
'†hello THERE'
);
[a-zA-ZÄÜÖäüö0-9]+ find a word that only has this chars
You can also use this instead [\w]+
see: http://www.regular-expressions.info/wordboundaries.html
preg_replace_callback call a function on the found result
function($a){} do something with the result, here ucfirst(strtolower())

$lowerString = strtolower($string);
$stringArray = explode($lowerString, ' ');
foreach ($stringArray as $key => $singleString) {
$i = 0;
$formatedString = '';
$upcased = false;
for ($i; $i < strlen($singleString); $i++) {
$ascNum = chr($singleString[$i]);
$word = $singleString[$i];
if (!$upcased) {
if (($ascNum >= 65 && $ascNum <= 90) || ($ascNum >= 97 && $ascNum <= 122) ) {
$word = ucwords($word);
$upcased = true;
}
}
$formatedString .= $word;
}
$stringArray[$key] = $formatedString;
}
$result = implode(' ',$stringArray);
maybe a little complicated, but a clean idea.

ucwords(strtolower("†HELLO THERE"),"† "); the second parameter of ucwords is an optional delimiter. So by including both dagger and space, ucwords will work for the examples provided.
for your second question, see here

Assuming words are separated by a space:
<?php
function custom_ucfirst($s)
{
$s = strtolower($s);
$e = (strpos($s, ' ') !== false ? explode(' ', $s) : array($s));
$keep_all_lowercase = array('or','and','but');
foreach($e as $k=>$v)
{
if(!in_array($v, $keep_all_lowercase))
{
$str_split = str_split($v);
foreach($str_split as $k2=>$v2)
{
if(in_array($v2, range('a','z')))
{
$str_split[$k2] = strtoupper($v2);
break;
}
}
$e[$k] = implode('', $str_split);
}
}
return implode(' ', $e);
}
echo custom_ucfirst('†HELLO THERE .cloud. or sky what a nice an*d ()good day.');
// †Hello There .Cloud. or Sky What A Nice An*d ()Good Day.

Validate IBAN PHP

As designing a new platform we tried to integrate the IBAN numbers. We have to make sure that the IBAN is validated and the IBAN stored to the database is always correct. So what would be a proper way to validate the number?

As the logic was explained in my other question, I've created a function myself. Based on the logic explained in the Wikipedia article find a proper function below. Country specific validation.
Algorithm and character lengths per country at https://en.wikipedia.org/wiki/International_Bank_Account_Number#Validating_the_IBAN.
function checkIBAN($iban)
{
if(strlen($iban) < 5) return false;
$iban = strtolower(str_replace(' ','',$iban));
$Countries = array('al'=>28,'ad'=>24,'at'=>20,'az'=>28,'bh'=>22,'be'=>16,'ba'=>20,'br'=>29,'bg'=>22,'cr'=>21,'hr'=>21,'cy'=>28,'cz'=>24,'dk'=>18,'do'=>28,'ee'=>20,'fo'=>18,'fi'=>18,'fr'=>27,'ge'=>22,'de'=>22,'gi'=>23,'gr'=>27,'gl'=>18,'gt'=>28,'hu'=>28,'is'=>26,'ie'=>22,'il'=>23,'it'=>27,'jo'=>30,'kz'=>20,'kw'=>30,'lv'=>21,'lb'=>28,'li'=>21,'lt'=>20,'lu'=>20,'mk'=>19,'mt'=>31,'mr'=>27,'mu'=>30,'mc'=>27,'md'=>24,'me'=>22,'nl'=>18,'no'=>15,'pk'=>24,'ps'=>29,'pl'=>28,'pt'=>25,'qa'=>29,'ro'=>24,'sm'=>27,'sa'=>24,'rs'=>22,'sk'=>24,'si'=>19,'es'=>24,'se'=>24,'ch'=>21,'tn'=>24,'tr'=>26,'ae'=>23,'gb'=>22,'vg'=>24);
$Chars = array('a'=>10,'b'=>11,'c'=>12,'d'=>13,'e'=>14,'f'=>15,'g'=>16,'h'=>17,'i'=>18,'j'=>19,'k'=>20,'l'=>21,'m'=>22,'n'=>23,'o'=>24,'p'=>25,'q'=>26,'r'=>27,'s'=>28,'t'=>29,'u'=>30,'v'=>31,'w'=>32,'x'=>33,'y'=>34,'z'=>35);
if(array_key_exists(substr($iban,0,2), $Countries) && strlen($iban) == $Countries[substr($iban,0,2)]){
$MovedChar = substr($iban, 4).substr($iban,0,4);
$MovedCharArray = str_split($MovedChar);
$NewString = "";
foreach($MovedCharArray AS $key => $value){
if(!is_numeric($MovedCharArray[$key])){
if(!isset($Chars[$MovedCharArray[$key]])) return false;
$MovedCharArray[$key] = $Chars[$MovedCharArray[$key]];
}
$NewString .= $MovedCharArray[$key];
}
if(bcmod($NewString, '97') == 1)
{
return true;
}
}
return false;
}

Slight modification of #PeterFox answer including support for bcmod() when bcmath is not available,
<?php
function isValidIBAN ($iban) {
$iban = strtolower($iban);
$Countries = array(
'al'=>28,'ad'=>24,'at'=>20,'az'=>28,'bh'=>22,'be'=>16,'ba'=>20,'br'=>29,'bg'=>22,'cr'=>21,'hr'=>21,'cy'=>28,'cz'=>24,
'dk'=>18,'do'=>28,'ee'=>20,'fo'=>18,'fi'=>18,'fr'=>27,'ge'=>22,'de'=>22,'gi'=>23,'gr'=>27,'gl'=>18,'gt'=>28,'hu'=>28,
'is'=>26,'ie'=>22,'il'=>23,'it'=>27,'jo'=>30,'kz'=>20,'kw'=>30,'lv'=>21,'lb'=>28,'li'=>21,'lt'=>20,'lu'=>20,'mk'=>19,
'mt'=>31,'mr'=>27,'mu'=>30,'mc'=>27,'md'=>24,'me'=>22,'nl'=>18,'no'=>15,'pk'=>24,'ps'=>29,'pl'=>28,'pt'=>25,'qa'=>29,
'ro'=>24,'sm'=>27,'sa'=>24,'rs'=>22,'sk'=>24,'si'=>19,'es'=>24,'se'=>24,'ch'=>21,'tn'=>24,'tr'=>26,'ae'=>23,'gb'=>22,'vg'=>24
);
$Chars = array(
'a'=>10,'b'=>11,'c'=>12,'d'=>13,'e'=>14,'f'=>15,'g'=>16,'h'=>17,'i'=>18,'j'=>19,'k'=>20,'l'=>21,'m'=>22,
'n'=>23,'o'=>24,'p'=>25,'q'=>26,'r'=>27,'s'=>28,'t'=>29,'u'=>30,'v'=>31,'w'=>32,'x'=>33,'y'=>34,'z'=>35
);
if (strlen($iban) != $Countries[ substr($iban,0,2) ]) { return false; }
$MovedChar = substr($iban, 4) . substr($iban,0,4);
$MovedCharArray = str_split($MovedChar);
$NewString = "";
foreach ($MovedCharArray as $k => $v) {
if ( !is_numeric($MovedCharArray[$k]) ) {
$MovedCharArray[$k] = $Chars[$MovedCharArray[$k]];
}
$NewString .= $MovedCharArray[$k];
}
if (function_exists("bcmod")) { return bcmod($NewString, '97') == 1; }
// http://au2.php.net/manual/en/function.bcmod.php#38474
$x = $NewString; $y = "97";
$take = 5; $mod = "";
do {
$a = (int)$mod . substr($x, 0, $take);
$x = substr($x, $take);
$mod = $a % $y;
}
while (strlen($x));
return (int)$mod == 1;
}

The accepted answer is not the preferred way of validation. The specification dictates the following:
Check that the total IBAN length is correct as per the country. If not, the IBAN is invalid
Replace the two check digits by 00 (e.g. GB00 for the UK)
Move the four initial characters to the end of the string
Replace the letters in the string with digits, expanding the string as necessary, such that A or a = 10, B or b = 11, and Z or z = 35. Each alphabetic character is therefore replaced by 2 digits
Convert the string to an integer (i.e. ignore leading zeroes)
Calculate mod-97 of the new number, which results in the remainder
Subtract the remainder from 98, and use the result for the two check digits. If the result is a single digit number, pad it with a leading 0 to make a two-digit number
I've written a class that validates, formats and parses strings according to the spec. Hope this helps some save the time required to roll their own.
The code can be found on GitHub here.

top rated function does NOT work.
Just try a string with '%' in it...
I'm using this one :
function checkIBAN($iban) {
// Normalize input (remove spaces and make upcase)
$iban = strtoupper(str_replace(' ', '', $iban));
if (preg_match('/^[A-Z]{2}[0-9]{2}[A-Z0-9]{1,30}$/', $iban)) {
$country = substr($iban, 0, 2);
$check = intval(substr($iban, 2, 2));
$account = substr($iban, 4);
// To numeric representation
$search = range('A','Z');
foreach (range(10,35) as $tmp)
$replace[]=strval($tmp);
$numstr=str_replace($search, $replace, $account.$country.'00');
// Calculate checksum
$checksum = intval(substr($numstr, 0, 1));
for ($pos = 1; $pos < strlen($numstr); $pos++) {
$checksum *= 10;
$checksum += intval(substr($numstr, $pos,1));
$checksum %= 97;
}
return ((98-$checksum) == $check);
} else
return false;
}

I found this solution in cakephp 3.7 validation class. Plain beautiful php realization.
/**
* Check that the input value has a valid International Bank Account Number IBAN syntax
* Requirements are uppercase, no whitespaces, max length 34, country code and checksum exist at right spots,
* body matches against checksum via Mod97-10 algorithm
*
* #param string $check The value to check
*
* #return bool Success
*/
public static function iban($check)
{
if (!preg_match('/^[A-Z]{2}[0-9]{2}[A-Z0-9]{1,30}$/', $check)) {
return false;
}
$country = substr($check, 0, 2);
$checkInt = intval(substr($check, 2, 2));
$account = substr($check, 4);
$search = range('A', 'Z');
$replace = [];
foreach (range(10, 35) as $tmp) {
$replace[] = strval($tmp);
}
$numStr = str_replace($search, $replace, $account . $country . '00');
$checksum = intval(substr($numStr, 0, 1));
$numStrLength = strlen($numStr);
for ($pos = 1; $pos < $numStrLength; $pos++) {
$checksum *= 10;
$checksum += intval(substr($numStr, $pos, 1));
$checksum %= 97;
}
return ((98 - $checksum) === $checkInt);
}

This function check the IBAN and need GMP activate http://php.net/manual/en/book.gmp.php.
function checkIban($string){
$to_check = substr($string, 4).substr($string, 0,4);
$converted = '';
for ($i = 0; $i < strlen($to_check); $i++){
$char = strtoupper($to_check[$i]);
if(preg_match('/[0-9A-Z]/',$char)){
if(!preg_match('/\d/',$char)){
$char = ord($char)-55;
}
$converted .= $char;
}
}
// prevent: "gmp_mod() $num1 is not an integer string" error
$converted = ltrim($converted, '0');
return strlen($converted) && gmp_strval(gmp_mod($converted, '97')) == 1;
}
enjoy !

optimizing a php function that trims strings

i programmed this php function that takes any text/html string and trims it.
For example:
gen_string("Hello, how are you today?",10);
Returns:
Hello, how...
The problem arises when the function string limit is the same as the position of a special character such as: á, ñ, etc...
In which case:
gen_string("Helló my friend",5);
Returns: Hell�...
Any ideas on how to solve this issue? This is the current function:
# string: advanced substr
function gen_string($string,$min,$clean=false) {
$text = trim(strip_tags($string));
if(strlen($text)>$min) {
$blank = strpos($text,' ');
if($blank) {
# limit plus last word
$extra = strpos(substr($text,$min),' ');
$max = $min+$extra;
$r = substr($text,0,$max);
if(strlen($text)>=$max && !$clean) $r=trim($r,'.').'...';
} else {
# if there are no spaces
$r = substr($text,0,$min).'...';
}
} else {
# if original length is lower than limit
$r = $text;
}
return trim($r);
}
Thanks!

You should use the multibyte string functions to correctly handle unicode characters.
For example you could try using mb_strimwidth to truncate a string to a specified length.

You could also take a different approach and make use of the PCRE regex extension's UTF-8 capabilities (assuming your strings are UTF-8!).
function gen_string($string, $length)
{
$str = trim(strip_tags($string));
$strlen = strlen(utf8_decode($str));
// String is less than limit
if ($strlen <= $length) return $str;
// Shorten string, preserving whole "words" (non-whitespace)
preg_match('/^.{'.($length-1).'}\S*/su', $str, $match);
// Append ellipsis if needed (bytes length is OK to check)
if (strlen($match[0]) !== strlen($str)) $match[0] .= '...';
return $match[0];
}

Aside from the multibyte issue, maybe you can write it shorter
function gen_string($str, $limit) {
if ($str >= strlen($limit))
return $str;
$offset = -(strlen($str) - $limit);
return substr($str, 0, strrpos($str, ' ', $offset)).'...';
}
It will limit the length of the string, so rather than cut it after the first word beyond the limit, it ensures that the length is never larger than the limit.

strlen() cannot be used for UTF-8 string, because it would count also the continuation characters, which should not be counted.
You can try with the following code:
define('PREG_CLASS_UNICODE_WORD_BOUNDARY',
'\x{0}-\x{2F}\x{3A}-\x{40}\x{5B}-\x{60}\x{7B}-\x{A9}\x{AB}-\x{B1}\x{B4}' .
'\x{B6}-\x{B8}\x{BB}\x{BF}\x{D7}\x{F7}\x{2C2}-\x{2C5}\x{2D2}-\x{2DF}' .
'\x{2E5}-\x{2EB}\x{2ED}\x{2EF}-\x{2FF}\x{375}\x{37E}-\x{385}\x{387}\x{3F6}' .
'\x{482}\x{55A}-\x{55F}\x{589}-\x{58A}\x{5BE}\x{5C0}\x{5C3}\x{5C6}' .
'\x{5F3}-\x{60F}\x{61B}-\x{61F}\x{66A}-\x{66D}\x{6D4}\x{6DD}\x{6E9}' .
'\x{6FD}-\x{6FE}\x{700}-\x{70F}\x{7F6}-\x{7F9}\x{830}-\x{83E}' .
'\x{964}-\x{965}\x{970}\x{9F2}-\x{9F3}\x{9FA}-\x{9FB}\x{AF1}\x{B70}' .
'\x{BF3}-\x{BFA}\x{C7F}\x{CF1}-\x{CF2}\x{D79}\x{DF4}\x{E3F}\x{E4F}' .
'\x{E5A}-\x{E5B}\x{F01}-\x{F17}\x{F1A}-\x{F1F}\x{F34}\x{F36}\x{F38}' .
'\x{F3A}-\x{F3D}\x{F85}\x{FBE}-\x{FC5}\x{FC7}-\x{FD8}\x{104A}-\x{104F}' .
'\x{109E}-\x{109F}\x{10FB}\x{1360}-\x{1368}\x{1390}-\x{1399}\x{1400}' .
'\x{166D}-\x{166E}\x{1680}\x{169B}-\x{169C}\x{16EB}-\x{16ED}' .
'\x{1735}-\x{1736}\x{17B4}-\x{17B5}\x{17D4}-\x{17D6}\x{17D8}-\x{17DB}' .
'\x{1800}-\x{180A}\x{180E}\x{1940}-\x{1945}\x{19DE}-\x{19FF}' .
'\x{1A1E}-\x{1A1F}\x{1AA0}-\x{1AA6}\x{1AA8}-\x{1AAD}\x{1B5A}-\x{1B6A}' .
'\x{1B74}-\x{1B7C}\x{1C3B}-\x{1C3F}\x{1C7E}-\x{1C7F}\x{1CD3}\x{1FBD}' .
'\x{1FBF}-\x{1FC1}\x{1FCD}-\x{1FCF}\x{1FDD}-\x{1FDF}\x{1FED}-\x{1FEF}' .
'\x{1FFD}-\x{206F}\x{207A}-\x{207E}\x{208A}-\x{208E}\x{20A0}-\x{20B8}' .
'\x{2100}-\x{2101}\x{2103}-\x{2106}\x{2108}-\x{2109}\x{2114}' .
'\x{2116}-\x{2118}\x{211E}-\x{2123}\x{2125}\x{2127}\x{2129}\x{212E}' .
'\x{213A}-\x{213B}\x{2140}-\x{2144}\x{214A}-\x{214D}\x{214F}' .
'\x{2190}-\x{244A}\x{249C}-\x{24E9}\x{2500}-\x{2775}\x{2794}-\x{2B59}' .
'\x{2CE5}-\x{2CEA}\x{2CF9}-\x{2CFC}\x{2CFE}-\x{2CFF}\x{2E00}-\x{2E2E}' .
'\x{2E30}-\x{3004}\x{3008}-\x{3020}\x{3030}\x{3036}-\x{3037}' .
'\x{303D}-\x{303F}\x{309B}-\x{309C}\x{30A0}\x{30FB}\x{3190}-\x{3191}' .
'\x{3196}-\x{319F}\x{31C0}-\x{31E3}\x{3200}-\x{321E}\x{322A}-\x{3250}' .
'\x{3260}-\x{327F}\x{328A}-\x{32B0}\x{32C0}-\x{33FF}\x{4DC0}-\x{4DFF}' .
'\x{A490}-\x{A4C6}\x{A4FE}-\x{A4FF}\x{A60D}-\x{A60F}\x{A673}\x{A67E}' .
'\x{A6F2}-\x{A716}\x{A720}-\x{A721}\x{A789}-\x{A78A}\x{A828}-\x{A82B}' .
'\x{A836}-\x{A839}\x{A874}-\x{A877}\x{A8CE}-\x{A8CF}\x{A8F8}-\x{A8FA}' .
'\x{A92E}-\x{A92F}\x{A95F}\x{A9C1}-\x{A9CD}\x{A9DE}-\x{A9DF}' .
'\x{AA5C}-\x{AA5F}\x{AA77}-\x{AA79}\x{AADE}-\x{AADF}\x{ABEB}' .
'\x{D800}-\x{F8FF}\x{FB29}\x{FD3E}-\x{FD3F}\x{FDFC}-\x{FDFD}' .
'\x{FE10}-\x{FE19}\x{FE30}-\x{FE6B}\x{FEFF}-\x{FF0F}\x{FF1A}-\x{FF20}' .
'\x{FF3B}-\x{FF40}\x{FF5B}-\x{FF65}\x{FFE0}-\x{FFFD}');
function utf8_strlen($text) {
if (function_exists('mb_strlen')) {
return mb_strlen($text);
}
// Do not count UTF-8 continuation bytes.
return strlen(preg_replace("/[\x80-\xBF]/", '', $text));
}
function utf8_truncate($string, $max_length, $wordsafe = FALSE, $add_ellipsis = FALSE, $min_wordsafe_length = 1) {
$ellipsis = '';
$max_length = max($max_length, 0);
$min_wordsafe_length = max($min_wordsafe_length, 0);
if (utf8_strlen($string) <= $max_length) {
// No truncation needed, so don't add ellipsis, just return.
return $string;
}
if ($add_ellipsis) {
// Truncate ellipsis in case $max_length is small.
$ellipsis = utf8_substr('...', 0, $max_length);
$max_length -= utf8_strlen($ellipsis);
$max_length = max($max_length, 0);
}
if ($max_length <= $min_wordsafe_length) {
// Do not attempt word-safe if lengths are bad.
$wordsafe = FALSE;
}
if ($wordsafe) {
$matches = array();
// Find the last word boundary, if there is one within $min_wordsafe_length
// to $max_length characters. preg_match() is always greedy, so it will
// find the longest string possible.
$found = preg_match('/^(.{' . $min_wordsafe_length . ',' . $max_length . '})[' . PREG_CLASS_UNICODE_WORD_BOUNDARY . ']/u', $string, $matches);
if ($found) {
$string = $matches[1];
}
else {
$string = utf8_substr($string, 0, $max_length);
}
}
else {
$string = utf8_substr($string, 0, $max_length);
}
if ($add_ellipsis) {
$string .= $ellipsis;
}
return $string;
}
function utf8_substr($text, $start, $length = NULL) {
if (function_exists('mb_substr')) {
return $length === NULL ? mb_substr($text, $start) : mb_substr($text, $start, $length);
}
else {
$strlen = strlen($text);
// Find the starting byte offset.
$bytes = 0;
if ($start > 0) {
// Count all the continuation bytes from the start until we have found
// $start characters or the end of the string.
$bytes = -1;
$chars = -1;
while ($bytes < $strlen - 1 && $chars < $start) {
$bytes++;
$c = ord($text[$bytes]);
if ($c < 0x80 || $c >= 0xC0) {
$chars++;
}
}
}
elseif ($start < 0) {
// Count all the continuation bytes from the end until we have found
// abs($start) characters.
$start = abs($start);
$bytes = $strlen;
$chars = 0;
while ($bytes > 0 && $chars < $start) {
$bytes--;
$c = ord($text[$bytes]);
if ($c < 0x80 || $c >= 0xC0) {
$chars++;
}
}
}
$istart = $bytes;
// Find the ending byte offset.
if ($length === NULL) {
$iend = $strlen;
}
elseif ($length > 0) {
// Count all the continuation bytes from the starting index until we have
// found $length characters or reached the end of the string, then
// backtrace one byte.
$iend = $istart - 1;
$chars = -1;
$last_real = FALSE;
while ($iend < $strlen - 1 && $chars < $length) {
$iend++;
$c = ord($text[$iend]);
$last_real = FALSE;
if ($c < 0x80 || $c >= 0xC0) {
$chars++;
$last_real = TRUE;
}
}
// Backtrace one byte if the last character we found was a real character
// and we don't need it.
if ($last_real && $chars >= $length) {
$iend--;
}
}
elseif ($length < 0) {
// Count all the continuation bytes from the end until we have found
// abs($start) characters, then backtrace one byte.
$length = abs($length);
$iend = $strlen;
$chars = 0;
while ($iend > 0 && $chars < $length) {
$iend--;
$c = ord($text[$iend]);
if ($c < 0x80 || $c >= 0xC0) {
$chars++;
}
}
// Backtrace one byte if we are not at the beginning of the string.
if ($iend > 0) {
$iend--;
}
}
else {
// $length == 0, return an empty string.
return '';
}
return substr($text, $istart, max(0, $iend - $istart + 1));
}
}

For your return statement you could try:
return htmlspecialchars(trim($r));
EDIT: I tried your code as you provided it and it ran fine for me without having to use htmlspecialchars(). This is probably due to the face that in the <head> of the page the code was running on, the charset was set to UTF-8. So your options could be to set the encoding of the page like this:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
or to use htmlspecialchars() as above.

Truncate a string to first n characters of a string and add three dots if any characters are removed

How can I get the first n characters of a string in PHP? What's the fastest way to trim a string to a specific number of characters, and append '...' if needed?

//The simple version for 10 Characters from the beginning of the string
$string = substr($string,0,10).'...';
Update:
Based on suggestion for checking length (and also ensuring similar lengths on trimmed and untrimmed strings):
$string = (strlen($string) > 13) ? substr($string,0,10).'...' : $string;
So you will get a string of max 13 characters; either 13 (or less) normal characters or 10 characters followed by '...'
Update 2:
Or as function:
function truncate($string, $length, $dots = "...") {
return (strlen($string) > $length) ? substr($string, 0, $length - strlen($dots)) . $dots : $string;
}
Update 3:
It's been a while since I wrote this answer and I don't actually use this code any more. I prefer this function which prevents breaking the string in the middle of a word using the wordwrap function:
function truncate($string,$length=100,$append="…") {
$string = trim($string);
if(strlen($string) > $length) {
$string = wordwrap($string, $length);
$string = explode("\n", $string, 2);
$string = $string[0] . $append;
}
return $string;
}

This functionality has been built into PHP since version 4.0.6. See the docs.
echo mb_strimwidth('Hello World', 0, 10, '...');
// outputs Hello W...
Note that the trimmarker (the ellipsis above) are included in the truncated length.

The Multibyte extension can come in handy if you need control over the string charset.
$charset = 'UTF-8';
$length = 10;
$string = 'Hai to yoo! I like yoo soo!';
if(mb_strlen($string, $charset) > $length) {
$string = mb_substr($string, 0, $length - 3, $charset) . '...';
}

sometimes, you need to limit the string to the last complete word ie: you don't want the last word to be broken instead you stop with the second last word.
eg:
we need to limit "This is my String" to 6 chars but instead of 'This i..." we want it to be 'This..." ie we will skip that broken letters in the last word.
phew, am bad at explaining, here is the code.
class Fun {
public function limit_text($text, $len) {
if (strlen($text) < $len) {
return $text;
}
$text_words = explode(' ', $text);
$out = null;
foreach ($text_words as $word) {
if ((strlen($word) > $len) && $out == null) {
return substr($word, 0, $len) . "...";
}
if ((strlen($out) + strlen($word)) > $len) {
return $out . "...";
}
$out.=" " . $word;
}
return $out;
}
}

If you want to cut being careful to don't split words you can do the following
function ellipse($str,$n_chars,$crop_str=' [...]')
{
$buff=strip_tags($str);
if(strlen($buff) > $n_chars)
{
$cut_index=strpos($buff,' ',$n_chars);
$buff=substr($buff,0,($cut_index===false? $n_chars: $cut_index+1)).$crop_str;
}
return $buff;
}
if $str is shorter than $n_chars returns it untouched.
If $str is equal to $n_chars returns it as is as well.
if $str is longer than $n_chars then it looks for the next space to cut or (if no more spaces till the end) $str gets cut rudely instead at $n_chars.
NOTE: be aware that this method will remove all tags in case of HTML.

The codeigniter framework contains a helper for this, called the "text helper". Here's some documentation from codeigniter's user guide that applies: http://codeigniter.com/user_guide/helpers/text_helper.html
(just read the word_limiter and character_limiter sections).
Here's two functions from it relevant to your question:
if ( ! function_exists('word_limiter'))
{
function word_limiter($str, $limit = 100, $end_char = '…')
{
if (trim($str) == '')
{
return $str;
}
preg_match('/^\s*+(?:\S++\s*+){1,'.(int) $limit.'}/', $str, $matches);
if (strlen($str) == strlen($matches[0]))
{
$end_char = '';
}
return rtrim($matches[0]).$end_char;
}
}
And
if ( ! function_exists('character_limiter'))
{
function character_limiter($str, $n = 500, $end_char = '…')
{
if (strlen($str) < $n)
{
return $str;
}
$str = preg_replace("/\s+/", ' ', str_replace(array("\r\n", "\r", "\n"), ' ', $str));
if (strlen($str) <= $n)
{
return $str;
}
$out = "";
foreach (explode(' ', trim($str)) as $val)
{
$out .= $val.' ';
if (strlen($out) >= $n)
{
$out = trim($out);
return (strlen($out) == strlen($str)) ? $out : $out.$end_char;
}
}
}
}

if(strlen($text) > 10)
$text = substr($text,0,10) . "...";

Use substring
http://php.net/manual/en/function.substr.php
$foo = substr("abcde",0, 3) . "...";

I'm not sure if this is the fastest solution, but it looks like it is the shortest one:
$result = current(explode("\n", wordwrap($str, $width, "...\n")));
P.S. See some examples here https://stackoverflow.com/a/17852480/131337

This function do the job without breaking words in the middle
function str_trim($str,$char_no){
if(strlen($str)<=$char_no)
return $str;
else{
$all_words=explode(" ",$str);
$out_str='';
foreach ($all_words as $word) {
$temp_str=($out_str=='')?$word:$out_str.' '.$word;
if(strlen($temp_str)>$char_no-3)//-3 for 3 dots
return $out_str."...";
$out_str=$temp_str;
}
}
}

The function I used:
function cutAfter($string, $len = 30, $append = '...') {
return (strlen($string) > $len) ?
substr($string, 0, $len - strlen($append)) . $append :
$string;
}
See it in action.

This is what i do
function cutat($num, $tt){
if (mb_strlen($tt)>$num){
$tt=mb_substr($tt,0,$num-2).'...';
}
return $tt;
}
where $num stands for number of chars, and $tt the string for manipulation.

I developed a function for this use
function str_short($string,$limit)
{
$len=strlen($string);
if($len>$limit)
{
$to_sub=$len-$limit;
$crop_temp=substr($string,0,-$to_sub);
return $crop_len=$crop_temp."...";
}
else
{
return $string;
}
}
you just call the function with string and limite
eg:str_short("hahahahahah",5);
it will cut of your string and add "..." at the end
:)

To create within a function (for repeat usage) and dynamical limited length, use:
function string_length_cutoff($string, $limit, $subtext = '...')
{
return (strlen($string) > $limit) ? substr($string, 0, ($limit-strlen(subtext))).$subtext : $string;
}
// example usage:
echo string_length_cutoff('Michelle Lee Hammontree-Garcia', 26);
// or (for custom substitution text
echo string_length_cutoff('Michelle Lee Hammontree-Garcia', 26, '..');

It's best to abstract you're code like so (notice the limit is optional and defaults to 10):
print limit($string);
function limit($var, $limit=10)
{
if ( strlen($var) > $limit )
{
return substr($string, 0, $limit) . '...';
}
else
{
return $var;
}
}

substr() would be best, you'll also want to check the length of the string first
$str = 'someLongString';
$max = 7;
if(strlen($str) > $max) {
$str = substr($str, 0, $max) . '...';
}
wordwrap won't trim the string down, just split it up...

$width = 10;
$a = preg_replace ("~^(.{{$width}})(.+)~", '\\1…', $a);
or with wordwrap
$a = preg_replace ("~^(.{1,${width}}\b)(.+)~", '\\1…', $a);

this solution will not cut words, it will add three dots after the first space.
I edited #Raccoon29 solution and I replaced all functions with mb_ functions so that this will work for all languages such as arabic
function cut_string($str, $n_chars, $crop_str = '...') {
$buff = strip_tags($str);
if (mb_strlen($buff) > $n_chars) {
$cut_index = mb_strpos($buff, ' ', $n_chars);
$buff = mb_substr($buff, 0, ($cut_index === false ? $n_chars : $cut_index + 1), "UTF-8") . $crop_str;
}
return $buff;
}

$yourString = "bla blaaa bla blllla bla bla";
$out = "";
if(strlen($yourString) > 22) {
while(strlen($yourString) > 22) {
$pos = strrpos($yourString, " ");
if($pos !== false && $pos <= 22) {
$out = substr($yourString,0,$pos);
break;
} else {
$yourString = substr($yourString,0,$pos);
continue;
}
}
} else {
$out = $yourString;
}
echo "Output String: ".$out;

If there is no hard requirement on the length of the truncated string, one can use this to truncate and prevent cutting the last word as well:
$text = "Knowledge is a natural right of every human being of which no one
has the right to deprive him or her under any pretext, except in a case where a
person does something which deprives him or her of that right. It is mere
stupidity to leave its benefits to certain individuals and teams who monopolize
these while the masses provide the facilities and pay the expenses for the
establishment of public sports.";
// we don't want new lines in our preview
$text_only_spaces = preg_replace('/\s+/', ' ', $text);
// truncates the text
$text_truncated = mb_substr($text_only_spaces, 0, mb_strpos($text_only_spaces, " ", 50));
// prevents last word truncation
$preview = trim(mb_substr($text_truncated, 0, mb_strrpos($text_truncated, " ")));
In this case, $preview will be "Knowledge is a natural right of every human being".
Live code example:
http://sandbox.onlinephpfunctions.com/code/25484a8b687d1f5ad93f62082b6379662a6b4713

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

getting a substring(charset independent) between given two offsets - php

Related

PHP substring function returns odd symbol at the end [duplicate]

PHP Ucwords with or and special characters

Validate IBAN PHP

optimizing a php function that trims strings

Truncate a string to first n characters of a string and add three dots if any characters are removed

Categories

Resources