I am using function:
private function random($len) {
if (#is_readable('/dev/urandom')) {
$f=fopen('/dev/urandom', 'r');
$urandom=fread($f, $len);
fclose($f);
}
$return='';
for ($i=0;$i<$len;++$i) {
if (!isset($urandom)) {
if ($i%2==0) mt_srand(time()%2147 * 1000000 + (double)microtime() * 1000000);
$rand=48+mt_rand()%64;
} else $rand=48+ord($urandom[$i])%64;
if ($rand>57)
$rand+=7;
if ($rand>90)
$rand+=6;
if ($rand==123) $rand=52;
if ($rand==124) $rand=53;
$return.=chr($rand);
}
return $return;
}
I have some forms which trigger this function and I get the error:
int(2) string(200) "is_readable(): open_basedir restriction in effect.
File(/dev/urandom) is not within the allowed path(s):
Is there a way to replace this function and not to use /dev/urandom ?
Thank you very much.
From the (previously accepted) answer:
Instead of urandom you can use "rand":
Nooooooooo!
Dealing with open_basedir is one of the things we handle gracefully in random_compat. Seriously consider importing that library then just using random_bytes() instead of reading from /dev/urandom.
Whatever you do, DON'T USE rand(). Even if you believe there's a use case for it, the security trade-offs are a lie.
Also, if you need a function to generate a random string (depends on PHP 7 or random_compat):
/**
* Note: See https://paragonie.com/b/JvICXzh_jhLyt4y3 for an alternative implementation
*/
function random_string($length = 26, $alphabet = 'abcdefghijklmnopqrstuvwxyz234567')
{
if ($length < 1) {
throw new InvalidArgumentException('Length must be a positive integer');
}
$str = '';
$alphamax = strlen($alphabet) - 1;
if ($alphamax < 1) {
throw new InvalidArgumentException('Invalid alphabet');
}
for ($i = 0; $i < $length; ++$i) {
$str .= $alphabet[random_int(0, $alphamax)];
}
return $str;
}
Demo code: https://3v4l.org/DOjNE
If your host doesn't support random_int() you can use a function which I made for myself.
function generateRandomString($length, $secureRand = false, $chars="0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ") {
if (!function_exists("random_int") && $secureRand) {
function random_int($min, $max) {
$range = $max - $min;
if ($range <= 0) return $min;
$log = ceil(log($range, 2));
$bytes = (int)($log / 8) + 1;
$filter = (int)(1 << ((int)($log + 1))) - 1;
do {
$rnd = hexdec(bin2hex(openssl_random_pseudo_bytes($bytes, $s)));
if (!$s) continue;
$rnd = $rnd & $filter;
} while ($rnd > $range);
return $min + $rnd;
}
}
$charsCount = strlen($chars) - 1;
$output = "";
for ($i=1; $i <= $length; $i++) {
if ($secureRand)
$output .= $chars[random_int(0, $charsCount)];
else
$output .= $chars[mt_rand(0, $charsCount)];
}
return $output;
}
If you need a secure random string (e.g. random passwords):
generateRandomString(8, true);
this will give you a 8 lenght string.
Related
I got the problem when convert between this 2 type in PHP. This is the code I searched in google
function strToHex($string){
$hex='';
for ($i=0; $i < strlen($string); $i++){
$hex .= dechex(ord($string[$i]));
}
return $hex;
}
function hexToStr($hex){
$string='';
for ($i=0; $i < strlen($hex)-1; $i+=2){
$string .= chr(hexdec($hex[$i].$hex[$i+1]));
}
return $string;
}
I check it and found out this when I use XOR to encrypt.
I have the string "this is the test", after XOR with a key, I have the result in string ↕↑↔§P↔§P ♫§T↕§↕. After that, I tried to convert it to hex by function strToHex() and I got these 12181d15501d15500e15541215712. Then, I tested with the function hexToStr() and I have ↕↑↔§P↔§P♫§T↕§q. So, what should I do to solve this problem? Why does it wrong when I convert this 2 style value?
For people that end up here and are just looking for the hex representation of a (binary) string.
bin2hex("that's all you need");
# 74686174277320616c6c20796f75206e656564
hex2bin('74686174277320616c6c20796f75206e656564');
# that's all you need
Doc: bin2hex, hex2bin.
For any char with ord($char) < 16 you get a HEX back which is only 1 long. You forgot to add 0 padding.
This should solve it:
<?php
function strToHex($string){
$hex = '';
for ($i=0; $i<strlen($string); $i++){
$ord = ord($string[$i]);
$hexCode = dechex($ord);
$hex .= substr('0'.$hexCode, -2);
}
return strToUpper($hex);
}
function hexToStr($hex){
$string='';
for ($i=0; $i < strlen($hex)-1; $i+=2){
$string .= chr(hexdec($hex[$i].$hex[$i+1]));
}
return $string;
}
// Tests
header('Content-Type: text/plain');
function test($expected, $actual, $success) {
if($expected !== $actual) {
echo "Expected: '$expected'\n";
echo "Actual: '$actual'\n";
echo "\n";
$success = false;
}
return $success;
}
$success = true;
$success = test('00', strToHex(hexToStr('00')), $success);
$success = test('FF', strToHex(hexToStr('FF')), $success);
$success = test('000102FF', strToHex(hexToStr('000102FF')), $success);
$success = test('↕↑↔§P↔§P ♫§T↕§↕', hexToStr(strToHex('↕↑↔§P↔§P ♫§T↕§↕')), $success);
echo $success ? "Success" : "\nFailed";
PHP :
string to hex:
implode(unpack("H*", $string));
hex to string:
pack("H*", $hex);
Here's what I use:
function strhex($string) {
$hexstr = unpack('H*', $string);
return array_shift($hexstr);
}
function hexToStr($hex){
// Remove spaces if the hex string has spaces
$hex = str_replace(' ', '', $hex);
return hex2bin($hex);
}
// Test it
$hex = "53 44 43 30 30 32 30 30 30 31 37 33";
echo hexToStr($hex); // SDC002000173
/**
* Test Hex To string with PHP UNIT
* #param string $value
* #return
*/
public function testHexToString()
{
$string = 'SDC002000173';
$hex = "53 44 43 30 30 32 30 30 30 31 37 33";
$result = hexToStr($hex);
$this->assertEquals($result,$string);
}
Using #bill-shirley answer with a little addition
function str_to_hex($string) {
$hexstr = unpack('H*', $string);
return array_shift($hexstr);
}
function hex_to_str($string) {
return hex2bin("$string");
}
Usage:
$str = "Go placidly amidst the noise";
$hexstr = str_to_hex($str);// 476f20706c616369646c7920616d6964737420746865206e6f697365
$strstr = hex_to_str($str);// Go placidly amidst the noise
You can try the following code to convert the image to hex string
<?php
$image = 'sample.bmp';
$file = fopen($image, 'r') or die("Could not open $image");
while ($file && !feof($file)){
$chunk = fread($file, 1000000); # You can affect performance altering
this number. YMMV.
# This loop will be dog-slow, almost for sure...
# You could snag two or three bytes and shift/add them,
# but at 4 bytes, you violate the 7fffffff limit of dechex...
# You could maybe write a better dechex that would accept multiple bytes
# and use substr... Maybe.
for ($byte = 0; $byte < strlen($chunk); $byte++)){
echo dechex(ord($chunk[$byte]));
}
}
?>
I only have half the answer, but I hope that it is useful as it adds unicode (utf-8) support
/**
* hexadecimal to unicode character
* #param string $hex
* #return string
*/
function hex2uni($hex) {
$dec = hexdec($hex);
if($dec < 128) {
return chr($dec);
}
if($dec < 2048) {
$utf = chr(192 + (($dec - ($dec % 64)) / 64));
} else {
$utf = chr(224 + (($dec - ($dec % 4096)) / 4096));
$utf .= chr(128 + ((($dec % 4096) - ($dec % 64)) / 64));
}
return $utf . chr(128 + ($dec % 64));
}
To string
var_dump(hex2uni('e641'));
Based on: http://www.php.net/manual/en/function.chr.php#Hcom55978
I have the following code:
for($a=1; $a<strlen($string); $a++){
for($b=1; $a+$b<strlen($string); $b++){
for($c=1; $a+$b+$c<strlen($string); $c++){
for($d=1; $a+$b+$c+$d<strlen($string); $d++){
$tempString = substr_replace($string, ".", $a, 0);
$tempString = substr_replace($tempString, ".", $a+$b+1, 0);
$tempString = substr_replace($tempString, ".", $a+$b+$c+2, 0);
$tempString = substr_replace($tempString, ".", $a+$b+$c+$d+3, 0);
echo $tempString."</br>";
}
}
}
}
What it does is to make all possible combinatons of a string with several dots.
Example:
t.est123
te.st123
tes.t123
...
test12.3
Then, I add one more dot:
t.e.st123
t.es.t123
...
test1.2.3
Doing the way I'm doing now, I need to create lots and lots of for loops, each for a determined number of dots. I don't know how I can turn that example into a functon or other easier way of doing this.
Your problem is a combination problem. Note: I'm not a math freak, I only researched this information because of interest.
http://en.wikipedia.org/wiki/Combination#Number_of_k-combinations
Also known as n choose k. The Binomial coefficient is a function which gives you the number of combinations.
A function I found here: Calculate value of n choose k
function choose($n, $k) {
if ($k == 0) {return 1;}
return($n * choose($n - 1, $k - 1)) / $k;
}
// 6 positions between characters (test123), 4 dots
echo choose(6, 4); // 15 combinations
To get all combinations you also have to choose between different algorithms.
Good post: https://stackoverflow.com/a/127856/1948627
UPDATE:
I found a site with an algorithm in different programming languages. (But not PHP)
I've converted it to PHP:
function bitprint($u){
$s= [];
for($n= 0;$u > 0;++$n, $u>>= 1) {
if(($u & 1) > 0) $s[] = $n;
}
return $s;
}
function bitcount($u){
for($n= 0;$u > 0;++$n, $u&= ($u - 1));
return $n;
}
function comb($c, $n){
$s= [];
for($u= 0;$u < 1 << $n;$u++) {
if(bitcount($u) == $c) $s[] = bitprint($u);
}
return $s;
}
echo '<pre>';
print_r(comb(4, 6));
It outputs an array with all combinations (positions between the chars).
The next step is to replace the string with the dots:
$string = 'test123';
$sign = '.';
$combs = comb(4, 6);
// get all combinations (Th3lmuu90)
/*
$combs = [];
for($i=0; $i<strlen($string); $i++){
$combs = array_merge($combs, comb($i, strlen($string)-1));
}
*/
foreach ($combs as $comb) {
$a = $string;
for ($i = count($comb) - 1; $i >= 0; $i--) {
$a = substr_replace($a, $sign, $comb[$i] + 1, 0);
}
echo $a.'<br>';
}
// output:
t.e.s.t.123
t.e.s.t1.23
t.e.st.1.23
t.es.t.1.23
te.s.t.1.23
t.e.s.t12.3
t.e.st.12.3
t.es.t.12.3
te.s.t.12.3
t.e.st1.2.3
t.es.t1.2.3
te.s.t1.2.3
t.est.1.2.3
te.st.1.2.3
tes.t.1.2.3
This is quite an unusual question, but I can't help but try to wrap around what you are tying to do. My guess is that you want to see how many combinations of a string there are with a dot moving between characters, finally coming to rest right before the last character.
My understanding is you want a count and a printout of string similar to what you see here:
t.est
te.st
tes.t
t.es.t
te.s.t
t.e.s.t
count: 6
To facilitate this functionality I came up with a class, this way you could port it to other parts of code and it can handle multiple strings. The caveat here is the strings must be at least two characters and not contain a period. Here is the code for the class:
class DotCombos
{
public $combos;
private function combos($string)
{
$rebuilt = "";
$characters = str_split($string);
foreach($characters as $index => $char) {
if($index == 0 || $index == count($characters)) {
continue;
} else if(isset($characters[$index]) && $characters[$index] == ".") {
break;
} else {
$rebuilt = substr($string, 0, $index) . "." . substr($string, $index);
print("$rebuilt\n");
$this->combos++;
}
}
return $rebuilt;
}
public function allCombos($string)
{
if(strlen($string) < 2) {
return null;
}
$this->combos = 0;
for($i = 0; $i < count(str_split($string)) - 1; $i++) {
$string = $this->combos($string);
}
}
}
To make use of the class you would do this:
$combos = new DotCombos();
$combos->allCombos("test123");
print("Count: $combos->combos");
The output would be:
t.est123
te.st123
tes.t123
test.123
test1.23
test12.3
t.est12.3
te.st12.3
tes.t12.3
test.12.3
test1.2.3
t.est1.2.3
te.st1.2.3
tes.t1.2.3
test.1.2.3
t.est.1.2.3
te.st.1.2.3
tes.t.1.2.3
t.es.t.1.2.3
te.s.t.1.2.3
t.e.s.t.1.2.3
Count: 21
Hope that is what you are looking for (or at least helps)....
I have this PHP code which is supposed to increase a URL shortener mask on each new entry.
My problem is that it dosen't append a new char when it hits the last one (z).
(I know incrementing is a safety issue since you can guess earlier entries, but this is not a problem in this instance)
If i add 00, it can figure out 01 and so on... but is there a simple fix to why it won't do it on its own?
(The param is the last entry)
<?php
class shortener
{
public function ShortURL($str = null)
{
if (!is_null($str))
{
for($i = (strlen($str) - 1);$i >= 0;$i--)
{
if($str[$i] != 'Z')
{
$str[$i] = $this->_increase($str[$i]);
#var_dump($str[$i]);
break;
}
else
{
$str[$i] = '0';
if($i == 0)
{
$str = '0'.$str;
}
}
}
return $str;
}
else {
return '0';
}
}
private function _increase($letter)
{
//Lowercase: 97 - 122
//Uppercase: 65 - 90
// 0 - 9 : 48 - 57
$ord = ord($letter);
if($ord == 122)
{
$ord = 65;
}
elseif ($ord == 57)
{
$ord = 97;
}
else
{
$ord++;
}
return chr($ord);
}
}
?>
Effectively, all you are doing is encoding a number into Base62. So if we take the string, decode it into base 10, increment it, and reencode it into Base62, it will be much easier to know what we are doing, and the length of the string will take care of itself.
class shortener
{
public function ShortURL($str = null)
{
if ($str==null) return 0;
$int_val = $this->toBase10($str);
$int_val++;
return $this->toBase62($int_val);
}
public function toBase62($num, $b=62) {
$base='0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
$r = $num % $b ;
$res = $base[$r];
$q = floor($num/$b);
while ($q) {
$r = $q % $b;
$q =floor($q/$b);
$res = $base[$r].$res;
}
return $res;
}
function toBase10( $num, $b=62) {
$base='0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
$limit = strlen($num);
$res=strpos($base,$num[0]);
for($i=1;$i<$limit;$i++) {
$res = $b * $res + strpos($base,$num[$i]);
}
return $res;
}
}
I set out to make a small project around a bounch of classes that return generators (php 5.5).
The main motivation for the small project was to expand on my TDD journey, fiddle with generators and have a package I could throw on packagist for later use.
The current state of the whole "project" can be found at Github
All tests are green, the methods does what I want. Now I want to refactor as I there is lots of dublication.
/**
* Returns a Generator with a even range.
*
* getEven(10); // 10,12,14,16,18,20,22 ...
* getEven(null, 10); // 10,8,6,4,2,0,-2,-4 ...
* getEven(10, null, 2); // 10,6,2, -2 ...
* getEven(10,20); // 10,12,14,16,18,20
* getEven(20,10); // 20,18,16,14,12,10
* getEven(10,20,2); // 10,14,18
*
* #param int|null $start
* #param int|null $end
* #param int $step
* #throws InvalidArgumentException|LogicException
* #return Generator
*/
public function getEven( $start = null, $end = null, $step = 1 )
{
// Throws LogicException
$this->throwExceptionIfAllNulls( [$start, $end] );
$this->throwExceptionIfInvalidStep($step);
// Throws InvalidArgumentException
$this->throwExceptionIfNotNullOrInt( [$start, $end] );
// infinite increase range
if(is_int($start) && is_null($end))
{
// throw LogicException
$this->throwExceptionIfOdd($start);
$Generator = function() use ($start, $step)
{
for($i = $start; true; $i += $step * 2)
{
yield $i;
}
};
}
// infinite decrease range
elseif(is_int($end) && is_null($start))
{
// throws LogicException
$this->throwExceptionIfUneven($end);
$Generator = function() use ($end, $step)
{
for($i = $end; true; $i -= $step * 2)
{
yield $i;
}
};
}
// predetermined range
else
{
// throws LogicException
$this->throwExceptionIfUneven($start);
$this->throwExceptionIfUneven($end);
// decrease
if($start >= $end)
{
$Generator = function() use ($start, $end, $step)
{
for($i = $start; $i >= $end; $i -= $step * 2)
{
yield $i;
}
};
}
// increase
else
{
$Generator = function() use ($start, $end, $step)
{
for($i = $start; $i <= $end; $i += $step * 2)
{
yield $i;
}
};
}
}
return $Generator();
}
The class also has a method named getOdd (and yes it looks alot like it ;) )
The main dublication is the closures $Generator = function() ... and the difference is mostly operators such as + - * / and arguments in the for loop. This is mainly the same in the rest of th class.
I read Dynamic Comparison Operators in PHP and come to the conclusion that there is no native method like compare(...)
Should I make a private/protected method for comparison. If so should I make a new class/function for this? I do not think it belongs in the current class.
Is it something else I am missing, I am unsure on how to DRY this up, in a proper way?
Btw. iknow a getEven, getOdd is kinda silly when i got a getRange With step function, but it is a more general refactoring / pattern question.
Update
#github the getEven and getOdd are now removed...
The code below has not been tested or verified to work, but I have faith in it and at least it shows one possible way of removing the multiple generator functions.
As you state yourself, the duplication you are trying to remove is mainly in the generator function. If you look into this you can see that every generator function you have can be written as this:
function createGenerator($index, $limit, $step) {
return function() use($index, $limit, $step) {
$incrementing = $step > 0;
for ($i = $index; true; $i += 2 * $step) {
if (($incrementing && $i <= $limit) || (!$incrementing && $i >= $limit)) {
yield $i;
}else {
break;
}
}
};
}
In order to utilize this you need to do some magic with the input arguments and it helps (at least makes it pretty) to define some constants. PHP allready got a PHP_INT_MAX constant holding the greatest value possible for an integer, however it does not got a PHP_INT_MIN. So I would define that as a constant of its own.
define('PHP_INT_MIN', ~PHP_INT_MAX);
Now lets take a look at the four cases in your function.
1) Infinite increase range
Infinte is a rather bold claim here, if we change it to "greatest value possible given the constraints of an int" we get a finite range from $index to PHP_INT_MAX, hence by setting $limit = PHP_INT_MAX; the above mentioned generator function will still be the same.
//$limit = PHP_INT_MAX;
createGenerator($index, PHP_INT_MAX, $step);
2) Infinite decrease range
The same argument as above can again be used here, but with a negativ $step and swapping $index and $limit;
//$index = $limit;
//$limit = PHP_INT_MIN;
//$step *= -1;
createGenerator($limit, PHP_INT_MIN, -1 * $step);
3) Predetermined decreasing range
Swap and negate once again.
//$temp = $index;
//$index = $limit;
//$limit = $temp;
//$step *= -1;
createGenerator($limit, $index, -1 * $step);
4) Predetermined increasing range
Well this is just the default case, where all arguments are given. And nothing needs to change.
createGenerator($index, $limit, $step);
The revised code
public function getEven($index = null, $limit = null, $step = 1) {
// Throws LogicException
$this->throwExceptionIfAllNulls([$index, $limit]);
$this->throwExceptionIfInvalidStep($step);
// Throws InvalidArgumentException
$this->throwExceptionIfNotNullOrInt([$index, $limit]);
//Generator function
function createGenerator($index, $limit, $step) {
return function() use($index, $limit, $step) {
$incrementing = $step > 0;
for ($i = $index; true; $i += 2 * $step) {
if (($incrementing && $i <= $limit) || (!$incrementing && $i >= $limit)) {
yield $i;
}else {
break;
}
}
};
}
// infinite increase range
if (is_int($index) && is_null($limit)) {
// throw LogicException
$this->throwExceptionIfodd($index);
return createGenerator($index, PHP_INT_MAX, $step);
}
// infinite decrease range
elseif (is_int($limit) && is_null($index)) {
// throws LogicException
$this->throwExceptionIfodd($limit);
return createGenerator($limit, PHP_INT_MIN, -1*$step);
}
// predetermined range
else {
// throws LogicException
$this->throwExceptionIfodd($index);
$this->throwExceptionIfodd($limit);
// decrease
if ($index >= $limit) {
return createGenerator($limit, $index, -1 * $step);
}
return createGenerator($index, $limit, $step);
}
}
i programmed this php function that takes any text/html string and trims it.
For example:
gen_string("Hello, how are you today?",10);
Returns:
Hello, how...
The problem arises when the function string limit is the same as the position of a special character such as: á, ñ, etc...
In which case:
gen_string("Helló my friend",5);
Returns: Hell�...
Any ideas on how to solve this issue? This is the current function:
# string: advanced substr
function gen_string($string,$min,$clean=false) {
$text = trim(strip_tags($string));
if(strlen($text)>$min) {
$blank = strpos($text,' ');
if($blank) {
# limit plus last word
$extra = strpos(substr($text,$min),' ');
$max = $min+$extra;
$r = substr($text,0,$max);
if(strlen($text)>=$max && !$clean) $r=trim($r,'.').'...';
} else {
# if there are no spaces
$r = substr($text,0,$min).'...';
}
} else {
# if original length is lower than limit
$r = $text;
}
return trim($r);
}
Thanks!
You should use the multibyte string functions to correctly handle unicode characters.
For example you could try using mb_strimwidth to truncate a string to a specified length.
You could also take a different approach and make use of the PCRE regex extension's UTF-8 capabilities (assuming your strings are UTF-8!).
function gen_string($string, $length)
{
$str = trim(strip_tags($string));
$strlen = strlen(utf8_decode($str));
// String is less than limit
if ($strlen <= $length) return $str;
// Shorten string, preserving whole "words" (non-whitespace)
preg_match('/^.{'.($length-1).'}\S*/su', $str, $match);
// Append ellipsis if needed (bytes length is OK to check)
if (strlen($match[0]) !== strlen($str)) $match[0] .= '...';
return $match[0];
}
Aside from the multibyte issue, maybe you can write it shorter
function gen_string($str, $limit) {
if ($str >= strlen($limit))
return $str;
$offset = -(strlen($str) - $limit);
return substr($str, 0, strrpos($str, ' ', $offset)).'...';
}
It will limit the length of the string, so rather than cut it after the first word beyond the limit, it ensures that the length is never larger than the limit.
strlen() cannot be used for UTF-8 string, because it would count also the continuation characters, which should not be counted.
You can try with the following code:
define('PREG_CLASS_UNICODE_WORD_BOUNDARY',
'\x{0}-\x{2F}\x{3A}-\x{40}\x{5B}-\x{60}\x{7B}-\x{A9}\x{AB}-\x{B1}\x{B4}' .
'\x{B6}-\x{B8}\x{BB}\x{BF}\x{D7}\x{F7}\x{2C2}-\x{2C5}\x{2D2}-\x{2DF}' .
'\x{2E5}-\x{2EB}\x{2ED}\x{2EF}-\x{2FF}\x{375}\x{37E}-\x{385}\x{387}\x{3F6}' .
'\x{482}\x{55A}-\x{55F}\x{589}-\x{58A}\x{5BE}\x{5C0}\x{5C3}\x{5C6}' .
'\x{5F3}-\x{60F}\x{61B}-\x{61F}\x{66A}-\x{66D}\x{6D4}\x{6DD}\x{6E9}' .
'\x{6FD}-\x{6FE}\x{700}-\x{70F}\x{7F6}-\x{7F9}\x{830}-\x{83E}' .
'\x{964}-\x{965}\x{970}\x{9F2}-\x{9F3}\x{9FA}-\x{9FB}\x{AF1}\x{B70}' .
'\x{BF3}-\x{BFA}\x{C7F}\x{CF1}-\x{CF2}\x{D79}\x{DF4}\x{E3F}\x{E4F}' .
'\x{E5A}-\x{E5B}\x{F01}-\x{F17}\x{F1A}-\x{F1F}\x{F34}\x{F36}\x{F38}' .
'\x{F3A}-\x{F3D}\x{F85}\x{FBE}-\x{FC5}\x{FC7}-\x{FD8}\x{104A}-\x{104F}' .
'\x{109E}-\x{109F}\x{10FB}\x{1360}-\x{1368}\x{1390}-\x{1399}\x{1400}' .
'\x{166D}-\x{166E}\x{1680}\x{169B}-\x{169C}\x{16EB}-\x{16ED}' .
'\x{1735}-\x{1736}\x{17B4}-\x{17B5}\x{17D4}-\x{17D6}\x{17D8}-\x{17DB}' .
'\x{1800}-\x{180A}\x{180E}\x{1940}-\x{1945}\x{19DE}-\x{19FF}' .
'\x{1A1E}-\x{1A1F}\x{1AA0}-\x{1AA6}\x{1AA8}-\x{1AAD}\x{1B5A}-\x{1B6A}' .
'\x{1B74}-\x{1B7C}\x{1C3B}-\x{1C3F}\x{1C7E}-\x{1C7F}\x{1CD3}\x{1FBD}' .
'\x{1FBF}-\x{1FC1}\x{1FCD}-\x{1FCF}\x{1FDD}-\x{1FDF}\x{1FED}-\x{1FEF}' .
'\x{1FFD}-\x{206F}\x{207A}-\x{207E}\x{208A}-\x{208E}\x{20A0}-\x{20B8}' .
'\x{2100}-\x{2101}\x{2103}-\x{2106}\x{2108}-\x{2109}\x{2114}' .
'\x{2116}-\x{2118}\x{211E}-\x{2123}\x{2125}\x{2127}\x{2129}\x{212E}' .
'\x{213A}-\x{213B}\x{2140}-\x{2144}\x{214A}-\x{214D}\x{214F}' .
'\x{2190}-\x{244A}\x{249C}-\x{24E9}\x{2500}-\x{2775}\x{2794}-\x{2B59}' .
'\x{2CE5}-\x{2CEA}\x{2CF9}-\x{2CFC}\x{2CFE}-\x{2CFF}\x{2E00}-\x{2E2E}' .
'\x{2E30}-\x{3004}\x{3008}-\x{3020}\x{3030}\x{3036}-\x{3037}' .
'\x{303D}-\x{303F}\x{309B}-\x{309C}\x{30A0}\x{30FB}\x{3190}-\x{3191}' .
'\x{3196}-\x{319F}\x{31C0}-\x{31E3}\x{3200}-\x{321E}\x{322A}-\x{3250}' .
'\x{3260}-\x{327F}\x{328A}-\x{32B0}\x{32C0}-\x{33FF}\x{4DC0}-\x{4DFF}' .
'\x{A490}-\x{A4C6}\x{A4FE}-\x{A4FF}\x{A60D}-\x{A60F}\x{A673}\x{A67E}' .
'\x{A6F2}-\x{A716}\x{A720}-\x{A721}\x{A789}-\x{A78A}\x{A828}-\x{A82B}' .
'\x{A836}-\x{A839}\x{A874}-\x{A877}\x{A8CE}-\x{A8CF}\x{A8F8}-\x{A8FA}' .
'\x{A92E}-\x{A92F}\x{A95F}\x{A9C1}-\x{A9CD}\x{A9DE}-\x{A9DF}' .
'\x{AA5C}-\x{AA5F}\x{AA77}-\x{AA79}\x{AADE}-\x{AADF}\x{ABEB}' .
'\x{D800}-\x{F8FF}\x{FB29}\x{FD3E}-\x{FD3F}\x{FDFC}-\x{FDFD}' .
'\x{FE10}-\x{FE19}\x{FE30}-\x{FE6B}\x{FEFF}-\x{FF0F}\x{FF1A}-\x{FF20}' .
'\x{FF3B}-\x{FF40}\x{FF5B}-\x{FF65}\x{FFE0}-\x{FFFD}');
function utf8_strlen($text) {
if (function_exists('mb_strlen')) {
return mb_strlen($text);
}
// Do not count UTF-8 continuation bytes.
return strlen(preg_replace("/[\x80-\xBF]/", '', $text));
}
function utf8_truncate($string, $max_length, $wordsafe = FALSE, $add_ellipsis = FALSE, $min_wordsafe_length = 1) {
$ellipsis = '';
$max_length = max($max_length, 0);
$min_wordsafe_length = max($min_wordsafe_length, 0);
if (utf8_strlen($string) <= $max_length) {
// No truncation needed, so don't add ellipsis, just return.
return $string;
}
if ($add_ellipsis) {
// Truncate ellipsis in case $max_length is small.
$ellipsis = utf8_substr('...', 0, $max_length);
$max_length -= utf8_strlen($ellipsis);
$max_length = max($max_length, 0);
}
if ($max_length <= $min_wordsafe_length) {
// Do not attempt word-safe if lengths are bad.
$wordsafe = FALSE;
}
if ($wordsafe) {
$matches = array();
// Find the last word boundary, if there is one within $min_wordsafe_length
// to $max_length characters. preg_match() is always greedy, so it will
// find the longest string possible.
$found = preg_match('/^(.{' . $min_wordsafe_length . ',' . $max_length . '})[' . PREG_CLASS_UNICODE_WORD_BOUNDARY . ']/u', $string, $matches);
if ($found) {
$string = $matches[1];
}
else {
$string = utf8_substr($string, 0, $max_length);
}
}
else {
$string = utf8_substr($string, 0, $max_length);
}
if ($add_ellipsis) {
$string .= $ellipsis;
}
return $string;
}
function utf8_substr($text, $start, $length = NULL) {
if (function_exists('mb_substr')) {
return $length === NULL ? mb_substr($text, $start) : mb_substr($text, $start, $length);
}
else {
$strlen = strlen($text);
// Find the starting byte offset.
$bytes = 0;
if ($start > 0) {
// Count all the continuation bytes from the start until we have found
// $start characters or the end of the string.
$bytes = -1;
$chars = -1;
while ($bytes < $strlen - 1 && $chars < $start) {
$bytes++;
$c = ord($text[$bytes]);
if ($c < 0x80 || $c >= 0xC0) {
$chars++;
}
}
}
elseif ($start < 0) {
// Count all the continuation bytes from the end until we have found
// abs($start) characters.
$start = abs($start);
$bytes = $strlen;
$chars = 0;
while ($bytes > 0 && $chars < $start) {
$bytes--;
$c = ord($text[$bytes]);
if ($c < 0x80 || $c >= 0xC0) {
$chars++;
}
}
}
$istart = $bytes;
// Find the ending byte offset.
if ($length === NULL) {
$iend = $strlen;
}
elseif ($length > 0) {
// Count all the continuation bytes from the starting index until we have
// found $length characters or reached the end of the string, then
// backtrace one byte.
$iend = $istart - 1;
$chars = -1;
$last_real = FALSE;
while ($iend < $strlen - 1 && $chars < $length) {
$iend++;
$c = ord($text[$iend]);
$last_real = FALSE;
if ($c < 0x80 || $c >= 0xC0) {
$chars++;
$last_real = TRUE;
}
}
// Backtrace one byte if the last character we found was a real character
// and we don't need it.
if ($last_real && $chars >= $length) {
$iend--;
}
}
elseif ($length < 0) {
// Count all the continuation bytes from the end until we have found
// abs($start) characters, then backtrace one byte.
$length = abs($length);
$iend = $strlen;
$chars = 0;
while ($iend > 0 && $chars < $length) {
$iend--;
$c = ord($text[$iend]);
if ($c < 0x80 || $c >= 0xC0) {
$chars++;
}
}
// Backtrace one byte if we are not at the beginning of the string.
if ($iend > 0) {
$iend--;
}
}
else {
// $length == 0, return an empty string.
return '';
}
return substr($text, $istart, max(0, $iend - $istart + 1));
}
}
For your return statement you could try:
return htmlspecialchars(trim($r));
EDIT: I tried your code as you provided it and it ran fine for me without having to use htmlspecialchars(). This is probably due to the face that in the <head> of the page the code was running on, the charset was set to UTF-8. So your options could be to set the encoding of the page like this:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
or to use htmlspecialchars() as above.