Similar to: How to read only 5 last line of the text file in PHP?
I have a large log file and I want to be able to show 100 lines from position X in the file.
I need to use fseek rather than file() because the log file is too large.
I have a similar function but it will only read from the end of the file. How can it be modified so that a start position can be specified as well? I would also need to start at the end of the file.
function read_line($filename, $lines, $revers = false)
{
$offset = -1;
$i = 0;
$fp = #fopen($filename, "r");
while( $lines && fseek($fp, $offset, SEEK_END) >= 0 ) {
$c = fgetc($fp);
if($c == "\n" || $c == "\r"){
$lines--;
if($revers){
$read[$i] = strrev($read[$i]);
$i++;
}
}
if($revers) $read[$i] .= $c;
else $read .= $c;
$offset--;
}
fclose ($fp);
if($revers){
if($read[$i] == "\n" || $read[$i] == "\r")
array_pop($read);
else $read[$i] = strrev($read[$i]);
return implode('',$read);
}
return strrev(rtrim($read,"\n\r"));
}
What I'm trying to do is create a web based log viewer that will start from the end of the file and display 100 lines, and when pressing the "Next" button, the next 100 lines preceding it will be shown.
If you're on Unix, you can utilize the sed tool. For example: to get line 10-20 from a file:
sed -n 10,20p errors.log
And you can do this in your script:
<?php
$page = 1;
$limit = 100;
$off = ($page * $limit) - ($limit - 1);
exec("sed -n $off,".($limit+$off-1)."p errors.log", $out);
print_r($out);
The lines are available in $out array.
This uses fseek to read 100 lines of a file starting from a specified offset. If the offset is greater than the number of lines in the log, the first 100 lines are read.
In your application, you could pass the current offset through the query string for prev and next and base the next offset on that. You could also store and pass the current file position for more efficiency.
<?php
$GLOBALS["interval"] = 100;
read_log();
function read_log()
{
$fp = fopen("log", "r");
$offset = determine_offset();
$interval = $GLOBALS["interval"];
if (seek_to_offset($fp, $offset) != -1)
{
show_next_button($offset, $interval);
}
$lines = array();
for ($ii = 0; $ii < $interval; $ii++)
{
$lines[] = trim(fgets($fp));
}
echo "<pre>";
print_r(array_reverse($lines));
}
// Get the offset from the query string or default to the interval
function determine_offset()
{
$interval = $GLOBALS["interval"];
if (isset($_GET["offset"]))
{
return intval($_GET["offset"]) + $interval;
}
return $interval;
}
function show_next_button($offset, $interval)
{
$next_offset = $offset + $interval;
echo "Next";
}
// Seek to the end of the file, then seek backward $offset lines
function seek_to_offset($fp, $offset)
{
fseek($fp, 0, SEEK_END);
for ($ii = 0; $ii < $offset; $ii++)
{
if (seek_to_previous_line($fp) == -1)
{
rewind($fp);
return -1;
}
}
}
// Seek backward by char until line break
function seek_to_previous_line($fp)
{
fseek($fp, -2, SEEK_CUR);
while (fgetc($fp) != "\n")
{
if (fseek($fp, -2, SEEK_CUR) == -1)
{
return -1;
}
}
}
Is "position X" measured in lines or bytes? If lines, you can easily use SplFileObject to seek to a certain line and then read 100 lines:
$file = new SplFileObject('log.txt');
$file->seek(199); // go to line 200
for($i = 0; $i < 100 and $file->valid(); $i++, $file->next())
{
echo $file->current();
}
If position X is measured in bytes, isn't it a simple matter of changing your initial $offset = -1 to a different value?
I would do it as followed:
function readFileFunc($tempFile){
if(#!file_exists($tempFile)){
return FALSE;
}else{
return file($tempFile);
}
}
$textArray = readFileFunc('./data/yourTextfile.txt');
$slicePos = count($textArray)-101;
if($slicePos < 0){
$slicePos = 0;
}
$last100 = array_slice($textArray, $slicePos);
$last100 = implode('<br />', $last100);
echo $last100;
Related
I created this function to converting numbers to words. And how I can convert words to number using this my function:
Simple function code:
$array = array("1"=>"ЯК","2"=>"ДУ","3"=>"СЕ","4"=>"ЧОР","5"=>"ПАНҶ","6"=>"ШАШ","7"=>"ҲАФТ","8"=>"ХАШТ","9"=>"НӮҲ","0"=>"НОЛ","10"=>"ДАҲ","20"=>"БИСТ","30"=>"СИ","40"=>"ЧИЛ","50"=>"ПАНҶОҲ","60"=>"ШАСТ","70"=>"ҲАФТОД","80"=>"ХАШТОД","90"=>"НАВАД","100"=>"САД");
$n = "98"; // Input number to converting
if($n < 10 && $n > -1){
echo $array[$n];
}
if($n == 10 OR $n == 20 OR $n == 30 OR $n == 40 OR $n == 50 OR $n == 60 OR $n == 70 OR $n == 80 OR $n == 90 OR $n == 100){
echo $array[$n];
}
if(mb_strlen($n) == 2 && $n[1] != 0)
{
$d = $n[0]."0";
echo "$array[$d]У ".$array[$n[1]];
}
My function so far converts the number to one hundred. How can I now convert text to a number using the answer of my function?
So, as #WillParky93 assumed, your input has spaces between words.
<?php
mb_internal_encoding("UTF-8");//For testing purposes
$array = array("1"=>"ЯК","2"=>"ДУ","3"=>"СЕ","4"=>"ЧОР","5"=>"ПАНҶ","6"=>"ШАШ","7"=>"ҲАФТ","8"=>"ХАШТ","9"=>"НӮҲ","0"=>"НОЛ","10"=>"ДАҲ","20"=>"БИСТ","30"=>"СИ","40"=>"ЧИЛ","50"=>"ПАНҶОҲ","60"=>"ШАСТ","70"=>"ҲАФТОД","80"=>"ХАШТОД","90"=>"НАВАД","100"=>"САД");
$postfixes = array("3" => "ВУ");
$n = "98"; // Input number to converting
$res = "";
//I also optimized your conversion of numbers to words
if($n > 0 && ($n < 10 || $n%10 == 0))
{
$res = $array[$n];
}
if($n > 10 && $n < 100 && $n%10 != 0)
{
$d = intval(($n/10));
$sd = $n%10;
$ending = isset($postfixes[$d]) ? $postfixes[$d] : "У";
$res = ($array[$d * 10]).$ending." ".$array[$sd];
}
echo $res;
echo "\n<br/>";
$splitted = explode(" ", $res);
//According to your example, you use only numerals that less than 100
//So, to simplify your task(btw, according to Google, the language is tajik
//and I don't know the rules of building numerals in this language)
if(sizeof($splitted) == 1) {
echo array_search($splitted[0], $array);
}
else if(sizeof($splitted) == 2) {
$first = $splitted[0];
$first_length = mb_strlen($first);
if(mb_substr($first, $first_length - 2) == "ВУ")
{
$first = mb_substr($first, 0, $first_length - 2);
}
else
{
$first = mb_substr($splitted[0], 0, $first_length - 1);
}
$second = $splitted[1];
echo (array_search($first, $array) + array_search($second, $array));
}
You didn't specify the input specs but I took the assumption you want it with a space between the words.
//get our input=>"522"
$input = "ПАНҶ САД БИСТ ДУ";
//split it up
$split = explode(" ", $input);
//start out output
$c = 0;
//set history
$history = "";
//loop the words
foreach($split as &$s){
$res = search($s);
//If number is 9 or less, we are going to check if it's with a number
//bigger than or equal to 100, if it is. We multiply them together
//else, we just add them.
if((($res = search($s)) <=9) ){
//get the next number in the array
$next = next($split);
//if the number is >100. set $nextres
if( ($nextres = search($next)) >= 100){
//I.E. $c = 5 * 100 = 500
$c = $nextres * $res;
//set the history so we skip over it next run
$history = $next;
}else{
//Single digit on its own
$c += $res;
}
}elseif($s != $history){
$c += $res;
}
}
//output the result
echo $c;
function search($s){
global $array;
if(!$res = array_search($s, $array)){
//grab the string length
$max = strlen($s);
//remove one character at a time until we find a match
for($i=0;$i<$max; $i++ ){
if($res = array_search(mb_substr($s, 0, -$i),$array)){
//stop the loop
$i = $max;
}
}
}
return $res;
}
Output is 522.
Im trying to make a script to get information in a file. The data in the file has this format:
#index|info1|info2|info3|info4|#index2|info21|info22|info23|info24
So I have to make a binary search direct on the file, look for the index and get all its information. For example:
$index = "index2";
Output: index2 info21 info22 info24
I tried:
$file = "file.txt";
// open the file for reading
$myfile = fopen($file, "r") or die("File cannot be openned");
$indexToSearch = "9780857293039";
$begging = 0;
$end = filesize($file) / sizeof($indexToSearch) - 1;
while($begging <= $end) {
$middle = ($end + $begging) / 2; );
$line = fread($myfile, $middle);
if(strcmp($line,$indexToSearch) == 0) {
echo "found";
break;
} else
if(strcmp($line,$indexToSearch) > 0) {
$end = $middle - 1;
} else {
$begging = $middle + 1;
}
}
fseek($myfile, $middle);
echo "<br><br>FINAL: ".fgets($myfile);
I want to write a code to remove extra delimiters for each row for a .csv file. With another code i have already determined that the .csv file only contains rows with too many delmiters. I further know which column (after nth delimiter) has extra delimiters. I have already written most of the code, but it's not working yet. Help is very much appreciated.
My PHP skills are still basic.
<?php
$delimiter = ';'; //type delimiter
$delimiter_start_column =23; //column-to-be-cleaned starts after this delimiter
$exp_delimiter =63; //expected delimiters per row
$total_delimiter =substr_count($line,$delimiter); //total delimiters in row
$delimiter_end_column =($exp_delimiter - $delimiter_start_column) + ($total_delimiter - $exp_delimiter); //column-to-be-cleaned ends before this delimiter
function splitleft($line,$delimiter,$delimiter_start_column){
$max = strlen($line);
$n = 0;
for($i=0;$i<$max;$i++){
if($line[$i]==$delimiter){
$n++;
if($n>=$delimiter_start_column){
break;
}
}
}
$arr[] = substr($line,0,$i);
return $arr;
}
function splitright($line,$delimiter,$delimiter_end_column){
$max = strlen($line);
$n = 0;
for($i=0;$i<$max;$i++){
if($line[$i]==$delimiter){
$n++;
if($n>=$delimiter_end_column){
break;
}
}
}
$arr[] = substr($line,$i,$max);
return $arr;
}
// determine start time in microseconds for runtime calculation
$file['datestamp'] = date("Y-m-d_H-i-s", $start);
$input['folder'] = 'input\\';
$input['file'] = ''; //enter filename
$output['folder'] = 'output\\';
$output['file_cleaned'] = $file['datestamp'].'_cleaned_';
// open input file read-only
$handle['input'] = #fopen($input['folder'].$input['file'], "r");
// initialize line, clean and dirty counters
$counter['total'] = 0;
$counter['cleaned'] = 0;
if($handle['input']) {
// open output files. set point to end of file.
$handle['cleaned'] = #fopen($output['folder'].$output['file_cleaned'].$input['file'], "a");
while(($line = fgets($handle['input'])) !== false) {
// increment line counter
$counter['total']++;
$result = substr_count($line, $delimiter);
if($result == $exp_delimiters AND $counter['line'] != 1 AND $line != $header) {
// if the number of delimiters matches the expected number as represented by $exp_delimiters
// increment clean lines counter
$counter['cleaned']++;
$output_file = $handle['cleaned'];
}
else {
// else, if the number of delimiters does not match the expectation
// remove extra delmiters from column
$line_cleaned = splitleft + str_replace(";","",substr($line,strlen(splitleft()),(strlen($line)-strlen(splitleft())-strlen(splitright()))) + splitright());
$output_file = $handle['cleaned'];
}
// prefix line number
$line = $counter['total'].$delimiter.$line;
// write line to correct output file
fwrite($output_file, $line_cleaned);
// output progress every 20.000 processed lines
if($counter['total'] % 20000 == 0) {
echo number_format($counter['total'], 0, ',', '.')."\r\n";
}
}
if(!feof($handle['input'])){
echo "Error: unexpected fgets() fail\n";
}
// close all input and output files
foreach($handle AS $close) {
fclose($close);
}
}
?>
The code below sends a msg when a value, obtained from a not so stable API, is within a certain range.
This code causes the load average, but not the CPU usage, to go up, likely due to high I/O wait.
CentOS 6.5
Kernel 2.6.32-431.11.2.el6.x86_64
Apache 2.2.15
PHP 5.3.3 (mod_fcgid/2.3.7)
> cat /sys/block/sda/queue/scheduler
noop anticipatory deadline [cfq]
The code had the same problematic effect on both dedicated hardware and on the cloud (both cases as a VM on KVM/Virtio).
What could be done to keep this code from *causing the processor to wait on instruction completion before processing new instructions**? I understand that lowering the timeout doesn't really solve the problem, only diminishes its impact.
*this is my understanding why this code causes load average to go up.
<?php
$min = '1';
$last_min = '1';
if (!empty($_GET['min'])) {
$min = $_GET['min'];
} else {
$min = false;
}
$ctx=stream_context_create(array('http'=>
array(
'timeout' => 10 // seconds timeout
)
));
$json = file_get_contents('https://www.domain.com/api/ticker/',false,$ctx);
if (!empty($json )) {
echo($json);
if (#file_get_contents('log.txt')) {
if (quote_changed($json)) {
file_put_contents('log.txt', $json, FILE_APPEND);
}
} else {
file_put_contents('log.txt', $json);
}
$obj = json_decode($json,true);
$last = $obj['ticker']['last'];
if (is_numeric($last)) {
$last = (int)$last;
$last_min = #file_get_contents('last_min.txt');
$notified = #file_get_contents('notified.txt');
if ($notified === false) {
$notified = 'false';
#echo "no notify file\n";
}
if (($last_min === false) || (($min) && ($last_min <> $min))) {
$last_min = 1;
$notified = 'false';
file_put_contents('last_min.txt', $min);
#echo "no min file or diff min\n";
}
#echo ('notified='.$notified.'\n');
if (($last >= $min) && ($notified=='false')) {
#$url = ('http://otherdomain.com/nexmo/sendmsg.php' . '?name=blah' . $last);
#file_get_contents($url);
#switch to SMS when going abroad and plugin new number when available
mail("8885551212#mail.net","Blah at".$last,"","From: gaia#domain.com\n");
file_put_contents('notified.txt', 'true');
#echo "msg sent\n";
} elseif (($last < $min) && ($notified=='true')) {
file_put_contents('notified.txt', 'false');
#echo "not sent\n";
}
}
}
function quote_changed($current) {
$previous = tailCustom('log.txt');
#echo ('previous='.$previous);
if ($previous === (trim($current))) {
return 0;
} else {
return 1;
}
}
function tailCustom($filepath, $lines = 1, $adaptive = true) {
// Open file
$f = #fopen($filepath, "rb");
if ($f === false) return false;
// Sets buffer size
if (!$adaptive) $buffer = 4096;
else $buffer = ($lines < 2 ? 64 : ($lines < 10 ? 512 : 4096));
// Jump to last character
fseek($f, -1, SEEK_END);
// Read it and adjust line number if necessary
// (Otherwise the result would be wrong if file doesn't end with a blank line)
if (fread($f, 1) != "\n") $lines -= 1;
// Start reading
$output = '';
$chunk = '';
// While we would like more
while (ftell($f) > 0 && $lines >= 0) {
// Figure out how far back we should jump
$seek = min(ftell($f), $buffer);
// Do the jump (backwards, relative to where we are)
fseek($f, -$seek, SEEK_CUR);
// Read a chunk and prepend it to our output
$output = ($chunk = fread($f, $seek)) . $output;
// Jump back to where we started reading
fseek($f, -mb_strlen($chunk, '8bit'), SEEK_CUR);
// Decrease our line counter
$lines -= substr_count($chunk, "\n");
}
// While we have too many lines
// (Because of buffer size we might have read too many)
while ($lines++ < 0) {
// Find first newline and remove all text before that
$output = substr($output, strpos($output, "\n") + 1);
}
// Close file and return
fclose($f);
return trim($output);
}
?>
Move the data into a database or if that's not an option, cache the unstable file into your server locally so you don't have to hit the external provider every time.
If neither of those are an option, it seems to me that the people providing the API need to improve their performance, you can verify this by just benchmarking how many milliseconds each hit takes through Firebug.
i programmed this php function that takes any text/html string and trims it.
For example:
gen_string("Hello, how are you today?",10);
Returns:
Hello, how...
The problem arises when the function string limit is the same as the position of a special character such as: á, ñ, etc...
In which case:
gen_string("Helló my friend",5);
Returns: Hell�...
Any ideas on how to solve this issue? This is the current function:
# string: advanced substr
function gen_string($string,$min,$clean=false) {
$text = trim(strip_tags($string));
if(strlen($text)>$min) {
$blank = strpos($text,' ');
if($blank) {
# limit plus last word
$extra = strpos(substr($text,$min),' ');
$max = $min+$extra;
$r = substr($text,0,$max);
if(strlen($text)>=$max && !$clean) $r=trim($r,'.').'...';
} else {
# if there are no spaces
$r = substr($text,0,$min).'...';
}
} else {
# if original length is lower than limit
$r = $text;
}
return trim($r);
}
Thanks!
You should use the multibyte string functions to correctly handle unicode characters.
For example you could try using mb_strimwidth to truncate a string to a specified length.
You could also take a different approach and make use of the PCRE regex extension's UTF-8 capabilities (assuming your strings are UTF-8!).
function gen_string($string, $length)
{
$str = trim(strip_tags($string));
$strlen = strlen(utf8_decode($str));
// String is less than limit
if ($strlen <= $length) return $str;
// Shorten string, preserving whole "words" (non-whitespace)
preg_match('/^.{'.($length-1).'}\S*/su', $str, $match);
// Append ellipsis if needed (bytes length is OK to check)
if (strlen($match[0]) !== strlen($str)) $match[0] .= '...';
return $match[0];
}
Aside from the multibyte issue, maybe you can write it shorter
function gen_string($str, $limit) {
if ($str >= strlen($limit))
return $str;
$offset = -(strlen($str) - $limit);
return substr($str, 0, strrpos($str, ' ', $offset)).'...';
}
It will limit the length of the string, so rather than cut it after the first word beyond the limit, it ensures that the length is never larger than the limit.
strlen() cannot be used for UTF-8 string, because it would count also the continuation characters, which should not be counted.
You can try with the following code:
define('PREG_CLASS_UNICODE_WORD_BOUNDARY',
'\x{0}-\x{2F}\x{3A}-\x{40}\x{5B}-\x{60}\x{7B}-\x{A9}\x{AB}-\x{B1}\x{B4}' .
'\x{B6}-\x{B8}\x{BB}\x{BF}\x{D7}\x{F7}\x{2C2}-\x{2C5}\x{2D2}-\x{2DF}' .
'\x{2E5}-\x{2EB}\x{2ED}\x{2EF}-\x{2FF}\x{375}\x{37E}-\x{385}\x{387}\x{3F6}' .
'\x{482}\x{55A}-\x{55F}\x{589}-\x{58A}\x{5BE}\x{5C0}\x{5C3}\x{5C6}' .
'\x{5F3}-\x{60F}\x{61B}-\x{61F}\x{66A}-\x{66D}\x{6D4}\x{6DD}\x{6E9}' .
'\x{6FD}-\x{6FE}\x{700}-\x{70F}\x{7F6}-\x{7F9}\x{830}-\x{83E}' .
'\x{964}-\x{965}\x{970}\x{9F2}-\x{9F3}\x{9FA}-\x{9FB}\x{AF1}\x{B70}' .
'\x{BF3}-\x{BFA}\x{C7F}\x{CF1}-\x{CF2}\x{D79}\x{DF4}\x{E3F}\x{E4F}' .
'\x{E5A}-\x{E5B}\x{F01}-\x{F17}\x{F1A}-\x{F1F}\x{F34}\x{F36}\x{F38}' .
'\x{F3A}-\x{F3D}\x{F85}\x{FBE}-\x{FC5}\x{FC7}-\x{FD8}\x{104A}-\x{104F}' .
'\x{109E}-\x{109F}\x{10FB}\x{1360}-\x{1368}\x{1390}-\x{1399}\x{1400}' .
'\x{166D}-\x{166E}\x{1680}\x{169B}-\x{169C}\x{16EB}-\x{16ED}' .
'\x{1735}-\x{1736}\x{17B4}-\x{17B5}\x{17D4}-\x{17D6}\x{17D8}-\x{17DB}' .
'\x{1800}-\x{180A}\x{180E}\x{1940}-\x{1945}\x{19DE}-\x{19FF}' .
'\x{1A1E}-\x{1A1F}\x{1AA0}-\x{1AA6}\x{1AA8}-\x{1AAD}\x{1B5A}-\x{1B6A}' .
'\x{1B74}-\x{1B7C}\x{1C3B}-\x{1C3F}\x{1C7E}-\x{1C7F}\x{1CD3}\x{1FBD}' .
'\x{1FBF}-\x{1FC1}\x{1FCD}-\x{1FCF}\x{1FDD}-\x{1FDF}\x{1FED}-\x{1FEF}' .
'\x{1FFD}-\x{206F}\x{207A}-\x{207E}\x{208A}-\x{208E}\x{20A0}-\x{20B8}' .
'\x{2100}-\x{2101}\x{2103}-\x{2106}\x{2108}-\x{2109}\x{2114}' .
'\x{2116}-\x{2118}\x{211E}-\x{2123}\x{2125}\x{2127}\x{2129}\x{212E}' .
'\x{213A}-\x{213B}\x{2140}-\x{2144}\x{214A}-\x{214D}\x{214F}' .
'\x{2190}-\x{244A}\x{249C}-\x{24E9}\x{2500}-\x{2775}\x{2794}-\x{2B59}' .
'\x{2CE5}-\x{2CEA}\x{2CF9}-\x{2CFC}\x{2CFE}-\x{2CFF}\x{2E00}-\x{2E2E}' .
'\x{2E30}-\x{3004}\x{3008}-\x{3020}\x{3030}\x{3036}-\x{3037}' .
'\x{303D}-\x{303F}\x{309B}-\x{309C}\x{30A0}\x{30FB}\x{3190}-\x{3191}' .
'\x{3196}-\x{319F}\x{31C0}-\x{31E3}\x{3200}-\x{321E}\x{322A}-\x{3250}' .
'\x{3260}-\x{327F}\x{328A}-\x{32B0}\x{32C0}-\x{33FF}\x{4DC0}-\x{4DFF}' .
'\x{A490}-\x{A4C6}\x{A4FE}-\x{A4FF}\x{A60D}-\x{A60F}\x{A673}\x{A67E}' .
'\x{A6F2}-\x{A716}\x{A720}-\x{A721}\x{A789}-\x{A78A}\x{A828}-\x{A82B}' .
'\x{A836}-\x{A839}\x{A874}-\x{A877}\x{A8CE}-\x{A8CF}\x{A8F8}-\x{A8FA}' .
'\x{A92E}-\x{A92F}\x{A95F}\x{A9C1}-\x{A9CD}\x{A9DE}-\x{A9DF}' .
'\x{AA5C}-\x{AA5F}\x{AA77}-\x{AA79}\x{AADE}-\x{AADF}\x{ABEB}' .
'\x{D800}-\x{F8FF}\x{FB29}\x{FD3E}-\x{FD3F}\x{FDFC}-\x{FDFD}' .
'\x{FE10}-\x{FE19}\x{FE30}-\x{FE6B}\x{FEFF}-\x{FF0F}\x{FF1A}-\x{FF20}' .
'\x{FF3B}-\x{FF40}\x{FF5B}-\x{FF65}\x{FFE0}-\x{FFFD}');
function utf8_strlen($text) {
if (function_exists('mb_strlen')) {
return mb_strlen($text);
}
// Do not count UTF-8 continuation bytes.
return strlen(preg_replace("/[\x80-\xBF]/", '', $text));
}
function utf8_truncate($string, $max_length, $wordsafe = FALSE, $add_ellipsis = FALSE, $min_wordsafe_length = 1) {
$ellipsis = '';
$max_length = max($max_length, 0);
$min_wordsafe_length = max($min_wordsafe_length, 0);
if (utf8_strlen($string) <= $max_length) {
// No truncation needed, so don't add ellipsis, just return.
return $string;
}
if ($add_ellipsis) {
// Truncate ellipsis in case $max_length is small.
$ellipsis = utf8_substr('...', 0, $max_length);
$max_length -= utf8_strlen($ellipsis);
$max_length = max($max_length, 0);
}
if ($max_length <= $min_wordsafe_length) {
// Do not attempt word-safe if lengths are bad.
$wordsafe = FALSE;
}
if ($wordsafe) {
$matches = array();
// Find the last word boundary, if there is one within $min_wordsafe_length
// to $max_length characters. preg_match() is always greedy, so it will
// find the longest string possible.
$found = preg_match('/^(.{' . $min_wordsafe_length . ',' . $max_length . '})[' . PREG_CLASS_UNICODE_WORD_BOUNDARY . ']/u', $string, $matches);
if ($found) {
$string = $matches[1];
}
else {
$string = utf8_substr($string, 0, $max_length);
}
}
else {
$string = utf8_substr($string, 0, $max_length);
}
if ($add_ellipsis) {
$string .= $ellipsis;
}
return $string;
}
function utf8_substr($text, $start, $length = NULL) {
if (function_exists('mb_substr')) {
return $length === NULL ? mb_substr($text, $start) : mb_substr($text, $start, $length);
}
else {
$strlen = strlen($text);
// Find the starting byte offset.
$bytes = 0;
if ($start > 0) {
// Count all the continuation bytes from the start until we have found
// $start characters or the end of the string.
$bytes = -1;
$chars = -1;
while ($bytes < $strlen - 1 && $chars < $start) {
$bytes++;
$c = ord($text[$bytes]);
if ($c < 0x80 || $c >= 0xC0) {
$chars++;
}
}
}
elseif ($start < 0) {
// Count all the continuation bytes from the end until we have found
// abs($start) characters.
$start = abs($start);
$bytes = $strlen;
$chars = 0;
while ($bytes > 0 && $chars < $start) {
$bytes--;
$c = ord($text[$bytes]);
if ($c < 0x80 || $c >= 0xC0) {
$chars++;
}
}
}
$istart = $bytes;
// Find the ending byte offset.
if ($length === NULL) {
$iend = $strlen;
}
elseif ($length > 0) {
// Count all the continuation bytes from the starting index until we have
// found $length characters or reached the end of the string, then
// backtrace one byte.
$iend = $istart - 1;
$chars = -1;
$last_real = FALSE;
while ($iend < $strlen - 1 && $chars < $length) {
$iend++;
$c = ord($text[$iend]);
$last_real = FALSE;
if ($c < 0x80 || $c >= 0xC0) {
$chars++;
$last_real = TRUE;
}
}
// Backtrace one byte if the last character we found was a real character
// and we don't need it.
if ($last_real && $chars >= $length) {
$iend--;
}
}
elseif ($length < 0) {
// Count all the continuation bytes from the end until we have found
// abs($start) characters, then backtrace one byte.
$length = abs($length);
$iend = $strlen;
$chars = 0;
while ($iend > 0 && $chars < $length) {
$iend--;
$c = ord($text[$iend]);
if ($c < 0x80 || $c >= 0xC0) {
$chars++;
}
}
// Backtrace one byte if we are not at the beginning of the string.
if ($iend > 0) {
$iend--;
}
}
else {
// $length == 0, return an empty string.
return '';
}
return substr($text, $istart, max(0, $iend - $istart + 1));
}
}
For your return statement you could try:
return htmlspecialchars(trim($r));
EDIT: I tried your code as you provided it and it ran fine for me without having to use htmlspecialchars(). This is probably due to the face that in the <head> of the page the code was running on, the charset was set to UTF-8. So your options could be to set the encoding of the page like this:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
or to use htmlspecialchars() as above.