PHP Remote file streaming with Resume Support

Firstly, I am aware of similar question being asked before.
The subject pretty much explains the question but still,
the file is hosted on another server, the user will download file via my script, streamed to him...
But the problem is user can't resume it once paused...any solutions?

You can try implementing your own download script using Accept-Ranges and Content-Range here is a prof of concept :
$download = new ResumeDownload("word.dir.txt", 50000); //delay about in microsecs
Using Internet Download Manager
Paused State
Class Used
class ResumeDownload {
private $file;
private $name;
private $boundary;
private $delay = 0;
private $size = 0;
function __construct($file, $delay = 0) {
if (! is_file($file)) {
header("HTTP/1.1 400 Invalid Request");
die("<h3>File Not Found</h3>");
$this->size = filesize($file);
$this->file = fopen($file, "r");
$this->boundary = md5($file);
$this->delay = $delay;
$this->name = basename($file);
public function process() {
$ranges = NULL;
$t = 0;
if ($_SERVER['REQUEST_METHOD'] == 'GET' && isset($_SERVER['HTTP_RANGE']) && $range = stristr(trim($_SERVER['HTTP_RANGE']), 'bytes=')) {
$range = substr($range, 6);
$ranges = explode(',', $range);
$t = count($ranges);
header("Accept-Ranges: bytes");
header("Content-Type: application/octet-stream");
header("Content-Transfer-Encoding: binary");
header(sprintf('Content-Disposition: attachment; filename="%s"', $this->name));
if ($t > 0) {
header("HTTP/1.1 206 Partial content");
$t === 1 ? $this->pushSingle($range) : $this->pushMulti($ranges);
} else {
header("Content-Length: " . $this->size);
private function pushSingle($range) {
$start = $end = 0;
$this->getRange($range, $start, $end);
header("Content-Length: " . ($end - $start + 1));
header(sprintf("Content-Range: bytes %d-%d/%d", $start, $end, $this->size));
fseek($this->file, $start);
$this->readBuffer($end - $start + 1);
private function pushMulti($ranges) {
$length = $start = $end = 0;
$output = "";
$tl = "Content-type: application/octet-stream\r\n";
$formatRange = "Content-range: bytes %d-%d/%d\r\n\r\n";
foreach ( $ranges as $range ) {
$this->getRange($range, $start, $end);
$length += strlen("\r\n--$this->boundary\r\n");
$length += strlen($tl);
$length += strlen(sprintf($formatRange, $start, $end, $this->size));
$length += $end - $start + 1;
$length += strlen("\r\n--$this->boundary--\r\n");
header("Content-Length: $length");
header("Content-Type: multipart/x-byteranges; boundary=$this->boundary");
foreach ( $ranges as $range ) {
$this->getRange($range, $start, $end);
echo "\r\n--$this->boundary\r\n";
echo $tl;
echo sprintf($formatRange, $start, $end, $this->size);
fseek($this->file, $start);
$this->readBuffer($end - $start + 1);
echo "\r\n--$this->boundary--\r\n";
private function getRange($range, &$start, &$end) {
list($start, $end) = explode('-', $range);
$fileSize = $this->size;
if ($start == '') {
$tmp = $end;
$end = $fileSize - 1;
$start = $fileSize - $tmp;
if ($start < 0)
$start = 0;
} else {
if ($end == '' || $end > $fileSize - 1)
$end = $fileSize - 1;
if ($start > $end) {
header("Status: 416 Requested range not satisfiable");
header("Content-Range: */" . $fileSize);
return array(
private function readFile() {
while ( ! feof($this->file) ) {
echo fgets($this->file);
private function readBuffer($bytes, $size = 1024) {
$bytesLeft = $bytes;
while ( $bytesLeft > 0 && ! feof($this->file) ) {
$bytesLeft > $size ? $bytesRead = $size : $bytesRead = $bytesLeft;
$bytesLeft -= $bytesRead;
echo fread($this->file, $bytesRead);
File Used

If you're using PHP to serve the file, you have to implement all resuming logic yourself.
You'll have to send Accept-Ranges and respond appropriately to Ranges.
That's a chunk of work. It might be easier to use mod_proxy.

What's the purpose of this? hiding urls only or just allowing members to download?
The way you described it, it's a bit tricky ...
The remote server your script will download from should support resuming downloads.
Your php script should check for 'Accept-Range' header & pass it through to the remote server (using sockets is your best option I guess) so your script is actually acting as a proxy.


Binary search in file by index

Im trying to make a script to get information in a file. The data in the file has this format:
So I have to make a binary search direct on the file, look for the index and get all its information. For example:
$index = "index2";
Output: index2 info21 info22 info24
I tried:
$file = "file.txt";
// open the file for reading
$myfile = fopen($file, "r") or die("File cannot be openned");
$indexToSearch = "9780857293039";
$begging = 0;
$end = filesize($file) / sizeof($indexToSearch) - 1;
while($begging <= $end) {
$middle = ($end + $begging) / 2; );
$line = fread($myfile, $middle);
if(strcmp($line,$indexToSearch) == 0) {
echo "found";
} else
if(strcmp($line,$indexToSearch) > 0) {
$end = $middle - 1;
} else {
$begging = $middle + 1;
fseek($myfile, $middle);
echo "<br><br>FINAL: ".fgets($myfile);

Detecting and storing length of audio files in a table

Is there any PHP function that will give me the MP3 duration. I looked at ID 3 function but i don't see any thing there for duration and apart from this,id3 is some kind of tag,which will not be there in all MP3 so using this will not make any sense.
This should work for you, notice the getduration function:
Install getid3, but if you only need duration, you can delete all but these modules:
Access the duration with code like this:
$getID3 = new getID3;
$ThisFileInfo = $getID3->analyze($pathName);
$len= #$ThisFileInfo['playtime_string']; // playtime in minutes:seconds, formatted string
Get it at Sourceforge
I have passed so many time, but without getID3 ( to get duration of audio file not possible.
1) First download library of getID3 using below link:
2) Try this below code:
$filename = 'bcd4ecc6bf521da9b9a2d8b9616d1505.wav';
$getID3 = new getID3;
$file = $getID3->analyze($filename);
$playtime_seconds = $file['playtime_seconds'];
echo gmdate("H:i:s", $playtime_seconds);
You can get the duration of an mp3 or many other audio/video files by using ffmpeg.
Install ffmpeg in your server.
Make sure that php shell_exec is not restricted in your php.
// Discriminate only the audio/video files you want
if(preg_match('/[^?#]+\.(?:wma|mp3|wav|mp4)/', strtolower($file))){
$filepath = /* your file path */;
// execute ffmpeg form linux shell and grab duration from output
$result = shell_exec("ffmpeg -i ".$filepath.' 2>&1 | grep -o \'Duration: [0-9:.]*\'');
$duration = str_replace('Duration: ', '', $result); // 00:05:03.25
//get the duration in seconds
$timeArr = preg_split('/:/', str_replace('s', '', $duration[0]));
$t = $this->_times[$file] = (($timeArr[3])? $timeArr[3]*1 + $timeArr[2] * 60 + $timeArr[1] * 60 * 60 : $timeArr[2] + $timeArr[1] * 60)*1000;
class MP3File
protected $filename;
public function __construct($filename)
$this->filename = $filename;
public static function formatTime($duration) //as hh:mm:ss
//return sprintf("%d:%02d", $duration/60, $duration%60);
$hours = floor($duration / 3600);
$minutes = floor( ($duration - ($hours * 3600)) / 60);
$seconds = $duration - ($hours * 3600) - ($minutes * 60);
return sprintf("%02d:%02d:%02d", $hours, $minutes, $seconds);
//Read first mp3 frame only... use for CBR constant bit rate MP3s
public function getDurationEstimate()
return $this->getDuration($use_cbr_estimate=true);
//Read entire file, frame by frame... ie: Variable Bit Rate (VBR)
public function getDuration($use_cbr_estimate=false)
$fd = fopen($this->filename, "rb");
$block = fread($fd, 100);
$offset = $this->skipID3v2Tag($block);
fseek($fd, $offset, SEEK_SET);
while (!feof($fd))
$block = fread($fd, 10);
if (strlen($block)<10) { break; }
//looking for 1111 1111 111 (frame synchronization bits)
else if ($block[0]=="\xff" && (ord($block[1])&0xe0) )
$info = self::parseFrameHeader(substr($block, 0, 4));
if (empty($info['Framesize'])) { return $duration; } //some corrupt mp3 files
fseek($fd, $info['Framesize']-10, SEEK_CUR);
$duration += ( $info['Samples'] / $info['Sampling Rate'] );
else if (substr($block, 0, 3)=='TAG')
fseek($fd, 128-10, SEEK_CUR);//skip over id3v1 tag size
fseek($fd, -9, SEEK_CUR);
if ($use_cbr_estimate && !empty($info))
return $this->estimateDuration($info['Bitrate'],$offset);
return round($duration);
private function estimateDuration($bitrate,$offset)
$kbps = ($bitrate*1000)/8;
$datasize = filesize($this->filename) - $offset;
return round($datasize / $kbps);
private function skipID3v2Tag(&$block)
if (substr($block, 0,3)=="ID3")
$id3v2_major_version = ord($block[3]);
$id3v2_minor_version = ord($block[4]);
$id3v2_flags = ord($block[5]);
$flag_unsynchronisation = $id3v2_flags & 0x80 ? 1 : 0;
$flag_extended_header = $id3v2_flags & 0x40 ? 1 : 0;
$flag_experimental_ind = $id3v2_flags & 0x20 ? 1 : 0;
$flag_footer_present = $id3v2_flags & 0x10 ? 1 : 0;
$z0 = ord($block[6]);
$z1 = ord($block[7]);
$z2 = ord($block[8]);
$z3 = ord($block[9]);
if ( (($z0&0x80)==0) && (($z1&0x80)==0) && (($z2&0x80)==0) && (($z3&0x80)==0) )
$header_size = 10;
$tag_size = (($z0&0x7f) * 2097152) + (($z1&0x7f) * 16384) + (($z2&0x7f) * 128) + ($z3&0x7f);
$footer_size = $flag_footer_present ? 10 : 0;
return $header_size + $tag_size + $footer_size;//bytes to skip
return 0;
public static function parseFrameHeader($fourbytes)
static $versions = array(
0x0=>'2.5',0x1=>'x',0x2=>'2',0x3=>'1', // x=>'reserved'
static $layers = array(
0x0=>'x',0x1=>'3',0x2=>'2',0x3=>'1', // x=>'reserved'
static $bitrates = array(
'V1L2'=>array(0,32,48,56, 64, 80, 96,112,128,160,192,224,256,320,384),
'V1L3'=>array(0,32,40,48, 56, 64, 80, 96,112,128,160,192,224,256,320),
'V2L1'=>array(0,32,48,56, 64, 80, 96,112,128,144,160,176,192,224,256),
'V2L2'=>array(0, 8,16,24, 32, 40, 48, 56, 64, 80, 96,112,128,144,160),
'V2L3'=>array(0, 8,16,24, 32, 40, 48, 56, 64, 80, 96,112,128,144,160),
static $sample_rates = array(
'1' => array(44100,48000,32000),
'2' => array(22050,24000,16000),
'2.5' => array(11025,12000, 8000),
static $samples = array(
1 => array( 1 => 384, 2 =>1152, 3 =>1152, ), //MPEGv1, Layers 1,2,3
2 => array( 1 => 384, 2 =>1152, 3 => 576, ), //MPEGv2/2.5, Layers 1,2,3
//$b0=ord($fourbytes[0]);//will always be 0xff
$version_bits = ($b1 & 0x18) >> 3;
$version = $versions[$version_bits];
$simple_version = ($version=='2.5' ? 2 : $version);
$layer_bits = ($b1 & 0x06) >> 1;
$layer = $layers[$layer_bits];
$protection_bit = ($b1 & 0x01);
$bitrate_key = sprintf('V%dL%d', $simple_version , $layer);
$bitrate_idx = ($b2 & 0xf0) >> 4;
$bitrate = isset($bitrates[$bitrate_key][$bitrate_idx]) ? $bitrates[$bitrate_key][$bitrate_idx] : 0;
$sample_rate_idx = ($b2 & 0x0c) >> 2;//0xc => b1100
$sample_rate = isset($sample_rates[$version][$sample_rate_idx]) ? $sample_rates[$version][$sample_rate_idx] : 0;
$padding_bit = ($b2 & 0x02) >> 1;
$private_bit = ($b2 & 0x01);
$channel_mode_bits = ($b3 & 0xc0) >> 6;
$mode_extension_bits = ($b3 & 0x30) >> 4;
$copyright_bit = ($b3 & 0x08) >> 3;
$original_bit = ($b3 & 0x04) >> 2;
$emphasis = ($b3 & 0x03);
$info = array();
$info['Version'] = $version;//MPEGVersion
$info['Layer'] = $layer;
//$info['Protection Bit'] = $protection_bit; //0=> protected by 2 byte CRC, 1=>not protected
$info['Bitrate'] = $bitrate;
$info['Sampling Rate'] = $sample_rate;
$info['Framesize'] = self::framesize($layer, $bitrate, $sample_rate, $padding_bit);
$info['Samples'] = $samples[$simple_version][$layer];
return $info;
private static function framesize($layer, $bitrate,$sample_rate,$padding_bit)
if ($layer==1)
return intval(((12 * $bitrate*1000 /$sample_rate) + $padding_bit) * 4);
else //layer 2, 3
return intval(((144 * $bitrate*1000)/$sample_rate) + $padding_bit);
$mp3file = new MP3File("Chal_Halke.mp3");//
$duration1 = $mp3file->getDurationEstimate();//(faster) for CBR only
$duration2 = $mp3file->getDuration();//(slower) for VBR (or CBR)
echo "duration: $duration1 seconds"."\n";
There is no native php function to do this.
Depending on your server environment, you may use a tool such as MP3Info.
$length = shell_exec('mp3info -p "%S" sample.mp3'); // total time in seconds
As earlier, I provided a solution for both mp3 and WAV files, Now this solution is specifically for the only WAV file with more precision but with longer evaluation time than the earlier solution.
function calculateWavDuration( $file ) {
$fp = fopen($file, 'r');
if (fread($fp, 4) == "RIFF") {
fseek($fp, 20);
$raw_header = fread($fp, 16);
$header = unpack('vtype/vchannels/Vsamplerate/Vbytespersec/valignment/vbits', $raw_header);
$pos = ftell($fp);
while (fread($fp, 4) != "data" && !feof($fp)) {
fseek($fp, $pos);
$raw_header = fread($fp, 4);
$data = unpack('Vdatasize', $raw_header);
$sec = $data[datasize] / $header[bytespersec];
$minutes = intval(($sec / 60) % 60);
$seconds = intval($sec % 60);
return str_pad($minutes, 2, "0", STR_PAD_LEFT) . ":" . str_pad($seconds, 2, "0", STR_PAD_LEFT);
$file = '1.wav'; //Enter File wav
The MP3 length is not stored anywhere (in the "plain" MP3 format), since MP3 is designed to be "split" into frames and those frames will remain playable.
If you have no ID tag on which to rely, what you would need to do (there are tools and PHP classes that do this) is to read the whole MP3 file and sum the durations of each frame.
$getID3 = new getID3;
$ThisFileInfo = $getID3->analyze($pathName);
// playtime in minutes:seconds, formatted string
$len = #$ThisFileInfo['playtime_string'];
//don't get playtime_string, but get playtime_seconds
$len = #$ThisFileInfo['playtime_seconds']*1000; //*1000 as calculate millisecond
I hope this helps you.
Finally, I developed a solution with my own calculations. This solution works best for mp3 and WAV files formats. However minor precision variations are expected. The solution is in PHP. I take little bit clue from WAV
function calculateFileSize($file){
$ratio = 16000; //bytespersec
if (!$file) {
exit("Verify file name and it's path");
$file_size = filesize($file);
if (!$file_size)
exit("Verify file, something wrong with your file");
$duration = ($file_size / $ratio);
$minutes = floor($duration / 60);
$seconds = $duration - ($minutes * 60);
$seconds = round($seconds);
echo "$minutes:$seconds minutes";
$file = 'apple-classic.mp3'; //Enter File Name mp3/wav
If you have FFMpeg installed, getting the duration is quite simple with FFProbe
$filepath = 'example.mp3';
$ffprobe = \FFMpeg\FFProbe::create();
$duration = $ffprobe->format($filepath)->get('duration');
echo gmdate('H:i:s', $duration);
FFMpeg is mentioned elsewhere, but here's a fuller explanation and example implementation.
Install ffmpeg for your system. E.g., on Ubuntu:
apt-get update && apt-get -y install ffmpeg
Install php-ffmpeg using Composer:
composer require php-ffmpeg/php-ffmpeg
Example utility class
namespace App\Utils;
use FFMpeg\FFProbe;
class Audio
public static function duration(string $path): float
$probe = FFProbe::create();
return $probe->format($path)->get('duration');
Where $path is the absolute path or URL to your audio file. To use:
$duration = \App\Utils\Audio::duration($path);
echo $duration; // 24.476750
Of course, you can just use it directly where you need it. The point of the utility class example is to show how you use it. You'll want to try/catch calling it in a production setting. If you aren't using composer, see #awavi's answer.

PHP: Read from certain point in file

Similar to: How to read only 5 last line of the text file in PHP?
I have a large log file and I want to be able to show 100 lines from position X in the file.
I need to use fseek rather than file() because the log file is too large.
I have a similar function but it will only read from the end of the file. How can it be modified so that a start position can be specified as well? I would also need to start at the end of the file.
function read_line($filename, $lines, $revers = false)
$offset = -1;
$i = 0;
$fp = #fopen($filename, "r");
while( $lines && fseek($fp, $offset, SEEK_END) >= 0 ) {
$c = fgetc($fp);
if($c == "\n" || $c == "\r"){
$read[$i] = strrev($read[$i]);
if($revers) $read[$i] .= $c;
else $read .= $c;
fclose ($fp);
if($read[$i] == "\n" || $read[$i] == "\r")
else $read[$i] = strrev($read[$i]);
return implode('',$read);
return strrev(rtrim($read,"\n\r"));
What I'm trying to do is create a web based log viewer that will start from the end of the file and display 100 lines, and when pressing the "Next" button, the next 100 lines preceding it will be shown.
If you're on Unix, you can utilize the sed tool. For example: to get line 10-20 from a file:
sed -n 10,20p errors.log
And you can do this in your script:
$page = 1;
$limit = 100;
$off = ($page * $limit) - ($limit - 1);
exec("sed -n $off,".($limit+$off-1)."p errors.log", $out);
The lines are available in $out array.
This uses fseek to read 100 lines of a file starting from a specified offset. If the offset is greater than the number of lines in the log, the first 100 lines are read.
In your application, you could pass the current offset through the query string for prev and next and base the next offset on that. You could also store and pass the current file position for more efficiency.
$GLOBALS["interval"] = 100;
function read_log()
$fp = fopen("log", "r");
$offset = determine_offset();
$interval = $GLOBALS["interval"];
if (seek_to_offset($fp, $offset) != -1)
show_next_button($offset, $interval);
$lines = array();
for ($ii = 0; $ii < $interval; $ii++)
$lines[] = trim(fgets($fp));
echo "<pre>";
// Get the offset from the query string or default to the interval
function determine_offset()
$interval = $GLOBALS["interval"];
if (isset($_GET["offset"]))
return intval($_GET["offset"]) + $interval;
return $interval;
function show_next_button($offset, $interval)
$next_offset = $offset + $interval;
echo "Next";
// Seek to the end of the file, then seek backward $offset lines
function seek_to_offset($fp, $offset)
fseek($fp, 0, SEEK_END);
for ($ii = 0; $ii < $offset; $ii++)
if (seek_to_previous_line($fp) == -1)
return -1;
// Seek backward by char until line break
function seek_to_previous_line($fp)
fseek($fp, -2, SEEK_CUR);
while (fgetc($fp) != "\n")
if (fseek($fp, -2, SEEK_CUR) == -1)
return -1;
Is "position X" measured in lines or bytes? If lines, you can easily use SplFileObject to seek to a certain line and then read 100 lines:
$file = new SplFileObject('log.txt');
$file->seek(199); // go to line 200
for($i = 0; $i < 100 and $file->valid(); $i++, $file->next())
echo $file->current();
If position X is measured in bytes, isn't it a simple matter of changing your initial $offset = -1 to a different value?
I would do it as followed:
function readFileFunc($tempFile){
return FALSE;
return file($tempFile);
$textArray = readFileFunc('./data/yourTextfile.txt');
$slicePos = count($textArray)-101;
if($slicePos < 0){
$slicePos = 0;
$last100 = array_slice($textArray, $slicePos);
$last100 = implode('<br />', $last100);
echo $last100;

How to generate excerpt with most searched words in PHP?

Here is an excerpt function:
function excerpt($text, $phrase, $radius = 100, $ending = "...") {
270 if (empty($text) or empty($phrase)) {
271 return $this->truncate($text, $radius * 2, $ending);
272 }
274 $phraseLen = strlen($phrase);
275 if ($radius < $phraseLen) {
276 $radius = $phraseLen;
277 }
279 $pos = strpos(strtolower($text), strtolower($phrase));
281 $startPos = 0;
282 if ($pos > $radius) {
283 $startPos = $pos - $radius;
284 }
286 $textLen = strlen($text);
288 $endPos = $pos + $phraseLen + $radius;
289 if ($endPos >= $textLen) {
290 $endPos = $textLen;
291 }
293 $excerpt = substr($text, $startPos, $endPos - $startPos);
294 if ($startPos != 0) {
295 $excerpt = substr_replace($excerpt, $ending, 0, $phraseLen);
296 }
298 if ($endPos != $textLen) {
299 $excerpt = substr_replace($excerpt, $ending, -$phraseLen);
300 }
302 return $excerpt;
303 }
Its drawback is that it doesn't try to match as many searched words as possible,which only matches once by default.
How to implement the desired one?
The code listed here thus far has not worked for me so I spent some time thinking of an algorithm to implement. What I have now works decently, and it does not appear to be a performance problem - feel free to test. Results are not as snazzy Google's snippets as there is no detection for where sentences start and end. I could add this but it'd be that much more complicated and I'd have to throw in the towel on doing this in a single function. Already its getting crowded and could be better coded if, for example, the object manipulations were abstracted to methods.
Anyhow, this is what I have and it should be a good start. The most dense excerpt is determined and the resulting string will approximately be the span you have specified. I urge some testing of this code as I have not done a thorough job of it. Surely there are problematic cases to be found.
I also encourage anyone to improve on this algorithm, or simply the code to execute it.
// string excerpt(string $text, string $phrase, int $span = 100, string $delimiter = '...')
// parameters:
// $text - text to be searched
// $phrase - search string
// $span - approximate length of the excerpt
// $delimiter - string to use as a suffix and/or prefix if the excerpt is from the middle of a text
function excerpt($text, $phrase, $span = 100, $delimiter = '...') {
$phrases = preg_split('/\s+/', $phrase);
$regexp = '/\b(?:';
foreach ($phrases as $phrase) {
$regexp .= preg_quote($phrase, '/') . '|';
$regexp = substr($regexp, 0, -1) . ')\b/i';
$matches = array();
preg_match_all($regexp, $text, $matches, PREG_OFFSET_CAPTURE);
$matches = $matches[0];
$nodes = array();
foreach ($matches as $match) {
$node = new stdClass;
$node->phraseLength = strlen($match[0]);
$node->position = $match[1];
$nodes[] = $node;
if (count($nodes) > 0) {
$clust = new stdClass;
$clust->nodes[] = array_shift($nodes);
$clust->length = $clust->nodes[0]->phraseLength;
$clust->i = 0;
$clusters = new stdClass;
$clusters->data = array($clust);
$clusters->i = 0;
foreach ($nodes as $node) {
$lastClust = $clusters->data[$clusters->i];
$lastNode = $lastClust->nodes[$lastClust->i];
$addedLength = $node->position - $lastNode->position - $lastNode->phraseLength + $node->phraseLength;
if ($lastClust->length + $addedLength <= $span) {
$lastClust->nodes[] = $node;
$lastClust->length += $addedLength;
$lastClust->i += 1;
} else {
if ($addedLength > $span) {
$newClust = new stdClass;
$newClust->nodes = array($node);
$newClust->i = 0;
$newClust->length = $node->phraseLength;
$clusters->data[] = $newClust;
$clusters->i += 1;
} else {
$newClust = clone $lastClust;
while ($newClust->length + $addedLength > $span) {
$shiftedNode = array_shift($newClust->nodes);
if ($shiftedNode === null) {
$newClust->i -= 1;
$removedLength = $shiftedNode->phraseLength;
if (isset($newClust->nodes[0])) {
$removedLength += $newClust->nodes[0]->position - $shiftedNode->position;
$newClust->length -= $removedLength;
if ($newClust->i < 0) {
$newClust->i = 0;
$newClust->nodes[] = $node;
$newClust->length += $addedLength;
$clusters->data[] = $newClust;
$clusters->i += 1;
$bestClust = $clusters->data[0];
$bestClustSize = count($bestClust->nodes);
foreach ($clusters->data as $clust) {
$newClustSize = count($clust->nodes);
if ($newClustSize > $bestClustSize) {
$bestClust = $clust;
$bestClustSize = $newClustSize;
$clustLeft = $bestClust->nodes[0]->position;
$clustLen = $bestClust->length;
$padding = round(($span - $clustLen)/2);
$clustLeft -= $padding;
if ($clustLeft < 0) {
$clustLen += $clustLeft*-1 + $padding;
$clustLeft = 0;
} else {
$clustLen += $padding*2;
} else {
$clustLeft = 0;
$clustLen = $span;
$textLen = strlen($text);
$prefix = '';
$suffix = '';
if (!ctype_space($text[$clustLeft]) && isset($text[$clustLeft-1]) && !ctype_space($text[$clustLeft-1])) {
while (!ctype_space($text[$clustLeft])) {
$clustLeft += 1;
$prefix = $delimiter;
$lastChar = $clustLeft + $clustLen;
if (!ctype_space($text[$lastChar]) && isset($text[$lastChar+1]) && !ctype_space($text[$lastChar+1])) {
while (!ctype_space($text[$lastChar])) {
$lastChar -= 1;
$suffix = $delimiter;
$clustLen = $lastChar - $clustLeft;
if ($clustLeft > 0) {
$prefix = $delimiter;
if ($clustLeft + $clustLen < $textLen) {
$suffix = $delimiter;
return $prefix . trim(substr($text, $clustLeft, $clustLen+1)) . $suffix;
I came up with the below to generate excerpts. You can see the code here It works by finding all the locations of the matching words, then takes an excerpt based on which words are the closest. In theory this does not sound very good but in practice it works very well.
Its actually very close to how Sphider (for the record it lives in searchfuncs.php from line 529 to 566) generates its snippets. I think the below is much easier to read and is without bugs which exist in Sphider. It also does not use regular expressions which makes it a bit faster then other methods I have used.
I blogged about it here
// find the locations of each of the words
// Nothing exciting here. The array_unique is required
// unless you decide to make the words unique before passing in
function _extractLocations($words, $fulltext) {
$locations = array();
foreach($words as $word) {
$wordlen = strlen($word);
$loc = stripos($fulltext, $word);
while($loc !== FALSE) {
$locations[] = $loc;
$loc = stripos($fulltext, $word, $loc + $wordlen);
$locations = array_unique($locations);
return $locations;
// Work out which is the most relevant portion to display
// This is done by looping over each match and finding the smallest distance between two found
// strings. The idea being that the closer the terms are the better match the snippet would be.
// When checking for matches we only change the location if there is a better match.
// The only exception is where we have only two matches in which case we just take the
// first as will be equally distant.
function _determineSnipLocation($locations, $prevcount) {
// If we only have 1 match we dont actually do the for loop so set to the first
$startpos = $locations[0];
$loccount = count($locations);
$smallestdiff = PHP_INT_MAX;
// If we only have 2 skip as its probably equally relevant
if(count($locations) > 2) {
// skip the first as we check 1 behind
for($i=1; $i < $loccount; $i++) {
if($i == $loccount-1) { // at the end
$diff = $locations[$i] - $locations[$i-1];
else {
$diff = $locations[$i+1] - $locations[$i];
if($smallestdiff > $diff) {
$smallestdiff = $diff;
$startpos = $locations[$i];
$startpos = $startpos > $prevcount ? $startpos - $prevcount : 0;
return $startpos;
// 1/6 ratio on prevcount tends to work pretty well and puts the terms
// in the middle of the extract
function extractRelevant($words, $fulltext, $rellength=300, $prevcount=50, $indicator='...') {
$textlength = strlen($fulltext);
if($textlength <= $rellength) {
return $fulltext;
$locations = _extractLocations($words, $fulltext);
$startpos = _determineSnipLocation($locations,$prevcount);
// if we are going to snip too much...
if($textlength-$startpos < $rellength) {
$startpos = $startpos - ($textlength-$startpos)/2;
$reltext = substr($fulltext, $startpos, $rellength);
// check to ensure we dont snip the last word if thats the match
if( $startpos + $rellength < $textlength) {
$reltext = substr($reltext, 0, strrpos($reltext, " ")).$indicator; // remove last word
// If we trimmed from the front add ...
if($startpos != 0) {
$reltext = $indicator.substr($reltext, strpos($reltext, " ") + 1); // remove first word
return $reltext;
function excerpt($text, $phrase, $radius = 100, $ending = "...") {
$phraseLen = strlen($phrase);
if ($radius < $phraseLen) {
$radius = $phraseLen;
$phrases = explode (' ',$phrase);
foreach ($phrases as $phrase) {
$pos = strpos(strtolower($text), strtolower($phrase));
if ($pos > -1) break;
$startPos = 0;
if ($pos > $radius) {
$startPos = $pos - $radius;
$textLen = strlen($text);
$endPos = $pos + $phraseLen + $radius;
if ($endPos >= $textLen) {
$endPos = $textLen;
$excerpt = substr($text, $startPos, $endPos - $startPos);
if ($startPos != 0) {
$excerpt = substr_replace($excerpt, $ending, 0, $phraseLen);
if ($endPos != $textLen) {
$excerpt = substr_replace($excerpt, $ending, -$phraseLen);
return $excerpt; }
I could not contact erisco, so I am posting his function with multiple fixes (most importantly multibyte support).
* #param string $text text to be searched
* #param string $phrase search string
* #param int $span approximate length of the excerpt
* #param string $delimiter string to use as a suffix and/or prefix if the excerpt is from the middle of a text
* #return string
public static function excerpt($text, $phrase, $span = 100, $delimiter = '...')
$phrases = preg_split('/\s+/u', $phrase);
$regexp = '/\b(?:';
foreach($phrases as $phrase)
$regexp.= preg_quote($phrase, '/') . '|';
$regexp = mb_substr($regexp, 0, -1) .')\b/ui';
$matches = [];
preg_match_all($regexp, $text, $matches, PREG_OFFSET_CAPTURE);
$matches = $matches[0];
$nodes = [];
foreach($matches as $match)
$node = new stdClass;
$node->phraseLength = mb_strlen($match[0]);
$node->position = mb_strlen(substr($text, 0, $match[1])); // calculate UTF-8 position (#see
$nodes[] = $node;
if(count($nodes) > 0)
$clust = new stdClass;
$clust->nodes[] = array_shift($nodes);
$clust->length = $clust->nodes[0]->phraseLength;
$clust->i = 0;
$clusters = new stdClass;
$clusters->data =
$clusters->i = 0;
foreach($nodes as $node)
$lastClust = $clusters->data[$clusters->i];
$lastNode = $lastClust->nodes[$lastClust->i];
$addedLength = $node->position - $lastNode->position - $lastNode->phraseLength + $node->phraseLength;
if($lastClust->length + $addedLength <= $span)
$lastClust->nodes[] = $node;
$lastClust->length+= $addedLength;
if($addedLength > $span)
$newClust = new stdClass;
$newClust->nodes =
$newClust->i = 0;
$newClust->length = $node->phraseLength;
$clusters->data[] = $newClust;
$newClust = clone $lastClust;
while($newClust->length + $addedLength > $span)
$shiftedNode = array_shift($newClust->nodes);
if($shiftedNode === null)
$removedLength = $shiftedNode->phraseLength;
$removedLength+= $newClust->nodes[0]->position - $shiftedNode->position;
$newClust->length-= $removedLength;
if($newClust->i < 0)
$newClust->i = 0;
$newClust->nodes[] = $node;
$newClust->length+= $addedLength;
$clusters->data[] = $newClust;
$bestClust = $clusters->data[0];
$bestClustSize = count($bestClust->nodes);
foreach($clusters->data as $clust)
$newClustSize = count($clust->nodes);
if($newClustSize > $bestClustSize)
$bestClust = $clust;
$bestClustSize = $newClustSize;
$clustLeft = $bestClust->nodes[0]->position;
$clustLen = $bestClust->length;
$padding = intval(round(($span - $clustLen) / 2));
$clustLeft-= $padding;
if($clustLeft < 0)
$clustLen+= $clustLeft * -1 + $padding;
$clustLeft = 0;
$clustLen+= $padding * 2;
$clustLeft = 0;
$clustLen = $span;
$textLen = mb_strlen($text);
$prefix = '';
$suffix = '';
if($clustLeft > 0 && !ctype_space(mb_substr($text, $clustLeft, 1))
&& !ctype_space(mb_substr($text, $clustLeft - 1, 1)))
while(!ctype_space(mb_substr($text, $clustLeft, 1)))
$prefix = $delimiter;
$lastChar = $clustLeft + $clustLen;
if($lastChar < $textLen && !ctype_space(mb_substr($text, $lastChar, 1))
&& !ctype_space(mb_substr($text, $lastChar + 1, 1)))
while(!ctype_space(mb_substr($text, $lastChar, 1)))
$suffix = $delimiter;
$clustLen = $lastChar - $clustLeft;
if($clustLeft > 0)
$prefix = $delimiter;
if($clustLeft + $clustLen < $textLen)
$suffix = $delimiter;
return $prefix . trim(mb_substr($text, $clustLeft, $clustLen + 1)) . $suffix;
