How to generate excerpt with most searched words in PHP? - php

Here is an excerpt function:
function excerpt($text, $phrase, $radius = 100, $ending = "...") {
270 if (empty($text) or empty($phrase)) {
271 return $this->truncate($text, $radius * 2, $ending);
272 }
273
274 $phraseLen = strlen($phrase);
275 if ($radius < $phraseLen) {
276 $radius = $phraseLen;
277 }
278
279 $pos = strpos(strtolower($text), strtolower($phrase));
280
281 $startPos = 0;
282 if ($pos > $radius) {
283 $startPos = $pos - $radius;
284 }
285
286 $textLen = strlen($text);
287
288 $endPos = $pos + $phraseLen + $radius;
289 if ($endPos >= $textLen) {
290 $endPos = $textLen;
291 }
292
293 $excerpt = substr($text, $startPos, $endPos - $startPos);
294 if ($startPos != 0) {
295 $excerpt = substr_replace($excerpt, $ending, 0, $phraseLen);
296 }
297
298 if ($endPos != $textLen) {
299 $excerpt = substr_replace($excerpt, $ending, -$phraseLen);
300 }
301
302 return $excerpt;
303 }
Its drawback is that it doesn't try to match as many searched words as possible,which only matches once by default.
How to implement the desired one?

The code listed here thus far has not worked for me so I spent some time thinking of an algorithm to implement. What I have now works decently, and it does not appear to be a performance problem - feel free to test. Results are not as snazzy Google's snippets as there is no detection for where sentences start and end. I could add this but it'd be that much more complicated and I'd have to throw in the towel on doing this in a single function. Already its getting crowded and could be better coded if, for example, the object manipulations were abstracted to methods.
Anyhow, this is what I have and it should be a good start. The most dense excerpt is determined and the resulting string will approximately be the span you have specified. I urge some testing of this code as I have not done a thorough job of it. Surely there are problematic cases to be found.
I also encourage anyone to improve on this algorithm, or simply the code to execute it.
Enjoy.
// string excerpt(string $text, string $phrase, int $span = 100, string $delimiter = '...')
// parameters:
// $text - text to be searched
// $phrase - search string
// $span - approximate length of the excerpt
// $delimiter - string to use as a suffix and/or prefix if the excerpt is from the middle of a text
function excerpt($text, $phrase, $span = 100, $delimiter = '...') {
$phrases = preg_split('/\s+/', $phrase);
$regexp = '/\b(?:';
foreach ($phrases as $phrase) {
$regexp .= preg_quote($phrase, '/') . '|';
}
$regexp = substr($regexp, 0, -1) . ')\b/i';
$matches = array();
preg_match_all($regexp, $text, $matches, PREG_OFFSET_CAPTURE);
$matches = $matches[0];
$nodes = array();
foreach ($matches as $match) {
$node = new stdClass;
$node->phraseLength = strlen($match[0]);
$node->position = $match[1];
$nodes[] = $node;
}
if (count($nodes) > 0) {
$clust = new stdClass;
$clust->nodes[] = array_shift($nodes);
$clust->length = $clust->nodes[0]->phraseLength;
$clust->i = 0;
$clusters = new stdClass;
$clusters->data = array($clust);
$clusters->i = 0;
foreach ($nodes as $node) {
$lastClust = $clusters->data[$clusters->i];
$lastNode = $lastClust->nodes[$lastClust->i];
$addedLength = $node->position - $lastNode->position - $lastNode->phraseLength + $node->phraseLength;
if ($lastClust->length + $addedLength <= $span) {
$lastClust->nodes[] = $node;
$lastClust->length += $addedLength;
$lastClust->i += 1;
} else {
if ($addedLength > $span) {
$newClust = new stdClass;
$newClust->nodes = array($node);
$newClust->i = 0;
$newClust->length = $node->phraseLength;
$clusters->data[] = $newClust;
$clusters->i += 1;
} else {
$newClust = clone $lastClust;
while ($newClust->length + $addedLength > $span) {
$shiftedNode = array_shift($newClust->nodes);
if ($shiftedNode === null) {
break;
}
$newClust->i -= 1;
$removedLength = $shiftedNode->phraseLength;
if (isset($newClust->nodes[0])) {
$removedLength += $newClust->nodes[0]->position - $shiftedNode->position;
}
$newClust->length -= $removedLength;
}
if ($newClust->i < 0) {
$newClust->i = 0;
}
$newClust->nodes[] = $node;
$newClust->length += $addedLength;
$clusters->data[] = $newClust;
$clusters->i += 1;
}
}
}
$bestClust = $clusters->data[0];
$bestClustSize = count($bestClust->nodes);
foreach ($clusters->data as $clust) {
$newClustSize = count($clust->nodes);
if ($newClustSize > $bestClustSize) {
$bestClust = $clust;
$bestClustSize = $newClustSize;
}
}
$clustLeft = $bestClust->nodes[0]->position;
$clustLen = $bestClust->length;
$padding = round(($span - $clustLen)/2);
$clustLeft -= $padding;
if ($clustLeft < 0) {
$clustLen += $clustLeft*-1 + $padding;
$clustLeft = 0;
} else {
$clustLen += $padding*2;
}
} else {
$clustLeft = 0;
$clustLen = $span;
}
$textLen = strlen($text);
$prefix = '';
$suffix = '';
if (!ctype_space($text[$clustLeft]) && isset($text[$clustLeft-1]) && !ctype_space($text[$clustLeft-1])) {
while (!ctype_space($text[$clustLeft])) {
$clustLeft += 1;
}
$prefix = $delimiter;
}
$lastChar = $clustLeft + $clustLen;
if (!ctype_space($text[$lastChar]) && isset($text[$lastChar+1]) && !ctype_space($text[$lastChar+1])) {
while (!ctype_space($text[$lastChar])) {
$lastChar -= 1;
}
$suffix = $delimiter;
$clustLen = $lastChar - $clustLeft;
}
if ($clustLeft > 0) {
$prefix = $delimiter;
}
if ($clustLeft + $clustLen < $textLen) {
$suffix = $delimiter;
}
return $prefix . trim(substr($text, $clustLeft, $clustLen+1)) . $suffix;
}

I came up with the below to generate excerpts. You can see the code here https://github.com/boyter/php-excerpt It works by finding all the locations of the matching words, then takes an excerpt based on which words are the closest. In theory this does not sound very good but in practice it works very well.
Its actually very close to how Sphider (for the record it lives in searchfuncs.php from line 529 to 566) generates its snippets. I think the below is much easier to read and is without bugs which exist in Sphider. It also does not use regular expressions which makes it a bit faster then other methods I have used.
I blogged about it here http://www.boyter.org/2013/04/building-a-search-result-extract-generator-in-php/
<?php
// find the locations of each of the words
// Nothing exciting here. The array_unique is required
// unless you decide to make the words unique before passing in
function _extractLocations($words, $fulltext) {
$locations = array();
foreach($words as $word) {
$wordlen = strlen($word);
$loc = stripos($fulltext, $word);
while($loc !== FALSE) {
$locations[] = $loc;
$loc = stripos($fulltext, $word, $loc + $wordlen);
}
}
$locations = array_unique($locations);
sort($locations);
return $locations;
}
// Work out which is the most relevant portion to display
// This is done by looping over each match and finding the smallest distance between two found
// strings. The idea being that the closer the terms are the better match the snippet would be.
// When checking for matches we only change the location if there is a better match.
// The only exception is where we have only two matches in which case we just take the
// first as will be equally distant.
function _determineSnipLocation($locations, $prevcount) {
// If we only have 1 match we dont actually do the for loop so set to the first
$startpos = $locations[0];
$loccount = count($locations);
$smallestdiff = PHP_INT_MAX;
// If we only have 2 skip as its probably equally relevant
if(count($locations) > 2) {
// skip the first as we check 1 behind
for($i=1; $i < $loccount; $i++) {
if($i == $loccount-1) { // at the end
$diff = $locations[$i] - $locations[$i-1];
}
else {
$diff = $locations[$i+1] - $locations[$i];
}
if($smallestdiff > $diff) {
$smallestdiff = $diff;
$startpos = $locations[$i];
}
}
}
$startpos = $startpos > $prevcount ? $startpos - $prevcount : 0;
return $startpos;
}
// 1/6 ratio on prevcount tends to work pretty well and puts the terms
// in the middle of the extract
function extractRelevant($words, $fulltext, $rellength=300, $prevcount=50, $indicator='...') {
$textlength = strlen($fulltext);
if($textlength <= $rellength) {
return $fulltext;
}
$locations = _extractLocations($words, $fulltext);
$startpos = _determineSnipLocation($locations,$prevcount);
// if we are going to snip too much...
if($textlength-$startpos < $rellength) {
$startpos = $startpos - ($textlength-$startpos)/2;
}
$reltext = substr($fulltext, $startpos, $rellength);
// check to ensure we dont snip the last word if thats the match
if( $startpos + $rellength < $textlength) {
$reltext = substr($reltext, 0, strrpos($reltext, " ")).$indicator; // remove last word
}
// If we trimmed from the front add ...
if($startpos != 0) {
$reltext = $indicator.substr($reltext, strpos($reltext, " ") + 1); // remove first word
}
return $reltext;
}
?>

function excerpt($text, $phrase, $radius = 100, $ending = "...") {
$phraseLen = strlen($phrase);
if ($radius < $phraseLen) {
$radius = $phraseLen;
}
$phrases = explode (' ',$phrase);
foreach ($phrases as $phrase) {
$pos = strpos(strtolower($text), strtolower($phrase));
if ($pos > -1) break;
}
$startPos = 0;
if ($pos > $radius) {
$startPos = $pos - $radius;
}
$textLen = strlen($text);
$endPos = $pos + $phraseLen + $radius;
if ($endPos >= $textLen) {
$endPos = $textLen;
}
$excerpt = substr($text, $startPos, $endPos - $startPos);
if ($startPos != 0) {
$excerpt = substr_replace($excerpt, $ending, 0, $phraseLen);
}
if ($endPos != $textLen) {
$excerpt = substr_replace($excerpt, $ending, -$phraseLen);
}
return $excerpt; }

I could not contact erisco, so I am posting his function with multiple fixes (most importantly multibyte support).
/**
* #param string $text text to be searched
* #param string $phrase search string
* #param int $span approximate length of the excerpt
* #param string $delimiter string to use as a suffix and/or prefix if the excerpt is from the middle of a text
*
* #return string
*/
public static function excerpt($text, $phrase, $span = 100, $delimiter = '...')
{
$phrases = preg_split('/\s+/u', $phrase);
$regexp = '/\b(?:';
foreach($phrases as $phrase)
{
$regexp.= preg_quote($phrase, '/') . '|';
}
$regexp = mb_substr($regexp, 0, -1) .')\b/ui';
$matches = [];
preg_match_all($regexp, $text, $matches, PREG_OFFSET_CAPTURE);
$matches = $matches[0];
$nodes = [];
foreach($matches as $match)
{
$node = new stdClass;
$node->phraseLength = mb_strlen($match[0]);
$node->position = mb_strlen(substr($text, 0, $match[1])); // calculate UTF-8 position (#see https://bugs.php.net/bug.php?id=67487)
$nodes[] = $node;
}
if(count($nodes) > 0)
{
$clust = new stdClass;
$clust->nodes[] = array_shift($nodes);
$clust->length = $clust->nodes[0]->phraseLength;
$clust->i = 0;
$clusters = new stdClass;
$clusters->data =
[
$clust
];
$clusters->i = 0;
foreach($nodes as $node)
{
$lastClust = $clusters->data[$clusters->i];
$lastNode = $lastClust->nodes[$lastClust->i];
$addedLength = $node->position - $lastNode->position - $lastNode->phraseLength + $node->phraseLength;
if($lastClust->length + $addedLength <= $span)
{
$lastClust->nodes[] = $node;
$lastClust->length+= $addedLength;
$lastClust->i++;
}
else
{
if($addedLength > $span)
{
$newClust = new stdClass;
$newClust->nodes =
[
$node
];
$newClust->i = 0;
$newClust->length = $node->phraseLength;
$clusters->data[] = $newClust;
$clusters->i++;
}
else
{
$newClust = clone $lastClust;
while($newClust->length + $addedLength > $span)
{
$shiftedNode = array_shift($newClust->nodes);
if($shiftedNode === null)
{
break;
}
$newClust->i--;
$removedLength = $shiftedNode->phraseLength;
if(isset($newClust->nodes[0]))
{
$removedLength+= $newClust->nodes[0]->position - $shiftedNode->position;
}
$newClust->length-= $removedLength;
}
if($newClust->i < 0)
{
$newClust->i = 0;
}
$newClust->nodes[] = $node;
$newClust->length+= $addedLength;
$clusters->data[] = $newClust;
$clusters->i++;
}
}
}
$bestClust = $clusters->data[0];
$bestClustSize = count($bestClust->nodes);
foreach($clusters->data as $clust)
{
$newClustSize = count($clust->nodes);
if($newClustSize > $bestClustSize)
{
$bestClust = $clust;
$bestClustSize = $newClustSize;
}
}
$clustLeft = $bestClust->nodes[0]->position;
$clustLen = $bestClust->length;
$padding = intval(round(($span - $clustLen) / 2));
$clustLeft-= $padding;
if($clustLeft < 0)
{
$clustLen+= $clustLeft * -1 + $padding;
$clustLeft = 0;
}
else
{
$clustLen+= $padding * 2;
}
}
else
{
$clustLeft = 0;
$clustLen = $span;
}
$textLen = mb_strlen($text);
$prefix = '';
$suffix = '';
if($clustLeft > 0 && !ctype_space(mb_substr($text, $clustLeft, 1))
&& !ctype_space(mb_substr($text, $clustLeft - 1, 1)))
{
$clustLeft++;
while(!ctype_space(mb_substr($text, $clustLeft, 1)))
{
$clustLeft++;
}
$prefix = $delimiter;
}
$lastChar = $clustLeft + $clustLen;
if($lastChar < $textLen && !ctype_space(mb_substr($text, $lastChar, 1))
&& !ctype_space(mb_substr($text, $lastChar + 1, 1)))
{
$lastChar--;
while(!ctype_space(mb_substr($text, $lastChar, 1)))
{
$lastChar--;
}
$suffix = $delimiter;
$clustLen = $lastChar - $clustLeft;
}
if($clustLeft > 0)
{
$prefix = $delimiter;
}
if($clustLeft + $clustLen < $textLen)
{
$suffix = $delimiter;
}
return $prefix . trim(mb_substr($text, $clustLeft, $clustLen + 1)) . $suffix;
}

Related

Fatal error : Using $this when not in object context in [duplicate]

This question already has answers here:
PHP Fatal error: Using $this when not in object context
(9 answers)
Closed 7 years ago.
I have the below function, But when I run the code make error like:
Fatal error: Using $this when not in object context in E:....
How to fix it. I replace $this-> with self:: but it failed too.
Please help in this regards,
<?php
function cehck_files()
{
$file1 = 'C:\xampp\htdocs\test\test1.php';
$file2 = 'C:\xampp\htdocs\test\test2.php';
$test = $this->compareFiles($file1,$file2,true);
$test_display = $this->toTable($test);
echo "<pre>";
print_r($test_display);
print_r($test);
echo "</pre>";
}
function compareFiles($file1, $file2, $compareCharacters = false) {
return $this->compare(file_get_contents($file1),file_get_contents($file2),$compareCharacters);
}
function compare($string1, $string2, $compareCharacters = false) {
$start = 0;
if ($compareCharacters){
$sequence1 = $string1;
$sequence2 = $string2;
$end1 = strlen($string1) - 1;
$end2 = strlen($string2) - 1;
} else {
$sequence1 = preg_split('/\R/', $string1);
$sequence2 = preg_split('/\R/', $string2);
$end1 = count($sequence1) - 1;
$end2 = count($sequence2) - 1;
}
// skip any common prefix
while ($start <= $end1 && $start <= $end2 && $sequence1[$start] == $sequence2[$start]) {
$start ++;
}
// skip any common suffix
while ($end1 >= $start && $end2 >= $start && $sequence1[$end1] == $sequence2[$end2]) {
$end1 --;
$end2 --;
}
// compute the table of longest common subsequence lengths
$table = self::computeTable($sequence1, $sequence2, $start, $end1, $end2);
// generate the partial diff
$partialDiff =
self::generatePartialDiff($table, $sequence1, $sequence2, $start);
// generate the full diff
$diff = array();
for ($index = 0; $index < $start; $index ++){
$diff[] = array($sequence1[$index], UNMODIFIED);
}
while (count($partialDiff) > 0) $diff[] = array_pop($partialDiff);
for ($index = $end1 + 1; $index < ($compareCharacters ? strlen($sequence1) : count($sequence1)); $index ++) {
$diff[] = array($sequence1[$index], UNMODIFIED);
}
// return the diff
return $diff;
}
function computeTable($sequence1, $sequence2, $start, $end1, $end2) {
$length1 = $end1 - $start + 1;
$length2 = $end2 - $start + 1;
// initialise the table
$table = array(array_fill(0, $length2 + 1, 0));
// loop over the rows
for ($index1 = 1; $index1 <= $length1; $index1 ++) {
// create the new row
$table[$index1] = array(0);
// loop over the columns
for ($index2 = 1; $index2 <= $length2; $index2 ++){
// store the longest common subsequence length
if ($sequence1[$index1 + $start - 1] == $sequence2[$index2 + $start - 1]) {
$table[$index1][$index2] = $table[$index1 - 1][$index2 - 1] + 1;
} else {
$table[$index1][$index2] =
max($table[$index1 - 1][$index2], $table[$index1][$index2 - 1]);
}
}
}
// return the table
return $table;
}
function generatePartialDiff( $table, $sequence1, $sequence2, $start ) {
$diff = array();
// initialise the indices
$index1 = count($table) - 1;
$index2 = count($table[0]) - 1;
// loop until there are no items remaining in either sequence
while ($index1 > 0 || $index2 > 0){
// check what has happened to the items at these indices
if ($index1 > 0 && $index2 > 0 && $sequence1[$index1 + $start - 1] == $sequence2[$index2 + $start - 1]) {
// update the diff and the indices
$diff[] = array($sequence1[$index1 + $start - 1], UNMODIFIED);
$index1 --;
$index2 --;
} elseif ($index2 > 0 && $table[$index1][$index2] == $table[$index1][$index2 - 1]) {
// update the diff and the indices
$diff[] = array($sequence2[$index2 + $start - 1], INSERTED);
$index2 --;
}else{
// update the diff and the indices
$diff[] = array($sequence1[$index1 + $start - 1], DELETED);
$index1 --;
}
}
// return the diff
return $diff;
}
function toTable($diff, $indentation = '', $separator = '<br>') {
$html = $indentation . "<table class=\"diff\">\n";
// loop over the lines in the diff
$index = 0;
while ($index < count($diff)){
// determine the line type
switch ($diff[$index][1]){
// display the content on the left and right
case UNMODIFIED:
$leftCell =
self::getCellContent(
$diff, $indentation, $separator, $index, UNMODIFIED);
$rightCell = $leftCell;
break;
// display the deleted on the left and inserted content on the right
case DELETED:
$leftCell =
self::getCellContent(
$diff, $indentation, $separator, $index, DELETED);
$rightCell =
self::getCellContent(
$diff, $indentation, $separator, $index, INSERTED);
break;
// display the inserted content on the right
case INSERTED:
$leftCell = '';
$rightCell =
self::getCellContent(
$diff, $indentation, $separator, $index, INSERTED);
break;
}
// extend the HTML with the new row
$html .=
$indentation
. " <tr>\n"
. $indentation
. ' <td class="diff'
. ($leftCell == $rightCell
? 'Unmodified'
: ($leftCell == '' ? 'Blank' : 'Deleted'))
. '">'
. $leftCell
. "</td>\n"
. $indentation
. ' <td class="diff'
. ($leftCell == $rightCell
? 'Unmodified'
: ($rightCell == '' ? 'Blank' : 'Inserted'))
. '">'
. $rightCell
. "</td>\n"
. $indentation
. " </tr>\n";
}
// return the HTML
return $html . $indentation . "</table>\n";
}
?>
You are using $this for a function which is not a method of any class.
Instead of
$test = $this->compareFiles($file1,$file2,true);
Use:
$test = compareFiles($file1,$file2,true);
Also, change
return $this->compare(file_get_contents($file1),file_get_contents($file2),$compareCharacters);
To
return compare(file_get_contents($file1),file_get_contents($file2),$compareCharacters);
And to the remaining changes in this way.

automatic number increase in php based web form [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
So I have created an html form which then posts the results to a php file that overlays them on a PDF and then emails that PDF to myself and the email that was put in the form. All I want to do now is find a simple way to make it so that the PDF includes a sequential number.
For example: When the form is filled out for the first time the number 0001 is input automatically into the PDF and 0002 for the second time and so on.
Is there an easy PHP function to accomplish this?
Essentially I am creating an online invoicing form so when I do service calls I can create an invoice on the spot from a web browser which is then emailed to my office and the client.
Any help would be greatly appreciated.
For an incrementing number, you could keep a number in a database and then extract it, add 1 to it, use it, and then put it back in the DB for next time, but this seems complicated. Somebody in the comments mentioned using the timestamp, which would be done like so:
$invoicenumber = time(); //This number will always be unique
The time function works like so (copied from w3schools):
The time() function returns the current time in the number of seconds since the Unix Epoch (January 1 1970 00:00:00 GMT).
Since actual seconds can only go up (increment), this number will never be the same twice.
I hope this is helpful.
-Edit
You can also display this date/time in a readable format like so:
$time = time();
echo date("Y-m-d H:i:s",$time);
-Edit 2
If you want an incrementing number, you basically need a very simple database to save it, which might be as simple as a table called invoices, with a column called invoicenumber, which stores your invoice number in it. You could / probably should use this to store other invoice information in it too, so you'd have each invoice number saved (which means we want to only get the highest one)
Then your code would look like this, for each time you want to use it:
Firstly you'd have a database information file (settings.php or something similar) with your database definitions in it, which might look like this:
define('DB_HOST', 'localhost');
define('DB_USER', 'db_username');
define('DB_PASS', 'db_password');
define('DB_NAME', 'database_name');
Your code would look like this:
//Establish a mysql connection
$mysqli = new mysqli(DB_HOST, DB_USER, DB_PASS, DB_NAME);
//Set up a query to get the highest number
$query = "SELECT invoicenumber FROM invoices ORDER BY invoicenumber DESC LIMIT 1";
//Get the result
$result = $mysqli->query($query);
$row = $result->fetch_assoc();
//If we have a record
if($row){
//New invoice number
$invoicenumber = $row['invoicenumber']++;
//Else (database is empty, so start at the beginning)
}else{
$invoicenumber = 1;
}
//Now we have our invoice number, so do whatever you want with it
/**
* Code here to use the number
* */
//Now we wanna add the new invoice to the database, so
/**
* Add any other info to this statement if you want.
* If any of it is user submitted data, be sure to use prepared statements
* (just look at php.net's documentation on prepared statements)
* w3schools also has some nice tutorials on how to safely insert stuff
* in to a database, so check it all out :)
* */
$query = "INSERT INTO invoices(invoicenumber) VALUES($invoicenumber)";
//Execute the query
if($mysqli->query($query)){
//Show success
echo "Invoice $invoicenumber has been added to the database.";
}else{
//Show error
echo "Unfortunately we could not add invoice $invoicenumber to the database.";
}
//Now we can clear up our resources
$stmt->free_result(); $stmt->close(); $mysqli->close();
Please note: this is a very basic example. Yours will have additions and enhanced security if you are using user submitted data, so please do your homework and make sure that you fully understand each line of this code before you proceed to use it.
I do exactly the same with patient accession numbers on patient reports.
include('/home/user/php/class.pdf2text.php');
$p2t = new PDF2Text();
$p2t->setFilename($pdf);
$p2t->decodePDF();
$data = $p2t->output();
$len = strlen($data);
$pos = strpos($data,$accession);
if (pos){
$in .= "$accession,";
$checked++;
}
else{
$missingPDF += 1;echo "\n<p> <span class='bold red'>INCORRECT ACCESSION NUMBER c=$row[0] p=$row[1]</span>\n";
}
if ($checked > 0){
$in = substr($in,0,-1) . ')';
$sql = "UPDATE `Patient` SET `PDF`=1 WHERE $in";
}
pdf2text.php
class PDF2Text {
// Some settings
var $multibyte = 4; // Use setUnicode(TRUE|FALSE)
var $convertquotes = ENT_QUOTES; // ENT_COMPAT (double-quotes), ENT_QUOTES (Both), ENT_NOQUOTES (None)
var $showprogress = true; // TRUE if you have problems with time-out
// Variables
var $filename = '';
var $decodedtext = '';
function setFilename($filename) {
// Reset
$this->decodedtext = '';
$this->filename = $filename;
}
function output($echo = false) {
if($echo) echo $this->decodedtext;
else return $this->decodedtext;
}
function setUnicode($input) {
// 4 for unicode. But 2 should work in most cases just fine
if($input == true) $this->multibyte = 4;
else $this->multibyte = 2;
}
function decodePDF() {
// Read the data from pdf file
$infile = #file_get_contents($this->filename, FILE_BINARY);
if (empty($infile))
return "";
// Get all text data.
$transformations = array();
$texts = array();
// Get the list of all objects.
preg_match_all("#obj[\n|\r](.*)endobj[\n|\r]#ismU", $infile . "endobj\r", $objects);
$objects = #$objects[1];
// Select objects with streams.
for ($i = 0; $i < count($objects); $i++) {
$currentObject = $objects[$i];
// Prevent time-out
#set_time_limit ();
if($this->showprogress) {
// echo ". ";
flush(); ob_flush();
}
// Check if an object includes data stream.
if (preg_match("#stream[\n|\r](.*)endstream[\n|\r]#ismU", $currentObject . "endstream\r", $stream )) {
$stream = ltrim($stream[1]);
// Check object parameters and look for text data.
$options = $this->getObjectOptions($currentObject);
if (!(empty($options["Length1"]) && empty($options["Type"]) && empty($options["Subtype"])) )
// if ( $options["Image"] && $options["Subtype"] )
// if (!(empty($options["Length1"]) && empty($options["Subtype"])) )
continue;
// Hack, length doesnt always seem to be correct
unset($options["Length"]);
// So, we have text data. Decode it.
$data = $this->getDecodedStream($stream, $options);
if (strlen($data)) {
if (preg_match_all("#BT[\n|\r](.*)ET[\n|\r]#ismU", $data . "ET\r", $textContainers)) {
$textContainers = #$textContainers[1];
$this->getDirtyTexts($texts, $textContainers);
} else
$this->getCharTransformations($transformations, $data);
}
}
}
// Analyze text blocks taking into account character transformations and return results.
$this->decodedtext = $this->getTextUsingTransformations($texts, $transformations);
}
function decodeAsciiHex($input) {
$output = "";
$isOdd = true;
$isComment = false;
for($i = 0, $codeHigh = -1; $i < strlen($input) && $input[$i] != '>'; $i++) {
$c = $input[$i];
if($isComment) {
if ($c == '\r' || $c == '\n')
$isComment = false;
continue;
}
switch($c) {
case '\0': case '\t': case '\r': case '\f': case '\n': case ' ': break;
case '%':
$isComment = true;
break;
default:
$code = hexdec($c);
if($code === 0 && $c != '0')
return "";
if($isOdd)
$codeHigh = $code;
else
$output .= chr($codeHigh * 16 + $code);
$isOdd = !$isOdd;
break;
}
}
if($input[$i] != '>')
return "";
if($isOdd)
$output .= chr($codeHigh * 16);
return $output;
}
function decodeAscii85($input) {
$output = "";
$isComment = false;
$ords = array();
for($i = 0, $state = 0; $i < strlen($input) && $input[$i] != '~'; $i++) {
$c = $input[$i];
if($isComment) {
if ($c == '\r' || $c == '\n')
$isComment = false;
continue;
}
if ($c == '\0' || $c == '\t' || $c == '\r' || $c == '\f' || $c == '\n' || $c == ' ')
continue;
if ($c == '%') {
$isComment = true;
continue;
}
if ($c == 'z' && $state === 0) {
$output .= str_repeat(chr(0), 4);
continue;
}
if ($c < '!' || $c > 'u')
return "";
$code = ord($input[$i]) & 0xff;
$ords[$state++] = $code - ord('!');
if ($state == 5) {
$state = 0;
for ($sum = 0, $j = 0; $j < 5; $j++)
$sum = $sum * 85 + $ords[$j];
for ($j = 3; $j >= 0; $j--)
$output .= chr($sum >> ($j * 8));
}
}
if ($state === 1)
return "";
elseif ($state > 1) {
for ($i = 0, $sum = 0; $i < $state; $i++)
$sum += ($ords[$i] + ($i == $state - 1)) * pow(85, 4 - $i);
for ($i = 0; $i < $state - 1; $i++) {
try {
if(false == ($o = chr($sum >> ((3 - $i) * 8)))) {
throw new Exception('Error');
}
$output .= $o;
} catch (Exception $e) { /*Dont do anything*/ }
}
}
return $output;
}
function decodeFlate($data) {
return #gzuncompress($data);
}
function getObjectOptions($object) {
$options = array();
if (preg_match("#<<(.*)>>#ismU", $object, $options)) {
$options = explode("/", $options[1]);
#array_shift($options);
$o = array();
for ($j = 0; $j < #count($options); $j++) {
$options[$j] = preg_replace("#\s+#", " ", trim($options[$j]));
if (strpos($options[$j], " ") !== false) {
$parts = explode(" ", $options[$j]);
$o[$parts[0]] = $parts[1];
} else
$o[$options[$j]] = true;
}
$options = $o;
unset($o);
}
return $options;
}
function getDecodedStream($stream, $options) {
$data = "";
if (empty($options["Filter"]))
$data = $stream;
else {
$length = !empty($options["Length"]) ? $options["Length"] : strlen($stream);
$_stream = substr($stream, 0, $length);
foreach ($options as $key => $value) {
if ($key == "ASCIIHexDecode")
$_stream = $this->decodeAsciiHex($_stream);
elseif ($key == "ASCII85Decode")
$_stream = $this->decodeAscii85($_stream);
elseif ($key == "FlateDecode")
$_stream = $this->decodeFlate($_stream);
elseif ($key == "Crypt") { // TO DO
}
}
$data = $_stream;
}
return $data;
}
function getDirtyTexts(&$texts, $textContainers) {
for ($j = 0; $j < count($textContainers); $j++) {
if (preg_match_all("#\[(.*)\]\s*TJ[\n|\r]#ismU", $textContainers[$j], $parts))
$texts = array_merge($texts, array(#implode('', $parts[1])));
elseif (preg_match_all("#T[d|w|m|f]\s*(\(.*\))\s*Tj[\n|\r]#ismU", $textContainers[$j], $parts))
$texts = array_merge($texts, array(#implode('', $parts[1])));
elseif (preg_match_all("#T[d|w|m|f]\s*(\[.*\])\s*Tj[\n|\r]#ismU", $textContainers[$j], $parts))
$texts = array_merge($texts, array(#implode('', $parts[1])));
}
}
function getCharTransformations(&$transformations, $stream) {
preg_match_all("#([0-9]+)\s+beginbfchar(.*)endbfchar#ismU", $stream, $chars, PREG_SET_ORDER);
preg_match_all("#([0-9]+)\s+beginbfrange(.*)endbfrange#ismU", $stream, $ranges, PREG_SET_ORDER);
for ($j = 0; $j < count($chars); $j++) {
$count = $chars[$j][1];
$current = explode("\n", trim($chars[$j][2]));
for ($k = 0; $k < $count && $k < count($current); $k++) {
if (preg_match("#<([0-9a-f]{2,4})>\s+<([0-9a-f]{4,512})>#is", trim($current[$k]), $map))
$transformations[str_pad($map[1], 4, "0")] = $map[2];
}
}
for ($j = 0; $j < count($ranges); $j++) {
$count = $ranges[$j][1];
$current = explode("\n", trim($ranges[$j][2]));
for ($k = 0; $k < $count && $k < count($current); $k++) {
if (preg_match("#<([0-9a-f]{4})>\s+<([0-9a-f]{4})>\s+<([0-9a-f]{4})>#is", trim($current[$k]), $map)) {
$from = hexdec($map[1]);
$to = hexdec($map[2]);
$_from = hexdec($map[3]);
for ($m = $from, $n = 0; $m <= $to; $m++, $n++)
$transformations[sprintf("%04X", $m)] = sprintf("%04X", $_from + $n);
} elseif (preg_match("#<([0-9a-f]{4})>\s+<([0-9a-f]{4})>\s+\[(.*)\]#ismU", trim($current[$k]), $map)) {
$from = hexdec($map[1]);
$to = hexdec($map[2]);
$parts = preg_split("#\s+#", trim($map[3]));
for ($m = $from, $n = 0; $m <= $to && $n < count($parts); $m++, $n++)
$transformations[sprintf("%04X", $m)] = sprintf("%04X", hexdec($parts[$n]));
}
}
}
}
function getTextUsingTransformations($texts, $transformations) {
$document = "";
for ($i = 0; $i < count($texts); $i++) {
$isHex = false;
$isPlain = false;
$hex = "";
$plain = "";
for ($j = 0; $j < strlen($texts[$i]); $j++) {
$c = $texts[$i][$j];
switch($c) {
case "<":
$hex = "";
$isHex = true;
$isPlain = false;
break;
case ">":
$hexs = str_split($hex, $this->multibyte); // 2 or 4 (UTF8 or ISO)
for ($k = 0; $k < count($hexs); $k++) {
$chex = str_pad($hexs[$k], 4, "0"); // Add tailing zero
if (isset($transformations[$chex]))
$chex = $transformations[$chex];
$document .= html_entity_decode("&#x".$chex.";");
}
$isHex = false;
break;
case "(":
$plain = "";
$isPlain = true;
$isHex = false;
break;
case ")":
$document .= $plain;
$isPlain = false;
break;
case "\\":
$c2 = $texts[$i][$j + 1];
if (in_array($c2, array("\\", "(", ")"))) $plain .= $c2;
elseif ($c2 == "n") $plain .= '\n';
elseif ($c2 == "r") $plain .= '\r';
elseif ($c2 == "t") $plain .= '\t';
elseif ($c2 == "b") $plain .= '\b';
elseif ($c2 == "f") $plain .= '\f';
elseif ($c2 >= '0' && $c2 <= '9') {
$oct = preg_replace("#[^0-9]#", "", substr($texts[$i], $j + 1, 3));
$j += strlen($oct) - 1;
$plain .= html_entity_decode("&#".octdec($oct).";", $this->convertquotes);
}
$j++;
break;
default:
if ($isHex)
$hex .= $c;
elseif ($isPlain)
$plain .= $c;
break;
}
}
$document .= "\n";
}
return $document;
}
}

auto increasing id

I would like to build a php script that automatically generates a new id by increasing the previous by 1.
eg: A0009 becomes A0010 and A9999 becomes B0000
I have written one that works but it doesn't go over 5 chars long:
eg: Z9999 should go to A00000 and so on.
Any suggestions?
here is my snippet:
<?php
function replaceChar($string2replace)
{
$charLength = strlen($string2replace)-1;
$charAt = array();
$charAt[4] = substr($string2replace, -1);
$charAt[3] = substr($string2replace, -2,1);
$charAt[2] = substr($string2replace, -3,1);
$charAt[1] = substr($string2replace, -4,1);
$charAt[0] = substr($string2replace, 0,1);
if($charAt[4] < 9)
{
$string2replace = substr_replace($string2replace,$charAt[4]+1,$charLength);
}
else
{
$charAt[4] = 0;
$string2replace = substr_replace($string2replace,$charAt[4],$charLength);
if($charAt[3] < 9)
{
$string2replace = substr_replace($string2replace,$charAt[3]+1,$charLength- 1,1);
}
else
{
$charAt[3] = 0;
$string2replace = substr_replace($string2replace,$charAt[3],$charLength-1,1);
if($charAt[2] < 9)
{
$string2replace = substr_replace($string2replace,$charAt[2]+1,$charLength-2,1);
}
else
{
$charAt[2] = 0;
$string2replace = substr_replace($string2replace,$charAt[2],$charLength-2,1);
if($charAt[1] < 9)
{
$string2replace = substr_replace($string2replace,$charAt[1]+1,$charLength-3,1);
}
else
{
$charAt[1] = 0;
$string2replace = substr_replace($string2replace,$charAt[1],$charLength-3,1);
}
if($charAt[0] < 'z')
{
$charAt[0] ++;
$string2replace = substr_replace($string2replace,$charAt[0],$charLength-4,1);
}
else
{
$charAt[0] = 'a';
$string2replace = substr_replace($string2replace,$charAt[0],$charLength-4,1);
}
}
}
}
return $string2replace;
}
$string2begin = 'A9999';
$generatedString = replaceChar($string2begin);
echo $string2begin . "<br />" . $generatedString;
?>
Your ID numbering scheme seems rather contrived, where the high-order digit is A-Z and the remaining digits are 0-9. If I understand that pattern correctly, this seems to do the trick:
function incrementID($id)
{
$letter = $id[0];
$number = substr($id, 1);
$newNum = str_pad($number + 1, strlen($number), '0', STR_PAD_LEFT);
// increase number only
if (strlen($number) == strlen($newNum))
return $letter . $newNum;
// increase ID length ('Z' to 'A')
if ($letter == 'Z')
return 'A' . str_repeat('0', strlen($number) + 1);
// change letter
$newLetter = chr(ord($letter) + 1);
return $newLetter . str_repeat('0', strlen($number));
}
printf("%s\n", incrementID('A0009')); // 'A0010'
printf("%s\n", incrementID('A9999')); // 'B0000'
printf("%s\n", incrementID('Z9999')); // 'A00000'
Even though your examples didn't fit this, I first assumed you really just wanted a base-36 number (any digit could be 0-9,A-Z, where A is 10 and Z is 35). Working with numbers in base-36 is easy because you can use base_convert() to convert them to customary base-10. This is all you would need to do to increment base-36 numbers:
function incrementBase36($id)
{
$numVal = base_convert($id, 36, 10);
$newId = base_convert($numVal + 1, 10, 36);
return strtoupper($newId);
}
printf("%s\n", incrementBase36('A0009')); // 'A000A'
printf("%s\n", incrementBase36('A9999')); // 'A999A'
printf("%s\n", incrementBase36('Z9999')); // 'Z999A'
printf("%s\n", incrementBase36('AZZZZ')); // 'B0000'
printf("%s\n", incrementBase36('ZZZZZ')); // '100000'

optimizing a php function that trims strings

i programmed this php function that takes any text/html string and trims it.
For example:
gen_string("Hello, how are you today?",10);
Returns:
Hello, how...
The problem arises when the function string limit is the same as the position of a special character such as: á, ñ, etc...
In which case:
gen_string("Helló my friend",5);
Returns: Hell�...
Any ideas on how to solve this issue? This is the current function:
# string: advanced substr
function gen_string($string,$min,$clean=false) {
$text = trim(strip_tags($string));
if(strlen($text)>$min) {
$blank = strpos($text,' ');
if($blank) {
# limit plus last word
$extra = strpos(substr($text,$min),' ');
$max = $min+$extra;
$r = substr($text,0,$max);
if(strlen($text)>=$max && !$clean) $r=trim($r,'.').'...';
} else {
# if there are no spaces
$r = substr($text,0,$min).'...';
}
} else {
# if original length is lower than limit
$r = $text;
}
return trim($r);
}
Thanks!
You should use the multibyte string functions to correctly handle unicode characters.
For example you could try using mb_strimwidth to truncate a string to a specified length.
You could also take a different approach and make use of the PCRE regex extension's UTF-8 capabilities (assuming your strings are UTF-8!).
function gen_string($string, $length)
{
$str = trim(strip_tags($string));
$strlen = strlen(utf8_decode($str));
// String is less than limit
if ($strlen <= $length) return $str;
// Shorten string, preserving whole "words" (non-whitespace)
preg_match('/^.{'.($length-1).'}\S*/su', $str, $match);
// Append ellipsis if needed (bytes length is OK to check)
if (strlen($match[0]) !== strlen($str)) $match[0] .= '...';
return $match[0];
}
Aside from the multibyte issue, maybe you can write it shorter
function gen_string($str, $limit) {
if ($str >= strlen($limit))
return $str;
$offset = -(strlen($str) - $limit);
return substr($str, 0, strrpos($str, ' ', $offset)).'...';
}
It will limit the length of the string, so rather than cut it after the first word beyond the limit, it ensures that the length is never larger than the limit.
strlen() cannot be used for UTF-8 string, because it would count also the continuation characters, which should not be counted.
You can try with the following code:
define('PREG_CLASS_UNICODE_WORD_BOUNDARY',
'\x{0}-\x{2F}\x{3A}-\x{40}\x{5B}-\x{60}\x{7B}-\x{A9}\x{AB}-\x{B1}\x{B4}' .
'\x{B6}-\x{B8}\x{BB}\x{BF}\x{D7}\x{F7}\x{2C2}-\x{2C5}\x{2D2}-\x{2DF}' .
'\x{2E5}-\x{2EB}\x{2ED}\x{2EF}-\x{2FF}\x{375}\x{37E}-\x{385}\x{387}\x{3F6}' .
'\x{482}\x{55A}-\x{55F}\x{589}-\x{58A}\x{5BE}\x{5C0}\x{5C3}\x{5C6}' .
'\x{5F3}-\x{60F}\x{61B}-\x{61F}\x{66A}-\x{66D}\x{6D4}\x{6DD}\x{6E9}' .
'\x{6FD}-\x{6FE}\x{700}-\x{70F}\x{7F6}-\x{7F9}\x{830}-\x{83E}' .
'\x{964}-\x{965}\x{970}\x{9F2}-\x{9F3}\x{9FA}-\x{9FB}\x{AF1}\x{B70}' .
'\x{BF3}-\x{BFA}\x{C7F}\x{CF1}-\x{CF2}\x{D79}\x{DF4}\x{E3F}\x{E4F}' .
'\x{E5A}-\x{E5B}\x{F01}-\x{F17}\x{F1A}-\x{F1F}\x{F34}\x{F36}\x{F38}' .
'\x{F3A}-\x{F3D}\x{F85}\x{FBE}-\x{FC5}\x{FC7}-\x{FD8}\x{104A}-\x{104F}' .
'\x{109E}-\x{109F}\x{10FB}\x{1360}-\x{1368}\x{1390}-\x{1399}\x{1400}' .
'\x{166D}-\x{166E}\x{1680}\x{169B}-\x{169C}\x{16EB}-\x{16ED}' .
'\x{1735}-\x{1736}\x{17B4}-\x{17B5}\x{17D4}-\x{17D6}\x{17D8}-\x{17DB}' .
'\x{1800}-\x{180A}\x{180E}\x{1940}-\x{1945}\x{19DE}-\x{19FF}' .
'\x{1A1E}-\x{1A1F}\x{1AA0}-\x{1AA6}\x{1AA8}-\x{1AAD}\x{1B5A}-\x{1B6A}' .
'\x{1B74}-\x{1B7C}\x{1C3B}-\x{1C3F}\x{1C7E}-\x{1C7F}\x{1CD3}\x{1FBD}' .
'\x{1FBF}-\x{1FC1}\x{1FCD}-\x{1FCF}\x{1FDD}-\x{1FDF}\x{1FED}-\x{1FEF}' .
'\x{1FFD}-\x{206F}\x{207A}-\x{207E}\x{208A}-\x{208E}\x{20A0}-\x{20B8}' .
'\x{2100}-\x{2101}\x{2103}-\x{2106}\x{2108}-\x{2109}\x{2114}' .
'\x{2116}-\x{2118}\x{211E}-\x{2123}\x{2125}\x{2127}\x{2129}\x{212E}' .
'\x{213A}-\x{213B}\x{2140}-\x{2144}\x{214A}-\x{214D}\x{214F}' .
'\x{2190}-\x{244A}\x{249C}-\x{24E9}\x{2500}-\x{2775}\x{2794}-\x{2B59}' .
'\x{2CE5}-\x{2CEA}\x{2CF9}-\x{2CFC}\x{2CFE}-\x{2CFF}\x{2E00}-\x{2E2E}' .
'\x{2E30}-\x{3004}\x{3008}-\x{3020}\x{3030}\x{3036}-\x{3037}' .
'\x{303D}-\x{303F}\x{309B}-\x{309C}\x{30A0}\x{30FB}\x{3190}-\x{3191}' .
'\x{3196}-\x{319F}\x{31C0}-\x{31E3}\x{3200}-\x{321E}\x{322A}-\x{3250}' .
'\x{3260}-\x{327F}\x{328A}-\x{32B0}\x{32C0}-\x{33FF}\x{4DC0}-\x{4DFF}' .
'\x{A490}-\x{A4C6}\x{A4FE}-\x{A4FF}\x{A60D}-\x{A60F}\x{A673}\x{A67E}' .
'\x{A6F2}-\x{A716}\x{A720}-\x{A721}\x{A789}-\x{A78A}\x{A828}-\x{A82B}' .
'\x{A836}-\x{A839}\x{A874}-\x{A877}\x{A8CE}-\x{A8CF}\x{A8F8}-\x{A8FA}' .
'\x{A92E}-\x{A92F}\x{A95F}\x{A9C1}-\x{A9CD}\x{A9DE}-\x{A9DF}' .
'\x{AA5C}-\x{AA5F}\x{AA77}-\x{AA79}\x{AADE}-\x{AADF}\x{ABEB}' .
'\x{D800}-\x{F8FF}\x{FB29}\x{FD3E}-\x{FD3F}\x{FDFC}-\x{FDFD}' .
'\x{FE10}-\x{FE19}\x{FE30}-\x{FE6B}\x{FEFF}-\x{FF0F}\x{FF1A}-\x{FF20}' .
'\x{FF3B}-\x{FF40}\x{FF5B}-\x{FF65}\x{FFE0}-\x{FFFD}');
function utf8_strlen($text) {
if (function_exists('mb_strlen')) {
return mb_strlen($text);
}
// Do not count UTF-8 continuation bytes.
return strlen(preg_replace("/[\x80-\xBF]/", '', $text));
}
function utf8_truncate($string, $max_length, $wordsafe = FALSE, $add_ellipsis = FALSE, $min_wordsafe_length = 1) {
$ellipsis = '';
$max_length = max($max_length, 0);
$min_wordsafe_length = max($min_wordsafe_length, 0);
if (utf8_strlen($string) <= $max_length) {
// No truncation needed, so don't add ellipsis, just return.
return $string;
}
if ($add_ellipsis) {
// Truncate ellipsis in case $max_length is small.
$ellipsis = utf8_substr('...', 0, $max_length);
$max_length -= utf8_strlen($ellipsis);
$max_length = max($max_length, 0);
}
if ($max_length <= $min_wordsafe_length) {
// Do not attempt word-safe if lengths are bad.
$wordsafe = FALSE;
}
if ($wordsafe) {
$matches = array();
// Find the last word boundary, if there is one within $min_wordsafe_length
// to $max_length characters. preg_match() is always greedy, so it will
// find the longest string possible.
$found = preg_match('/^(.{' . $min_wordsafe_length . ',' . $max_length . '})[' . PREG_CLASS_UNICODE_WORD_BOUNDARY . ']/u', $string, $matches);
if ($found) {
$string = $matches[1];
}
else {
$string = utf8_substr($string, 0, $max_length);
}
}
else {
$string = utf8_substr($string, 0, $max_length);
}
if ($add_ellipsis) {
$string .= $ellipsis;
}
return $string;
}
function utf8_substr($text, $start, $length = NULL) {
if (function_exists('mb_substr')) {
return $length === NULL ? mb_substr($text, $start) : mb_substr($text, $start, $length);
}
else {
$strlen = strlen($text);
// Find the starting byte offset.
$bytes = 0;
if ($start > 0) {
// Count all the continuation bytes from the start until we have found
// $start characters or the end of the string.
$bytes = -1;
$chars = -1;
while ($bytes < $strlen - 1 && $chars < $start) {
$bytes++;
$c = ord($text[$bytes]);
if ($c < 0x80 || $c >= 0xC0) {
$chars++;
}
}
}
elseif ($start < 0) {
// Count all the continuation bytes from the end until we have found
// abs($start) characters.
$start = abs($start);
$bytes = $strlen;
$chars = 0;
while ($bytes > 0 && $chars < $start) {
$bytes--;
$c = ord($text[$bytes]);
if ($c < 0x80 || $c >= 0xC0) {
$chars++;
}
}
}
$istart = $bytes;
// Find the ending byte offset.
if ($length === NULL) {
$iend = $strlen;
}
elseif ($length > 0) {
// Count all the continuation bytes from the starting index until we have
// found $length characters or reached the end of the string, then
// backtrace one byte.
$iend = $istart - 1;
$chars = -1;
$last_real = FALSE;
while ($iend < $strlen - 1 && $chars < $length) {
$iend++;
$c = ord($text[$iend]);
$last_real = FALSE;
if ($c < 0x80 || $c >= 0xC0) {
$chars++;
$last_real = TRUE;
}
}
// Backtrace one byte if the last character we found was a real character
// and we don't need it.
if ($last_real && $chars >= $length) {
$iend--;
}
}
elseif ($length < 0) {
// Count all the continuation bytes from the end until we have found
// abs($start) characters, then backtrace one byte.
$length = abs($length);
$iend = $strlen;
$chars = 0;
while ($iend > 0 && $chars < $length) {
$iend--;
$c = ord($text[$iend]);
if ($c < 0x80 || $c >= 0xC0) {
$chars++;
}
}
// Backtrace one byte if we are not at the beginning of the string.
if ($iend > 0) {
$iend--;
}
}
else {
// $length == 0, return an empty string.
return '';
}
return substr($text, $istart, max(0, $iend - $istart + 1));
}
}
For your return statement you could try:
return htmlspecialchars(trim($r));
EDIT: I tried your code as you provided it and it ran fine for me without having to use htmlspecialchars(). This is probably due to the face that in the <head> of the page the code was running on, the charset was set to UTF-8. So your options could be to set the encoding of the page like this:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
or to use htmlspecialchars() as above.

Restricting access to a site using IP address

I would like to know whether there is a way of restricting the users of a site such that they can only access the inner pages of a site if they are within a certain range of IP addresses or a certain network?
The current PHP scripts I am getting cant differentiate the real IPs from the Proxies?
Thanks
i wouldn’t restrict on ip addresses. as you said, you can’t know if it’s a proxy. furthermore, ip addresses can be easily spoofed.
Have you considered using apache .htaccess files for that?
IP restriction with htaccess
You can try out a script I created that allows very advanced IP rules. I coded it years ago so I apologize in advance for the current shape of it.
Edit:
If you're looking for an "&" operator in the syntax don't bother. I forgot to add it when I coded this and looking back at this script now makes me cringe at the thought of touching it again.
<?php
##############################################################
# IP Expression Class #
# Easy IP-based Access Restrictions #
# Change Log: #
# - Added Range and limited IPv6 support #
# - Changed name from IPAR to IPEX #
# #
##############################################################
# Example Rules: #
# 69.[10-20].[^50].* #
# 69.*.[1-5 | 10-20 |^30].* #
# 60.12.2.* #
# 127.* #
# 69.1.1.1-70.1.1.1 <-- This is a range #
# #
# Usage: #
# Ipex::IsMatch($rule, $ip); #
# #
# [range] - Defines a range for a section of the IP #
# | - OR token. IP can match this range/number #
# ^ - NOT token. IP can not match this range/number #
# x-y - Defines a range from x to y #
# x - Exactly match x (x = a hex or dec number) #
# * - Match any number #
# #
#----------===============================-------------------#
# [ Written by Chris Tarquini ] #
#----------===============================-------------------#
##############################################################
define('IPR_DENY', false);
define('IPR_ALLOW', true);
define('IPR_ERR_MISMATCH',-1);
define('IPR_ERR_RANGE_MISMATCH',-2);
define('IPR_ERR_RANGE_INVALID',-3);
define('IPR_ERR_INVALID_RULE',-4);
class IPEX
{
const TOKEN_RANGE_BEGIN = '[';
const TOKEN_RANGE_END = ']';
const TOKEN_WILDCARD = '*';
const TOKEN_RANGE_SPLIT = '-';
const TOKEN_OR = '|';
const TOKEN_NOT = '^';
const DEBUG_MODE = TRUE;
private static function trace($err){if(self::DEBUG_MODE) echo "$err\r\n";}
private static function FixRule($rule,$count = 4, $split='.')
{
$rule = explode($split,$rule);
$filler = 0;
$size = sizeof($rule);
for($i = 0; $i < $count; $i++)
{
if($i > $size) { $rule[] = $filler; $size++;}
else if(empty($rule[$i])) { $filler = self::TOKEN_WILDCARD; $rule[$i] = $filler;}
}
return $rule;
}
private static function FixIP($rule,$count = 4, $split='.')
{
$rule = explode($split,$rule);
$size = sizeof($rule);
for($i = 0; $i < $count; $i++)
{
if($i > $size) { $rule[] = 0; $size++;}
else if(empty($rule[$i])) { $rule[$i] = 0;}
}
return $rule;
}
private static function GetIpType(&$ip)
{
$mode = IPID::Identify($ip,$newip);
if($mode == IPID_IPv4_Embed) { $ip = $newip; return IPID_IPv4;}
return $mode;
}
private static function FixIPRange(&$start, &$stop)
{
$count = 4; $split = '.';
if(self::GetIpType($start) == IPID_IPv6) {$count = 8; $split = ':';}
$q = 0;
while($q < 2)
{
$filler = ($q == 0) ? 0 : 255;
$arr = explode($split,($q == 0) ? $start : $stop);
$size = sizeof($arr);
for($i = 0; $i < $count; $i++)
{
if($i > $size){ $arr[] = $filler; $size++;}
else if(empty($arr[$i])){ $arr[$i] = $filler; }
}
if($q == 0) $start = implode($split, $arr);
else $stop = implode($split,$arr);
$q++;
}
}
public static function IsInRange($start, $stop, $ip)
{
//Sorry guys we only support IPv4 for this ;(
self::FixIPRange($start,$stop);
self::trace("fixed: start = $start, stop = $stop");
$start = ip2long($start); $stop = ip2long($stop);
$ip = ip2long($ip);
self::trace("start = $start, stop = $stop, ip = $ip");
return ($ip >= $start && $ip <= $stop);
}
public static function IsAllowed($rule, $ip){return self::IsMatch($rule,$ip);}
public static function IsMatch($rule,$ip)
{
$mode = self::GetIpType($ip);
self::trace("ip type: $mode");
if(strpos($rule, self::TOKEN_RANGE_SPLIT) !== false && strpos($rule,self::TOKEN_RANGE_BEGIN) === false)
{
self::trace("ip range mode");
$test = explode(self::TOKEN_RANGE_SPLIT, $rule);
self::trace("range size: ".sizeof($test));
print_r($test);
if(sizeof($test) != 2) return IPR_ERR_RANGE_INVALID;
$start = $test[0]; $end = $test[1];
if(empty($start) || empty($end)) return IPR_ERR_RANGE_INVALID;
self::trace("range start: $start, range stop: $end");
$rm1 = (self::IsHex($start)) ? $mode : self::GetIpType($start);
$rm2 = (self::IsHex($end)) ? $mode : self::GetIpType($end);
self::trace("range types: $rm1, $rm2\r\nip type: $mode");
if($rm1 != $rm2 || $rm1 != $mode) return IPR_ERR_RANGE_MISMATCH;
if($mode == IPID_IPv6) { return IPR_ERR_IPv6_NOTSUPPORTED;}
return self::IsInRange($start,$end,$ip);
}
if(self::GetIpType($rule) != $mode) return IPR_ERR_MISMATCH;
//all is good so far
$count = 4;
$split = '.'; if($mode==IPID_IPv6){$count = 8; $split=':';}
$rule = self::FixRule($rule, $count,$split);
$ip = self::FixIp($ip,$count,$split);
self::trace("ip: ".implode($split,$ip));
self::trace('rule: '.implode($split,$rule));
for($i = 0; $i < $count; $i++)
{
$r = str_replace(' ', '', $rule[$i]);
$ri = false;
if($r == self::TOKEN_WILDCARD) continue;
if($mode == IPPID_IPv6 && self::IsHex($r)) { $ri = hexdec($r);}else if(is_numeric($r)) $ri = $r;
$x = $ip[$i];
if($mode == IPPID_IPv6) $x = hexdec($x);
//* Exact Match *//
self::trace("rule[$i]: $ri");
self::trace("ip[$i]: $x");
if($ri !== false && $ri != $x) return IPR_DENY;
$len = strlen($r);
for($y = 0; $y < $len; $y++)
{
self::trace("y = $y");
if(substr($r, $y,1) == self::TOKEN_RANGE_BEGIN)
{
++$y;
self::trace("found range, y = $y");
$negflag = false;
$start = false;
$stop = false;
$allows = 0;
$denys = 0;
$q = 0;
$c = substr($r,$y,1);
while($c !== false)
{
self::trace("in range, char: $c");
//* Flags *//
$break = false;
$exec = false;
$toggle = false;
$reset = false;
if($c === self::TOKEN_RANGE_END) {$skiphex = true;$break = true; $exec = true; self::trace("found end of range");}
if($c === self::TOKEN_NOT) {if($q > 0){ $toggle = true; $exec = true;} else $negflag = !$negflag; $skiphex =false; self::trace("found TOKEN_NOT");}
if($c === self::TOKEN_OR) { $exec = true; $reset = true;$skiphex=true;self::trace("found TOKEN_OR");}
if($c === self::TOKEN_RANGE_SPLIT){ $skiphex = false;++$q; self::trace("found range split");}
//* Read Hex Tokens *//
if(!$skiphex && self::IsHexChar($c))
{
$n = self::ReadNextHexToken($r,$y);
if($mode == IPID_IPv6) $n = hexdec($n);
if($q == 0) $start = $n;
else if($q == 1) $stop = $n;
--$y; //fixes error
self::trace("parsed number: $n, y = $y");
}
if($reset) {$negflag = false; $start = false; $stop = false; $q = 0;}
if($exec)
{
self::trace("executing: start = $start, stop = $stop, x = $x");
self::trace("negflag = $negflag");
if($stop !== false && $x >= $start && $x <= $stop)
{
if($negflag) { ++$denys; $allows = 0; break;}
else ++$allows;
}
else if($stop === false && $start == $x)
{
if($negflag) { ++$denys; $allows = 0; break;}
else ++$allows;
}
self::trace("exec complete: allows = $allows, denys = $denys");
$q = 0;
}
if($toggle) $negflag = !$negflag;
if($break) break;
++$y;
$c = substr($r,$y,1);
}
if(!$allows) return IPR_DENY;
}
}
}
return IPR_ALLOW;
}
private static function ReadNextHexToken($buff, &$offset, $max = -1)
{
$str = '';
if($max == -1) { $max = strlen($buff);}
for(; $offset < $max; $offset++)
{
$c = substr($buff,$offset, 1);
if(self::IsHexChar($c))
$str .= $c;
else
return $str;
}
return $str;
}
private static function IsHex($x){ $len = strlen($x); for($i = 0; $i < $len; $i++) if(!self::IsHexChar(substr($x,$i,1))) return false; return true;}
private static function IsHexChar($x){self::trace("isHex($x);"); return (in_array(strtoupper($x),array('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F')));
}
}
######################
# IP Identify Class #
#####################
define('IPID_INVALID',false);
define('IPID_IPv4',2);
define('IPID_IPv6',3);
define('IPID_IPv4_Embed',6);
class IPID
{
public static function Identify($ip,&$ipconvert = false)
{
$ip = strtoupper($ip);
$ipconvert = $ip;
// Check if we are IPv4
if(strpos($ip,':') === false && strpos($ip,'.') !== false)
return IPID_IPv4;
//Is it one of those hybrids?
else if(strpos($ip,':FFFF') !== false && strpos($ip,'.') !== false)
{
$ipconvert = substr($ip,strpos($ip,':FFFF:')+6);
return IPID_IPv4_Embed;
}
// Is it IPv6?
else if(strpos($ip,':') !== false) return IPID_IPv6;
// What the...?
return IPID_INVALID;
}
}
?>
You can use it as long as you don't try and resell it and you keep the header as is.
<?php
//This function returns True if visitor IP is allowed.
//Otherwise it returns False.
function CheckAccess()
{
//allowed IP. Change it to your static IP
$allowedip = '127.0.0.1';
$ip = $_SERVER['REMOTE_ADDR'];
return ($ip == $allowedip);
}
Proxy servers should set the X-Forwarded-For HTTP header, which you could look up with $_SERVER['HTTP_X_FORWARDED_FOR']. Otherwise $_SERVER['REMOTE_ADDR'] can be used to get the IP address. As others have noted, both of these can be easily spoofed, and there is no requirement for proxies to set the X-Forwarded-For request header.
There is a ip2long() function in PHP which give you an integer to use for range checking.
To get the location of an IP address you need a lookup table which maps IP address ranges to approximate geographical locations (such lookup tables are typically not free). There are many services which offer IP address geolocation, some of which are mentioned here and here.

Categories