I have been trying to improve this script in PHP so it can give me the value of a to 9999999999.....9999999999 (up to 72 characters) to insert in MySQL. So far it stops at 999. I have increased Apache's memory and the script exuction time but it still stays the same. Here is my script:
<?php
function evol($length = 1, $deb_chaine = '') {
$tab=array("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z","0","1","2","3","4","5","6","7","8","9");
$str = '';
if(strlen($deb_chaine) <= ($length - 1)) {
foreach($tab as $lettre) {
$str .= ' '. $deb_chaine . $lettre;
}
if($deb_chaine == '') {
$str .= evol($length, 'a');
}
else { // sinon
$last = substr($deb_chaine, -1);
$reste = substr($deb_chaine, 0, -1);
if($last == "9") {
$i = strlen($deb_chaine) - 1;
$reste = "";
while($i >= 0) {
if($deb_chaine[$i] == "9") {
$reste = 'a'. $reste;
}
else {
$reste = $tab[(array_search($deb_chaine[$i], $tab) + 1)] . substr($reste, 0, -1);
break 1;
}
$i--;
}
$new = 'a';
}
else {
$new = $tab[(array_search($last, $tab) + 1)];
}
$str .= evol($length, ($reste . $new));
}
}
return $str;
}
echo evol(72);
?>
This code sets the value of a to 999.
Related
I want to print a string with this colors:
yellow, when a char is an odd number;
blue, when a char is an even number;
red, when a char is a vowel;
green, a the char is a consonant;
For example: $string = "hi12";
h = green
i = red
1 = yellow
2 = blue
I've tried with this but it does not seem to work:
$string = "hi12"; <br>
$strCol = ""; <br>
$char = ""; <br>
$color = ""; <br>
for($i = 0; $i < strlen($string); $i++){
$char = $string[$i];
if(is_numeric($char)){
if(($char % 2) == 1){
$color = "<p style='color:yellow;'>" + $char + "</p>";
$strCol .= $color;
}
else if(($char % 2) == 0){
$color = "<p style='color:blue;'>" + $char + "</p>";
$strCol .= $color;
}
}
else{
if(preg_match('/^[aeiou]/i', $char)){
$color = "<p style='color:red;'>" + $char + "</p>";
$strCol .= $color;
}
else{
$color = "<p style='color:green;'>" + $char + "</p>";
$strCol .= $color;
}
}
}
echo $strCol;
$isGreen = false;
$isRed = false;
$isYellow = false;
$isBlue = false;
$string = 'some text with space';
$letterCount = strlen($string) - substr_count($string, ' ');
if (0 == $letterCount % 4) {
$isBlue = true;
} elseif (1 == $letterCount % 4) {
$isGreen = true;
} elseif (2 == $letterCount % 4) {
$isRed = true;
} else {
$isYellow = true;
}
for ($i = strlen($string) - 1; $i >= 0; $i--) {
$letter = substr($string, $i, 1);
if (' ' == $letter) {
continue;
}
if ($isGreen) {
$colour = 'green';
$isGreen = !$isGreen;
$isBlue = !$isBlue;
} elseif ($isRed) {
$colour = 'red';
$isGreen = !$isGreen;
$isRed = !$isRed;
} elseif ($isYellow) {
$colour = 'yellow';
$isRed = !$isRed;
$isYellow = !$isYellow;
} else {
$colour = 'blue';
$isBlue = !$isBlue;
$isYellow = !$isYellow;
}
$string = substr_replace($string, sprintf('<span style="color:%s">%s</span>', $colour, $letter) , $i, 1);
}
echo $string;
My company receives coupons codes from another company and I need to process them to use them in my company's website. The problem is that they send me a PDF file with the codes, and they say their system can only export them in PDF. Strange.
I've tried several times to let them know that I need those coupons in plain text separated by something (a.k.a CSV) and they doesn't take any notice of that.
My website is written in PHP, and uses MySQL. I would like to offer a way to upload that coupons file and add those to the database. It would be easy considering it is a CSV file but it's not.
Is there any way I can programatically -not manually- process PDF files as text files? or, any workaround for this situation?
I have found pdf2text to have worked well for quite some time.
Will not work with image (TIFF) PDF. will work with text searchable PDF.
include('/home/user/php/class.pdf2text.php');
$p2t = new PDF2Text();
$p2t ->setFilename($pdf);
$p2t ->decodePDF();
$data = $p2t ->output();
$pos = strpos($data,$search);
if (pos){...}
Source:
<?php
class PDF2Text {
// Some settings
var $multibyte = 4; // Use setUnicode(TRUE|FALSE)
var $convertquotes = ENT_QUOTES; // ENT_COMPAT (double-quotes), ENT_QUOTES (Both), ENT_NOQUOTES (None)
var $showprogress = true; // TRUE if you have problems with time-out
// Variables
var $filename = '';
var $decodedtext = '';
function setFilename($filename) {
// Reset
$this->decodedtext = '';
$this->filename = $filename;
}
function output($echo = false) {
if($echo) echo $this->decodedtext;
else return $this->decodedtext;
}
function setUnicode($input) {
// 4 for unicode. But 2 should work in most cases just fine
if($input == true) $this->multibyte = 4;
else $this->multibyte = 2;
}
function decodePDF() {
// Read the data from pdf file
$infile = #file_get_contents($this->filename, FILE_BINARY);
if (empty($infile))
return "";
// Get all text data.
$transformations = array();
$texts = array();
// Get the list of all objects.
preg_match_all("#obj[\n|\r](.*)endobj[\n|\r]#ismU", $infile . "endobj\r", $objects);
$objects = #$objects[1];
// Select objects with streams.
for ($i = 0; $i < count($objects); $i++) {
$currentObject = $objects[$i];
// Prevent time-out
#set_time_limit ();
if($this->showprogress) {
// echo ". ";
flush(); ob_flush();
}
// Check if an object includes data stream.
if (preg_match("#stream[\n|\r](.*)endstream[\n|\r]#ismU", $currentObject . "endstream\r", $stream )) {
$stream = ltrim($stream[1]);
// Check object parameters and look for text data.
$options = $this->getObjectOptions($currentObject);
if (!(empty($options["Length1"]) && empty($options["Type"]) && empty($options["Subtype"])) )
// if ( $options["Image"] && $options["Subtype"] )
// if (!(empty($options["Length1"]) && empty($options["Subtype"])) )
continue;
// Hack, length doesnt always seem to be correct
unset($options["Length"]);
// So, we have text data. Decode it.
$data = $this->getDecodedStream($stream, $options);
if (strlen($data)) {
if (preg_match_all("#BT[\n|\r](.*)ET[\n|\r]#ismU", $data . "ET\r", $textContainers)) {
$textContainers = #$textContainers[1];
$this->getDirtyTexts($texts, $textContainers);
} else
$this->getCharTransformations($transformations, $data);
}
}
}
// Analyze text blocks taking into account character transformations and return results.
$this->decodedtext = $this->getTextUsingTransformations($texts, $transformations);
}
function decodeAsciiHex($input) {
$output = "";
$isOdd = true;
$isComment = false;
for($i = 0, $codeHigh = -1; $i < strlen($input) && $input[$i] != '>'; $i++) {
$c = $input[$i];
if($isComment) {
if ($c == '\r' || $c == '\n')
$isComment = false;
continue;
}
switch($c) {
case '\0': case '\t': case '\r': case '\f': case '\n': case ' ': break;
case '%':
$isComment = true;
break;
default:
$code = hexdec($c);
if($code === 0 && $c != '0')
return "";
if($isOdd)
$codeHigh = $code;
else
$output .= chr($codeHigh * 16 + $code);
$isOdd = !$isOdd;
break;
}
}
if($input[$i] != '>')
return "";
if($isOdd)
$output .= chr($codeHigh * 16);
return $output;
}
function decodeAscii85($input) {
$output = "";
$isComment = false;
$ords = array();
for($i = 0, $state = 0; $i < strlen($input) && $input[$i] != '~'; $i++) {
$c = $input[$i];
if($isComment) {
if ($c == '\r' || $c == '\n')
$isComment = false;
continue;
}
if ($c == '\0' || $c == '\t' || $c == '\r' || $c == '\f' || $c == '\n' || $c == ' ')
continue;
if ($c == '%') {
$isComment = true;
continue;
}
if ($c == 'z' && $state === 0) {
$output .= str_repeat(chr(0), 4);
continue;
}
if ($c < '!' || $c > 'u')
return "";
$code = ord($input[$i]) & 0xff;
$ords[$state++] = $code - ord('!');
if ($state == 5) {
$state = 0;
for ($sum = 0, $j = 0; $j < 5; $j++)
$sum = $sum * 85 + $ords[$j];
for ($j = 3; $j >= 0; $j--)
$output .= chr($sum >> ($j * 8));
}
}
if ($state === 1)
return "";
elseif ($state > 1) {
for ($i = 0, $sum = 0; $i < $state; $i++)
$sum += ($ords[$i] + ($i == $state - 1)) * pow(85, 4 - $i);
for ($i = 0; $i < $state - 1; $i++) {
try {
if(false == ($o = chr($sum >> ((3 - $i) * 8)))) {
throw new Exception('Error');
}
$output .= $o;
} catch (Exception $e) { /*Dont do anything*/ }
}
}
return $output;
}
function decodeFlate($data) {
return #gzuncompress($data);
}
function getObjectOptions($object) {
$options = array();
if (preg_match("#<<(.*)>>#ismU", $object, $options)) {
$options = explode("/", $options[1]);
#array_shift($options);
$o = array();
for ($j = 0; $j < #count($options); $j++) {
$options[$j] = preg_replace("#\s+#", " ", trim($options[$j]));
if (strpos($options[$j], " ") !== false) {
$parts = explode(" ", $options[$j]);
$o[$parts[0]] = $parts[1];
} else
$o[$options[$j]] = true;
}
$options = $o;
unset($o);
}
return $options;
}
function getDecodedStream($stream, $options) {
$data = "";
if (empty($options["Filter"]))
$data = $stream;
else {
$length = !empty($options["Length"]) ? $options["Length"] : strlen($stream);
$_stream = substr($stream, 0, $length);
foreach ($options as $key => $value) {
if ($key == "ASCIIHexDecode")
$_stream = $this->decodeAsciiHex($_stream);
elseif ($key == "ASCII85Decode")
$_stream = $this->decodeAscii85($_stream);
elseif ($key == "FlateDecode")
$_stream = $this->decodeFlate($_stream);
elseif ($key == "Crypt") { // TO DO
}
}
$data = $_stream;
}
return $data;
}
function getDirtyTexts(&$texts, $textContainers) {
for ($j = 0; $j < count($textContainers); $j++) {
if (preg_match_all("#\[(.*)\]\s*TJ[\n|\r]#ismU", $textContainers[$j], $parts))
$texts = array_merge($texts, array(#implode('', $parts[1])));
elseif (preg_match_all("#T[d|w|m|f]\s*(\(.*\))\s*Tj[\n|\r]#ismU", $textContainers[$j], $parts))
$texts = array_merge($texts, array(#implode('', $parts[1])));
elseif (preg_match_all("#T[d|w|m|f]\s*(\[.*\])\s*Tj[\n|\r]#ismU", $textContainers[$j], $parts))
$texts = array_merge($texts, array(#implode('', $parts[1])));
}
}
function getCharTransformations(&$transformations, $stream) {
preg_match_all("#([0-9]+)\s+beginbfchar(.*)endbfchar#ismU", $stream, $chars, PREG_SET_ORDER);
preg_match_all("#([0-9]+)\s+beginbfrange(.*)endbfrange#ismU", $stream, $ranges, PREG_SET_ORDER);
for ($j = 0; $j < count($chars); $j++) {
$count = $chars[$j][1];
$current = explode("\n", trim($chars[$j][2]));
for ($k = 0; $k < $count && $k < count($current); $k++) {
if (preg_match("#<([0-9a-f]{2,4})>\s+<([0-9a-f]{4,512})>#is", trim($current[$k]), $map))
$transformations[str_pad($map[1], 4, "0")] = $map[2];
}
}
for ($j = 0; $j < count($ranges); $j++) {
$count = $ranges[$j][1];
$current = explode("\n", trim($ranges[$j][2]));
for ($k = 0; $k < $count && $k < count($current); $k++) {
if (preg_match("#<([0-9a-f]{4})>\s+<([0-9a-f]{4})>\s+<([0-9a-f]{4})>#is", trim($current[$k]), $map)) {
$from = hexdec($map[1]);
$to = hexdec($map[2]);
$_from = hexdec($map[3]);
for ($m = $from, $n = 0; $m <= $to; $m++, $n++)
$transformations[sprintf("%04X", $m)] = sprintf("%04X", $_from + $n);
} elseif (preg_match("#<([0-9a-f]{4})>\s+<([0-9a-f]{4})>\s+\[(.*)\]#ismU", trim($current[$k]), $map)) {
$from = hexdec($map[1]);
$to = hexdec($map[2]);
$parts = preg_split("#\s+#", trim($map[3]));
for ($m = $from, $n = 0; $m <= $to && $n < count($parts); $m++, $n++)
$transformations[sprintf("%04X", $m)] = sprintf("%04X", hexdec($parts[$n]));
}
}
}
}
function getTextUsingTransformations($texts, $transformations) {
$document = "";
for ($i = 0; $i < count($texts); $i++) {
$isHex = false;
$isPlain = false;
$hex = "";
$plain = "";
for ($j = 0; $j < strlen($texts[$i]); $j++) {
$c = $texts[$i][$j];
switch($c) {
case "<":
$hex = "";
$isHex = true;
$isPlain = false;
break;
case ">":
$hexs = str_split($hex, $this->multibyte); // 2 or 4 (UTF8 or ISO)
for ($k = 0; $k < count($hexs); $k++) {
$chex = str_pad($hexs[$k], 4, "0"); // Add tailing zero
if (isset($transformations[$chex]))
$chex = $transformations[$chex];
$document .= html_entity_decode("&#x".$chex.";");
}
$isHex = false;
break;
case "(":
$plain = "";
$isPlain = true;
$isHex = false;
break;
case ")":
$document .= $plain;
$isPlain = false;
break;
case "\\":
$c2 = $texts[$i][$j + 1];
if (in_array($c2, array("\\", "(", ")"))) $plain .= $c2;
elseif ($c2 == "n") $plain .= '\n';
elseif ($c2 == "r") $plain .= '\r';
elseif ($c2 == "t") $plain .= '\t';
elseif ($c2 == "b") $plain .= '\b';
elseif ($c2 == "f") $plain .= '\f';
elseif ($c2 >= '0' && $c2 <= '9') {
$oct = preg_replace("#[^0-9]#", "", substr($texts[$i], $j + 1, 3));
$j += strlen($oct) - 1;
$plain .= html_entity_decode("&#".octdec($oct).";", $this->convertquotes);
}
$j++;
break;
default:
if ($isHex)
$hex .= $c;
elseif ($isPlain)
$plain .= $c;
break;
}
}
$document .= "\n";
}
return $document;
}
}
?>
How increase the performance of this code in php?
or any alternative method to find out the comment if the string starts with # then call at()
or if string starts with "#" then call hash()
here the sample comment is "#hash #at #####tag";
/
/this is the comment with mention,tag
function getCommentWithLinks($comment="#name #tag ###nam1 test", $pid, $img='', $savedid='', $source='', $post_facebook='', $fbCmntInfo='') {
//assign to facebook facebookComment bcz it is used to post into the fb wall
$this->facebookComment = $comment;
//split the comment based on the space
$comment = explode(" ", $comment);
//get the lenght of the splitted array
$cmnt_length = count($comment);
$store_cmnt = $tagid = '';
$this->img = $img;
$this->saveid = $savedid;//this is uspid in product saved table primary key
//$this->params = "&product=".base_url()."product/".$this->saveid;
$this->params['product'] = base_url()."product/".$this->saveid;
//$this->params['tags']='';
foreach($comment as $word) {
//check it is tag or not
//the first character must be a # and remaining all alphanumeric if any # or # is exist then it is comment
//find the length of the tag #mention
$len = strlen($word);
$cmt = $c = $tag_name = '';
$j = 0;
$istag = false;
for($i=0; $i<$len; $i++) {
$j = $i-1;
//check if the starting letter is # or not
if($word[$i] == '#') {
//insert tagname
if($istag) {
//insert $tag_name
$this->save_tag($tag_name, $pid);
$istag = false;
$tag_name = '';
}
//check for comment
if($i >= 1 && $word[$j]=='#') {
$this->store_cmnt .= $word[$i];
}else{
//append to the store_coment if the i is 1 or -1 or $word[j]!=#
$this->store_cmnt .= $word[$i];//23,#
}
}else if($word[$i]=='#') {
//insert tagname
if($istag) {
//insert $tag_name
$this->save_mention($tag_name, $pid, $fbCmntInfo);
$istag = false;
$tag_name = '';
}
//check for comment
if($i >= 1 && $word[$j]=='#') {
$this->store_cmnt .= $word[$i];
}else{
$this->store_cmnt .= $word[$i];//23,#
}
}else if( $this->alphas($word[$i]) && $i!=0){
if($tag_name=='') {
//check the length of the string
$strln=strlen($this->store_cmnt);//4
if($strln != 0) {
$c = substr($this->store_cmnt, $strln-1, $strln);//#
if($c=='#' || $c=='#') {
$this->store_cmnt = substr($this->store_cmnt, 0, $strln-1);//23,
$tag_name = $c;
}
}
//$tag_name='';
}
//check that previous is # or # other wise it is
if($c=='#' || $c=='#') {
$tag_name .= $word[$i];
$istag = true;
//check if lenis == i then add anchor tag her
if($i == $len-1) {
$istag =false;
//check if it is # or #
if($c=='#')
$this->save_tag($tag_name,$pid);
else
$this->save_mention($tag_name,$pid,$fbCmntInfo);
//$this->store_cmnt .= '<a >'. $tag_name.'</a>';
}
}else{
$this->store_cmnt .= $word[$i];
}
}else{
if($istag) {
//insert $tag_name
$this->save_tag($tag_name,$pid);
$istag = false;
$tag_name = '';
}
$this->store_cmnt .= $word[$i];
}
}
$this->store_cmnt .=" ";
}
}
Try This it may be help full
function getResultStr($data, $param1, $param2){
return $param1 != $param2?(''.$data.''):$data;
}
function parseWord($word, $symbols){
$result = $word;
$status = FALSE;
foreach($symbols as $symbol){
if(($pos = strpos($word, $symbol)) !== FALSE){
$status = TRUE;
break;
}
}
if($status){
$temp = $symFlag = '';
$result = '';
foreach(str_split($word) as $char){
//Checking whether chars are symbols(#,#)
if(in_array($char, $symbols)){
if($symFlag != ''){
$result .= getResultStr($temp, $symFlag, $temp);
}
$symFlag = $temp = $char;
} else if(ctype_alnum($char) or $char == '_'){
//accepts[0-9][A-Z][a-z] and unserscore (_)
//Checking whether Symbol already started
if($symFlag != ''){
$temp .= $char;
} else {
//Just appending the char to $result
$result .= $char;
}
} else {
//accepts all special symbols excepts #,# and _
if($symFlag != ''){
$result .= getResultStr($temp, $symFlag, $temp);
$temp = $symFlag = '';
}
$result .= $char;
}
}
$result .= getResultStr($temp, $symFlag, '');
}
return $result;
}
function parseComment($comment){
$str = '';
$symbols = array('#', '#');
foreach(explode(' ', $comment) as $word){
$result = parseWord($word, $symbols);
$str .= $result.' ';
}
return $str;
}
$str = "#Richard, #McClintock, a Latin professor at $%#Hampden_Sydney #College-in #Virginia, looked up one of the ######more obscure Latin words, #######%%#%##consectetur, from a Lorem Ipsum passage, and #going#through the cites of the word in classical literature";
echo "<br />Before Parsing : <br />".$str;
echo "<br /><br />After Parsing : <br />".parseComment($str);
use strpos or preg_match or strstr
Please refer string functions in php. You can do it in a line or two with that in built functions.
If it not matches better to write a regex.
What is the best way to convert a string such as CO2 and make it output CO<sub>2</sub> via PHP?
Use preg_replace() to surround groups of digits with <sub></sub>
$input = "CO2";
echo preg_replace('/(\d+)/', '<sub>$1</sub>', $input);
// Using $input = "H2SO4";
// Prints:
H<sub>2</sub>SO<sub>4</sub>
This will correctly NOT sub some of the digits.
$s = "O2+2H2=H2O";
$len = strlen($s);
$html = '';
if($len > 0) {
$prev = $s[0];
$html = $prev;
for($i=1;$i<$len;$i++){
$ch = $s[$i];
if(is_numeric($ch) && 'a' <= strtolower($prev) && strtolower($prev) <= 'z') {
$html .= "<sub>$ch</sub>";
} else {
$html .= $ch;
}
$prev = $ch;
}
}
echo $html;
prints O2+2H2=H2O Note the non-sub-ed 2
Do you know LaTeX? It renders fomulars very nicely.
You could use it on your page by including
<script language="JavaScript" src="http://thewe.net/tex/textheworld6.user.js"></script>
and writing your fomular like this [;CO_2;] see here.
//With ions in the equation:
// charge written like: sign number
$s= "1H2SO4=> 2H+1 + 1SO4-2 " ;
//$s = "1O2 + 2H2=> 2H2O";
$len = strlen($s);
$html = '';
if($len > 0) {
$prev = $s[0];
$html = $prev;
for($i=1;$i<$len;$i++)
{
$ch = $s[$i];
if(is_numeric($ch) && 'a' <= strtolower($prev) && strtolower($prev) <= 'z')
{ $html .= "<sub>$ch</sub>"; }
else
{
if(($ch=="+" or $ch=="-") && '1' <= strtolower($s[$i+1]) && strtolower($s[$i+1]) <= '9')
{
$html .= "<sup>$ch</sup>";
$html .= "<sup>".$s[$i+1]."</sup>";
$i=$i+1;
}
else
{
$html .= $ch;
}
$prev = $ch;
}
}
}
echo $html;
The better way would be:
preg_replace('/([A-Z)])([0-9]+)/', '\1<sub>\2</sub>', $input)
That way you wouldn't have individual <sub> and </sub> around numbers higher than 9 next to each other (e.g., <sub>12</sub> and not <sub>1</sub><sub>2</sub>). It also accounts for parentheses and when you have numbers before the letters.
function formular($string){
$string .= ' ';
$len = strlen($string);
$str_return = '';
if($len > 0) {
$prev = $string[0];
$str_return = $prev;
for($i = 1; $i < $len; $i++){
$ch = $string[$i];
if(is_numeric($ch)){
if('a' <= strtolower($prev) && strtolower($prev) <= 'z' || $prev == ')'){
if(( $string[$i+1] == '-' || $string[$i+1] == '+') && !in_array(#$string[$i+2], ['C', 'O', 'H'])){
$str_return .= '<sup>' . $ch . '</sup>';
$str_return .= '<sup>' . $string[$i+1] . '</sup>';
$i++;
}else{
$str_return .= "<sub>$ch</sub>";
}
}else{
$str_return .= $ch;
$prev = $ch;
}
}else{
$str_return .= $ch;
$prev = $ch;
}
}
}
return $str_return;
}
Is there a way to do this without writing my own function?
For example:
$text = 'Test <span><a>something</a> something else</span>.';
$text = cutText($text, 2, null, 20, true);
//result: Test <span><a>something</a></span>
I need to make this function indestructible
My problem is similar to
This thread
but I need a better solution. I would like to keep nested tags untouched.
So far my algorithm is:
function cutText($content, $max_words, $max_chars, $max_word_len, $html = false) {
$len = strlen($content);
$res = '';
$word_count = 0;
$word_started = false;
$current_word = '';
$current_word_len = 0;
if ($max_chars == null) {
$max_chars = $len;
}
$inHtml = false;
$openedTags = array();
for ($i = 0; $i<$max_chars;$i++) {
if ($content[$i] == '<' && $html) {
$inHtml = true;
}
if ($inHtml) {
$max_chars++;
}
if ($html && !$inHtml) {
if ($content[$i] != ' ' && !$word_started) {
$word_started = true;
$word_count++;
}
$current_word .= $content[$i];
$current_word_len++;
if ($current_word_len == $max_word_len) {
$current_word .= '- ';
}
if (($content[$i] == ' ') && $word_started) {
$word_started = false;
$res .= $current_word;
$current_word = '';
$current_word_len = 0;
if ($word_count == $max_words) {
return $res;
}
}
}
if ($content[$i] == '<' && $html) {
$inHtml = true;
}
}
return $res;
}
But of course it won't work. I thought about remembering opened tags and closing them if they were not closed but maybe there is a better way?
This works perfectly for me:
function trimContent ($str, $trimAtIndex) {
$beginTags = array();
$endTags = array();
for($i = 0; $i < strlen($str); $i++) {
if( $str[$i] == '<' )
$beginTags[] = $i;
else if($str[$i] == '>')
$endTags[] = $i;
}
foreach($beginTags as $k=>$index) {
// Trying to trim in between tags. Trim after the last tag
if( ( $trimAtIndex >= $index ) && ($trimAtIndex <= $endTags[$k]) ) {
$trimAtIndex = $endTags[$k];
}
}
return substr($str, 0, $trimAtIndex);
}
Try something like this
function cutText($inputText, $start, $length) {
$temp = $inputText;
$res = array();
while (strpos($temp, '>')) {
$ts = strpos($temp, '<');
$te = strpos($temp, '>');
if ($ts > 0) $res[] = substr($temp, 0, $ts);
$res[] = substr($temp, $ts, $te - $ts + 1);
$temp = substr($temp, $te + 1, strlen($temp) - $te);
}
if ($temp != '') $res[] = $temp;
$pointer = 0;
$end = $start + $length - 1;
foreach ($res as &$part) {
if (substr($part, 0, 1) != '<') {
$l = strlen($part);
$p1 = $pointer;
$p2 = $pointer + $l - 1;
$partx = "";
if ($start <= $p1 && $end >= $p2) $partx = "";
else {
if ($start > $p1 && $start <= $p2) $partx .= substr($part, 0, $start-$pointer);
if ($end >= $p1 && $end < $p2) $partx .= substr($part, $end-$pointer+1, $l-$end+$pointer);
if ($partx == "") $partx = $part;
}
$part = $partx;
$pointer += $l;
}
}
return join('', $res);
}
Parameters:
$inputText - input text
$start - position of first character
$length - how menu characters we want to remove
Example #1 - Removing first 3 characters
$text = 'Test <span><a>something</a> something else</span>.';
$text = cutText($text, 0, 3);
var_dump($text);
Output (removed "Tes")
string(47) "t <span><a>something</a> something else</span>."
Removing first 10 characters
$text = cutText($text, 0, 10);
Output (removed "Test somet")
string(40) "<span><a>hing</a> something else</span>."
Example 2 - Removing inner characters - "es" from "Test "
$text = cutText($text, 1, 2);
Output
string(48) "Tt <span><a>something</a> something else</span>."
Removing "thing something el"
$text = cutText($text, 9, 18);
Output
string(32) "Test <span><a>some</a>se</span>."
Hope this helps.
Well, maybe this is not the best solution but it's everything I can do at the moment.
Ok I solved this thing.
I divided this in 2 parts.
First cutting text without destroying html:
function cutHtml($content, $max_words, $max_chars, $max_word_len) {
$len = strlen($content);
$res = '';
$word_count = 0;
$word_started = false;
$current_word = '';
$current_word_len = 0;
if ($max_chars == null) {
$max_chars = $len;
}
$inHtml = false;
$openedTags = array();
$i = 0;
while ($i < $max_chars) {
//skip any html tags
if ($content[$i] == '<') {
$inHtml = true;
while (true) {
$res .= $content[$i];
$i++;
while($content[$i] == ' ') { $res .= $content[$i]; $i++; }
//skip any values
if ($content[$i] == "'") {
$res .= $content[$i];
$i++;
while(!($content[$i] == "'" && $content[$i-1] != "\\")) {
$res .= $content[$i];
$i++;
}
}
//skip any values
if ($content[$i] == '"') {
$res .= $content[$i];
$i++;
while(!($content[$i] == '"' && $content[$i-1] != "\\")) {
$res .= $content[$i];
$i++;
}
}
if ($content[$i] == '>') { $res .= $content[$i]; $i++; break;}
}
$inHtml = false;
}
if (!$inHtml) {
while($content[$i] == ' ') { $res .= $content[$i]; $letter_count++; $i++; } //skip spaces
$word_started = false;
$current_word = '';
$current_word_len = 0;
while (!in_array($content[$i], array(' ', '<', '.', ','))) {
if (!$word_started) {
$word_started = true;
$word_count++;
}
$current_word .= $content[$i];
$current_word_len++;
if ($current_word_len == $max_word_len) {
$current_word .= '-';
$current_word_len = 0;
}
$i++;
}
if ($letter_count > $max_chars) {
return $res;
}
if ($word_count < $max_words) {
$res .= $current_word;
$letter_count += strlen($current_word);
}
if ($word_count == $max_words) {
$res .= $current_word;
$letter_count += strlen($current_word);
return $res;
}
}
}
return $res;
}
And next thing is closing unclosed tags:
function cleanTags(&$html) {
$count = strlen($html);
$i = -1;
$openedTags = array();
while(true) {
$i++;
if ($i >= $count) break;
if ($html[$i] == '<') {
$tag = '';
$closeTag = '';
$reading = false;
//reading whole tag
while($html[$i] != '>') {
$i++;
while($html[$i] == ' ') $i++; //skip any spaces (need to be idiot proof)
if (!$reading && $html[$i] == '/') { //closing tag
$i++;
while($html[$i] == ' ') $i++; //skip any spaces
$closeTag = '';
while($html[$i] != ' ' && $html[$i] != '>') { //start reading first actuall string
$reading = true;
$html[$i] = strtolower($html[$i]); //tags to lowercase
$closeTag .= $html[$i];
$i++;
}
$c = count($openedTags);
if ($c > 0 && $openedTags[$c-1] == $closeTag) array_pop($openedTags);
}
if (!$reading) //read only tag
while($html[$i] != ' ' && $html[$i] != '>') { //start reading first actuall string
$reading = true;
$html[$i] = strtolower($html[$i]); //tags to lowercase
$tag .= $html[$i];
$i++;
}
//skip any values
if ($html[$i] == "'") {
$i++;
while(!($html[$i] == "'" && $html[$i-1] != "\\")) {
$i++;
}
}
//skip any values
if ($html[$i] == '"') {
$i++;
while(!($html[$i] == '"' && $html[$i-1] != "\\")) {
$i++;
}
}
if ($reading && $html[$i] == '/') { //self closed tag
$tag = '';
break;
}
}
if (!empty($tag)) $openedTags[] = $tag;
}
}
while (count($openedTags) > 0) {
$tag = array_pop($openedTags);
$html .= "</$tag>";
}
}
It's not idiot proof but tinymce will clear this thing out so further cleaning is not necessary.
It may be a little long but i don't think it will eat a lot of resources and it should be faster than regex.