PHP Runtime Issue when Breaking a 10,000 char string into segments - php

$chapter is a string that stores a chapter of a book with 10,000 - 15,000 characters. I want to break up the string into segments with a minimum of 1000 characters but officially break after the next whitespace, so that I don't break up a word. The provided code will run successfully about 9 times and then it will run into a run time issue.
"Fatal error: Maximum execution time of 30 seconds exceeded in D:\htdocs\test.php on line 16"
<?php
$chapter = ("10000 characters")
$len = strlen($chapter);
$i=0;
do{$key="a";
for($k=1000;($key != " ") && ($i <= $len); $k = $k+1) {
$j=$i+$k; echo $j;
$key = substr($chapter,$j,1);
}
$segment = substr ($chapter,$i,$k);
$i=$j;
echo ($segment);
} while($i <= $len);
?>

I think your method of writing it has too much overhead, while increasing max_execution_time will help, not everyone is able to modify their server settings. This simple thing split 15000 bytes of lorum ipsum text (2k Words) into 1000 character segments. I assume it would do well with more, as the execution time was fairly quick.
//Define variables, Set $x as int(1 = true) to start
$chapter = ("15000 bytes of Lorum Ipsum Here");
$sections = array();
$x = 1;
//Start Splitting
while( $x ) {
//Get current length of $chapter
$len = strlen($chapter);
//If $chapter is longer than 1000 characters
if( $len > 1000 ) {
//Get Position of last space character before 1000
$x = strrpos( substr( $chapter, 0, 1000), " ");
//If $x is not FALSE - Found last space
if( $x ) {
//Add to $sections array, assign remainder to $chapter again
$sections[] = substr( $chapter, 0, $x );
$chapter = substr( $chapter, $x );
//If $x is FALSE - No space in string
} else {
//Add last segment to $sections for debugging
//Last segment will not have a space. Break loop.
$sections[] = $chapter;
break;
}
//If remaining $chapter is not longer than 1000, simply add to array and break.
} else {
$sections[] = $chapter;
break;
}
}
print_r($sections);
Edit:
Tested with 5k Words (33K bytes) In a fraction of a second. Divided the text up into 33 segments. (Whoops, I had it set to divide into 10K character segments, before.)
Added verbose comments to code, as to explain what everything does.

Here is a simple function to do that
$chapter = "Your full chapter";
breakChapter($chapter,1000);
function breakChapter($chapter,$size){
do{
if(strlen($chapter)<$size){
$segment=$chapter;
$chapter='';
}else{
$pos=strpos($chapter,' ', $size);
if ($pos==false){
$segment=$chapter;
$chapter='';
}else{
$segment=substr($chapter,0,$pos);
$chapter=substr($chapter,$pos+1);
}
}
echo $segment. "\n";
}while ($chapter!='');
}
checking each character is not a good option and is resource/time intensive
PS: I have not tested this (just typed in here), and this may not be the best way to do this. but the logic works!

You are always reading the $chapter from the start. You should delete the already read characters from $chapter so you will never read much more than 10000 characters. If you do this, you must also tweak the cycles.

try
set_time_limit(240);
at the begining of the code. (this is the ThrowSomeHardwareAtIt aproach )

It can be done in just one single line, wich speeds up your code a lot.
echo $segment = substr($chapter, 0, strpos($chapter, " ", 1000));
It wil take the substring of the chapter until 1000 + some characters until the first space.

Related

Advanced Algoithm issues to generate unique 6 char long code

I have used this algorithm to generate 100 codes. Now Once again i have searched the threads on this site and on google before posting. The algorthms found do work in generating the code the issue I got is i have to generate UNIQUE codes from the one i already have. The problem i am finding is all the scripts are all failing to execute.
So basically I have an ARRAY holding 100 voucher codes I need to generate 400 more voucher codes however while my algorithm's works it keeps timing out due to it taking to long to execute. I have tried several other algorithms from the threads on Stack overflow but for some reason they all keep timing out. I need advice on what I can do to my algorithm to actually produce 400 more unique 6 Char codes.
Here is script I have
/**
**Use this portion to get my current 100 codes in an array which i use
**/
$redeem = 'THIS IS A STRING WHICH CREATES AN ARRAY CONTIANING 100 VALUES';
$redeem = preg_replace('/\s+/', '', $redeem);
$redeem_code = str_split($redeem, 6);
/**
**Function generates unique 6 letter voucher codes
**
**/
function random_str($length, $keyspace = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ')
{
$str = '';
$max = mb_strlen($keyspace, '8bit') - 1;
for ($i = 0; $i < $length; ++$i) {
$str .= $keyspace[mt_rand(0, $max)];
}
return $str;
}
/**
**This is the query which checks against my array and then echo's a valid code
**
**/
$redeem_count = count($redeem_code);
$i = 0;
$ia = 1;
while($i <= 500){
$string = random_str(6);
if(in_array($string, $redeem_code))
{
echo $string;
$i= $i+1;
}
}
Constant Error Message on All functions tried.
Fatal error: Maximum execution time of 120 seconds exceeded in C:\wamp\www\nfr_chip.php on line 115
You never add anything to the array and only increase $i if the code already exist.
This should work...
$redeem_count = count($redeem_code);
while($redeem_count <= 500) {
$string = random_str(6);
if(!in_array($string, $redeem_code))
{
$redeem_code[] = $string;
$redeem_count++;
}
}
And for the way you are splitting your string with codes, instead of using preg_replace (avoid regex unless you really need it):
$redeem_code = explode(" ", $redeem);
You can safely use openssl_random_pseudo_bytes with bin2hex to generate pseudo-random, as the name implies, strings.
Here's an example:
for ($i=1;$i<=5;$i++){
print bin2hex(openssl_random_pseudo_bytes(3)) . PHP_EOL;
}
//output
9282fb
3b9798
c187a0
a058e3
df2e4b
3 bytes will give you a 6 char string

Search for pattern in a string

Pattern search within a string.
for eg.
$string = "111111110000";
FindOut($string);
Function should return 0
function FindOut($str){
$items = str_split($str, 3);
print_r($items);
}
If I understand you correctly, your problem comes down to finding out whether a substring of 3 characters occurs in a string twice without overlapping. This will get you the first occurence's position if it does:
function findPattern($string, $minlen=3) {
$max = strlen($string)-$minlen;
for($i=0;$i<=$max;$i++) {
$pattern = substr($string,$i,$minlen);
if(substr_count($string,$pattern)>1)
return $i;
}
return false;
}
Or am I missing something here?
What you have here can conceptually be solved with a sliding window. For your example, you have a sliding window of size 3.
For each character in the string, you take the substring of the current character and the next two characters as the current pattern. You then slide the window up one position, and check if the remainder of the string has what the current pattern contains. If it does, you return the current index. If not, you repeat.
Example:
1010101101
|-|
So, pattern = 101. Now, we advance the sliding window by one character:
1010101101
|-|
And see if the rest of the string has 101, checking every combination of 3 characters.
Conceptually, this should be all you need to solve this problem.
Edit: I really don't like when people just ask for code, but since this seemed to be an interesting problem, here is my implementation of the above algorithm, which allows for the window size to vary (instead of being fixed at 3, the function is only briefly tested and omits obvious error checking):
function findPattern( $str, $window_size = 3) {
// Start the index at 0 (beginning of the string)
$i = 0;
// while( (the current pattern in the window) is not empty / false)
while( ($current_pattern = substr( $str, $i, $window_size)) != false) {
$possible_matches = array();
// Get the combination of all possible matches from the remainder of the string
for( $j = 0; $j < $window_size; $j++) {
$possible_matches = array_merge( $possible_matches, str_split( substr( $str, $i + 1 + $j), $window_size));
}
// If the current pattern is in the possible matches, we found a duplicate, return the index of the first occurrence
if( in_array( $current_pattern, $possible_matches)) {
return $i;
}
// Otherwise, increment $i and grab a new window
$i++;
}
// No duplicates were found, return -1
return -1;
}
It should be noted that this certainly isn't the most efficient algorithm or implementation, but it should help clarify the problem and give a straightforward example on how to solve it.
Looks like you more want to use a sub-string function to walk along and check every three characters and not just break it into 3
function fp($s, $len = 3){
$max = strlen($s) - $len; //borrowed from lafor as it was a terrible oversight by me
$parts = array();
for($i=0; $i < $max; $i++){
$three = substr($s, $i, $len);
if(array_key_exists("$three",$parts)){
return $parts["$three"];
//if we've already seen it before then this is the first duplicate, we can return it
}
else{
$parts["$three"] = i; //save the index of the starting position.
}
}
return false; //if we get this far then we didn't find any duplicate strings
}
Based on the str_split documentation, calling str_split on "1010101101" will result in:
Array(
[0] => 101
[1] => 010
[2] => 110
[3] => 1
}
None of these will match each other.
You need to look at each 3-long slice of the string (starting at index 0, then index 1, and so on).
I suggest looking at substr, which you can use like this:
substr($input_string, $index, $length)
And it will get you the section of $input_string starting at $index of length $length.
quick and dirty implementation of such pattern search:
function findPattern($string){
$matches = 0;
$substrStart = 0;
while($matches < 2 && $substrStart+ 3 < strlen($string) && $pattern = substr($string, $substrStart++, 3)){
$matches = substr_count($string,$pattern);
}
if($matches < 2){
return null;
}
return $substrStart-1;

Select random strings over 6 and under 10 characters from text file

I have a text file, and I need to pick a random string that is over 6 characters and under 10 characters. Normally, I would use a script like this, which would work, but since it needs to be a certain length, that won't work. Does anybody have a solution to this?
A sample input would be something like this:
Apple
Banana
Orange
Strawberry
Blueberry
Pineapple
Somelongfruithere
Those values would be in a .txt file, each with a line break. An example of a string that would be allowed is pineapple, but apple or Somelongfruithere wouldn't be allowed.
You'll need to do something like this:
$lines = array();
$tmpLines = file('random.txt');
for($i = 0; $i < count($tmpLines); ++$i)
{
if(strlen($tmpLines[ $i ]) > 6 && strlen($tmpLines[ $i ]) < 10)
{
$lines[] = $tmpLines[ $i ];
}
}
$randomWord = $lines[ array_rand($lines) ];
A shorter way, in number of lines, goes like this (but is much less safe):
$randomWord = '';
$lines = file('random.txt');
while(strlen($randomWord) <= 6 || strlen($randomWord) >= 10)
$randomWord = $lines[ array_rand($lines) ];
The first option gets all the lines in the file, and copies only the ones between 6 and 10 chars in length to another array. When choosing a random element from this array, you are "guaranteed" a reasonable access time for any random string.
The second option simply continues to pick a random string until one of the proper length is chosen, but could potentially take a while depending on the random number generator's mood. Unlikely, but I wouldn't want to risk it. Always take reliability as the best approach, in my book.
I would say explode the text file into a variable, run your random generator to get a placement value (position of the random string) then in a loop (a do/while loop), pull the string from the exploded variable, and check it's length to ensure it's what you want
if (strlen($rand_word) > 6 && strlen($rand_word) < 10) {
//execute function and end loop
} else {
// keep checking using a new random placement number
}
The answer depends on the requirement.
If you need to select from a group of strings and only accept one that fits your criteria, then you'll need to use strlen and try again if it is not the correct length.
Otherwise, you're still going to need strlen, to make sure it is at least 6 Chars, but then you can use substr to cut it to 10. If whitespace does not count, use ltrim & rtrim before strlen and substr.
First find all the words that are in the right character range, then pull a random one from the resulting array:
$fileLines = file('somefile.txt');
$myWords = array();
foreach ($fileLines as $line)
{
$thisLine = split(" ",$line);
foreach ($thisLine as $word)
{
$length = strlen($word);
if ($length > 6 && $length < 10)
{
$myWords[] = $word;
}
}
}
$randomWord = $myWords[array_rand($myWords)];
Shortened way!
for ($i=7; $i<=9; $i++)
if (strlen($str) == $i )
echo "bingo! " . strlen($str);

Increment through A-Za-z0-9 in expanding string with PHP

I'm trying to generate strings in PHP with a group of valid characters, cycling through them and appending an extra character on the end of the string, until maximum length is reached. For example, desired output:
a,b,c,d,e,f,aa,ab,ac,ad,ae,af,ba,bb,bc,bd,be,bf,ca,cb..etc
This is my PHP function so far:
<?php
$chars = Array('a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z',
'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','X','Y','Z',
'1','2','3','4','5','6','7','8','9','0');
$maxlen = 10;
$input = $chars[0];
while (1):
echo buildInput($maxlen, $chars, $input) . "\n";
endwhile;
function buildInput($maxlen, $chars, $previous)
{
if (array_search(substr($previous, -1), $chars) == sizeof($chars) - 1):
// end of input cycle reached, add another character
$previous = $previous . $chars[0];
endif;
if (strlen($previous) > $maxlen):
die('Max length reached');
endif;
// Remove last character, and append incremented char
$input = substr($previous, 0, -1);
$input = $input . $chars[array_search(substr($previous, -1), $chars)+1];
return $input;
}
?>
It only increments the last character of the string which gets to 0, then appends 'a' and starts over but without trying all the other possible permutations.
Could someone help me with a better method?
Is this the kind of thing you want?
<?php
$chars = array('a','b','c');
$max_length = 3;
function build($base_arr, $ctr) {
global $chars;
global $max_length;
$combos = array();
foreach ($base_arr as $base) {
foreach ($chars as $char) {
echo $base, $char, '<br />';
$combos[] = $base.$char;
}
}
if ($ctr < $max_length) {
build($combos, $ctr + 1);
}
}
foreach ($chars as $char) {
echo $char, '<br />';
}
build($chars, 2);
?>
It'll give you: a, b, c, aa, ab, ac, ba, bb, bc, ca, cb, cc, ..., bcc, caa, cab, cac, cba, cbb, cbc, cca, ccb, ccc.
Your array is so large, though, that using this method on it would take up way too much memory to work. Out of 62 characters (A-Z, a-z, 0-9), the number of possible 10-character permutations is 8.4 x 10^17; so hopefully, you'll be able to find a more efficient method or figure out a way to get the result you want without having to cycle through such a large array. I hope you find what you're looking for!
If you limit yourself to 0-9,a-z (only lower case), then you could use base_convert for this and do it in one line:
for($i = 0; $i < 1000; $i++) echo base_convert($i, 10, 36) . '<br/>';
Here's a demo.
This will print 200 letters: a,b,c,d,...,aa,...,cq
The buildString function will build our string from the least significant number (right) to the most significant (left). By performing a modulus division, you will find the array position of the next character. Add this character to the front of your string, and divide
your number by the size of your array (which is the base number in your character based number system), ignoring the rest.
To explain the method using our normal 10-based number system and the input of 123, you would simply pick the last digit, 3, divide the input by 10, pick the last digit 2, divide the input by 10, pick the last digit 1, divide the input by 10. The input is now 0 and your output is ready...
$chars = array('a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q'
,'r','s','t','u','v','w','x','y','z','A','B','C','D','E','F','G','H','I','J'
,'K','L','M','N','O','P','Q','R','S','T','U','V','X','Y','Z','1','2','3','4'
,'5','6','7','8','9','0');
$numChars = count($chars);
// Output numbers from 1 to 200 (a to cq)
for($i = 1; $i <= 200; $i++) {
echo buildString($i).'<br>';
}
// Will also work fine for large numbers - output "dxSA"
echo buildString(1000000).'<br>';
function buildString($int) {
global $chars;
global $numChars;
$output = '';
while($int) {
$output = $chars[($int-1) % $numChars] . $output;
$int = floor(($int-1) / $numChars);
}
return $output;
}
If you have access to gmp extension and PHP 5.3.2+ this will work for the charset you specified:
$result = strtr(
gmp_strval(gmp_init($i, 10), 62),
'0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz',
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
);

Performance of tokenizing CSS in PHP

This is a noob question from someone who hasn't written a parser/lexer ever before.
I'm writing a tokenizer/parser for CSS in PHP (please don't repeat with 'OMG, why in PHP?'). The syntax is written down by the W3C neatly here (CSS2.1) and here (CSS3, draft).
It's a list of 21 possible tokens, that all (but two) cannot be represented as static strings.
My current approach is to loop through an array containing the 21 patterns over and over again, do an if (preg_match()) and reduce the source string match by match. In principle this works really good. However, for a 1000 lines CSS string this takes something between 2 and 8 seconds, which is too much for my project.
Now I'm banging my head how other parsers tokenize and parse CSS in fractions of seconds. OK, C is always faster than PHP, but nonetheless, are there any obvious D'Oh! s that I fell into?
I made some optimizations, like checking for '#', '#' or '"' as the first char of the remaining string and applying only the relevant regexp then, but this hadn't brought any great performance boosts.
My code (snippet) so far:
$TOKENS = array(
'IDENT' => '...regexp...',
'ATKEYWORD' => '#...regexp...',
'String' => '"...regexp..."|\'...regexp...\'',
//...
);
$string = '...CSS source string...';
$stream = array();
// we reduce $string token by token
while ($string != '') {
$string = ltrim($string, " \t\r\n\f"); // unconsumed whitespace at the
// start is insignificant but doing a trim reduces exec time by 25%
$matches = array();
// loop through all possible tokens
foreach ($TOKENS as $t => $p) {
// The '&' is used as delimiter, because it isn't used anywhere in
// the token regexps
if (preg_match('&^'.$p.'&Su', $string, $matches)) {
$stream[] = array($t, $matches[0]);
$string = substr($string, strlen($matches[0]));
// Yay! We found one that matches!
continue 2;
}
}
// if we come here, we have a syntax error and handle it somehow
}
// result: an array $stream consisting of arrays with
// 0 => type of token
// 1 => token content
Use a lexer generator.
The first thing I would do would be to get rid of the preg_match(). Basic string functions such as strpos() are much faster, but I don't think you even need that. It looks like you are looking for a specific token at the front of a string with preg_match(), and then simply taking the front length of that string as a substring. You could easily accomplish this with a simple substr() instead, like this:
foreach ($TOKENS as $t => $p)
{
$front = substr($string,0,strlen($p));
$len = strlen($p); //this could be pre-stored in $TOKENS
if ($front == $p) {
$stream[] = array($t, $string);
$string = substr($string, $len);
// Yay! We found one that matches!
continue 2;
}
}
You could further optimize that by pre-calculating the length of all your tokens and storing them in the $TOKENS array, so that you don't have to call strlen() all the time. If you sorted $TOKENS into groups by length, you could reduce the number of substr() calls further as well, as you could take a substr($string) of the current string being analyzed just once for each token length, and run through all the tokens of that length before moving on to the next group of tokens.
the (probably) faster (but less memory friendly) approach would be to tokenize the whole stream at once, using one big regexp with alternatives for each token, like
preg_match_all('/
(...string...)
|
(#ident)
|
(#ident)
...etc
/x', $stream, $tokens);
foreach($tokens as $token)...parse
Don't use regexp, scan character by character.
$tokens = array();
$string = "...code...";
$length = strlen($string);
$i = 0;
while ($i < $length) {
$buf = '';
$char = $string[$i];
if ($char <= ord('Z') && $char >= ord('A') || $char >= ord('a') && $char <= ord('z') || $char == ord('_') || $char == ord('-')) {
while ($char <= ord('Z') && $char >= ord('A') || $char >= ord('a') && $char <= ord('z') || $char == ord('_') || $char == ord('-')) {
// identifier
$buf .= $char;
$char = $string[$i]; $i ++;
}
$tokens[] = array('IDENT', $buf);
} else if (......) {
// ......
}
}
However, that makes the code unmaintainable, therefore, a parser generator is better.
It's an old post but still contributing my 2 cents on this.
one thing that seriously slows down the original code in the question is the following line :
$string = substr($string, strlen($matches[0]));
instead of working on the entire string, take just a part of it (say 50 chars) which are enough for all the possible regexes. then, apply the same line of code on it. when this string shrinks below a preset length, load some more data to it.

Categories