Recursive directory iterator with offset - php

Is it possible to start the loop from a certain point?
$iterator = new \RecursiveIteratorIterator(new \RecursiveDirectoryIterator($path, $flags));
$startTime = microtime(true);
foreach($iterator as $pathName => $file){
// file processing here
// after 5 seconds stop and continue in the next request
$elapsedSecs = (microtime(true) - $startTime);
if($elapsedSecs > 5)
break;
}
But how do I resume from my break point in the next request?

a) pull the time calculation out of the foreach. you have a start time and you want a runtime of 5 seconds, so you might calculate the endtime beforehand (startime+5s). inside the foreach, simply compare if time is greater or equal to endtime, then break.
b) Q: is it possible to start the loop from a certain point? how do I resume from my break point in the next request?
Two approaches come to my mind.
You could store the last processing point and the iterator and resume at last point + 1.
You would save the last position of the iteration and fast forward to it on the next request, by calling iterator->next() until you reach the next item to process, which is $lastPosition+1.
we have to store the iterator and the lastPosition and pick both up on the next request, until
lastPosition equals the total number of elements in the iterator.
Or, you could turn the iterator into an array on the first run: $array = iterator_to_array($iterator); and then use a reduce array approach.
(Maybe someone else knows how to reduce an iterator object.)
With this approach you would only store the data, which decreases request by request until 0.
The code is untested. It's just a quick draft.
$starttime = time();
$endtime = $starttime + (5 * 60); // 5sec
$totalElements = count($array);
for($i = 0; $i <= $totalElements; $i++)
{
if(time() >= $endtime) {
break;
}
doStuffWith($array[$i]);
}
echo 'Processed ' . $i . ' elements in 5 seconds';
// exit condition is "totalElements to process = 0"
// greater 1 means there is more work to do
if( ($totalElements - $i) >= 1) {
// chop off all the processed items from the inital array
// and build the array for the next processing request
$reduced_array = array_slice(array, $i);
// save the reduced array to cache, session, disk
store($reduced_array);
} else {
echo 'Done.';
}
// on the next request, load the array and resume the steps above...
All in all, this is batch processing and might be done more efficiently by a worker/job-queue, like:
Gearman (See the PHP manual has some Gearman examples.) or
RabbitMQ / AMPQ or
the PHP libs listed here: https://github.com/ziadoz/awesome-php#queue.

Related

How to run php file for only 10 times a day without cron?

I am new to php so please mind if it easy question. I have a php script, I want it to be executed only 10 times a day and not more than that. I don't want to use cron for this. Is there any way to do this in php only?
Right now I have set a counter which increases by one every time any one runs the script and loop it to 10 times only. if it exceeds it it shows an error message.
function limit_run_times(){
$counter = 1;
$file = 'counter.txt';
if(file_exists($file)){
$counter += file_get_contents($file);
}
file_put_contents($file,$counter);
if($counter > 11 ){
die("limit is exceeded!");
}
}
I want some efficient way to do this so everyday the script is only executed for 10 times and this is applicable for everyday i.e this counter gets refreshed to 0 everyday or is there any other efficient method.
I would rather recommend that you use a database instead - its cleaner and more simple to maintain.
However, it is achievable with file-handling as well. The file will be of format 2019-05-15 1 (separated by tab, \t). Fetch the contents of the file and split the values by explode(). Then do your comparisons and checks, and return values accordingly.
function limit_run_times() {
// Variable declarations
$fileName = 'my_log.txt';
$dailyLimit = 10;
$content = file_get_contents($fileName);
$parts = explode("\t", $content);
$date = $parts[0];
$counter = $parts[1] + 1;
// Check the counter - if its higher than 10 on this date, return false
if ($counter > $dailyLimit && date("Y-m-d") === $date) {
die("Daily executing limit ($dailyLimit) exceeded! Please try again tomorrow.");
}
// We only get here if the count is $dailyLimit or less
// Check if the date is today, if so increment the counter by 1
// Else set the new date and reset the counter to 1 (as it is executed now)
if (date("Y-m-d") !== $date) {
$counter = 1;
$date = date("Y-m-d");
}
file_put_contents($fileName, $date."\t".$counter);
return true;
}

How to make comparisons of a needle in a haystack more efficient

I've been struggling to make the following piece of code more efficient.
In short;
I've got a database with titles and descriptions. The database will average on 10000 texts. I want to search compare these texts by splitting the text with 'mb_split' and then loop through all other texts to compare if the word exists. Depending on how many comparisons were made, I want to write the article numbers to another table in that database.
The following code works and does the trick, but it takes a really long time to finish and uses a lot of resources. I can't seem to find a way to compare these texts more efficiently.
function compareArticle() {
include '../include/write.php';
$readNewsQuery = "select title,text,articleid,name from texts";
$readNews = $dbwrite->query($readNewsQuery);
if ($readNews) {
//Fetch mysql data as an array
$news = $readNews->fetch_all(MYSQLI_NUM);
// Start foreach to read every article once
foreach ($news as $item) {
echo $item[2].'<br />';
// Start another foreach to loop through the articles to compare with
foreach ($news as $compare) {
$strippedWords = mb_split(' +', $item[0]);
$count = 0;
$compareString = "";
$compareString .= $compare[0];
$compareString .= $compare[1];
$compareString = strtolower($compareString);
// Start yet another foreach to loop through the words
foreach ($strippedWords as $word) {
// I only want to count the words that are longer than 4 characters
if (strlen($word) > 4) {
$woord = strtolower($word);
if (strpos($compareString, $word) && $compare[2] != $item[2]) {
$count++;
}
}
}
if ($count > 5) {
echo $count.'<br />';
//Insert action to write comparison to database (item[2] and compare[2])
}
}
}
}
}
What I'd really like to know; Can I be more efficient? Could I use less loops, or is there an easier way to search the array? If I can be more efficient, could someone give me a nudge in the right direction?
EDIT:
It might be useful to know what data I retrieve and what I want to write to another table:
texts-database is set up to include
| article id | title | text | sourcename
I compare the words in a title with the words of title and text combined for all other articles. If they match enough, I want to write both article id's to another table:
| id | original article id | compared article id |
Once you loop through a news item, you no longer need to compare any other news items to it, for example, if news item 1 didn't match the other 50 news items then when you start checking news item 2 you already know that it doesn't match news item 1.
So instead of looping through the news items twice, you can start your second loop on the current index +1 (you don't need to compare the current news item with itself) of your first news article loop.
Edit: Heres an example loop:
Optimized Loop:
$matches = array();
$a = [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 10, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 ];
$count = 0;
for ($i = 0; $i < count($a); ++$i) {
for ($j = $i+1; $j < count($a); ++$j) {
if ($a[$i] == $a[$j]) {
array_push($matches, "$i, $j");
}
$count++;
}
}
echo "Optimized n loops: $count\n";
echo 'Matches: ' . count($matches);
// Output
// Optimized n loops: 435
// Matches: 5
Un-optimized Loop
$matches = array();
$count = 0;
for ($i = 0; $i < count($a); ++$i) {
for ($j = 0; $j < count($a); ++$j) {
if ($a[$i] == $a[$j]) {
array_push($matches, "$i, $j");
}
$count++;
}
}
$matches = array_unique($matches); // Dedupe
echo "Un-optimized n loops: $count\n";
echo 'Matches: ' . count($matches);
// Output
// Un-optimized n loops: 900
// Matches: 40
The unoptimized loop includes a lot of duplicate matches (index 1 matches index 5, index 5 matches index 1)
I've executed a lot of tests and made a few changes to my script and now know what the biggest culprit was.
Original case:
Sample size of 10.000;
Execution time: over 600 seconds (ran into max execution time).
Test case:
Completely stripped down version of the original
Sample size of 1000;
Execution time: 24 seconds.
What made the biggest difference?
The biggest difference was changing the location of the following line:
$strippedWords = mb_split(' +', $item[0]);
I moved that line to the first loop instead of the second. This way the title from the first loop only gets split once every 1000 items instead of 1000 times every 1000 items. I measured the differences in time:
mb_split in the second loop:
Total execution time in seconds: 162.17704296112
mb_split in the first loop:
Total execution time in seconds: 24.564566135406
That's an amazingly huge difference. I'm guessing mb_split isn't the easiest thing for PHP to do. Putting mb_split in the wrong part of my code made the script almost 7 times slower :|
strtolower()
After that result, I was curious what differences I could make changing the location of other text modifiers. So, I took strtolower() and put that, where possible, in the first loop as well.
strtolower() in second loop:
Total execution time in seconds: 44.315208911896
strtolower() in first loop:
Total execution time in seconds: 37.129139900208
Although this difference is a lot smaller, it's still a notable difference.
A possible other cause
I am not sure -- as I don't currently have the time to test this -- if this is completely true, but while testing a few cases I found my browser was acting up. When I told PHP to output a lot of information to my browser, the scripts felt like they would run longer and the browser would stop showing information after a while, too.
If the occasion arrises and I have some spare time left, I'll be testing this theory and try to see if my browser can actually screw with the duration of my PHP scripts. I can't seem to find a logical reason as to why it would impact the duration of my PHP script, as I expect the browser to just crash and my PHP script to continue working server sided... but the thought crossed my mind a few times.
anyway, here's the new script
function compareArticle() {
//For timing my script
$time_start = microtime(true);
include '../include/write.php';
$readNewsQuery = "select title,text,articleid,name,datetoday from texts";
$readNews = $dbwrite->query($readNewsQuery);
$dateToday = date("Y-m-d");
if ($readNews) {
//Fetch mysql data as an array
$news = $readNews->fetch_all(MYSQLI_NUM);
}
foreach ($news as $item) {
// Decrease the sample pool
if ($item[4] != $dateToday) {
continue;
}
$strippedWords = strtolower($item[0]);
$strippedWords = mb_split(' +', $strippedWords);
// Start another foreach to loop through the articles to compare with
foreach ($news as $compare) {
$compareString = "";
$compareString .= $compare[0];
$compareString .= $compare[1];
$count = 0;
// Start yet another foreach to loop through the words
foreach ($strippedWords as $word) {
// I only want to count the words that are longer than 4 characters
if (strlen($word) > 4) {
if (strpos(strtolower($compareString), $word)) {
$count++;
}
}
}

Why is a sorted array slower than a non sorted array in PHP

I have the following script, and I know about the principle "Branch prediction" but it seems that's not the case here.
Why is it faster to process a sorted array than an unsorted array?
It seems to work the other way around.
When I run the following script without the sort($data) the script takes 193.23883700371 seconds to complete.
When I enable the sort($data) line the scripts takes 300.26129794121 seconds to complete.
Why is it so much slower in PHP? I used PHP 5.5 and 5.6.
In PHP 7 the script is faster when the sort() is not commented out.
<?php
$size = 32768;
$data = array_fill(0, $size, null);
for ($i = 0; $i < $size; $i++) {
$data[$i] = rand(0, 255);
}
// Improved performance when disabled
//sort($data);
$total = 0;
$start = microtime(true);
for ($i = 0; $i < 100000; $i++) {
for ($x = 0; $x < $size; $x++) {
if ($data[$x] >= 127) {
$total += $data[$x];
}
}
}
$end = microtime(true);
echo($end - $start);
Based on my comments above the solution is to either find or implement a sort function that moves the values so that memory remains contiguous and gives you the speedup, or push the values from the sorted array into a second array so that the new array has contiguous memory.
Assuming you MEANT to not time the actual sort, since your code doesn't time that action, it's difficult to assess any true performance difference because you've filled the array with random data. This means that one pass might have MANY more values greater than or equal to 127 (and thus running an additional command) then another pass. To really compare the two, fill your array with an identical set of fixed data. Otherwise, you'll never know if the random fill is causing the time differences you're seeing.

Time it takes to assign a variable vs. assign + add

<?php
$a = microtime(true);
$num = 0;
for($i=0;$i<10000000;$i++)
{
$num = $i;
}
$b= microtime(true);
echo $b-$a;
?>
I run this on Ubuntu 12.10 and Apache 2
will give me approx. .50 seconds... when I run an assignment for a million times.. BUT BUT...
the same code, instead of $num = $i ... i go ...
$num = $i + 10; and it now takes almost 1.5 times less time to execute.. around .36 consistently..
How come the simple assignment is taking more, whilst an assignment and adding a 10 over it... takes less time!
I am by no means an expert, but here are my findings:
$s = microtime(true);
for($i=0;$i<100000000;$i++) $tmp = $i;
$t = microtime(true);
for($i=0;$i<100000000;$i++) $tmp = $i+10;
$u = microtime(true);
echo ($t-$s).chr(10).($u-$t);
Results in:
9.9528648853302
9.0821340084076
On the other hand, using a constant value for the assignment test:
$x = 0;
$s = microtime(true);
for($i=0;$i<100000000;$i++) $tmp = $x;
$t = microtime(true);
for($i=0;$i<100000000;$i++) $tmp = $x+10;
$u = microtime(true);
echo ($t-$s).chr(10).($u-$t);
Results in:
6.1365358829498
9.3231790065765
This leads me to believe that the answer has something to do with opcode cacheing. I honestly couldn't tell you what about it is making the difference, but as you can see using a constant value for the assignment makes a huge difference.
This is just an educated guess, based on looking at the latest php source on Github, but I'd say this difference is due to function call overhead in the interpreter source.
$tmp = $i;
compiles to a single opcode ASSIGN !2, !1;, which copies one named variable's value to another named variable. In the source, the key part looks like this:
if (EXPECTED(Z_TYPE_P(variable_ptr) <= IS_BOOL)) {
/* nothing to destroy */
ZVAL_COPY_VALUE(variable_ptr, value);
zendi_zval_copy_ctor(*variable_ptr);
}
$tmp = $i + 10;
compiles to two opcodes ADD ~8 !1, 10; ASSIGN !2, ~8;, which creates a temporary variable ~8 and assigns its value to a named variable. In the source, the key part looks like this:
if (EXPECTED(Z_TYPE_P(variable_ptr) <= IS_BOOL)) {
/* nothing to destroy */
ZVAL_COPY_VALUE(variable_ptr, value);
}
Notice that there's an extra function call to zendi_zval_copy_ctor() in the first case. That function performs some bookkeeping as needed (e.g. if the original variable is a resource, it needs to make sure that resource is not freed until this new variable is gone, etc.). For a primitive type such as a number, there's nothing to do, but the function call itself introduces some overhead, which accumulates over 10 million iterations of your test. You should note that this overhead is normally negligible, because even in 10 million iterations it only accumulated to .14 seconds.
#Kolink's observation about a constant being faster can also be answered in the same function. It includes a check to avoid redundant copying if the new value is the same as the old one:
if (EXPECTED(variable_ptr != value)) {
copy_value:
// the same code that handles `$tmp = $i` above
if (EXPECTED(Z_TYPE_P(variable_ptr) <= IS_BOOL)) {
/* nothing to destroy */
ZVAL_COPY_VALUE(variable_ptr, value);
zendi_zval_copy_ctor(*variable_ptr);
} else {
/* irrelevant to the question */
}
}
So only the first assignment of $tmp = $x copies the value of $x, the following ones see that the value of $tmp would not change and skip the copying, making it faster.

Project Euler || Question 10

I'm attempting to solve Project Euler in PHP and running into a problem with my for loop conditions inside the while loop. Could someone point me towards the right direction? Am I on the right track here?
The problem, btw, is to find the sums of all prime numbers below 2,000,000
Other note: The problem I'm encountering is that it seems to be a memory hog and besides implementing the sieve, I'm not sure how else to approach this. So, I'm wondering if I did something wrong in the implementation.
<?php
// The sum of the primes below 10 is 2 + 3 + 5 + 7 = 17.
// Additional information:
// Sum below 100: 1060
// 1000: 76127
// (for testing)
// Find the sum of all the primes below 2,000,000.
// First, let's set n = 2 mill or the number we wish to find
// the primes under.
$n = 2000000;
// Then, let's set p = 2, the first prime number.
$p = 2;
// Now, let's create a list of all numbers from p to n.
$list = range($p, $n);
// Now the loop for Sieve of Eratosthenes.
// Also, let $i = 0 for a counter.
$i = 0;
while($p*$p < $n)
{
// Strike off all multiples of p less than or equal to n
for($k=0; $k < $n; $k++)
{
if($list[$k] % $p == 0)
{
unset($list[$k]);
}
}
// Re-initialize array
sort ($list);
// Find first number on list after p. Let that equal p.
$i = $i + 1;
$p = $list[$i];
}
echo array_sum($list);
?>
You can make a major optimization to your middle loop.
for($k=0; $k < $n; $k++)
{
if($list[$k] % $p == 0)
{
unset($list[$k]);
}
}
By beginning with 2*p and incrementing by $p instead of by 1. This eliminates the need for divisibility check as well as reducing the total iterations.
for($k=2*$p; $k < $n; $k += $p)
{
if (isset($list[k])) unset($list[$k]); //thanks matchu!
}
The suggestion above to check only odds to begin with (other than 2) is a good idea as well, although since the inner loop never gets off the ground for those cases I don't think its that critical. I also can't help but thinking the unsets are inefficient, tho I'm not 100% sure about that.
Here's my solution, using a 'boolean' array for the primes rather than actually removing the elements. I like using map,filters,reduce and stuff, but i figured id stick close to what you've done and this might be more efficient (although longer) anyway.
$top = 20000000;
$plist = array_fill(2,$top,1);
for ($a = 2 ; $a <= sqrt($top)+1; $a++)
{
if ($plist[$a] == 1)
for ($b = ($a+$a) ; $b <= $top; $b+=$a)
{
$plist[$b] = 0;
}
}
$sum = 0;
foreach ($plist as $k=>$v)
{
$sum += $k*$v;
}
echo $sum;
When I did this for project euler i used python, as I did for most. but someone who used PHP along the same lines as the one I did claimed it ran it 7 seconds (page 2's SekaiAi, for those who can look). I don't really care for his form (putting the body of a for loop into its increment clause!), or the use of globals and the function he has, but the main points are all there. My convenient means of testing PHP runs thru a server on a VMWareFusion local machine so its well slower, can't really comment from experience.
I've got the code to the point where it runs, and passes on small examples (17, for instance). However, it's been 8 or so minutes, and it's still running on my machine. I suspect that this algorithm, though simple, may not be the most effective, since it has to run through a lot of numbers a lot of times. (2 million tests on your first run, 1 million on your next, and they start removing less and less at a time as you go.) It also uses a lot of memory since you're, ya know, storing a list of millions of integers.
Regardless, here's my final copy of your code, with a list of the changes I made and why. I'm not sure that it works for 2,000,000 yet, but we'll see.
EDIT: It hit the right answer! Yay!
Set memory_limit to -1 to allow PHP to take as much memory as it wants for this very special case (very, very bad idea in production scripts!)
In PHP, use % instead of mod
The inner and outer loops can't use the same variable; PHP considers them to have the same scope. Use, maybe, $j for the inner loop.
To avoid having the prime strike itself off in the inner loop, start $j at $i + 1
On the unset, you used $arr instead of $list ;)
You missed a $ on the unset, so PHP interprets $list[j] as $list['j']. Just a typo.
I think that's all I did. I ran it with some progress output, and the highest prime it's reached by now is 599, so I'll let you know how it goes :)
My strategy in Ruby on this problem was just to check if every number under n was prime, looping through 2 and floor(sqrt(n)). It's also probably not an optimal solution, and takes a while to execute, but only about a minute or two. That could be the algorithm, or that could just be Ruby being better at this sort of job than PHP :/
Final code:
<?php
ini_set('memory_limit', -1);
// The sum of the primes below 10 is 2 + 3 + 5 + 7 = 17.
// Additional information:
// Sum below 100: 1060
// 1000: 76127
// (for testing)
// Find the sum of all the primes below 2,000,000.
// First, let's set n = 2 mill or the number we wish to find
// the primes under.
$n = 2000000;
// Then, let's set p = 2, the first prime number.
$p = 2;
// Now, let's create a list of all numbers from p to n.
$list = range($p, $n);
// Now the loop for Sieve of Eratosthenes.
// Also, let $i = 0 for a counter.
$i = 0;
while($p*$p < $n)
{
// Strike off all multiples of p less than or equal to n
for($j=$i+1; $j < $n; $j++)
{
if($list[$j] % $p == 0)
{
unset($list[$j]);
}
}
// Re-initialize array
sort ($list);
// Find first number on list after p. Let that equal p.
$i = $i + 1;
$p = $list[$i];
echo "$i: $p\n";
}
echo array_sum($list);
?>

Categories