Best way to handle a large while loop in PHP - php

I have a script that runs via CRON that processes each row (or user) in one of the tables in my databases, then uses cURL to pull a URL based on the username found in the row, and then adds or updates additional information into the same row. This works fine for the most part, but seems to take about 20 minutes+ to go through the whole database and it seems to go slower and slower the farther it is into the while loop. I have about 4000 rows at the moment and there will be even more in the future.
Right now a simplified version of my code is like this:
$i=0;
while ($i < $rows) {
$username = mysql_result($query,$i,"username");
curl_setopt($ch, CURLOPT_URL, 'http://www.test.com/'.$username.'.php');
$page = curl_exec($ch);
preg_match_all('htmlcode',$page,$test)
foreach ($test as $test3) {
$test2 = $test[$test3][0];
}
mysql_query("UPDATE user SET info = '$test2' WHERE username = '$username');
++$i;
}
I know MySQL querys shouldn't be in a while loop, and it's the last query for me to remove from it, but what is the best way to handle a while loop that needs to run over and over for a very long time?
I was thinking the best option would be to have the script run through the rows ten at a time then stop. For instance, since I have the script in CRON, I would like to have it run every 5 minutes and it would run through 10 rows, stop, and then somehow know to pick up the next 10 rows when the CRON job starts again. I have no idea how to accomplish this however.
Any help would be appreciated!

About loading the data step by step:
You could add a column "last_updated" to your table and update it every time you load the page. Then you compare the column with the current timestamp before you load the website again.
Example:
mysql_query("UPDATE user SET info = '$test2', last_updated = ".time()." WHERE username = '$username');
And when you load your data, make it "WHERE last_updated > (time()-$time_since_last_update)"

What about dropping the 'foreach' loop?
Just use the last element of the $test array.

LIMIT and OFFSET are your friends here. Keep track of where you are through a DB field as suggested by Bastian or you could even store the last offset you used somewhere (could be a flat file) and then increase that every time you run the script. When you don't get any more data back, reset it to 0.

Related

run the same script but with a different variable each given period of time

Let's say I have a text file that has a list of urls, from which social media comments must be parsed regularly. I don't want to parse comments from all pages at once as that's a significant load. I need to run my script with a different $url variable corresponding to a line from that text file each 5 minutes.
So it must take the first line as $url and complete the script using this variable, after 5 minutes the variable $url must change to the second line from that file and complete the script with it, in another 5 minutes the same must be repeated for the third line from that file, and so on. When it reaches the last line, it must start from the beginning.
Sorry, can't show any attempts, because I have no idea how to implement it, and I couldn't come up with an appropriate search request either.
As a 1st step you should setup cron job (ex: cron.php) which will be executed every 5 minutes.
crontab
*/5 * * * * /path_to_your_cron_php/cron.php
Lets assume that you have your urls in file named file.txt in this simple txt format.
file.txt
https://www.google.com/
https://www.alexa.com/
https://www.yourdomain.com/
Lets create file where we will keep index of url we want to execute next in index.txt which will have just 1 line with 1 value.
index.txt
0
cron.php
<?php
$fileWithUrl = '/path/to/your/file.txt';
$index = (int)file_get_contents('/path/to/your/index.txt');
$urls = file($fileWithUrl);
$maxIndex = count($urls);
$url = $urls[$index];
your_parse_function($url);
file_put_contents('/path/to/your/index.txt',($index >= $maxIndex) ? 0 : $index++);
As you can see this script reads content of file.txt and index.txt. Convert 1st one to an array of urls and cast index.txt to integer index.
After execution of your_parse_function() this script will replace the content of index.php with incremented index or reset it to 0 if it is bigger than number of urls we have in file.txt.
Since variables don't persist through different runs, you'd need to keep track of the ones you have already parsed and the ones that remain outside of your code.
The most efficient way would be to have a semaphore table with each URL on a single row, paired with a parsed/pending flag.
Each time the cron runs, select a single row from the semaphore table which is flagged pending:
assuming it's done on mysql:
select url
from semaphore
where status='pending'
limit 1;
this will select one (whatever one) url that's yet to be parsed. Take that as input from your parser and after parsing, update the flag to parsed so it's not selected again.
Other approaches would be to keep a counter on a text file or a database table. Each time the cron runs, check what the counter is and process the next number. After processing, update the counter to the current value + 1.
EDIT:
This may be a simple way to solve your re-iteration with a variable list of URLs
1.- Create a table with the following fields:
id, url, status (pending/parsed), last_updated (datetime)
2.- on each run of your cron:
select url from semaphore where status='pending' order by last_updated asc limit 1
3.- if a url is returned, process that. Upon completion, update the status to parsed and last_updated to the current timestamp.
if nothing is returned, update every row to status = pending (but not the last_updated field) and then re-run the above query.
By doing this, you can be sure that when starting over, you'll be first processing the url that has been "waiting" longer
PHP is pretty stateless by default, so once a script has finished executing, everything is wiped.
What I would do: Try a for loop, and use PHP's sleep() function for a break in between URLs. You can either run that loop as a cron job (better), or put it in a while (true) loop and never let it "finish".
https://secure.php.net/manual/en/function.sleep.php
If you want to do this with only the things you currently are using (PHP and that text file), you could just remove that first line from the text file when you process it and then append it back to the end once you're done. You'd either have to open two successive file handles or seek to the end of the file using one, but you wouldn't need any additional data structures/SQL/what have you. Make the text file itself rotate while you blindly fire cron every five minutes.

how can I speed up my cron job / database update

I have a cron job that runs once every hour, to update a local database with hourly data from an API.
The database stores hourly data in rows, and the API returns 24 points of data, representing the past 24 hours.
Sometimes a data point is missed, so when I get the data back, I cant only update the latest hour - I also need to check if I have had this data previously, and fill in any gaps where gaps are found.
Everything is running and working, but the cron job takes at least 30 minutes to complete every time, and I wonder if there is any way to make this run better / faster / more efficiently?
My code does the following: (summary code for brevity!)
// loop through the 24 data points returned
for($i=0; $i<24; $i+=1) {
// check if the data is for today, because the past 24 hours data will include data from yesterday
if ($thisDate == $todaysDate) {
// check if data for this id and this time already exists
$query1 = "SELECT reference FROM mydatabase WHERE ((id='$id') AND (hour='$thisTime'))";
// if it doesnt exist, insert it
if ($datafound==0) {
$query2 = "INSERT INTO mydatabase (id,hour,data_01) VALUES ('$id','$thisTime','$thisData')";
}
}
}
And there are 1500 different IDs, so it does this 1500 times!
Is there any way I can speed up or optimise this code so it runs faster and more efficiently?
This does not seem very complex and it should run in few seconds. So my first guess without knowing your database is that you are missing an index on your database. So please check if there is an index on your id field. If your id field is not your unique key you should consider adding another index on 2 fields id and hour. If these aren't already there this should lead to a massive time save.
Another idea could be to retrieve all data for the last 24 hours in a single sql query, store the values in an array and do your checks if you already read that data only on your array.

Prevent a table from update queries for 20 seconds. possible?

Here is the problem,
I have a PHP file called at every second for 20 seconds. If I write a query to update a table that will be executed 20 times. But I want it to be updated only once of 20 calls. After 20 seconds the table need to be restored for further updates. How can I accomplish this? Is it anything like trigger to automatically prevent it for certain period of time?
I have tried something so far,
I kept a record in a table updating current timestamp, I'm checking the timestamp for the next call, If it exceeds 20 seconds, i'm updating it, else just passing the updating script. It will work, but any more efficient methods?
The fun and interesting method where by on PHP 5.3 you can use APC cache to store a variable for 20 seconds and given that it does not exist run your query. This would change in php 5.5 with PHPs adoption of a different caching method.
if(!$value = apc_fetch('key')) {
// Run your query and store the updated key
apc_store('key', true, 20);
}
The boring and dull method but solidly future proof is to use a session variable to effectively do the same and just check that its within the 20 second limit.
if(strtotime($_SESSION['timer']) > strtotime("-20 seconds")) {
// run your query and update the timer with the update time.
$_SESSION['timer'] = date();
}

"Loading Record Number [##] of [Total]" progress bar

I have a problem that may not have a solution, but hopefully someone out there can figure this out.
I developed a website in PHP/MySQL that uses HTML/CSS to process payroll. When the user submits the payroll for the past (2 week) period, it processes each employee's hours. For companies with <50 employees, it can process it pretty fast, but for companies with over 100 employees, it can take quite a while to process. What I would like ideally is not a generic 'Loading' bar or an estimated '35% loaded' bar since each company's payroll will vary greatly in employee numbers.
The best solution would be that as soon as they submit the pay period, I could pass the total record number from the PHP/MySQL processor/DB, then update the number as each employee is processed from the PHP processor, so the user would see "Processing Employee 35 of 134" for example where '35' would increment and be updated as each record is processed. Or, if not possible, I'd even be fine with a dynamic list such as:
Processing Employee 1 of 134
Processing Employee 2 of 134
Processing Employee 3 of 134
Processing Employee 4 of 134
and so on ...
Ajax or Javascript seem to be the best options to achieve this, however I can't figure out yet how to use them to achieve this. Any help or insight would be greatly appreciated. I'll continue looking and update this post if I find anything as well.
I've done that by calling the flush() command in PHP while iterating through the batch, but you could get tricky and update a hidden field and have a javascript function on setTimeOut check that value and update a progress bar.
http://php.net/manual/en/function.flush.php
And progress bar:
http://docs.jquery.com/UI/Progressbar
What I would do with the dynamic list is:
$count = 0;
// some db query
// start output
echo "<ul>";
// iterate through records and perform dynamic insert
$count ++;
echo "<li>Processed " . $count . " records.</li>";
flush();
// end iteration
// end output
echo "</ul>";
If you want to only update every % of records, then like you stated get a total count, then perhaps use a modulus operator in if clause. For example if you had 50 records and you wanted to update every 5, if($count mod 5 == 0) { echo ... flush() }
You would have to make a combination of what Mike S. suggested and quick ajax calls (say every 500 ms) You could make an ajax call to a text file that is written to from your PHP file....
For example:
<?php
$count = 0;
mysql_connect('blahblah');
// start output
$query = mysql_query("SELECT ...");
while($rs = mysql_fetch_assoc($query)) {
$fh = fopen('filename.txt','w');
fwrite($fh, $count);
fclose($fh);
++$count
}
?>
Then you need to make an ajax call every 500 ms (or sooner than that) to that filename.txt file and read the contents of that file to see how far along you are in processing your request. You could even do something similar to write in the contents of the php file [current_count]-[total_count] (15-155 for on record 15 of 155 total records) and do results.split('-') in your javascript coding.
My approach would be to store the total number of records and the current record number in session variables. Then set up a php page that returns the text/html of "Processing employee $currentRec of $totalRec".
When you submit the request from the main page, display a div on the page to show the status message. Fire off an ajax request to process the data and have it hide the div when it is complete. The code that processes the records can update the session variable as it goes along. At the same time fire off a periodical ajax request that gets the status message and updates the div's contents with the response. Have this continue until the div is no longer visible. You should have a status message on the page that pops up while the data is being processed to display the current record number, and it will update as often as you like based on how you set up the update timer.
The exact implementation would depend on whether you are using jQuery, prototype, plain Javascript, etc...

get new result from sql every week

I have a table that I want to pick one row from it and show it to the user. every week I want to make the website automatically picks another row randomly. so, basically I want to get new result every week not every time a user visit the page.
I am using this code right now :
$res = mysql_query("SELECT COUNT(*) FROM fruit");
$row = mysql_fetch_array($res);
$offset = rand(0, $row[0]-1);
/* the first three lines to pick a row randomly from the table */
$res = mysql_query("SELECT * FROM fruit LIMIT $offset, 1");
$row = mysql_fetch_assoc($res);
This code gets a new result everytime the user visit the page, and after every refresh another random row gets chosen. I want to make it update every week and the results are the same for every user. Is their a php command that does that? If so, how does it work?
My suggestion would be as follows:
Store the random result id and timestamp is some other kind of persistent storage (file, DB table, etc).
Setup a cron job or other automated task to update the record above weekly. If you don't have access to such solutions, you could write code to do it on each page load and check against the timestamp column. However, that's pretty inefficient.
Yes there is. Use the date function in php and write each week and the corresponding row to a file using fwrite. Then, using an if statement, check if it is a new week and if it is get a new random row, write it to the file and return that, if it isn't, return the same one for that week.
A cronjob is the best solution. Create a script weeklynumber.php, much as what you have already, that generates an entry. After this, go to your console, and open your crontab file using crontab -e.
In here, you may add
0 0 * * 0 php /path/to/weeklynumber.php
This means that at every Sunday at 0:00, php /path/to/weeklynumber.php is executed.
But all of this assumes you're on UNIX and that you have access to creating cronjobs. If not, here's another solution: Hash the week number and year, and use that to generate the weekly number.
// Get the current week and year
$week = date('Wy');
// Get the MD5 hash of this
$hash = md5($week);
// Get the amount of records in the table
$count = mysql_result(mysql_query("SELECT COUNT(*) FROM fruit"),0);
// Convert the MD5 hash to an integer
$num = base_convert($hash, 16, 10);
// Use the last 6 digits of the number and take modulus $count
$num = substr($num,-6) % $count;
Note that the above will only work as long as the amount of records in your table doesn't change.
And finally, just a little note to your current method. Instead of counting rows, getting a random number from PHP, and asking your DBMS to return that number, it can all be done with a single query
SELECT * FROM fruit ORDER BY RAND() LIMIT 1

Categories