PHP While Loop Slowdown over time - php

I have seemingly harmless while loop that goes through the result-set of a mysql query and compares the id returned from mysql, to one in a very large multidimensional array:
//mysqli query here
while($row = fetch_assoc())
{
if(!in_array($row['id'], $multiDArray['dimensionOne']))
{
//do something
}
}
When the script first executes, it is running through the results at about 2-5k per second. Sometimes more, rarely less. The result set brings back 7million rows, and the script peaks at 2.8GB of memory.
In terms of big data, this is not a lot.
The problem is, around the 600k mark, the loop starts to slow down, and by 800k, it is processing a few records a second.
In terms of server load and memory use, there are no issues.
This is behaviour I have noticed before in other scripts dealing with large data sets.
Is array seek time progressively slower as the internal pointer moves deeper?

That really depends on what happens inside the loop. I know you are convinced it's not a memory issue but it looks like one. Program usually get very slow when system tries to get extra RAM by using SWAP. Using hard drive is obviously very slow and that's what you might be experiencing. It's very easy to benchmark it.
In one terminal run
vmstat 3 100
Run you scrip and observe vmstat. Look into IO and SWAP. If that is really not the case then profile execution with XDEBUG. It might be tricky because you do many iterations and this will also cause major IO.

Related

Query return speed is fine but fetch is very slow

PostgreSQL 14
PHP 7.4
I have a PHP call that returns a handle to a scrollable cursor (PDO::ATTR_CURSOR => PDO::CURSOR_SCROLL) with about 760,000 records. The handle is being returned in a reasonable amount of time. However, once I have the handle back, doing nothing but fetching each record in a loop is taking over 12 minutes. I have tried it as a forward-only cursor with similar results. I am on high performance processors and have plenty of memory. Each record has 181 numeric fields. How can I improve the performance of this?
$first = true;
while($rec = $handle->fetch(PDO::FETCH_ASSOC, $first ? PDO::FETCH_ORI_FIRST : PDO::FETCH_ORI_NEXT))
{
$first = false;
}
Updates to answer questions:
Network distance between client and server?
Zero. Both are on localhost.
Where is the return time being measured in the database layer or an application frontend?
Measured from the time the query is executed to when the handle is returned. Getting the handle is fine. It's the fetch loop itself that is taking forever once it starts. PHP is measuring the fetch loop execution time.
Do you need to fetch one by one or can you fetch in batches?
I could fetch in batches and process the batches one by one, but the base query is very heavy and running it repeatedly to get to an offset would not be good.
Try running your statement on a psql command line.
If it is slow there as well, you have to use EXPLAIN (ANALYZE, BUFFERS) to understand why (and add the execution plan to the question for further help).
If it is fast in psql, the problem are either the 760000 client-server round trips or that you didn't set cursor_tuple_fraction to 1.0.

How can I do lengthy tasks in php while the max execution time is 30 seconds?

I'm parsing data from a text file to a mysql database. The problem is, after parsing a certain number of records (anywhere from 250,000 to 380,000) I get Fatal error: Maximum execution time of 30 seconds exceeded. I can work around this by splitting the file into smaller files, but this is a pain and I'd like to find a way to trick PHP into processing the whole file.
Is there a way to convince PHP to run lengthy processes, even though I don't have access to php.ini and can't change my maximum execution time?
Btw, here's my parsing script. Maybe I'm doing something wrong with my php code.
You may find you can improve performance by inserting rows several at a time. Try this syntax:
INSERT INTO
tbl_name (a,b,c)
VALUES(1,2,3),(4,5,6),(7,8,9)
;
The number of rows you should group together will be best found by experimentation. It might be that the more you add in the one statement, the faster it will be, but equally that might be slow if you don't have auto-commit turned on.
As someone mentions in the comments, putting too many in at once may max out the CPU, and raise the eyebrows of the server admin. If this is something you need to be careful with on your host, try 200 at a time and then a small usleep between iterations.
It's worth looking at the connection to your database too - is this on the same server, or is it on another server? The connection may be slow. Add some timing for, say, how long 5,000 rows take to insert, and then play around to see how to reduce it.
Here's the manual reference for INSERT; note that this is non-standard SQL, and it won't work on other database engines.

PHP script (algorithm) - only loops till a certain iteration after which it stops

I am running an algorithm in PHP which has a lot of data involved. All the processing happens within a nested for loop. Strangely, the outer for loop stops working after 'X' number of iterations (where 'X' is changing all the time I run the script). It takes anywhere between 5 mins to 30mins for the script to crash depending on 'X'. It does not throw out any errors, and only does an incomplete printout of my var_dump (in the first iteration of the outer loop)
These are the precautions I took:
1. I have set the timeout limit in php.ini to be 3600sec (60mins).
2. I am printing out the memory_get_usage() after every outer for loop iteration and i have verified that it is much lesser compared to the max memory allocated to php.
3. I am unsetting arrays once they are used
4. I reuse variable names to limit memory within the forloop
5. I have minimal calls to my DB
I have been solving this for a long time to no avail. So my question is what can be the cause of this problem/how do I go about debugging it. Thank you so much!
Extra: If i work with a much smaller test data size, everything works fine.
Obviously without code this is just a guess, but are you making sure to use a single connection to your database? If you are reconnecting every time you may get too many connections which could cause an error like this.
This sounds like an issue with utilisation of your server cores and a similar answer/workaround could be found here: Boost Apache2 up to 4 cores usage, running PHP
Try running your datasets in parallel.

How does the garbage collector work in PHP

I have a PHP script that has a large array of people, it grabs their details from an external resource via SOAP, modifies the data and sends it back. Due to the size of the details I upped PHP's memory to 128MB. After about 4 hours of running (It will probably take 4 days to run) it ran out of memory. Heres the basics of what it does:
$people = getPeople();
foreach ($people as $person) {
$data = get_personal_data();
if ($data == "blah") {
importToPerson("blah", $person);
} else {
importToPerson("else", $person);
}
}
After it ran out of memory and crashed I decided to initialise $data before the foreach loop and according to top, memory usage for the process hasn't risen above 7.8% and it's been running for 12 hours.
So my question is, does PHP not run a garbage collector on variables initialised inside the loop even if reused? Is the system reclaiming the memory and PHP hasn't marked it as usable yet and will eventually crash again (I've upped it to 256MB now so I've changed 2 things and not sure which has fixed it, I could probably change my script back to answer this but don't want to wait another 12 hours for it to crash to figure out)?
I'm not using the Zend framework so the other question like this I don't think is relevant.
EDIT: I don't actually have an issue with the script or what it's doing. At the moment, as far as all system reporting is concerned I don't have any issues. This question is about the garbage collector and how / when it reclaims resources in a foreach loop and / or how the system reports on memory usage of a php process.
I don't know the insides of PHP's VM, but from my experience, it doesn't garbage collect whilst your page is running. This is because it throws away everything your page created when it finishes.
Most of the time, when a page runs out of memory and the limit is pretty high (and 128Mb isn't high), there is an algorithm problem. Many PHP programmers assemble a structure of data, then pass it to the next step which iterates over the structure, usually creating another one. Lather, rinse, repeat. Unfortunately, this approach is a big memory hog and you end up creating multiple copies of your data in memory. Two of the really big changes in PHP 5 was that objects are reference counted, not copied, and the entire string subsystem was made much much faster. But it's still a problem.
To minimise memory use, you would look at re-structuring your algorithm so it can work with one piece of data from start to finish. Then you get the next and start again. Best case scenario is that you don't ever have the entire dataset in memory. For a database-backed website, this would mean processing a row of data from a database query all the way to presentation before getting the next. Of course, this approach isn't always possible and the script just has to keep a huge wodge of data in memory.
That said, you can do this memory-saving approach for part of the data. The trick is that you explicitly unset() a key variable or two at the end of the loop. This should reclaim the space. The other "best-practice" trick is to shift out of the loop data manipulation that doesn't need to be in the loop. As you seem to have discovered.
I've run PHP scripts that need upwards of 1Gb of memory. You can set the memory limit per script, actually, with ini_set('memory_limit', '1G');
Use memory_get_usage() to see what going on? Could put it inside of the loop to see the behavior in memory allocation.
Have you tried looking at the system monitor or whatever to see how much memory php is using during that process?

Zend Lucene exhausts memory when indexing

An oldish site I'm maintaining uses Zend Lucene (ZF 1.7.2) as it's search engine. I recently added two new tables to be indexed, together containing about 2000 rows of text data ranging between 31 bytes and 63kB.
The indexing worked fine a few times, but after the third run or so it started terminating with a fatal error due to exhausting it's allocated memory. The PHP memory limit was originally set to 16M, which was enough to index all other content, 200 rows of text at a few kilobytes each. I gradually increased the memory limit to 160M but it still isn't enough and I can't increase it any higher.
When indexing, I first need to clear the previously indexed results, because the path scheme contains numbers which Lucene seems to treat as stopwords, returning every entry when I run this search:
$this->index->find('url:/tablename/12345');
After clearing all of the results I reinsert them one by one:
foreach($urls as $v) {
$doc = new Zend_Search_Lucene_Document();
$doc->addField(Zend_Search_Lucene_Field::UnStored('content', $v['data']);
$doc->addField(Zend_Search_Lucene_Field::Text('title', $v['title']);
$doc->addField(Zend_Search_Lucene_Field::Text('description', $v['description']);
$doc->addField(Zend_Search_Lucene_Field::Text('url', $v['path']);
$this->index->addDocument($doc);
}
After about a thousand iterations the indexer runs out of memory and crashes. Strangely doubling the memory limit only helps a few dozen rows.
I've already tried adjusting the MergeFactor and MaxMergeDocs parameters (to values of 5 and 100 respectively) and calling $this->index->optimize() every 100 rows but neither is providing consistent help.
Clearing the whole search index and rebuilding it seems to result in a successful indexing most of the time, but I'd prefer a more elegant and less CPU intensive solution. Is there something I'm doing wrong? Is it normal for the indexing to hog so much memory?
I had a similar problem for a site I had to maintain that had at least three different languages and had to re-index the same 10'000+ (and growing) localized documents for each different locale separately (each using their own localized search engine). Suffice to say that it failed usually within the second pass.
We ended up implementing an Ajax based re-indexing process that called the script a first time to initialize and start re-indexing. That script aborted at a predefined number of processed documents and returned a JSON value indicating if it was completed or not, along with other progress information. We then re-called the same script again with the progress variables until the script returned a completed state.
This allowed also to have a progress bar of the process for the admin area.
For the cron job, we simply made a bash script doing the same task but with exit codes.
This was about 3 years ago and nothing has failed since then.

Categories