I am developing a networking application where I listen on a port and create a new socket and thread when a new connection request arrives; the architecture is working well but we are facing severe memory issues.
The problem is that even if I create a single thread, it does the work but the memory keeps on increasing.
To demonstrate the problem please review the following code where we start one thread of a class whose duty it to print a thread ID and a random number infinitely.
class ThreadWorker extends Thread {
public function run() {
while(1) {
echo $this->getThreadId()." => ".rand(1,1000)."\r\n";
}
}
}
$th = new ThreadWorker();
$th->start();
I am developing it on Windows OS and when I open the task manager the PHP.exe memory usage keeps on increasing until the system becomes unresponsive.
Please note that the PHP script is executed from command line:
PHP.exe pthreads-test.php
OK, I think the problem is that the thread loop is highly CPU consuming. Avoid such that code. If you just want to echo a message, I recommend putting a sleep() instruction after. Example:
class ThreadWorker extends Thread {
public function run() {
while(1) {
echo $this->getThreadId()." => ".rand(1,1000)."\r\n";
sleep(1);
}
}
}
EDIT
It seems there's a way to force garbage collection in PHP. On the other hand, sleep() is not a proper way to stabilize CPU use. Normally threads do things like reading from files, sockets or pipes, i.e., they often perform I/O operations which are normally blocking (i.e. they pause the thread until data is I/O operation is possible). This behaviour inherently yields the CPU and other resources to other threads, thus stabilizing the whole system.
Related
I have a computation-expensive backend process in Symfony2 / PHP that I would like to run multi-threaded.
Since I iterate over thousands of objects, I think I shouldn't start one thread per object. I would like to have a $cores variable that defines how many threads I want in parallel, then iterate through the loop and keep that many threads running. So every time a thread finishes, a new one with the next object should be started, until all objects are done.
Looking at the pthreads documentation and doing some Google searches, I can't find a useable example for this situation. All examples I found have a fixed number of threads they run once, none of them iterate over thousands of objects.
Can someone point me into the right direction to get started? I understand the basics of setting up a thread and joining it, etc. but not how to do it in a loop with a wait condition.
The answer to the question is use Pool and Worker abstraction.
The basic idea is that you ::submit Threaded objects to the Pool, which it stacks onto the next available Worker, distributing your Threaded objects (round robin) across all Workers.
Follows is super simple code is for PHP7 (pthreads v3):
<?php
$jobs = [];
while (count($jobs) < 2000) {
$jobs[] = mt_rand(0, 1999);
}
$pool = new Pool(8);
foreach ($jobs as $job) {
$pool->submit(new class($job) extends Threaded {
public function __construct(int $job) {
$this->job = $job;
}
public function run() {
var_dump($this->job);
}
});
}
$pool->shutdown();
?>
The jobs are pointless, obviously. In the real world, I guess your $jobs array keeps growing, so you can just swap foreach for some do {} while, and keep calling ::submit for new jobs.
In the real world, you will want to collect garbage in the same loop (just call Pool::collect with no parameters for default behaviour).
Noteworthy, none of this would be possible if it really were the case that PHP wasn't intended to work in multi-threaded environments ... it definitely is.
That is the answer to the question, but it doesn't make it the best solution to your problem.
You have mentioned in comments that you assume 8 threads executing Symfony code will take up less memory than 8 processes. This is not the case, PHP is shared nothing, all the time. You can expect 8 Symfony threads to take up as much memory as 8 Symfony processes, in fact, a little bit more. The benefit of using threads over processes is that they can communicate, synchronize and (appear to) share with each other.
Just because you can, doesn't mean you should. The best solution for the task at hand is probably to use some ready made package or software intended to do what is required.
Studying this stuff well enough to implement a robust solution is something that will take a long time, and you wouldn't want to deploy that first solution ...
If you decide to ignore my advice, and give it a go, you can find many examples in the github repository for pthreads.
Joe has a good approach, but I found a different solution elsewhere that I am now using. Basically, I have two commands, one control and one worker command. The control command starts background processes and checks their results:
protected function process($worker, $entity, $timeout=60) {
$min = $this->em->createQuery('SELECT MIN(e.id) FROM BM2SiteBundle:'.$entity.' e')->getSingleScalarResult();
$max = $this->em->createQuery('SELECT MAX(e.id) FROM BM2SiteBundle:'.$entity.' e')->getSingleScalarResult();
$batch_size = ceil((($max-$min)+1)/$this->parallel);
$pool = array();
for ($i=$min; $i<=$max; $i+=$batch_size) {
$builder = new ProcessBuilder();
$builder->setPrefix($this->getApplication()->getKernel()->getRootDir().'/console');
$builder->setArguments(array(
'--env='.$this->getApplication()->getKernel()->getEnvironment(),
'maf:worker:'.$worker,
$i, $i+$batch_size-1
));
$builder->setTimeout($timeout);
$process = $builder->getProcess();
$process->start();
$pool[] = $process;
}
$this->output->writeln($worker.": started ".count($pool)." jobs");
$running = 99;
while ($running > 0) {
$running = 0;
foreach ($pool as $p) {
if ($p->isRunning()) {
$running++;
}
}
usleep(250);
}
foreach ($pool as $p) {
if (!$p->isSuccessful()) {
$this->output->writeln('fail: '.$p->getExitCode().' / '.$p->getCommandLine());
$this->output->writeln($p->getOutput());
}
}
}
where $this->parallel is a variable I set to 6 on my 8 core machine, it signifies the number of processes to start. Note that this method requires that I iterate over a specific entity (it splits by that), which is always true in my use cases.
It's not perfect, but it starts completely new processes instead of threads, which I consider the better solution.
The worker command takes min and max ID numbers and does the actual work for the set between those two.
This approach works as long as the data set is reasonably well distributed. If you have no data in the 1-1000 range but every ID between 1000 and 2000 is used, the first three processes would have nothing to do.
$p = new Pool(10);
for ($i = 0; i<1000; i++){
$tasks[i] = new workerThread($i);
}
foreach ($tasks as $task) {
$p->submit($task);
}
// shutdown will wait for current queue to be completed
$p->shutdown();
// garbage collection check / read results
$p->collect(function($checkingTask){
return ($checkingTask->isGarbage);
});
class workerThread extends Collectable {
public function __construct($i){
$this->i= $i;
}
public function run(){
echo $this->i;
ob_flush();
flush();
}
}
The code above is a simple example that would cause crash. I'm trying to update the page real-time by putting ob_flush();and flush(); in the Threaded Object, and it mostly works as expected. So the code above is not guaranteed to crash every time, but if you run it a couple times more, sometimes the script stops and Apache restarts with an error message "httpd.exe Application error The instruction at "0x006fb17f" referenced memory at "0x028a1e20". The memory could not be "Written". Click on OK ."
I think it's caused by flushing conflict of multiple threads when they try to flush about the same time? What can I do to work around it and flush as there's any new output.
Multiple threads should not write standard output, there is no safe way to do this.
Zend provides no facility to make it safe, it works by coincidence, and will always be unsafe.
I'm using gearman to distribute long running tasks across multiple worker servers. For one of my worker tasks, I attempt to invoke another background job. The background job is performed by another worker successfully... but that worker process doesn't respond to any new jobs that are added to gearman afterwards.
Anyone know what might be going on? Is this a feature of gearman?
EDIT:
Also, if I restart my workers they repeat the task that was queued by the other worker. Gearman appears to not be recognizing the job has completed.
EDIT 2:
tried:
var_dump($this->conn);
var_dump($this->handle);
From within the worker function that's called from my other worker. This is the output I receive:
NULL
string(0) ""
EDIT 3:
Well I came up with a hackey way to solve this. The following is the relevant snippet of code. I'm using codeigniter for my project, and my gearman servers are stored in as an array. I simply test in my job code if the connection is null, and if so reestablish it using a random gearman server. I'm sure this sucks so if anyone has some improved insight I would very much appreciate it.
class Net_Gearman_Job_notification_vc_friends_new_user extends Net_Gearman_Job_Common{
private $CI;
function __construct(){
$this->CI =& get_instance();
if(!$this->conn){
$gearman = $this->CI->config->item('gearman');
$servers = $gearman['servers'];
$key = array_rand($servers);
$this->conn = Net_Gearman_Connection::connect($servers[$key]);
}
}
Figured it out! pretty stupid actually, forgot to call
parent::__construct(); in my constructor... oops.
I'm having a terrible amount of problems with an XML parsing script leaking some memory in PHP.
I've made a solution by rewriteing my whole OOP code to non OOP, which was mostly database checks and inserts, and that seemed to plug the hole, but I'm curious as to what caused it? I'm using Zend Framework and once I removed all of the model stuff, there are no leaks.
Just to give you and idea how bad it was:
I'm running through some 30k items on the same number of files. So, one per file. It started out by using 5mb!!! or RAM, when the file itself was only about 20kb big.
Could it be those referencing functions that I've read about? I thought that that bug was fixed?!
EDIT
I found out, that the leak was due to using Zend Framework database classes. Is there a way to call a shutdown function after each iteration, so that it would clear the resources?
Its pretty dificult to answer this as we have no code to work with.
Revert back to the OOP version of your sources and create a small class like so:
abstract class MemoryLeakLogger
{
public static $_logs = array();
public function Start($id,$action)
{
self::$_logs[$id] = array(
'action' => $action,
'start_ts' => microtime(),
'memory_start' => memory_get_usage()
);
}
public function End($id)
{
self::$_logs[$id]['end_ts'] = microtime();
self::$_logs[$id]['memory_end'] = memory_get_usage();
}
public static function GetInformation(){return self::$_logs;}
}
and then within your application do the following:
MemoryLeakLogger::Start(":xml_parse_links_set_2", "parsing set to of links");
/*
* Here you would do the relative code
*/
MemoryLeakLogger::End(":xml_parse_links_set_2");
And so forth throughout your application, you will need to create calculations to gather the offsets for memory usages and time taken per action, once your script is completed just debug the information by printing it in a readable fashion and look for peaks
You can also use xdebug to trace your application.
Hope this helps
I'm having problems with a batch insertion of objects into a database using symfony 1.4 and doctrine 1.2.
My model has a certain kind of object called "Sector", each of which has several objects of type "Cupo" (usually ranging from 50 up to 200000). These objects are pretty small; just a short identifier string and one or two integers. Whenever a group of Sectors are created by the user, I need to automatically add all these instances of "Cupo" to the database. In case anything goes wrong, I'm using a doctrine transaction to roll back everything. The problem is that I can only create around 2000 instances before php runs out of memory. It currently has a 128MB limit, which should be more than enough for handling objects that use less than 100 bytes. I've tried increasing the memory limit up to 512MB, but php still crashes and that doesn't solve the problem. Am I doing the batch insertion correctly or is there a better way?
Here's the error:
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 71 bytes) in /Users/yo/Sites/grifoo/lib/vendor/symfony/lib/log/sfVarLogger.class.php on line 170
And here's the code:
public function save($conn=null){
$conn=$conn?$conn:Doctrine_Manager::connection();
$conn->beginTransaction();
try {
$evento=$this->object;
foreach($evento->getSectores() as $s){
for($j=0;$j<$s->getCapacity();$j++){
$cupo=new Cupo();
$cupo->setActivo($s->getActivo());
$cupo->setEventoId($s->getEventoId());
$cupo->setNombre($j);
$cupo->setSector($s);
$cupo->save();
}
}
$conn->commit();
return;
}
catch (Exception $e) {
$conn->rollback();
throw $e;
}
Once again, this code works fine for less than 1000 objects, but anything bigger than 1500 fails. Thanks for the help.
Tried doing
$cupo->save();
$cupo->free();
$cupo = null;
(But substituting my code) And I'm still getting memory overflows. Any other ideas, SO?
Update:
I created a new environment in my databases.yml, that looks like:
all:
doctrine:
class: sfDoctrineDatabase
param:
dsn: 'mysql:host=localhost;dbname=.......'
username: .....
password: .....
profiler: false
The profiler: false entry disables doctrine's query logging, that normally keeps a copy of every query you make. It didn't stop the memory leakage, but I was able to get about twice as far through my data importing as I was without it.
Update 2
I added
Doctrine_Manager::connection()->setAttribute(Doctrine_Core::ATTR_AUTO_FREE_QUERY_OBJECTS, true );
before running my queries, and changed
$cupo = null;
to
unset($cupo);
And now my script has been churning away happily. I'm pretty sure it will finish without running out of RAM this time.
Update 3
Yup. That's the winning combo.
I have just did "daemonized" script with symfony 1.4 and setting the following stopped the memory hogging:
sfConfig::set('sf_debug', false);
For a symfony task, I also faced to this issue and done following things. It worked for me.
Disable debug mode. Add following before db connection initialize
sfConfig::set('sf_debug', false);
Set auto query object free attribute for db connection
$connection->setAttribute(Doctrine_Core::ATTR_AUTO_FREE_QUERY_OBJECTS, true );
Free all object after use
$object_name->free()
Unset all arrays after use unset($array_name)
Check all doctrine queries used on task. Free all queries after use. $q->free()
(This is a good practice for any time of query using.)
That's all. Hope it may help someone.
Doctrine leaks and there's not much you can do about it. Make sure you use $q->free() whenever applicable to minimize the effect.
Doctrine is not meant for maintenance scripts. The only way to work around this problem is to break you script to parts which will perform part of the task. One way to do that is to add a start parameter to your script and after a certain amount of objects had been processed, the script redirects to itself with a higher start value. This works well for me although it makes writing maintenance scripts more cumbersome.
Try to unset($cupo); after every saving. This should be help. An other thing is to split the script and do some batch processing.
Try to break circular reference which usually cause memory leaks with
$cupo->save();
$cupo->free(); //this call
as described in Doctrine manual.
For me , I've just initialized the task like that:
// initialize the database connection
$databaseManager = new sfDatabaseManager($this->configuration);
$connection = $databaseManager->getDatabase($options['connection'])->getConnection();
$config = ProjectConfiguration::getApplicationConfiguration('frontend', 'prod', true);
sfContext::createInstance($config);
(WITH PROD CONFIG)
and use free() after a save() on doctrine's object
the memory is stable at 25Mo
memory_get_usage=26.884071350098Mo
with php 5.3 on debian squeeze
Periodically close and re-open the connection - not sure why but it seems PDO is retaining references.
What is working for me is calling the free method like this:
$cupo->save();
$cupo->free(true); // free also the related components
unset($cupo);