I'm having a terrible amount of problems with an XML parsing script leaking some memory in PHP.
I've made a solution by rewriteing my whole OOP code to non OOP, which was mostly database checks and inserts, and that seemed to plug the hole, but I'm curious as to what caused it? I'm using Zend Framework and once I removed all of the model stuff, there are no leaks.
Just to give you and idea how bad it was:
I'm running through some 30k items on the same number of files. So, one per file. It started out by using 5mb!!! or RAM, when the file itself was only about 20kb big.
Could it be those referencing functions that I've read about? I thought that that bug was fixed?!
EDIT
I found out, that the leak was due to using Zend Framework database classes. Is there a way to call a shutdown function after each iteration, so that it would clear the resources?
Its pretty dificult to answer this as we have no code to work with.
Revert back to the OOP version of your sources and create a small class like so:
abstract class MemoryLeakLogger
{
public static $_logs = array();
public function Start($id,$action)
{
self::$_logs[$id] = array(
'action' => $action,
'start_ts' => microtime(),
'memory_start' => memory_get_usage()
);
}
public function End($id)
{
self::$_logs[$id]['end_ts'] = microtime();
self::$_logs[$id]['memory_end'] = memory_get_usage();
}
public static function GetInformation(){return self::$_logs;}
}
and then within your application do the following:
MemoryLeakLogger::Start(":xml_parse_links_set_2", "parsing set to of links");
/*
* Here you would do the relative code
*/
MemoryLeakLogger::End(":xml_parse_links_set_2");
And so forth throughout your application, you will need to create calculations to gather the offsets for memory usages and time taken per action, once your script is completed just debug the information by printing it in a readable fashion and look for peaks
You can also use xdebug to trace your application.
Hope this helps
Related
I want to be able to call the CakeS3 plugin from the Cake Shell. However, as I understand it components cannot be loaded from the shell. I have read this post outlining strategies for overcoming it: using components in Cakephp 2+ Shell - however, I have had no success. The CakeS3 code here is similar to perfectly functioning cake S3 code in the rest of my app.
<?php
App::uses('Folder','Utility');
App::uses('File','Utility');
App::uses('CakeS3.CakeS3','Controller/Component');
class S3Shell extends AppShell {
public $uses = array('Upload', 'User', 'Comment');
public function main() {
$this->CakeS3 = new CakeS3.CakeS3(
array(
's3Key' => 'key',
's3Secret' => 'key',
'bucket' => 'bucket')
);
$this->out('Hello world.');
$this->CakeS3->permission('private');
$response = $this->CakeS3->putObject(WWW_ROOT . '/file.type' , 'file.type', $this->CakeS3->permission('private'));
if ($response == false){
echo "it failed";
} else {
echo "it worked";
}
}
This returns an error of "Fatal error: Class 'CakeS3' not found in /home/app/Console/Command/S3Shell.php. The main reason I am trying to get this to work is so I can automate some uploads with a cron. Of course, if there is a better way, I am all ears.
Forgive me this "advertising"... ;) but my plugin is probably better written and has a better architecture than this CakeS3 plugin if it is using a component which should be a model or behaviour task. Also it was made for exactly the use case you have. Plus it supports a few more storage systems than only S3.
You could do that for example in your shell:
StorageManager::adapter('S3')->write($key, StorageManager::adapter('Local')->read($key));
A file should be handled as an entity on its own that is associated to whatever it needs to be associated to. Every uploaded file (if you use or extend the models that come with the plugin, if not you have to take care of that) is stored as a single database entry that contains the name of the config that was used and some meta data for that file. If you do the line of code above in your shell you will have to keep record in the table if you want to access it this way later. Just check the examples in the readme.md out. You don't have to use the database table as a reference to your files but I really recommend the system the plugin implements.
Also, you might not be aware that WWW_ROOT is public accessible, so in the case you store sensitive data there it can be accessed publicly.
And finally in a shell you should not use echo but $this->out() for proper shell output.
I think the App:uses should look like:
App::uses('CakeS3', 'CakeS3.Controller/Component');
I'm the author of CakeS3, and no I'm afraid there is no "supported" way to do this as when we built this plugin, we didn't need to run uploads from shell and just needed a simple interface to S3 from our controllers. We then open sourced the plugin as a simple S3 connector.
If you'd like to have a go at modifying it to support shell access, I'd welcome a PR.
I don't have a particular road map for the plugin, so I've tagged your issue on github as an enhancement and will certainly consider it in future development, but I can't guarantee that it would fit your time requirements so that's why I mention you doing a PR.
I have a simple controller file StudentController.php
<?php
$data = array();
$data["firstName"] = $_GET["firstName"];
loadView("StudentView.php", $data);
?>
I have a even simpler view file called StudentView.php
<?php
echo $firstName;
?>
I have absolutely no idea how to implement loadView($view, $data) function. I want variables from $data in controller became available in view ($data["foo"] from controller became $foo in view)
I want achieve what is very easy to do in CodeIgniter but I have no idea how it is implemented. I tried to look into Controller.php and Loader.php in source files, but it was too messy for me to understand.
I don't want to use CodeIgniter or any other framework, I want to natively do in PHP.
If you're going to be building a large website to be used by the general public, a framework is generally a good idea for multiple reasons:
Properly unit tested - generally speaking, you can be confident that the core of your website is functioning as it's intended to
Community support - have questions and you can easily get answers; generally these frameworks are open sourced and usually actively developed by the PHP community
Secure - Framework developers are, well, framework developers; and they have been trained to write code with security as a top priority
However, this question has nothing to do with frameworks vs not, so I'll answer the question you asked with a very simple function:
function loadView($view, $data) {
extract($data);
ob_start();
require_once $view;
$contents = ob_get_contents();
ob_end_clean();
return $contents;
}
You can chose to return the contents or print them directly, but that function should do what you need. I'm making no guarantees about the security of this code, and I have obviously done no error checking. But it should serve as a great foundation to get you started.
Firstly, I will say that I come from the Java world (this is important, really).
I have been coding PHP for a while, one of the problems that I have encountered is that due to the lack of compilation, sometimes errors that could be easily detected at compilation time (for example, wrong number of parameters for a given function), can silently pass.
That could be easily detected as code coverage increases by adding unit tests. The question is, does it make sense for example to tests constructors in order to check that the passed parameters are correct? I do not mean only the number of parameters, but also the content of such parameters (for example, if a parameter is null, certain objects should launch an exception in order to avoid creating a "dirty" object).
Question is, am I too contaminated by years of Java code? Because after all, increasing the code coverage to "discover" missued functions feels like a (really) primitive way of compiling.
Also, I would like to note that I already use a development environment (PHPStorm), we are also using tools like PHPCodeSniffer.
Any ideas/suggestions?
This is a good question that can be answered on a number of levels:
Language characteristics
Test coverage
CASE tools
1. Language characteristics
As you have pointed out the characteristics of the PHP language differ markedly from the more strongly-typed languages such as Java. This raises a serious issue where programmers coming from the more strongly-typed languages such as Java and C# may not be aware of the implications of PHP's behaviour (such as those you have described). This introduces the possibility of mistakes on the part of the programmer (for example, a programmer who may have been less careful using Java because they know the compiler will catch incorrect parameters may not apply the appropriate care when developing in PHP).
Consequently, better programmer education/supervision is needed to address this issue (such as in-house company coding standards, pair programming, code review). It also (as you have pointed out) raises the question of whether test coverage should be increased to check for such mistakes as would have been caught by a compiler.
2. Test Coverage
The argument for test coverage is very project-specific. In the real world, the level of test coverage is primarily dictated by the error tolerance of the customer (which is dictated by the consequences of an error occuring in your system). If you are developing software that is to run on a real-time control system, then obviously you will test more. In your question you identify PHP as the language of choice; this could apply equally to the ever-increasing number of web-enabled frontends for critical systems infrastructure. On the other side of the coin, if you are developing a simple website for a model railroad club and are just developing a newsletter app then your customer may not care about the possibility of a bug in the constructor.
3. CASE Tools
Ultimately it would be desirable for a CASE tool to be available which can detect these errors, such as missing parameters. If there are no suitable tools out there, why not create one of your own. The creation of a CASE tool is not out of reach of most programmers, particularly if you can hook into an open-source parsing engine for your language. If you are open-source inclined this may be a good project to kick start, or perhaps your company could market such a solution.
Conclusion
In your case whether or not to test the constructors basically comes down to the question: what will the consequences of a failure in my system be? If it makes financial sense to expend extra resources on testing your constructors in order to avoid such failures, then you should do so. Otherwise it may be possible to get by with lesser testing such as pair programming or code reviews.
Do you want the constructor to throw an exception if invalid parameters set? Do you want it to behave that same way tomorrow and next week and next year? Then you write a test to verify that it does.
Tests verify that your code behaves as you want it to. Failing on invalid parameters is code behavior just as much as calculating sales tax or displaying a user's profile page.
We test constructors, as well as the order of the parameters, the defaults when not provided, and then some actual settings. For instance:
class UTIL_CATEGORY_SCOPE extends UTIL_DEPARTMENT_SCOPE
{
function __construct($CategoryNo = NULL, $CategoryName = NULL)
{
parent::__construct(); // Do Not Pass fields to ensure that the array is checked when all fields are defined.
$this->DeclareClassFields_();
$this->CategoryName = $CategoryName;
$this->CategoryNo = $CategoryNo;
}
private function DeclareClassFields_()
{
$this->Fields['CategoryNo'] = new UTIL_ICAP_FIELD_PAIR_FIRST('CCL', 6, ML('Category'), 8);
$this->Fields['CategoryName'] = new UTIL_ICAP_FIELD_PAIR_SECOND('CCL', 32, ML('Name'), 15, array(), array(), NULL, UTIL_ICAP_FIELD::EDIT_DENY, UTIL_ICAP_FIELD::UPDATE_DENY, 'DES');
}
}
We then create our tests to not only check the constructor and its order, but that class and inheritance has not changed.
public function testObjectCreation()
{
$CategoryInfo = new UTIL_CATEGORY_SCOPE();
$this->assertInstanceOf('UTIL_CATEGORY_SCOPE', $CategoryInfo);
$this->assertInstanceOf('UTIL_DEPARTMENT_SCOPE', $CategoryInfo);
$this->assertInstanceOf('UTIL_DATA_STRUCTURE', $CategoryInfo); // Inherited from UTIL_DEPARTMENT_SCOPE
}
public function testConstructFieldOrder()
{
$CategoryInfo = new UTIL_CATEGORY_SCOPE(1500, 'Category Name');
$this->assertEquals(1500, $CategoryInfo->CategoryNo);
$this->assertEquals('Category Name', $CategoryInfo->CategoryName);
}
public function testConstructDefaults()
{
$CategoryInfo = new UTIL_CATEGORY_SCOPE();
$this->assertNull($CategoryInfo->CategoryNo);
$this->assertNull($CategoryInfo->CategoryName);
}
public function testFieldsCreated()
{
$CategoryInfo = new UTIL_CATEGORY_SCOPE();
$this->assertArrayHasKey('CategoryNo', $CategoryInfo->Fields);
$this->assertArrayHasKey('CategoryName', $CategoryInfo->Fields);
$this->assertArrayHasKey('DeptNo', $CategoryInfo->Fields); // Inherited from Parent
$this->assertArrayHasKey('DeptName', $CategoryInfo->Fields); // Inherited from Parent
}
I am trying to reduce my memory usage on a large loop script so I made this little test. Using Doctrine I run this code:
$new_user_entry = getById($new_user_entries[0]['id']);
unset($new_user_entry);
$new_user_entry = getById($new_user_entries[1]['id']);
unset($new_user_entry);
function getById($holding_id)
{
return Doctrine_Core::getTable('UserHoldingTable')->findOneById($holding_id);
}
But it leaves about another 50 KB in the memory for each time I do the getById and unset and I don't know why or how to change it. I have a loop that goes through thousands of these plus a couple other functions and this is creating an issue.
I was unable to find a better solution so I gave up on Doctrine for this function and did a manual query with mysqli. This side stepped the issue and everything worked though not ideal.
My latest idea for do settings across my php project I am building was to store all my settings in a config PHP file, the php file will just return an array like this...
<?php
/**
* #Filename app-config.php
* #description Array to return to our config class
*/
return array(
'db_host' => 'localhost',
'db_name' => 'socialnetwork',
'db_user' => 'root',
'db_password' => '',
'site_path' => 'C:/webserver/htdocs/project/',
'site_url' => 'http://localhost/project/',
'image_path' => 'C:/webserver/htdocs/fproject/images/',
'image_url' => 'http://localhost/project/images/',
'site_name' => 'test site',
'admin_id' => 1,
'UTC_TIME' => gmdate('U', time()),
'ip' => $_SERVER['REMOTE_ADDR'],
'testtttt' => array(
'testtttt' => false
)
);
?>
Please note the actual config array is MUCH MUCH larger, many more items in it...
Then I would have a Config.class.php file that would load my array file and use the magic method __get($key). I can then autoload my config class file and access any site settings like this...
$config->ip;
$config->db_host;
$config->db_name;
$config->db_user;
So I realize this works great and is very flexible, in my class I can have it read in a PHP file with array like I am doing now, read INI file into array, read XML file into array, read JSON file into array. So it is very flexible for future projects but I am more concerned about performance for this particular project that I am working on now, it will be a social network site like facebook/myspace and I have had one before prior to this project and once I got around 100,000 user's performance became very important. So I am not "micro-optimizing" or "premature optimizing" I am stricly looking to do this the BEST way with performance in mind, it does not need to be flexible as I will only need it on this project.
So with that information, I always read about people trying to eliminate function calls as much as possible saying function calls cause more overhead. SO I am wanting to know from more experienced people what you think about this? I am new to using classes and objects in PHP, so is calling $config->db_user; as costly as calling a function in procedural like this getOption('db_user'); ? I am guessing it is the same as every time I would call a setting it is using the __get() method.
So for best performance should I go about this a different way? Like just loading my config array into a bootstrap file and accessing items when I need them like this...
$config['db_host'];
$config['db_username'];
$config['db_password'];
$config['ip'];
Please give me your thoughts on this without me having to do a bunch of benchmark test
From tests I've seen, I believe Alix Axel's response above is correct with respect to the relative speed of the four methods. Using a direct methods is the fastest, and using any sort of magic method usually is slower.
Also, in terms of optimization. The biggest performance hit for any single request in the system you describe will probably be the parsing of the XML/INI/JSON, rather than the accessing of it via whichever syntax you decide to go with. If you want to fix this, store the loaded data in APC once you parse it. This will come with the one caveat that you will want to only store static data in it, and not dynamic things like the UTC date.
Firstly, instead of an included file that returns an array I would instead use an .ini file and then use PHP's parse_ini_file() to load the settings.
Secondly, you shouldn't worry about function calls in this case. Why? Because you might have 100,000 users but if all 100,000 execute a script and need some config values then your 100,000 function calls are distributed over 100,000 scripts, which will be completely irrelevant as far as performance goes.
Function calls are only an issue if a single script execution, for example, executes 100,000 of them.
So pick whichever is the most natural implementation. Either an object or an array will work equally well. Actually an object has an advantage in that you can do:
$db = $config->database->hostname;
where $config->database can implicitly load just the database section of the INI file and will create another config object that can return the hostname entry. If you want to segment your config file this way.
IMO these are the fastest methods (in order):
$config['db_user']
$config->db_user directly
$config->db_user via __get()
getOption('db_user') via __get()
Also, you've already asked a lot of questions about your config system, not that I mind but I specifically remembered that you asked a question about whether you should use parse_ini_file() or not.
Why are you repeating the basically same questions over and over again?
I think you're taking premature optimization to a whole new level, you should worry about the performance of 100,000 users iff and when you get 50,000 users or so, not now.