READ FURTHER BELOW at CLI, FOR THE CLI QUESTION, WHICH JUST ADDED TO THE CONVERSATION! THX!
I have written a script which processes an xml file of around 160'000 entries with 48.1MB and a text file of 150'000 entries with 31.1MB, including some directory searches for external files, heavy interlinking and recursive checks and the result formatted and all saved into html files.
Surely, I did review the program couple times and ended up with the most efficient code I could think of. This is a local program and the generator doesn't need to run regularly. One could argue that I should use an other language than PHP, but PHP with simplexml, etc. just works the best for me and for this purpose. Also a set_time_limit('70000') doesn't bother me.
Although, here my question, is it possible to make the apache2 on my linux system, use my 4 CPU cores running my PHP script?
Even if I split the process and make several request's simultaneously, the CPU usage can't go above 1 CPU at a time.
I googled this topic but couldn't find a solution, so I may have to just run it over night, even though, I would appreciate some help to boost that thing!!!
ADDED INFO - And here a picture of my processes:
CLI:
I need to call my index.php in the linux terminal to execute. But I also wanna send four post variables ($_POST['example']) to the script. On top of that, I am looking for having my echos presented in some output file. Could anyone help quickly with the terminal command and the php command to track those 4 post variables inside:
if (PHP_SAPI === 'cli')
{
// ...
}
? ...sorry but this is my first php-cli interaction. Thx!
No, a single PHP script will never use multiple threads and thus always run on single core.
Depending on how much the things you do depend on each other you couldn't easily split them on multiple threads anyway.
EDIT: Author's response
This is not a solution but a nice workaround. I clone my virtual machine with the linux/apache2 install to kick in the same process but different parts of the file/process on different vm's, which lets the host system apply one core for each virtual system, that way I could break down the process time by around the factor 4. Thanks for your posts!
===============
If it's local, and you want to run it every now and then, you should probably just invoke it from a cron job. That way, you can spawn a process for each task you are doing. If you really do want to use PHP for it, you can even invoke PHP to do it from the cron line.
None the less, it sounds like you're doing an inherently single-threaded process anyway, and if you want it faster, should probably use something that isn't PHP for this.
Maybe you can use Spork! It's a php lib allowing you to fork the php process into multiple ones.
<?php
use Spork\Deferred\DeferredFactory;
use Spork\ProcessManager;
$manager = new ProcessManager(new DeferredFactory());
$manager->fork(function() {
// do something in another process!
})->then(function($output, $status) {
// do something in the parent process when it's done!
});
https://github.com/kriswallsmith/spork
SOLUTION, THX TO ThiefMaster and Zebediah49 recommending cli and my friend who supported me with the links: http://ch.php.net/manual/en/reserved.variables.argv.php / http://ch.php.net/manual/en/function.getopt.php
and here how I call the php through cli:
//whenRunFromCLI
//callCLI
//php index.php './data/xyfullFile1.xml' './data/xxfullFile2.utf' 0 60000
//php index.php './data/xyfullFile1.xml' './data/xxfullFile2.utf' 60000 120000
//php index.php './data/xyfullFile1.xml' './data/xxfullFile2.utf' 120000 all
if (PHP_SAPI === 'cli'){
$_POST['xml'] = $argv[1];
$_POST['example'] = $argv[2];
#$_POST['rangeFrom'] = $argv[3];
#$_POST['rangeTo'] = $argv[4];
}
and the Result of calling the php file in three terminals:
I know, I must give some more RAM to my virtual machine, lucky that I still have 8GB spare ;-)
Cheers and peace!
Related
I have read the other questions on SO with a similar title, but that's not what this question is about. I know HOW to execute a PHP script from another PHP script. The problem is, when I do so, it uses far too much CPU. I would like to know how to reduce this.
I have a simple front-controller-like script called index.php. It processes GET requests from a client and depending on the "action" parameter passed, it sends the request to the appropriate file to handle it. For example, this is a client request:
xhttp.open("GET", serverURL + "?action=doSomething" + "&userID=" + user.ID + "&time=" + lastServerTime, true);
index.php has an array that maps the "action" parameter to the appropriate file:
exec('php ' . $url_map[$action] . ' "' . $parameter1 . '"' . ' "' . $parameter2 . '" 2>&1', $output, $return_value);
For testing purposes, I have created a PHP script that does nothing except measure CPU utilisation and dump it to a log file:
<?php
function varDumpToFile($parameter1) {
$file = 'log.txt';
$dump = $parameter1;
$output = print_r($dump, true);
file_put_contents($file, $output, FILE_APPEND | LOCK_EX);
}
varDumpToFile(`ps -eo pcpu,pid,user,args --no-headers| sort -t. -nk1,2 -k4,4 -r |head -n 5`);
?>
This produces a log file that looks like this:
9.0 3123052 user /opt/cpanel/ea-php56/root/usr/bin/php cputest.php 10 147424 1537625595
Clearly, a PHP script shouldn't take 9% of CPU to execute. For comparison, I've run the same script directly accessing it via a GET request:
0.1 3186198 user lsphp:ic_html/dev/php/cputest.php
0.1% is more like it. But why does calling this PHP script from another PHP script use so much CPU? Is it because I have to execute a "new instance" of PHP when I exec PHP, which has a lot of overhead? If so, is there a way to exec a PHP script using an "already running" instance of PHP? Or is there another way of doing this?
I always say "when in doubt, look at PHP source code". In here, for instance. While doing exec, you have to fork the process, create a new stream, read from the input buffer, etc.
And also, while PHP is a compiled language, for the newly forked process, you must run the opcode compiler to generate opcodes (instructions similar to Java bytecode) and then execute those. You can read all about it here. In the end you run the compiler twice, for each fork separately.
Is it worth 9% of your CPU? I have no idea. Maybe. Maybe not. Who knows.
"Better solution"? Upgrade to latest version of PHP. PHP 5.6 is not supported anymore and security updates will cease in 3 months. Even better solution - keep a normal object-oriented and maintainable code without using exec. IMO, it's okay to play around with exec like you are. But if it's your production code, I pray for the souls of those, who would maintain your code after you.
Whatever which way you run your application be mod_php or fpm, they rely on having worker processes ready to manage your request. Process management is built in: they will do their best to keep as many workers idle as you specify and reuse them to avoid exactly this problem, having to fork processes at the least desirable moment.
Not only there's overhead on executing new processes, but the execution environment will be completely different too. If you look into your php configuration there will be several php.ini files, one for each specific environment. This means that one environment could have different modules enabled or different configuration outright. It's not uncommon to have cli scripts max_execution_time or memory_limit set to unlimited. This can affect resource usage on your server, but it's also a pain to maintain.
Also, since your scripts will be running in a brand new process in a different execution environment, this won't have access to some variables (like $_SERVER or $_POST) or capabilities like sending headers.
And there's this thing called shared memory. As #Alex mentions, scripts have to be compiled. If you have opcode cache enabled (which you should) the bytecode gets cached when compiled and this compilation process can be skipped if the resulting bytecode it's there already. For this to work you need to have a persistent running process that can keep this memory around. If you are creating a new process it can't access this shared area and has to do the compilation all by itself.
I'd use the following code:
$SERVER_PATH = dirname(__FILE__);
shell_exec($PHP_LOCATION.' '.$SERVER_PATH."/script.php?k1=v1&k2=v2 > /dev/null 2>/dev/null &");
Where:
$PHP_LOCATION should contain the path to PHP,
$SERVER_PATH - is current working directory (fortunately the script to run is in the same directory),
> /dev/null 2>/dev/null & added to make this call asynchronous (taken from Asynchronous shell exec in PHP question)
This code has two problems:
As far as I remember ?k1=v1&k2=v2 will work for web-call only, so in this particular case parameters will not be passed to the script.
I don't really know how to init the $PHP_LOCATION variable to be flexible and to work on the most hosts.
I conducted some research regarding both problems:
To solve 1 suggested to use -- 'parameters_string' but it is also recommended to modify the script to parse parameters string which looks a bit clumsy. Is there a better solution?
To solve 2 I found a solution to use PHP_BINARY but this is a PHP 5.4+ case (I'm using 5.3). But the original question was about to run PHP of the same version as the original script version. So for me (as I use PHP 5.3 only) is there probably a solution?
EDIT 0
Let me do some explanation why I stuck to this weird (for PHP) approach:
Those PHP scripts should be separate from each other:
one of those will analyze the data and
the second will generate PNG graphs as a final result.
Those scripts aren't intended to run simultaneously, this means that the second can run at it's own schedule it is only needed that the run should be upon its data will be ready (which done by the first script). So no data should be passed back from second script (child) to the first (parent).
EDIT 1
As seeing from most of the comments the main discussion goes to forking direction. However I'd like to make stress on the point 1 and 2 asked in the original questions. I have some reasons to solve the task in the way I pointed out and I tried to point all that reason. If some of my points looks weird, please post a comment - I will make it more clear or I will change the main question.
Thank you in advance!
How to get executable
Assuming you're using Linux, you can use:
function getBinaryRunner($binary)
{
return trim(shell_exec('which '.$binary));
}
For instance, same may be used for checking if needed stuff is installed:
function checkIfCommandExists($command)
{
$result = shell_exec('which '.$command);
return !empty($result);
}
Some points:
Yes, it will work only for Linux
You should be careful with user input if it is allowed to be passed to shell commands: escapeshellarg() and company
Indeed, normally PHP should not be used for stuff like this as if it is about asynchronous requests, better to either implement forking or run commands from external workers.
How to pass parameters
With doing shell_exec() via file system path you're accessing the file and, obviously, all "GET" parameters are becoming just part of file name, it is no longer "URI" as there is no web-server to process that. So you have two options:
Invoke call via accessing your web-server. So it will be like:
//Yes, you will use wget or, better, curl to make web-request from CLI
shell_exec('wget http://your.web-server.domain/script.php?foo=bar');
Downside here: if you'll access your web-server via public DNS, it will cause network gap and all processing overheads. Benefit - obviously, you will not have to expect anything else in your script and make no distinction between CLI and non-CLI calls
Use $_SERVER array in your script and pass parameters as it should be with CLI:
shell_exec('/usr/bin/php /path/to/script.php foo bar');
//inside your script.php you will see:
//$_SERVER['argv'][0] is "script.php"
//$_SERVER['argv'][1] is "foo"
//$_SERVER['argv'][2] is "bar"
Yes, it will require modification in the script, and, probably, some logic of how to map "regular" web-requests and CLI ones. I would suggest even to think of separating CLI-related stuff to different scripts bundle so not to mess that logic.
More about "asynchronous run"
When you do php script.php & you just run it in background mode. That, however, still keeps parent-child relation for your process. That means - if parent process dies, it's childs will also be removed. To be precise, SIGHUP will be triggered and to avoid this situation you should use nohup command. It will allow to emulate "detaching" of a process and therefore making it's run reliable and independent of circumstances happening to parent process.
Is it possible to do a non-stopping program in PHP? For example, using 2% of processor and some memory all of the time. If it's not possible, can you tell me what direction I should be looking for c++ non-stopping program (on UNIX server) and how to pass variables from PHP to c++.
EDIT:
First: I have max execution time which is stopping it (but I need it for other scripts in problem of bugs).
Second: I don't want to burn server so while true it's not the best idea (it have to have some max memory and processor usage).
You can use CLI
Create your php file and run it on the command line, it won't stop unless the code ends
You can limit the memory usage: php -d memory_limit=128M my_script.php this is changing php.ini directives so you can edit on your own instead of defining it every time
you can do something like this
// run-forever.php
while(true) {
// your executive code
usleep(500) // time in us - something like yield to not ocupy the CPU
}
and then you can run: php run-forever.php
btw if you tend to use web based php you'll have to define set_time_limit(0); before a while loop.
In PHP I create an XLS-file using COM-object.
$excel = new COM("Excel.Application") or die("Excel is not installed!");
$excel->WorkBooks->Add();
...
$excel->WorkBooks[1]->SaveAs($path);
$excel->Quit();
$excel->Release();
$excel = null;
After this code has been run, Excel.exe process still be in memory for ~ 6 minutes. How to force this process to quit immediately?
You are using COM objects, so you must be talking about php on Windows. Basically, you can kill tasks using the command line tool taskkill. You can call that using the php command shell_exec, like
shell_exec('taskkill /F /IM "excel.exe"');
If you don't have the rights (but I believe you should because you started this process), run the cmd line using runas.
BUT! You are using php, so if you are running this in a web environment and people are accessing your website simultaneously, you might have multiple instances of Excel running. If that is the case, you'd have to figure out which one you want to kill.
The process sticks around for some time in case another "Excel.Application" object is requested, in which case we can avoid the cost of starting a new process and loading all the DLLs.
I'd just see it as a feature.
There is another way...
Try Uninitializing COMS. Instead of doing it just once, do it 4 times. This was established through trial and error so I do not have a valid reason for it except that it works. However, this does not work all the time. It might just work in your case
We are running PHP on a Windows server (a source of many problems indeed, but migrating is not an option currently). There are a few points where a user-initiated action will need to kick off a few things that take a while and about which the user doesn't need to know if they succeed or fail, such as sending off an email or making sure some third-party accounts are updated. If I could just fork with pcntl_fork(), this would be very simple, but the PCNTL functions are not available in Windows.
It seems the closest I can get is to do something of this nature:
exec( 'php-cgi.exe somescript.php' );
However, this would be far more complicated. The actions I need to kick off rely on a lot of context that already will exist in the running process; to use the above example, I'd need to figure out the essential data and supply it to the new script in some way. If I could fork, it'd just be a matter of letting the parent process return early, leaving the child to work on a few more things.
I've found a few people talking about their own work in getting various PCNTL functions compiled on Windows, but none seemed to have anything available (broken links, etc).
Despite this question having practically the same name as mine, it seems the problem was more execution timeout than needing to fork. So, is my best option to just refactor a bit to deal with calling php-cgi, or are there other options?
Edit: It seems exec() won't work for this, at least not without me figuring some other aspect of it, as it waits until the call returns. I figured I could use START, sort of like exec( 'start php-cgi.exe somescript.php' );, but it still waits until the other script finishes.
how about installing psexec and use the -d (don't wait) option
exec('psexec -d php-cgi.exe somescript.php');
Get PSExec and run the command:
exec("psexec -d php-cgi.exe myfile.php");
PSTools are a good patch in, but I'll leave this here:
If your server runs windows 10 and it has the latest updates, you can install a Linux subsystem, which has its own Kernel that supports native forking.
This is supported by Microsoft officially.
Here's a good guide on how to do it.
Once you've installed the subsystem itself, you need to install php on the subsystem.
Your windows "c:\" drive can be found under "/mnt/c", so you can run your php from the subsystem, which supports forking (and by extension the subsystem's php can use pcntl_fork).
Example: php /mnt/c/xampp/htdocs/test.php
If you want to run the subsystem's php directly from a windows command line you can simply use the "wsl" command.
Assuming you're running this from under "C:\xampp\htdocs\"
Example: wsl php main.php
The "wsl" command will resolve the path for you, so you don't need to do any dark magic, if you call the command under c:\xampp\htdocs, the subsystem will resolve it as "/mnt/c/xampp/htdocs/".
If you're running your server as an apache server, you don't really need to do anything extra, just stop the windows apache server and start the linux one and you're done.
Obviously you'll need to install all the missing php modules that you need on the subsystem.
You can create a daemon/background process to run the code (e.g. sending emails) and the request would just have to add items to the queue, let the deamon do the heavy lifting.
For example, a file send_emails.bat:
cls
C:\PHP533\php.exe D:\web\server.php
exit
open windows task scheduler, and have the above send_emails.bat run every 30 minutes. Make sure only one instance runs at a time or you might run each task in multiples, or send each email twice. I say 30 minutes in case something breaks temporarily (memory issues, database unavailable, etc), it will re-start every 30 minutes rather than having a never ending process that just stops. The following is a skeleton daemon... not complete or tested I am just typing out an example:
<?php
set_time_limit(60*30); // don't run
$keepgoing = true;
$timeout = time()+ 60*29; // 29 minutes
while(time() < $timeout)
{
// grab emails from database
$result = $db->query('select subject, body, to_email FROM email_queue');
if($result->num_rows == 0)
{
sleep(10); // so we are not taxing the database
}
else
{
while($row = $result->fetch_assoc())
{
// send email
}
}
}
exit;
?>
Finally you just need the request to add the item to the queue in a database, and let the daemon handle the heavy lifting.
$db->query('insert into email_queue(to,subject,body) values ('customer#email.com','important email','<b>html body!</b>');