Symfony 1.4 functional test - reduce memory usage - php

I have a csv file that defines routes to test, and the expected status code each route should return.
I am working on a functional test that iterates over the csv file and makes a request to each route, then checks to see if the proper status code is returned.
$browser = new sfTestFunctional(new sfBrowser());
foreach ($routes as $route)
{
$browser->
get($route['path'])->
with('response')->begin()->
isStatusCode($route['code'])->
end()
;
print(memory_get_usage());
}
/*************** OUTPUT: *************************
ok 1 - status code is 200
97953280# get /first_path
ok 2 - status code is 200
109607536# get /second_path
ok 3 - status code is 403
119152936# get /third_path
ok 4 - status code is 200
130283760# get /fourth_path
ok 5 - status code is 200
140082888# get /fifth_path
...
/***************************************************/
This continues until I get an allowed memory exhausted error.
I have increased the amount of allowed memory, which temporarily solved the problem. That is not a permanent solution since more routes will be added to the csv file over time.
Is there a way to reduce the amount of memory this test is using?

I faced the same out of memory problem. I needed to crawl a very long list of URI (around 30K) to generate the HTML cache. Thanks to Marek, I tried to fork processes. There is still a little leak, but this is insignificant.
As an input, I had a text file with one line per URI. Of course you can easily adapt the following script with a CSV.
const NUMBER_OF_PROCESS = 4;
const SIZE_OF_GROUPS = 5;
require_once(dirname(__FILE__).'/../../config/ProjectConfiguration.class.php');
$configuration = ProjectConfiguration::getApplicationConfiguration('frontend', 'prod', false);
sfContext::createInstance($configuration);
$file = new SplFileObject(dirname(__FILE__).'/list-of-uri.txt');
while($file->valid())
{
$count = 0;
$uris = array();
while($file->valid() && $count < NUMBER_OF_PROCESS * SIZE_OF_GROUPS) {
$uris[] = trim($file->current());
$file->next();
$count++;
}
$urisGroups = array_chunk($uris, SIZE_OF_GROUPS);
$childs = array();
echo "---\t\t\t Forking ".sizeof($urisGroups)." process \t\t\t ---\n";
foreach($urisGroups as $uriGroup) {
$pid = pcntl_fork();
if($pid == -1)
die('Could not fork');
if(!$pid) {
$b = new sfBrowser();
foreach($uriGroup as $key => $uri) {
$starttime = microtime(true);
$b->get($uri);
$time = microtime(true) - $starttime;
echo 'Mem: '.memory_get_peak_usage().' - '.$time.'s - URI N°'.($key + 1).' PID '.getmypid().' - Status: '.$b->getResponse()->getStatusCode().' - URI: '.$uri."\n";
}
exit();
}
if($pid) {
$childs[] = $pid;
}
}
while(count($childs) > 0) {
foreach($childs as $key => $pid) {
$res = pcntl_waitpid($pid, $status, WNOHANG);
// If the process has already exited
if($res == -1 || $res > 0)
unset($childs[$key]);
}
sleep(1);
}
}
const NUMBER_OF_PROCESS is defining the number of parallel processes working (thus, you save time if you have a multi-core processor)
const NUMBER_OF_PROCESS is defining the number of URI that will be crawled by sfBrowser in each process. You can decrease it if you still have out of memory problems

Related

How to cache remote video

I want to play video from a remote server. so I write this function.
$remoteFile = 'blabla.com/video_5GB.mp4';
play($remoteFile);
function play($url){
ini_set('memory_limit', '1024M');
set_time_limit(3600);
ob_start();
if (isset($_SERVER['HTTP_RANGE'])) $opts['http']['header'] = "Range: " . $_SERVER['HTTP_RANGE'];
$opts['http']['method'] = "HEAD";
$conh = stream_context_create($opts);
$opts['http']['method'] = "GET";
$cong = stream_context_create($opts);
$out[] = file_get_contents($url, false, $conh);
$out[] = $httap_response_header;
ob_end_clean();
array_map("header", $http_response_header);
readfile($url, false, $cong);
}
The above function works very well in playing videos. But I don't want to burden the remote server
My question is how can I cache video files every 5 hours to my server. if possible, the cache folder contains small files (5MB / 10MB) from remote video
As mentioned in my comment, the following code has been tested only on a small selection of MP4 files. It could probably do with some more work but it does fill your immediate needs as it is.
It uses exec() to spawn a separate process that generates the cache files when they are needed, i.e. on the first request or after 5 hours. Each video must have its own cache folder because the cached chunks are simply called 1, 2, 3, etc. Please see additional comments in the code.
play.php - This is the script that will be called by the users from the browser
<?php
ini_set('memory_limit', '1024M');
set_time_limit(3600);
$remoteFile = 'blabla.com/video_5GB.mp4';
play($remoteFile);
/**
* #param string $url
*
* This will serve the video from the remote url
*/
function playFromRemote($url)
{
ob_start();
$opts = array();
if(isset($_SERVER['HTTP_RANGE']))
{
$opts['http']['header'] = "Range: ".$_SERVER['HTTP_RANGE'];
}
$opts['http']['method'] = "HEAD";
$conh = stream_context_create($opts);
$opts['http']['method'] = "GET";
$cong = stream_context_create($opts);
$out[] = file_get_contents($url, false, $conh);
$out[] = $http_response_header;
ob_end_clean();
$fh = fopen('response.log', 'a');
if($fh !== false)
{
fwrite($fh, print_r($http_response_header, true)."\n\n\n\n");
fclose($fh);
}
array_map("header", $http_response_header);
readfile($url, false, $cong);
}
/**
* #param string $cacheFolder Directory in which to find the cached chunk files
* #param string $url
*
* This will serve the video from the cache, it uses a "completed.log" file which holds the byte ranges of each chunk
* this makes it easier to locate the first chunk of a range request. The file is generated by the cache script
*/
function playFromCache($cacheFolder, $url)
{
$bytesFrom = 0;
$bytesTo = 0;
if(isset($_SERVER['HTTP_RANGE']))
{
//the client asked for a specific range, extract those from the http_range server var
//can take the form "bytes=123-567" or just a from "bytes=123-"
$matches = array();
if(preg_match('/^bytes=(\d+)-(\d+)?$/', $_SERVER['HTTP_RANGE'], $matches))
{
$bytesFrom = intval($matches[1]);
if(!empty($matches[2]))
{
$bytesTo = intval($matches[2]);
}
}
}
//completed log is a json_encoded file containing an array or byte ranges that directly
//correspond with the chunk files generated by the cache script
$log = json_decode(file_get_contents($cacheFolder.DIRECTORY_SEPARATOR.'completed.log'));
$totalBytes = 0;
$chunk = 0;
foreach($log as $ind => $bytes)
{
//find the first chunk file we need to open
if($bytes[0] <= $bytesFrom && $bytes[1] > $bytesFrom)
{
$chunk = $ind + 1;
}
//and while we are at it save the last byte range "to" which is the total number of bytes of all the chunk files
$totalBytes = $bytes[1];
}
if($bytesTo === 0)
{
if($totalBytes === 0)
{
//if we get here then something is wrong with the cache, revert to serving from the remote
playFromRemote($url);
return;
}
$bytesTo = $totalBytes - 1;
}
//calculate how many bytes will be returned in this request
$contentLength = $bytesTo - $bytesFrom + 1;
//send some headers - I have hardcoded MP4 here because that is all I have developed with
//if you are using different video formats then testing and changes will no doubt be required
header('Content-Type: video/mp4');
header('Content-Length: '.$contentLength);
header('Accept-Ranges: bytes');
//Send a header so we can recognise that the content was indeed served by the cache
header('X-Cached-Date: '.(date('Y-m-d H:i:s', filemtime($cacheFolder.DIRECTORY_SEPARATOR.'completed.log'))));
if($bytesFrom > 0)
{
//We are sending back a range so it needs a header and the http response must be 206: Partial Content
header(sprintf('content-range: bytes %s-%s/%s', $bytesFrom, $bytesTo, $totalBytes));
http_response_code(206);
}
$bytesSent = 0;
while(is_file($cacheFolder.DIRECTORY_SEPARATOR.$chunk) && $bytesSent < $contentLength)
{
$cfh = fopen($cacheFolder.DIRECTORY_SEPARATOR.$chunk, 'rb');
if($cfh !== false)
{
//if we are fetching a range then we might need to seek the correct starting point in the first chunk we look at
//this check will be performed on all chunks but only the first one should need seeking so no harm done
if($log[$chunk - 1][0] < $bytesFrom)
{
fseek($cfh, $bytesFrom - $log[$chunk - 1][0]);
}
//read and send data until the end of the file or we have sent what was requested
while(!feof($cfh) && $bytesSent < $contentLength)
{
$data = fread($cfh, 1024);
//check we are not going to be sending too much back and if we are then truncate the data to the correct length
if($bytesSent + strlen($data) > $contentLength)
{
$data = substr($data, 0, $contentLength - $bytesSent);
}
$bytesSent += strlen($data);
echo $data;
}
fclose($cfh);
}
//move to the next chunk
$chunk ++;
}
}
function play($url)
{
//I have chosen a simple way to make a folder name, this can be improved any way you need
//IMPORTANT: Each video must have its own cache folder
$cacheFolder = sha1($url);
if(!is_dir($cacheFolder))
{
mkdir($cacheFolder, 0755, true);
}
//First check if we are currently in the process of generating the cache and so just play from remote
if(is_file($cacheFolder.DIRECTORY_SEPARATOR.'caching.log'))
{
playFromRemote($url);
}
//Otherwise check if we have never completed the cache or it was completed 5 hours ago and if so spawn a process to generate the cache
elseif(!is_file($cacheFolder.DIRECTORY_SEPARATOR.'completed.log') || filemtime($cacheFolder.DIRECTORY_SEPARATOR.'completed.log') + (5 * 60 * 60) < time())
{
//fork the caching to a separate process - the & echo $! at the end causes the process to run as a background task
//and print the process ID returning immediately
//The cache script can be anywhere, pass the location to sprintf in the first position
//A base64 encoded url is passed in as argument 1, sprintf second position
$cmd = sprintf('php %scache.php %s & echo $!', __DIR__.DIRECTORY_SEPARATOR, base64_encode($url));
$pid = exec($cmd);
//with that started we need to serve the request from the remote url
playFromRemote($url);
}
else
{
//if we got this far then we have a completed cache so serve from there
playFromCache($cacheFolder, $url);
}
}
cache.php - This script will be called by play.php via exec()
<?php
//This script expects as argument 1 a base64 encoded url
if(count($argv)!==2)
{
die('Invalid Request!');
}
$url = base64_decode($argv[1]);
//make sure to use the same method of obtaining the cache folder name as the main play script
//or change the code to pass it in as an argument
$cacheFolder = sha1($url);
if(!is_dir($cacheFolder))
{
die('Invalid Arguments!');
}
//double check it is not already running
if(is_file($cacheFolder.DIRECTORY_SEPARATOR.'caching.log'))
{
die('Already Running');
}
//create a file so we know this has started, the file will be removed at the end of the script
file_put_contents($cacheFolder.DIRECTORY_SEPARATOR.'caching.log', date('d/m/Y H:i:s'));
//get rid of the old completed log
if(is_file($cacheFolder.DIRECTORY_SEPARATOR.'completed.log'))
{
unlink($cacheFolder.DIRECTORY_SEPARATOR.'completed.log');
}
$bytesFrom = 0;
$bytesWritten = 0;
$totalBytes = 0;
//this is the size of the chunk files, currently 10MB
$maxSizeInBytes = 10 * 1024 * 1024;
$chunk = 1;
//open the url for binary reading and first chunk for binary writing
$fh = fopen($url, 'rb');
$cfh = fopen($cacheFolder.DIRECTORY_SEPARATOR.$chunk, 'wb');
if($fh !== false && $cfh!==false)
{
$log = array();
while(!feof($fh))
{
$data = fread($fh, 1024);
fwrite($cfh, $data);
$totalBytes += strlen($data); //use actual length here
$bytesWritten += strlen($data);
//if we are on or passed the chunk size then close the chunk and open a new one
//keeping a log of the byte range of the chunk
if($bytesWritten>=$maxSizeInBytes)
{
$log[$chunk-1] = array($bytesFrom,$totalBytes);
$bytesFrom = $totalBytes;
fclose($cfh);
$chunk++;
$bytesWritten = 0;
$cfh = fopen($cacheFolder.DIRECTORY_SEPARATOR.$chunk, 'wb');
}
}
fclose($fh);
$log[$chunk-1] = array($bytesFrom,$totalBytes);
fclose($cfh);
//write the completed log. This is a json encoded string of the chunk byte ranges and will be used
//by the play script to quickly locate the starting chunk of a range request
file_put_contents($cacheFolder.DIRECTORY_SEPARATOR.'completed.log', json_encode($log));
//finally remove the caching log so the play script doesn't think the process is still running
unlink($cacheFolder.DIRECTORY_SEPARATOR.'caching.log');
}

Connection errors in forked PHP processes

I have a PHP script that takes N documents from MongoDB, forks the process into K child PHP processes, each process does some things with each document and tries to update document's info (see the code below).
On my local environment (Docker) everything is cool, but on the server (no Docker there) sometimes during the loop strange things happen...
Randomly all forked processes can not connect to MongoDB. The updateOne command returns an error :
"Failed to send "update" command with database "databasename": Invalid reply from server. in /vendor/mongodb/mongodb/src/Operation/Update.php on line 158".
This happens to all processes at the same time only for one (or several) random loop iterations. When each process goes to another iteration (takes the next document) -- everything is ok again. I make 5 tries to write to MongoDB.
Each try is with delay +1 sec to the previous, so the first try makes immediately, if any exception is caught -- wait a second and try again, the next try will be in 2 seconds and so on. But this does not help, all these 5 tries are broken.
This is not mongoDB problem, it's log is empty and it even don't receive anything from PHP, when error happens.
Also I have admitted, the more simultaneous processes I run -- the more frequent errors occur.
Also it is not server resource problem, when error occurs, half of RAM (4 gig) is free and CPU is working for the half of it's power.
Maybe PHP has some configuration for this? Some memory limits or something...
I use PHP v 7.1.30
MongoDB v 3.2.16
PHP Package mongodb/mongodb v 1.1.2
<?php
$processesAmount = 5;
$documents = $this->mongoResource->getDocuments();
for ($processNumber = 0; $processNumber < $processesAmount; $processNumber++) {
// create child process
$pid = pcntl_fork();
// do not create new processes in child processes
if ($pid === 0) {
break;
}
if ($pid === -1) {
// some errors catching staff here...
}
else if ($pid === 0) {
// create new MongoDB connection
}
else {
// Protect against Zombie children
// main process waits before all child processes end
for ($i = 0; $i < $processesAmount; $i++) {
pcntl_wait($status);
}
return null;
}
// spread documents to each process without duplicates
for ($i = $processNumber; $i < count($documents); $i += $processesAmount) {
$newDocumentData = $this->doSomeStaffWithDocument($documents[$i]);
$this->mongoResource->updateDocument($documents[$i], $newDocumentData);
}
}
There could be many issues here, one being that all processes are sharing 1 DB connection and the first to connect is then disconnecting and killing the connection for them all. Check the second example in the docs here: https://www.php.net/manual/en/ref.pcntl.php
If that doesn't help, the way I read your code the "spreading" part is happening in every process, when it should be happening once. Shouldn't you be putting the "work" in the child section like below?
$processesAmount = 5;
$documents = $this->mongoResource->getDocuments();
$numDocs = count($documents);
$i = 0;
$children = [];
for ($processNumber = 0; $processNumber < $processesAmount; $processNumber++) {
// create child
$pid = pcntl_fork();
if ($pid === -1) {
// some errors catching staff here...
} else if ($pid) {
//parent
$children[] = $pid;
} else {
//child
while (!empty($documents) && $i <= $numDocs) {
$i += $processNumber;
$doc = $documents[$i] ?? null;
unset($documents[$i]);
$newDocumentData = $this->doSomeStaffWithDocument($doc);
$this->mongoResource->updateDocument($doc, $newDocumentData);
}
}
}
//protect against zombies and wait for parent
//children is always empty unless in parent
while (!empty($children)) {
foreach ($children as $key => $pid) {
$status = null;
$res = pcntl_waitpid($pid, $status, WNOHANG);
if ($res == -1 || $res > 0) { //if the process has already exited
unset($children[$key]);
}
}
}
}

PHP: Writing a lot of small files the fastest or/and most efficient way

Imagine that a campaign will have 10,000 to 30,000 files about 4kb each should be written to disk.
And, there will be a couple of campaigns running at the same time. 10 tops.
Currently, I'm going with the usual way: file_put_contents.
it gets the job done but in a slow way and its php process is taking 100% cpu usage all the way.
fopen, fwrite, fclose, well, the result is similar to file_put_contents.
I've tried some async io stuff such as php eio and swoole.
it's faster but it'll yield "too many open files" after some time.
php -r 'echo exec("ulimit -n");' the result is 800000.
Any help would be appreciated!
well, this is sort of embarrassing... you guys are correct, the bottleneck is how it generates the file content...
I am assuming that you cannot follow SomeDude's very good advice on using databases instead, and you already have performed what hardware tuning could be performed (e.g. increasing cache, increasing RAM to avoid swap thrashing, purchasing SSD drives).
I'd try and offload the file generation to a different process.
You could e.g. install Redis and store the file content into the keystore, which is very fast. Then, a different, parallel process could extract the data from the keystore, delete it, and write to a disk file.
This removes all disk I/O from the main PHP process, and lets you monitor the backlog (how many keypairs are still unflushed: ideally zero) and concentrate on the bottleneck in content generation. You'll possibly need some extra RAM.
On the other hand, this is not too different from writing to a RAM disk. You could also output data to a RAM disk, and it would be probably even faster:
# As root
mkdir /mnt/ramdisk
mount -t tmpfs -o size=512m tmpfs /mnt/ramdisk
mkdir /mnt/ramdisk/temp
mkdir /mnt/ramdisk/ready
# Change ownership and permissions as appropriate
and in PHP:
$fp = fopen("/mnt/ramdisk/temp/{$file}", "w");
fwrite($fp, $data);
fclose($fp);
rename("/mnt/ramdisk/temp/{$file}", "/mnt/ramdisk/ready/{$file}");
and then have a different process (crontab? Or continuously running daemon?) move files from the "ready" directory of the RAM disk to the disk, deleting then the RAM ready file.
File System
The time required to create a file depends on the number of files in the directory, with various dependency functions that themselves depend on the file system. ext4, ext3, zfs, btrfs etc. will exhibit different behaviour. Specifically, you might experience significant slowdowns if the number of files exceeds some quantity.
So you might want to try timing the creation of a large number of sample files in one directory, and see how this time grows with the growth of the number. Keep in mind that there will be a performance penalty for access to different directories, so using straight away a very large number of subdirectories is again not recommended.
<?php
$payload = str_repeat("Squeamish ossifrage. \n", 253);
$time = microtime(true);
for ($i = 0; $i < 10000; $i++) {
$fp = fopen("file-{$i}.txt", "w");
fwrite($fp, $payload);
fclose($fp);
}
$time = microtime(true) - $time;
for ($i = 0; $i < 10000; $i++) {
unlink("file-{$i}.txt");
}
print "Elapsed time: {$time} s\n";
Creation of 10000 files takes 0.42 seconds on my system, but creation of 100000 files (10x) takes 5.9 seconds, not 4.2. On the other hand, creating one eighth of those files in 8 separate directories (the best compromise I found) takes 6.1 seconds, so it's not worthwhile.
But suppose that creating 300000 files took 25 seconds instead of 17.7; dividing those files in ten directories might take 22 seconds, and make the directory split worthwhile.
Parallel processing: r strategy
TL;DR this doesn't work so well on my system, though your mileage may vary. If the operations to be done are lengthy (here they are not) and differently bound from the main process, then it can be advantageous to offload them each to a different thread, provided you don't spawn too many threads.
You will need pcntl functions installed.
$payload = str_repeat("Squeamish ossifrage. \n", 253);
$time = microtime(true);
for ($i = 0; $i < 100000; $i++) {
$pid = pcntl_fork();
switch ($pid) {
case 0:
// Parallel execution.
$fp = fopen("file-{$i}.txt", "w");
fwrite($fp, $payload);
fclose($fp);
exit();
case -1:
echo 'Could not fork Process.';
exit();
default:
break;
}
}
$time = microtime(true) - $time;
print "Elapsed time: {$time} s\n";
(The fancy name r strategy is taken from biology).
In this example, spawning times are catastrophic if compared to what each child needs to do. Therefore, overall processing time skyrockets. With more complex children things would go better, but you must be careful not to turn the script into a fork bomb.
One possibility, if possible, could be to divide the files to be created into, say, chunks of 10% each. Each child would then change its working directory with chdir(), and create its files in a different directory. This would negate the penalty for writing files in different subdirectories (each child writes in its current directory), while benefiting from writing less files. In this case, with very lightweight and I/O bound operations in the child, again the strategy isn't worthwhile (I get doubled execution time).
Parallel processing: K strategy
TL;DR this is more complex but works well... on my system. Your mileage may vary.
While r strategy involves lots of fire-and-forget threads, K strategy calls for a limited (possibly one) child which is nurtured carefully. Here we offload the creation of all the files to one parallel thread, and communicate with it via sockets.
$payload = str_repeat("Squeamish ossifrage. \n", 253);
$sockets = array();
$domain = (strtoupper(substr(PHP_OS, 0, 3)) == 'WIN' ? AF_INET : AF_UNIX);
if (socket_create_pair($domain, SOCK_STREAM, 0, $sockets) === false) {
echo "socket_create_pair failed. Reason: ".socket_strerror(socket_last_error());
}
$pid = pcntl_fork();
if ($pid == -1) {
echo 'Could not fork Process.';
} elseif ($pid) {
/*parent*/
socket_close($sockets[0]);
} else {
/*child*/
socket_close($sockets[1]);
for (;;) {
$cmd = trim(socket_read($sockets[0], 5, PHP_BINARY_READ));
if (false === $cmd) {
die("ERROR\n");
}
if ('QUIT' === $cmd) {
socket_write($sockets[0], "OK", 2);
socket_close($sockets[0]);
exit(0);
}
if ('FILE' === $cmd) {
$file = trim(socket_read($sockets[0], 20, PHP_BINARY_READ));
$len = trim(socket_read($sockets[0], 8, PHP_BINARY_READ));
$data = socket_read($sockets[0], $len, PHP_BINARY_READ);
$fp = fopen($file, "w");
fwrite($fp, $data);
fclose($fp);
continue;
}
die("UNKNOWN COMMAND: {$cmd}");
}
}
$time = microtime(true);
for ($i = 0; $i < 100000; $i++) {
socket_write($sockets[1], sprintf("FILE %20.20s%08.08s", "file-{$i}.txt", strlen($payload)));
socket_write($sockets[1], $payload, strlen($payload));
//$fp = fopen("file-{$i}.txt", "w");
//fwrite($fp, $payload);
//fclose($fp);
}
$time = microtime(true) - $time;
print "Elapsed time: {$time} s\n";
socket_write($sockets[1], "QUIT\n", 5);
$ok = socket_read($sockets[1], 2, PHP_BINARY_READ);
socket_close($sockets[1]);
THIS IS HUGELY DEPENDENT ON THE SYSTEM CONFIGURATION. For example on a mono-processor, mono-core, non-threading CPU, this is madness - you'll at least double the total runtime, but more likely it will go from three to ten times as slow.
So this is definitely not the way to pimp up something running on an old system.
On a modern multithreading CPU and supposing the main content creation loop is CPU bound, you may experience the reverse - the script might go ten times faster.
On my system, the "forking" solution above runs a bit less than three times faster. I expected more, but there you are.
Of course, whether the performance is worth the added complexity and maintenance, remains to be evaluated.
The bad news
While experimenting above, I came to the conclusion that file creation on a reasonably configured and performant machine in Linux is fast as hell, so not only it's difficult to squeeze more performances, but if you're experiencing slowness, it's very likely that it is not file related. Try detailing some more about how you create that content.
Having read your description, I understand you're writing many files that are each rather small. The way PHP usually works (at least in the Apache server), there is overhead for each filesystem access: a file pointer and buffer is opened and maintained for each file. Since there's no code samples to review here, it's hard to see where inefficiencies are.
However, using file_put_contents() for 300,000+ files appears to be slightly less efficient than using fopen() and fwrite() or fflush() directly, then fclose() when you're done. I'm saying that based on a benchmark done by a fellow in the comments of the PHP documentation for file_put_contents() at http://php.net/manual/en/function.file-put-contents.php#105421
Next, when dealing with such small file sizes, it sounds like there's a great opportunity to use a database instead of flat files (I'm sure you've got that before). A database, whether mySQL or PostgreSQL, is highly optimized for simultaneous access to many records, and can internally balance CPU workload in ways that filesystem access never can (and binary data in records is possible too). Unless you need access to real files directly from your server hard drives, a database can simulate many files by allowing PHP to return individual records as file data over the web (i.e., by using the header() function). Again, I'm assuming this PHP is running as a web interface on a server.Overall, what I am reading suggests that there may be an inefficiency somewhere else besides filesystem access. How is the file content generated? How does the operating system handle file access? Is there compression or encryption involved? Are these images or text data? Is the OS writing to one hard drive, a software RAID array, or some other layout? Those are some of the questions I can think of just glancing over your problem. Hopefully my answer helped. Cheers.
The main idea is to have less files.
Ex: 1,000 files can be added in 100 files, each containing 10 files - and parsed with explode and you will get 5x faster on write and 14x faster on read+parse
with file_put_contents and fwrite optimized, you will not get more than 1.x speed. This solution can be useful for read/write. Other solution may be mysql or other db.
On my computer to create 30k files with a small string it takes 96.38 seconds and to append 30k times same string in one file it takes 0.075 sec
I can offer you an unusual solution, when you can use it fewer times file_put_contents function. bellow this i show you a simple code to understand how it works.
$start = microtime(true);
$str = "Aaaaaaaaaaaaaaaaaaaaaaaaa";
if( !file_exists("test/") ) mkdir("test/");
foreach( range(1,1000) as $i ) {
file_put_contents("test/".$i.".txt",$str);
}
$end = microtime(true);
echo "elapsed_file_put_contents_1: ".substr(($end - $start),0,5)." sec\n";
$start = microtime(true);
$out = '';
foreach( range(1,1000) as $i ) {
$out .= $str;
}
file_put_contents("out.txt",$out);
$end = microtime(true);
echo "elapsed_file_put_contents_2: ".substr(($end - $start),0,5)." sec\n";
this is a full example with 1000 files and elapsed time
with 1000 files
writing file_put_contens: elapsed: 194.4 sec
writing file_put_contens APPNED :elapsed: 37.83 sec ( 5x faster )
............
reading file_put_contens elapsed: 2.401 sec
reading append elapsed: 0.170 sec ( 14x faster )
$start = microtime(true);
$allow_argvs = array("gen_all","gen_few","read_all","read_few");
$arg = isset($argv[1]) ? $argv[1] : die("php ".$argv[0]." gen_all ( ".implode(", ",$allow_argvs).")");
if( !in_array($arg,$allow_argvs) ) {
die("php ".$argv[0]." gen_all ( ".implode(", ",$allow_argvs).")");
}
if( $arg=='gen_all' ) {
$dir_campain_all_files = "campain_all_files/";
if( !file_exists($dir_campain_all_files) ) die("\nFolder ".$dir_campain_all_files." not exist!\n");
$exists_campaings = false;
foreach( range(1,10) as $i ) { if( file_exists($dir_campain_all_files.$i) ) { $exists_campaings = true; } }
if( $exists_campaings ) {
die("\nDelete manualy all subfolders from ".$dir_campain_all_files." !\n");
}
build_campain_dirs($dir_campain_all_files);
// foreach in campaigns
foreach( range(1,10) as $i ) {
$campain_dir = $dir_campain_all_files.$i."/";
$nr_of_files = 1000;
foreach( range(1,$nr_of_files) as $f ) {
$file_name = $f.".txt";
$data_file = generateRandomString(4*1024);
$dir_file_name = $campain_dir.$file_name;
file_put_contents($dir_file_name,$data_file);
}
echo "campaing #".$i." done! ( ".$nr_of_files." files writen ).\n";
}
}
if( $arg=='gen_few' ) {
$delim_file = "###FILE###";
$delim_contents = "###FILE###";
$dir_campain = "campain_few_files/";
if( !file_exists($dir_campain) ) die("\nFolder ".$dir_campain_all_files." not exist!\n");
$exists_campaings = false;
foreach( range(1,10) as $i ) { if( file_exists($dir_campain.$i) ) { $exists_campaings = true; } }
if( $exists_campaings ) {
die("\nDelete manualy all files from ".$dir_campain." !\n");
}
$amount = 100; // nr_of_files_to_append
$out = ''; // here will be appended
build_campain_dirs($dir_campain);
// foreach in campaigns
foreach( range(1,10) as $i ) {
$campain_dir = $dir_campain.$i."/";
$nr_of_files = 1000;
$cnt_few=1;
foreach( range(1,$nr_of_files) as $f ) {
$file_name = $f.".txt";
$data_file = generateRandomString(4*1024);
$my_file_and_data = $file_name.$delim_file.$data_file;
$out .= $my_file_and_data.$delim_contents;
// append in a new file
if( $f%$amount==0 ) {
$dir_file_name = $campain_dir.$cnt_few.".txt";
file_put_contents($dir_file_name,$out,FILE_APPEND);
$out = '';
$cnt_few++;
}
}
// append remaning files
if( !empty($out) ) {
$dir_file_name = $campain_dir.$cnt_few.".txt";
file_put_contents($dir_file_name,$out,FILE_APPEND);
$out = '';
}
echo "campaing #".$i." done! ( ".$nr_of_files." files writen ).\n";
}
}
if( $arg=='read_all' ) {
$dir_campain = "campain_all_files/";
$exists_campaings = false;
foreach( range(1,10) as $i ) {
if( file_exists($dir_campain.$i) ) {
$exists_campaings = true;
}
}
foreach( range(1,10) as $i ) {
$campain_dir = $dir_campain.$i."/";
$files = getFiles($campain_dir);
foreach( $files as $file ) {
$data = file_get_contents($file);
$substr = substr($data, 100, 5); // read 5 chars after char100
}
echo "campaing #".$i." done! ( ".count($files)." files readed ).\n";
}
}
if( $arg=='read_few' ) {
$dir_campain = "campain_few_files/";
$exists_campaings = false;
foreach( range(1,10) as $i ) {
if( file_exists($dir_campain.$i) ) {
$exists_campaings = true;
}
}
foreach( range(1,10) as $i ) {
$campain_dir = $dir_campain.$i."/";
$files = getFiles($campain_dir);
foreach( $files as $file ) {
$data_temp = file_get_contents($file);
$explode = explode("###FILE###",$data_temp);
//#mkdir("test/".$i);
foreach( $explode as $exp ) {
$temp_exp = explode("###FILE###",$exp);
if( count($temp_exp)==2 ) {
$file_name = $temp_exp[0];
$file_data = $temp_exp[1];
$substr = substr($file_data, 100, 5); // read 5 chars after char100
//file_put_contents("test/".$i."/".$file_name,$file_data); // test if files are recreated correctly
}
}
//echo $file." has ".strlen($data_temp)." chars!\n";
}
echo "campaing #".$i." done! ( ".count($files)." files readed ).\n";
}
}
$end = microtime(true);
echo "elapsed: ".substr(($end - $start),0,5)." sec\n";
echo "\n\nALL DONE!\n\n";
/*************** FUNCTIONS ******************/
function generateRandomString($length = 10) {
$characters = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
$charactersLength = strlen($characters);
$randomString = '';
for ($i = 0; $i < $length; $i++) {
$randomString .= $characters[rand(0, $charactersLength - 1)];
}
return $randomString;
}
function build_campain_dirs($dir_campain) {
foreach( range(1,10) as $i ) {
$dir = $dir_campain.$i;
if( !file_exists($dir) ) {
mkdir($dir);
}
}
}
function getFiles($dir) {
$arr = array();
if ($handle = opendir($dir)) {
while (false !== ($file = readdir($handle))) {
if ($file != "." && $file != "..") {
$arr[] = $dir.$file;
}
}
closedir($handle);
}
return $arr;
}

interrupt a function by other function

i have a symfony job that have two function:
launch and stop.
My launch function will import contacts,for example 4 by 4 from database and send to all of them messages.
public function launchAction()
{
$offset = 0;
$limit = 4;
$sizeData /= $limit;
for( $i = 0; $i < $sizeData; $i++)
{
$contacts = $repository->getListByLimit($offset, $limit);
$sender->setContacts($contacts);
$sender->send();
$offset += $limit;
}
}
when i launched my launch function it will take for example 20 seconds to import and send the message to all contacts
but if i want to stop it,how can the stop function interrupt the launch function
public function stopAction()
{
}
i will not fully answer but give you two hints how it could work
1:
save a file with process id on launch()
on stop() you could check for existence and kill the process by id
2:
on launch() you can check for a specific db-entry in loop so it breaks if value is present
on stop you set this db entry
If your only purpose is to be able to stop the script, you don't need a full event loop implementation I think. You can listen to a local socket, and break when you receive data.
You could for example run this in launchAction
public function launchAction()
{
$offset = 0;
$limit = 4;
$sizeData /= $limit;
// Init IPC connection
$server = stream_socket_server("tcp://127.0.0.1:1337", $errno, $errorMessage);
if ($server === false) {
throw new UnexpectedValueException("Could not bind to socket: $errorMessage");
}
for( $i = 0; $i < $sizeData; $i++)
{
// Check our socket for data
$client = #stream_socket_accept($server);
if ($client) {
// Read sent data
$data = fread($client, 1024);
// Probably break
if ($data === 'whatever') {
break;
}
}
$contacts = $repository->getListByLimit($offset, $limit);
$sender->setContacts($contacts);
$sender->send();
$offset += $limit;
}
// Close socket after sending all messages
fclose($client);
}
And stopAction could hit the socket to terminate the connection like so:
public function stopAction()
{
$socket = stream_socket_client('tcp://127.0.0.1:1337');
fwrite($socket, 'whatever');
fclose($socket);
}
This should work if you run both functions on the same machine. Also note that PHP can only listen to sockets which are not occupied already. So you might need to change the port number. And in case you start a second process to send messages in parallel, the new one will not be able to bind to the same socket.
A great blogpost explaining some detail is https://www.christophh.net/2012/07/24/php-socket-programming/
If however you wish to start a long running process, I suggest you take a look at ReactPHP, which is an excellent event loop implementation that runs on several different setups. It also includes timers, and other useful libs.
You might want to take a look at this blogpost series, to get an idea https://blog.wyrihaximus.net/2015/01/reactphp-introduction/

PHP Function Timeout

I'm not an expert with PHP. I have a function which uses EXEC to run WINRS whcih then runs commands on remote servers. The problem is this function is placed into a loop which calls getservicestatus function dozens of times. Sometimes the WINRS command can get stuck or take longer than expected causing the PHP script to time out and throw a 500 error.
Temporarily I've lowered the set timeout value in PHP and created a custom 500 page in IIS and if the referring page is equal to the script name then reload the page (else, throw an error). But this is messy. And obviously it doesn't apply to each time the function is called as it's global. So it only avoids the page stopping at the HTTP 500 error.
What I'd really like to do is set a timeout of 5 seconds on the function itself. I've been searching quite a bit and have been unable to find an answer, even on stackoverflow. Yes, there are similar questions but I have not been able to find any that relate to my function. Perhaps there's a way to do this when executing the command such as an alternative to exec()? I don't know. Ideally I'd like the function to timeout after 5 seconds and return $servicestate as 0.
Code is commented to explain my spaghetti mess. And I'm sorry you have to see it...
function getservicestatus($servername, $servicename, $username, $password)
{
//define start so that if an invalid result is reached the function can be restarted using goto.
start:
//Define command to use to get service status.
$command = 'winrs /r:' . $servername . ' /u:' . $username . ' /p:' . $password . ' sc query ' . $servicename . ' 2>&1';
exec($command, $output);
//Defines the server status as $servicestate which is stored in the fourth part of the command array.
//Then the string "STATE" and any number is stripped from $servicestate. This will leave only the status of the service (e.g. RUNNING or STOPPED).
$servicestate = $output[3];
$strremove = array('/STATE/','/:/','/[0-9]+/','/\s+/');
$servicestate = preg_replace($strremove, '', $servicestate);
//Define an invalid output. Sometimes the array is invalid. Catch this issue and restart the function for valid output.
//Typically this can be caught when the string "SERVICE_NAME" is found in $output[3].
$badservicestate = "SERVICE_NAME" . $servicename;
if($servicestate == $badservicestate) {
goto start;
}
//Service status (e.g. Running, Stopped Disabled) is returned as $servicestate.
return $servicestate;
}
The most straightforward solution, since you are calling an external process, and you actually need its output in your script, is to rewrite exec in terms of proc_open and non-blocking I/O:
function exec_timeout($cmd, $timeout, &$output = '') {
$fdSpec = [
0 => ['file', '/dev/null', 'r'], //nothing to send to child process
1 => ['pipe', 'w'], //child process's stdout
2 => ['file', '/dev/null', 'a'], //don't care about child process stderr
];
$pipes = [];
$proc = proc_open($cmd, $fdSpec, $pipes);
stream_set_blocking($pipes[1], false);
$stop = time() + $timeout;
while(1) {
$in = [$pipes[1]];
$out = [];
$err = [];
stream_select($in, $out, $err, min(1, $stop - time()));
if($in) {
while(!feof($in[0])) {
$output .= stream_get_contents($in[0]);
break;
}
if(feof($in[0])) {
break;
}
} else if($stop <= time()) {
break;
}
}
fclose($pipes[1]); //close process's stdout, since we're done with it
$status = proc_get_status($proc);
if($status['running']) {
proc_terminate($proc); //terminate, since close will block until the process exits itself
return -1;
} else {
proc_close($proc);
return $status['exitcode'];
}
}
$returnValue = exec_timeout('YOUR COMMAND HERE', $timeout, $output);
This code:
uses proc_open to open a child process. We only specify the pipe for the child's stdout, since we have nothing to send to it, and don't care about its stderr output. if you do, you'll have to adjust the following code accordingly.
Loops on stream_select(), which will block for a period up to the $timeout set ($stop - time()).
If there is input, it will var_dump() the contents of the input buffer. This won't block, because we have stream_set_blocking($pipe[1], false) on the pipe. You will likely want to save the content into a variable (appending it rather than overwriting it), rather than printing out.
When we have read the entire file, or we have exceeded our timeout, stop.
Cleanup by closing the process we have opened.
Output is stored in the pass-by-reference string $output. The process's exit code is returned, or -1 in the case of a timeout.

Categories