Echoing large string in PHP results in no output at all

Echoing large string in PHP results in no output at all - php

I am helping to build a Joomla site (using Joomla 1.5.26). One of the pages are really really big. As a result, PHP just stops working without any error and all previously printed strings are ignored. There is no output at all. We have display_errors set to TRUE and error_reporting set to E_ALL.
I found the exact line where PHP breaks. It's in libraries/joomla/application/component/view.php:196
function display($tpl = null)
{
$result = $this->loadTemplate($tpl);
if (JError::isError($result)) {
return $result;
}
echo $result;
}
Some information:
Replacing echo $result; with echo strlen($result); works. The length of the string is 257759.
echo substr($result, 0, 103396); is printing partial content.
echo substr($result, 0, 103397); results in no output at all.
echo substr($result, 0, 103396) . "A"; results in no output at all. So splitting string into chunks is not a solution.
I have checked server performance during the execution of the script. CPU usage is 100% but there's plenty of memory left. PHP memory limit is 1024M. output_buffering is 4096 but I tried setting it to unreasonably high number - dies at exact same position. Server runs Apache 2.2.14-5ubuntu8.10 and PHP 5.3.2-1ubuntu4.18. PHP runs as fast_cgi module.
I have never experienced something like that and Google search results in nothing also. Have any of you experienced something like that and know the solution?
Thanks for reading!

Maybe try exploding the string and looping through each line.
You could also try this, found on php.net - echo:
<?php
function echobig($string, $bufferSize = 8192)
{
// suggest doing a test for Integer & positive bufferSize
for ($chars = strlen($string)-1, $start = 0;$start <= $chars; $start += $bufferSize) {
echo substr($string, $start, $bufferSize);
}
}
?>
Basically, it seems echo can't handle such large data in one call. Breaking it up somehow should get you where you need to go.

what about try using print_r rather than echo
function display($tpl = null)
{
$result = $this->loadTemplate($tpl);
if (JError::isError($result)) {
return $result;
}
print_r($result);
}

I have tested this on the CLI and it works fine with PHP 5.4.11 and 5.3.15:
$str = '';
for ($i=0;$i<257759;$i++) {
$str .= 'a';
}
echo $str;
It seems a reasonable assumption that PHP itself works fine, but that the output buffer is too large for Apache/fast_cgi. I would investigate the Apache config further. Do you have any special Apache settings?

May be that?
Try something like this
php_flag output_buffering On
Or try to turn on gzip in Joomla!
Or use nginx as reverse proxy or standalone server :^ )

It seems I solved the problem by myself. It was somewhat unexpected thing - faulty HTML formatting. We use a template for order page and inside there is a loop which shows all ordered products. When there were a few products, everything worked great but when I tried to do the same with 40 products, the page did break.
However I still don't understand why the server response would be empty with status code 200.
Thanks for answers, everybody!

Related

Validating 1 million+ values with PHP and a client side programming language on an online form

I've been trying to validate over 1 million randomly generated values (strings) with PHP and a client side programming language on an online form, but there are a few challenges I'm facing:
PHP
Link to the (editable) PHP code:https://3v4l.org/AtTkO
The PHP code:
<?php
function generateRandomString($length = 10) {
$characters = '0123456789abcdefghijklmnopqrstuvwxyz-_.';
$charactersLength = strlen($characters);
$randomString = '';
for ($i = 0; $i < $length; $i++) {
$randomString .= $characters[rand(0, $charactersLength - 1)];
}
return $randomString;
}
$unique = array();
for ($i = 0; $i < 9000000; $i++)
{
$u=$i+1;
$random = generateRandomString(5);
if(!in_array($random, $unique)){
echo $u.".m".$random."#[server]\n";
$unique[] = $random;
gc_collect_cycles();
}else{
echo "duplicate detected";
$i--;
}
}
echo memory_get_peak_usage();
What should happen:
New 5 character value gets randomly generated
Value gets checked if it already exists in the array
Value gets added to array
All randomly generated values are exported to a .txt file to be used for validating. (Not in the script yet)
What actually happens:
I hit either a memory usage limit or a server timeout for the execution time.
What I've tried
I've tried using sleep(3) during the for loop.
Setting Memory limit to -1 and timeout to 0. The unlimited memory doesn't make a difference and is too dangerous in a working environment.
Using gc_collect_cycles() during the for loop
Using echo memory_get_peak_usage(); -> I don't really understand
how I could use this for debugging.
What I need help with:
Memory management in PHP
Having pauses in the script that will reset the PHP execution timer
Client Side Programming language
This is where I have absolutely no clue which way I should go or which programming language I should use for this.
What I want to achieve
Load a webpage that has a form
Load the .txt with all randomly generated strings
fill in the form with the first string
submit the form:
If positive response from form > save string in special .txt file or array, go to the next value
If negative response from form > delete string from file, go to the next value | or just go to the next value
All values with a positive response are filtered out and easily accessible at the end.
I don't know which programming language I should use for this function. I've been thinking about Javascript and Python but I'm not sure how I could combine that with PHP. A nudge in the right direction would be appreciated.
I might be completely wrong for trying to achieve this with PHP, if so, please let me know what would be the better and easier option.
Thanks!

Interesting question, first of all whenever you think of a solution like this, one of the first things you need to consider is can it be async? If your answer is yes, then your implementation will likely be simple, else, you will likely have to pay huge server costs or render random cached results.
NB remove gc_collect_cycles. It does the opposite of what you want, and you hardly ever need to call it manually.
That being said, the approach I would recommend in your case is as follows:
Use a websocket which will be opened only once on the client browser, and then forward results in realtime from server to the browser. Of course, this code itself, can run completely on clientside via javascript, so if it's not just a PoC, you can convert the php code to javascript.
Change your code to yield items or forward results via websocket once a generated code has been confirmed as unique.
However, if you're really just doing only what the PHP code says, you can do that completely in javascript and save your server resources. See this answer for an example code to replace your generateRandomString function.

Assuming you have the ability to edit the php.ini:
Increase your memory limit as described here:
PHP MEMORY LIMIT INCREASE

For the 'memory limit' see here
and for the 'timeout for the execution time' add :
set_time_limit(0);
on the top of the PHP file.

Have you tried using sets? https://www.php.net/manual/en/class.ds-set.php
Sets are very efficient whenever you want to ensure a value isn't present twice.
Checking the presence of a value in a set it way way way faster that loop across all entries on the array.
I'm not a expert with PHP but it would look like something like that in Ruby
require 'set'
CHARS = '0123456789abcdefghijklmnopqrstuvwxyz-_.'.split('');
unique = Set.new()
def generateRandomString(l = 10)
Array.new(l) { CHARS.sample }.join
end
while unique.length < 1_000_000
random_string = generateRandomString
if !unique.include?(random_string)
unique.add(random_string)
end
end
hope it helps

Running out of memory always on the same line

First of all, I am not looking for an answer saying "Check your PHP memory limit" or "You need to add more memory" or these kind of stuff ... I am on a dedicated machine, with 8GB of RAMS; 512MB of it is the memory limit. I always get an out of memory error on one single line :
To clarify: This part of the code belongs to Joomla! CMS.
function get($id, $group, $checkTime){
$data = false;
$path = $this->_getFilePath($id, $group);
$this->_setExpire($id, $group);
if (file_exists($path)) {
$data = file_get_contents($path);
if($data) {
// Remove the initial die() statement
$data = preg_replace('/^.*\n/', '', $data); // Out of memory here
}
}
return $data;
}
This is a part of Joomla's caching ... This function read the cache file and remove the first line which block direct access to the files and return the rest of the data.
As you can see the line uses a preg_replace to remove the first line in the cache file which is always :
<?php die("Access Denied"); ?>
My question is, it seems to me as a simple process (removing the first line from the file content) could it consume a lot of memory if the initial $data is huge? if so, what's the best way to work around that issue? I don't mind having the cache files without that die() line I can take of security and block direct access to the cache files.
Am I wrong?
UPDATE
As suggested by posts, regex seems to create more problems than solving them. I've tried:
echo memory_get_usage() . "\n";
after the regex then tried the same statement using substr(). The difference is very slight in memory usage. Almost nothing.
That's for your contributions, I am still trying to find out why this happen.

Use substr to avoid the memory hungry preg_replace() , like this:
$data = substr($data, strpos($data, '?>') + 3);
As a general advice don't use regular expressions if you can do the same task by using other string/array functions, regular expression functions are slower and consume more memory than the core string/array functions.
This is explicitly warned in PHP docs too, see some examples:
http://www.php.net/manual/en/function.preg-match.php#refsect1-function.preg-match-notes
http://www.php.net/manual/en/function.preg-split.php#refsect1-function.preg-split-notes

don't use a string function to replace something in a huge string. You can cycle through the lines of a file and just break after you have found what your looking for.
check the PHP docs here:
http://php.net/manual/en/function.fgets.php
basically what #cbuckley just said :p

If You just want to remove the first line of a file and return the rest, you should make use of file:
$lines = file($path);
array_shift($lines);
$data = implode("\n", $lines);

In stead of using file_get_contents() that gets the entire file at once, which can be too big to run regexps on, you should use fopen() in combination with fgets() (http://php.net/fgets). This function gets the file line by line.
You can then choose to do a regexp on a specific line. Or in your case just skip the entire line.
So in stead of:
$data = file_get_contents($path);
if($data) {
// Remove the initial die() statement
$data = preg_replace('/^.*\n/', '', $data); // Out of memory here
}
try this:
$fileHandler = fopen($path,'r');
$lineNumber = 0;
while (($line = fgets($fileHandler)) !== false) {
if($lineNumber++ != 0) { // Skip the initial die() statement
$data .= $line; // or maybe echo out $line directly so $data doesn't take up too much memory as well.
}
}

I suggest you use this in the file including the cached files:
define('INCLUDESALLOW', 1);
and in the file that will be included:
if( !defined('INCLUDESALLOW') ) die("Access Denied");
Then just use include instead of file_get_contents. This would run PHP code in the included though, not 100% sure if that is what you need.

There times that you will use more memory than the 8 MB php has allotted. If your unable to use less memory by making your code more efficient, you might have to increase your available memory. This can be done in two ways.
The limit can be set to a global default in php.ini:
memory_limit = 32M
Or you can override it in your script like this:
<?php
ini_set('memory_limit', '64M');
...
For more on PHP memory limit you can see This SO question or ini.memory-limit.

Am I using preg_replace correctly (PHP)?

I think I have this right but I would like someone to verify that.
function storageAmmount()
{
system('stat /web/'.$user."/ | grep Size: > /web/".$user."/log/storage");
$storage = fopen('/web/'.$user.'/log/storage');
$storage = str_replace('Size:', " ", $storage);
$storage = preg_replace('Blocks:(((A-Z), a-z), 1-9)','',$storage);
}
This is the line in the text file:
Size: 4096 Blocks: 8 IO Block: 4096 directory
I am trying to get just the numeric value the proceeds "Size: ", the word Size: and everything else is usless to me.
I am mainly looking at the preg_replace. Is it just me or is regex a tad bit confusing? Any thoughts. Thanks for any help in advance.
Cheers!,
Phill
Ok,
Here is what the function looks like now:
function storageAmmount()
{
$storage = filesize('/web/'.$user.'/');
$final = $storage/(1024*1024*1024);
return $final;
}
Where would I put the number_format(), I am not really sure if it would go in the equation or in the return statement. I have tred it in both an all it returns is "0.00".
V1.
function storageAmmount()
{
$storage = filesize('/web/'.$user.'/');
$final = number_format($storage/(1024*1024*1024), 2);
return $final;
}
or V2.
function storageAmmount()
{
$storage = filesize('/web/'.$user.'/');
$final = $storage/(1024*1024*1024);
return number_format($final, 2);
}
neither work and they both return "0.00". Any thoughts?

Looks like you are trying to get the size of the file in bytes. If so why not just use the filesize function of PHP which takes the file name as its argument and returns the size of the file in bytes.
function storageAmmount(){
$storage = filesize('/web/'.$user);
}

No, you're not using preg_replace properly.
There's a lot of misunderstanding in your code; to correct it would mean I'd have to explain the basics of how Regex works. I really recommend reading a few primers on the subject. There's a good one here: http://www.regular-expressions.info/
In fact, what you're trying to do here with the str_replace and the preg_replace together would be better achieved using a single preg_match.
Something like this would do the trick:
$matches = preg_match('/Size: (\d+)/',$storage);
$storage = $matches[1];
The (\d+) picks up any number of digits and puts them into an element in the $matches array. Putting Size: in front of that forces it to only recognise the digits that are immediately after Size: in your input string.
If your input string is consistently formatted in the way you described, you could also do it without using any preg_ functions at all; just explode() on a space character and pick up the second element. No regex required.

The best usage of regex is
// preg_match solution
$storage = 'Size: 4096 Blocks: 8 IO Block: 4096 directory';
if (preg_match('/Size: (?P<size>\d+)/', $storage, $matches)) {
return matches['size'];
}
But if you are doing it localy, you can use th php function stat
// php stat solution
$f = escapeshellcmd($user);
if ($stat = #stat('/web/'.$f.'/log/storage')) {
return $stat['size'];
}

Bearing in mind the fact that you're already using string manipulation (don't quite get why - a single regular expression could handle it all), I don't know why you don't proceed down this path.
For example using explode:
function storageAmount($user) {
system(escapeshellcmd('stat /web/'.$user.'/ | grep Size: > /web/'.$user.'/log/storage'));
$storageChunks = explode(' ', file_get_contents('/web/'.$user.'/log/storage'));
return $storageChunks[1];
}
Incidentally:
The $user variable doesn't exist within the scope of your function - you either need to pass it in as an argument as I've done, or make it a global.
You should really use escapeshellcmd on all commands passed to system/exec, etc.
You're using fopen incorrectly. fopen returns a resource which you then need to read from via fread, etc. prior to using fclose. That said, I've replaced this with file_get_contents which does the open/read/close for you.
I'm not really sure what you're attempting to do with the system command, but I've left it as-is. However, you could just get the result of the grep back directly as the last string output rather than having to output it to a file. (This is what the system command returns.) You're also mixing ' and " as string delimiters - this won't work. (Just use one consistently.)
I suspect you actually want to the final line of "df --si /web/'.$user.'/'" command as otherwise you'll always be returning the value 4096.

have you tried
$storage = preg_replace('Block:.*','',$storage)
?
Even better would be to use
function storageAmmount()
{
exec('stat /web/'.$user.'/ | grep Size:',$output);
preg_match("/Size: ([0-9]*).*/",$output[0],$matches);
return $matches[1];
}
(tested on my machine)

PHP code scraping a URL suddenly stopped working

$url = 'the web address I want to get the first and second numbers close to $' ;
$str = file_get_contents($url);
preg_match_all('/ ([$]) *(\d+(:?.\d+)?)/', $str, $matches, PREG_SET_ORDER);
$i=0;
foreach ($matches as $val) {
if($i==0) $first=$val[2] ;
if($i==3) $second=$val[2] ;
$i++;
}
$bad_symbols = array(",", "."); $first = str_replace($bad_symbols, "", $first);
$bad_symbols = array(",", "."); $second = str_replace($bad_symbols, "", $second);
echo $first . "</br>";
echo $second;
it worked fine till yesterday
what could be the problem?

I see at least two possible explanations :
The HTML of the site has changed ; maybe only a little bit -- but enough to get you in trouble.
You could test for the return value of preg_match_all
if it's false, it means your regex didn't match -- which may indicate the content of the HTML pageis not the same...
Then, you might have to modify your regex
The admin of the server (or it can be done in the code generating the page) has banned you
Maybe the website has detected it was scraped by you (either because you were going too hard on their server, or they saw their content on your site)
And they banned your IP (for instance)
To detect that, try to get the return value of file_get_contents ; if it's false, it might be the cause of the problem
Can you try getting that HTML page from your server, using wget in command-line ?
A third one, as suggestd by others : maybe the configuration of your server has changed, and you can't use file_get_content over HTTP anymore...
A solution would be to use curl, for instance
Check in your configuration the allow_url_fopen directive
If you activate error_reporting (see also), you might also get some informations that could prove usefull...

Maybe system administrator has changed allow_url_fopen directive, that means you can't access files that are not on your server. Check what file_get_contents() returns, because you gave us very little information about error.
Another problem, as mentioned above, could be that remote site has been changed :)

PHP echo performance

I'm always using an output variable in PHP where I gather all the content before I echo it. Then I read somewhere (I don't remember where, though) that you get best performance if you split the output variable into packets and then echo each packet instead of the whole output variable.
How is it really?

If you are outputting really big strings with echo, it is better to use multiple echo statements.
This is because of the way Nagle's algorithm causes data to be buffered over TCP/IP.
Found an note on Php-bugs about it:
http://bugs.php.net/bug.php?id=18029

This will automatically break up big strings into smaller chunks and echo them out:
function echobig($string, $bufferSize = 8192) {
$splitString = str_split($string, $bufferSize);
foreach($splitString as $chunk) {
echo $chunk;
}
}
Source: http://wonko.com/post/seeing_poor_performance_using_phps_echo_statement_heres_why

I think a better solution is presented here ....
http://wonko.com/post/seeing_poor_performance_using_phps_echo_statement_heres_why#comment-5606
........
Guys, I think I narrowed it down even further!
As previously said, PHP buffering will let PHP race to the end of your script, but after than it will still “hang” while trying to pass all that data to Apache.
Now I was able, not only to measure this (see previous comment) but to actually eliminate the waiting period inside of PHP. I did that by increasing Apache’s SendBuffer with the SendBufferSize directive.
This pushes the data out of PHP faster. I guess the next step would be to get it out of Apache faster but I’m not sure if there is actually another configurable layer between Apache and the raw network bandwidth.

This is my version of the solution, it echos only if the connection is not aborted. if the user disconnects then the function exits.
<?php
function _echo(){
$args = func_get_args();
foreach($args as $arg){
$strs = str_split($arg, 8192);
foreach($strs as $str){
if(connection_aborted()){
break 2;
}
echo $str;
}
}
}
_echo('this','and that','and more','and even more');
_echo('or just a big long text here, make it as long as you want');

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Echoing large string in PHP results in no output at all - php

what about try using print_r rather than echo function display($tpl = null) { $result = $this->loadTemplate($tpl); if (JError::isError($result)) { return $result; } print_r($result); }

May be that? Try something like this php_flag output_buffering On Or try to turn on gzip in Joomla! Or use nginx as reverse proxy or standalone server :^ )

Related

Validating 1 million+ values with PHP and a client side programming language on an online form

Running out of memory always on the same line

Am I using preg_replace correctly (PHP)?

PHP code scraping a URL suddenly stopped working

PHP echo performance

Categories

Resources