xml_parse No memory error PHP - php

I've got a strange bug when using xml_parse. My script returns "No memory" error on last line of XML file by xml_parse function. This only happens when size of file bigger then 10Mb. Less is acceptable. But I have 3Gb avilable for PHP script and total memory is 32Gb.
This script used to be working while it was working on another server (with 2Gb for PHP and 16Gb total) and it worked with even bigger files. But it was FreeBSD, now it is under CentOS 6.4.
May be somebody has same situation?

There is a limit hardcoded in libxml "LIBXML_PARSEHUGE"
Check http://php.net/manual/en/libxml.constants.php for details.
But you don't need to downgrade libxml. Just change the way you call xml_parse.
For example, with a file which exceed 10MB, this way doesn't work :
$fileContent = file_get_contents("/tmp/myFile.xml");
if (!xml_parse($this->xmlParser, $fileContent, true))
{
$xmlErreurString = xml_error_string(xml_get_error_code($xmlParser));
}
But if you read your file 5 by 5MB, it's ok :
$xmlParser = xml_parser_create();
$fp = fopen("/tmp/myFile.xml", "r");
while($fileContent = fread($fp, 1024*1024*5))
{
if (!xml_parse($xmlParser, $fileContent, feof($fp)))
{
$xmlErreurString = xml_error_string(xml_get_error_code($xmlParser));
}
}

Problem is solved by downgrading of libxml. Due to our framework - Symfony 1.4 we need to use PHP 5.2.17 and libxml was last version. After downgrade everything is ok.

There is a better answer to this, apparently, as outlined here XML_PARSE_HUGE on function simplexml_load_string()
You need to set the constant LIBXML_PARSEHUGE to bypass the restriction:
$xmlDoc->loadXML( $xml , LIBXML_PARSEHUGE );
Thanks to #Vaclav Kohout for this usage note.

Related

file_put_contents truncates content to max int on 32-bit php

I have nextcloud running on my Raspberry Pi 4, which uses 32-bit architecture.
When trying to upload a file larger then 2147483647 bytes, the file is uploaded completely and is accessible through ssh. However when I try to access it in any way through the webclient it fails. The error seen in the webclient's logging is the following:
file_put_contents(): content truncated from 4118394086 to 2147483647 bytes at /var/www/html/nextcloud/lib/private/Files/Storage/Local.php#556
When I try to access the file this error message is logged:
Sabre\DAV\Exception\RequestedRangeNotSatisfiable: The start offset (0) exceeded the size of the entity (-176573210)
The file in question here is a .mp4 file, however i have been able to replicate the issue with other filetypes.
I have read that the 2GB upload limit for 32-bit architectures has been fixed, however I don't know why it might fail in my case.
Problem
Well you cant get around this by tweaking any config, since its a hard limit set by PHP (PHP_INT_MAX on 32-Bit architecure is 2G (2^(32-1)-1))
There is hope
You can patch manually or even better override the responsible nextcloud code:
patch manually (since you are not using composer this is what you probably wanna do)
// this one is pretty memory expensive, but works with resouce and string
// Test: 4GB file, 2GB chunks (at 32bits)
// 12GB memory usage! - hell no
public function file_put_contents($path, $data) {
$bytesWritten = 0;
foreach (explode(PHP_EOL, chunk_split($data, PHP_INT_MAX, PHP_EOL)) as $chunk) {
$bytesWritten += file_put_contents($this->getSourcePath($path), $chunk, FILE_APPEND|LOCK_EX);
}
return $bytesWritten;
}
or
// better use this, in case $data is a resource - I dont know, you have to test it!
// Test: 4GB file, 1MB chunks
// 2MB memory usage - much better :)
public function file_put_contents($path, $data) {
$bytesWritten = 0;
while ($chunk = fread($data, 2**20)) {
$bytesWritten += file_put_contents($this->getSourcePath($path), $chunk, FILE_APPEND|LOCK_EX);
}
return $bytesWritten;
}
In case you want to override (composer)
class PatchedLocal extends OC\Files\Storage\Local {
public function file_put_contents($path, $data) {
// same as above ...
}
}
And here everything you need to know to force the autoloader to use your PatchedLocal. - As mentioned, you want to use composers PSR-4 implementation for this - via composer.json.

xml_parse huge file PHP

I have a issue with PHP function xml_parse. It's not working with huge files - I have xml file with 10MB size.
Problem is, that I have old XML-RPC library from Zend and there are another functions (element handlers and case folding).
$parser_resource = xml_parser_create('utf-8');
xml_parser_set_option($parser_resource, XML_OPTION_CASE_FOLDING, true);
xml_set_element_handler($parser_resource, 'XML_RPC_se', 'XML_RPC_ee');
xml_set_character_data_handler($parser_resource, 'XML_RPC_cd');
if (!xml_parse($parser_resource, $data, 1)) {
// ends here with 10MB file
}
On another place, I just use siple_load_xml_file with option LIBXML_PARSEHUGE, but in this case I don't know what can I do.
Best way will be, if function xml_parse will have some parameter for huge files too.
Thank you for your advices
Error is:
XML error: No memory at line ...
The chunk length of file to parse could be to huge.
if you use fread
while ($data = fread($fp, 1024*1024)) {...}
use smaller length (at my case it has to be smaller than 10 MB) e.g. 1MB and put the xml_parse function in the while loop.

Validating a large XML file ~400MB in PHP

I have a large XML file (around 400MB) that I need to ensure is well-formed before I start processing it.
First thing I tried was something similar to below, which is great as I can find out if XML is not well formed and which parts of XML are 'bad'
$doc = simplexml_load_string($xmlstr);
if (!$doc) {
$errors = libxml_get_errors();
foreach ($errors as $error) {
echo display_xml_error($error);
}
libxml_clear_errors();
}
Also tried...
$doc->load( $tempFileName, LIBXML_DTDLOAD|LIBXML_DTDVALID )
I tested this with a file of about 60MB, but anything a lot larger (~400MB) causes something which is new to me "oom killer" to kick in and terminate the script after what always seems like 30 secs.
I thought I may need to increase the memory on the script so figured out the peak usage when processing 60MB and adjusted it accordingly for a large and also turn the script time limit off just in case it was that.
set_time_limit(0);
ini_set('memory_limit', '512M');
Unfortunately this didn't work, as oom killer appears to be a linux thing that kicks in if memory load (even the right term?) is consistently high.
It would be great if I could load xml in chunks somehow as I imagine this will reduce the memory load so that oom killer doesn't stick it's fat nose in and kill my process.
Does anyone have any experience validating a large XML file and capturing errors of where it's badly formed, a lot of posts I've read point to SAX and XMLReader that might solve my problem.
UPDATE
So #chiborg pretty much solved this issue for me...the only downside to this method is that I don't get to see all of the errors in the file, just the first that failed which I guess makes sense as I think it can't parse past the first point that fails.
When using simplexml...it's able to capture most of the issues in the file and show me at the end which was nice.
Since the SimpleXML and DOM APIs will always load the document into memory, using a streaming parser like SAX or XMLReader is the better approach.
Adpating the code from the example page, it could look like this:
$xml_parser = xml_parser_create();
if (!($fp = fopen($file, "r"))) {
die("could not open XML input");
}
while ($data = fread($fp, 4096)) {
if (!xml_parse($xml_parser, $data, feof($fp))) {
$errors[] = array(
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser));
}
}
xml_parser_free($xml_parser);
For big file, perfect use XMLReader class.
But if liked simplexml syntax: https://github.com/dkrnl/SimpleXMLReader/blob/master/library/SimpleXMLReader.php
Usage example: http://github.com/dkrnl/SimpleXMLReader/blob/master/examples/example1.php

Issue to determine a currently downloading file size?

I have an interesting problem. I need to do a progress bar from an asycronusly php file downloading. I thought the best way to do it is before the download starts the script is making a txt file which is including the file name and the original file size as well.
Now we have an ajax function which calling a php script what is intended to check the local file size. I have 2 main problems.
files are bigger then 2GB so filesize() function is out of business
i tried to find a different way to determine the local file size like this:
.
function getSize($filename) {
$a = fopen($filename, 'r');
fseek($a, 0, SEEK_END);
$filesize = ftell($a);
fclose($a);
return $filesize;
}
Unfortunately the second way giving me a tons of error assuming that i cannot open a file which is currently downloading.
Is there any way i can check a size of a file which is currently downloading and the file size will be bigger then 2 GB?
Any help is greatly appreciated.
I found the solution by using an exec() function:
exec("ls -s -k /path/to/your/file/".$file_name,$out);
Just change your OS and PHP to support 64 bit computing. and you can still use filesize().
From filesize() manual:
Return Values
Returns the size of the file in bytes, or FALSE (and generates an
error of level E_WARNING) in case of an error.
Note: Because PHP's integer type is signed and many platforms use
32bit integers, some filesystem functions may return unexpected
results for files which are larger than 2GB.

is php file_get_contents() enough for downloading movies?

is file_get_contents() enough for downloading remote movie files located on a server ?
i just think that perhaps storing large movie files to string is harmful ? according to the php docs.
OR do i need to use cURL ? I dont know cURL.
UPDATE: these are big movie files. around 200MB each.
file_get_contents() is a problem because it's going to load the entire file into memory in one go. If you have enough memory to support the operation (taking into account that if this is a web server, you may have multiple hits that generate this behavior simultaneously, and therefore each need that much memory), then file_get_contents() should be fine. However, it's not the right way to do it - you should use a library specifically intended for these sort of operations. As mentioned by others, cURL will do the trick, or wget. You might also have good luck using fopen('http://someurl', 'r') and reading blocks from the file and then dumping them straight to a local file that's been opened for write privileges.
As #mopoke suggested it could depend on the size of the file. For a small movie it may suffice. In general I think cURL would be a better fit though. You have much more flexibility with it than with file_get_contents().
For the best performance you may find it makes sense to just use a standard unix util like WGET. You should be able to call it with system("wget ...") or exec()
http://www.php.net/manual/en/function.system.php
you can read a few bytes at a time using fread().
$src="http://somewhere/test.avi";
$dst="test.avi";
$f = fopen($src, 'rb');
$o = fopen($dst, 'wb');
while (!feof($f)) {
if (fwrite($o, fread($f, 2048)) === FALSE) {
return 1;
}
}
fclose($f);
fclose($o);

Categories