PHP ZipArchive fails to extract CSV files properly - php

A real head scratcher this one - any help would be gratefully received.
I have been using the zipArchive library to extract csv files from a zip.
Oddly, it will only extract 40 files properly. Files with an index 40 or greater appear as empty files, files 0-39 extract perfectly.
This is the case regardless of the combination of files and the size of the files. I have tried removing the 39th file and the 40th file from the zip and the problem just moves. No matter what combination of files I use, it extracts 40 files properly and then just dies.
Thanks to this forum, I have tried using Shell Exec with exactly the same outcome.
I have also tried extracting the files one at a time, using a zip with only the csv files and zips with multiple different file types. Always only 40 are extracted.
This is such a suspiciously round number that it must surely be a setting somewhere that I cannot find or otherwise a bug.
For what it is worth, the unzipping code is below:
$zip = new ZipArchive;
if ($zip->open('Directory/zipname.zip') == TRUE) {
for ($i = 0; $i < $zip->numFiles; $i++) {
$filename = $zip->getNameIndex($i);
if(substr(strrchr($filename,'.'),1,3)=="csv")
{
$zip->extractTo('Directory/',$filename);
}
}
}
I have also tried the following which uses a different method with the same results :-(
$zip2 = new ZipArchive;
if ($zip2->open('Directory/zipname.zip') == TRUE) {
for ($i = 0; $i < $zip2->numFiles; $i++) {
$filename = $zip2->getNameIndex($i);
if(substr(strrchr($filename,'.'),1,3)=="csv")
{
$content=$zip2->getFromIndex($i);
$thefile=fopen("directory/filename","w");
fwrite($thefile,$content);
fclose($thefile);
}
}
}

FINALLY found the answer. Thanks to all who tried to help.
For others suffering in the same way, the problem was solved by increasing the server disk allocation. I was on a rather old plan which had served well until the advent of a new national database that increased the amount of storage 10 fold.
A measly 100MB allowance meant that the server would only do so much before spitting the dummy.
Interestingly, a similar problem occurred with trying other file operations - it seemed to be limited to 40 file operations per script, regardless of the size of each file.

Related

Php cannot see many files included in a zip file

I'm reading files that are located inside a .zip archive (in this case an XBRL taxonomy) using PHP 7 and its ZipArchive functions, but a whole lot of files that I am sure are inside are simply skipped, ignored as they do not exist.
This is the result of running zipinfo www.eba.europa.eu.zip on the file
https://www.dropbox.com/s/336njdmfg8uaho8/output-zipinfo.txt?dl=0
This is the result of reading the content of the zip using the following code:
$zip = new \ZipArchive();
$zip->open("www.eba.europa.eu.zip");
for ($i = 0; $i < $zip->numFiles; $i++) {
echo 'Filename: ' . $zip->getNameIndex($i) . PHP_EOL;
}
https://www.dropbox.com/s/7njkp5i92d68fxs/output-ziparchive.txt?dl=0
As you can see all the files with finrep in their name just are not present in the second test.
What could it be? Missing permissions on something? File size/number limit? Sorry for the Dropbox links, but the logs are both quite big considering the number of files.
Thanks in advance for the help!

Apache/PHP using 100% CPU while trying to free cache space

I created a script for use with my website that is supposed to erase the oldest entry in cache when a new item needs to be cached. My website is very large with 500,000 photos on it and the cache space is set to 2 GB.
These functions are what cause the trouble:
function cache_tofile($fullf, $c)
{
error_reporting(0);
if(strpos($fullf, "/") === FALSE)
{
$fullf = "./".$fullf;
}
$lp = strrpos($fullf, "/");
$fp = substr($fullf, $lp + 1);
$dp = substr($fullf, 0, $lp);
$sz = strlen($c);
cache_space_make($sz);
mkdir($dp, 0755, true);
cache_space_make($sz);
if(!file_exists($fullf))
{
$h = #fopen($fullf, "w");
if(flock($h, LOCK_EX))
{
ftruncate($h, 0);
rewind($h);
$tmo = 1000;
$cc = 1;
$i = fputs($h, $c);
while($i < strlen($c) || $tmo-- > 1)
{
$c = substr($c, $i);
$i = fwrite($h, $c);
}
flock($h, LOCK_UN);
fclose($h);
}
}
error_reporting(7);
}
function cache_space_make($sz)
{
$ct = 0;
$cf = cachefolder();
clearstatcache();
$fi = shell_exec("df -i ".$cf." | tail -1 | awk -F\" \" '{print \$4}'");
if($fi < 1)
{
return;
}
if(($old = disk_free_space($cf)) === false)
{
return;
}
while($old < $sz)
{
$ct++;
if($ct > 10000)
{
error_log("Deleted over 10,000 files. Is disk screwed up?");
break;
}
$fi = shell_exec("rm \$(find ".$cf."cache -type f -printf '%T+ %p\n' | sort | head -1 | awk -F\" \" '{print \$2}');");
clearstatcache();
$old = disk_free_space($cf);
}
}
cachefolder() is a function that returns the correct folder name with a / appended to it.
When the functions are executed, the CPU usage for apache is between 95% and 100% and other services on the server are extremely slow to access during that time. I also noticed in whm that cache disk usage is at 100% and refuses to drop until I clear the cache. I was expecting more like maybe 90ish%.
What I am trying to do with the cache_tofile function is attempt to free disk space in order to create a folder then free disk space to make the cache file. The cache_space_make function takes one parameter representing the amount of disk space to free up.
In that function I use system calls to try to find the oldest file in the directory tree of the entire cache and I was unable to find native php functions to do so.
The cache file format is as follows:
/cacherootfolder/requestedurl
For example, if one requests http://www.example.com/abc/def then from both functions, the folder that is supposed to be created is abc and the file is then def so the entire file in the system will be:
/cacherootfolder/abc/def
If one requests http://www.example.com/111/222 then the folder 111 is created and the file 222 will be created
/cacherootfolder/111/222
Each file in both cases contain the same content as what the user requests based on the url. (example: /cacherootfolder/111/222 contains the same content as what one would see when viewing source from http://www.example.com/111/222)
The intent of the caching system is to deliver all web pages at optimal speed.
My question then is how do I prevent the system from trying to lockup when the cache is full. Is there better code I can use than what I provided?
I would start by replacing the || in your code by &&, which was most likely the intention.
Currently, the loop will always run at least 1000 times - I very much hope the intention was to stop trying after 1000 times.
Also, drop the ftruncate and rewind.
From the PHP Manual on fopen (emphasis mine):
'w' Open for writing only; place the file pointer at the beginning of the file and truncate the
file to zero length. If the file does not exist, attempt to create it.
So your truncate is redundant, as is your rewind.
Next, review your shell_exec's.
The one outside the loop doesn't seem too much of a bottleneck to me, but the one inside the loop...
Let's say you have 1'000'000 files in that cache folder.
find will happily list all of them for you, no matter how long it takes.
Then you sort that list.
And then you flush 999'999 entries of that list down the toilet, and only keep the first one.
Then you do some stuff with awk that I don't really care about, and then you delete the file.
On the next iteration, you'll only have to go through 999'999 files, of which you discard only 999'998.
See where I'm going?
I consider calling shell scripts out of pure convenience bad practice anyway, but if you do it, do it as efficiently as possible, at least!
Do one shell_exec without head -1, store the resulting list in a variable, and iterate over it.
Although it might be better to abandon shell_exec altogether and instead program the corresponding routines in PHP (one could argue that find and rm are machine code, and therefore faster than code written in PHP to do the same task, but there sure is a lot of overhead for all that IO redirection).
Please do all that, and then see how bad it still performs.
If the results are still unacceptable, I suggest you put in some code to measure the time certain parts of those functions require (tip: microtime(true)) or use a profiler, like XDebug, to see where exactly most of your time is spent.
Also, why did you turn off error reporting for that block? Looks more than suspicious to me.
And as a little bonus, you can get rid of $cc since you're not using it anywhere.

PHP ZipArchive ExtractTo Hanging Issue

I am trying to unzip one file that has two csv files in it. Every variation of my code has the same results. Using the code I have currently, it gets only one file partially out and then hangs. The file unzipped shows that it is 38,480kb and gets stuck toward the end of row 214410. The true file in the archive is 38,487kb and has a total of 214442 rows. Any idea what could be causing this to hang at the last minute? I am doing all my testing with xampp on localhost on a windows 7 machine. This is in php, and is the only code in the file. The required zip file is in the same folder with it.
<?php
ini_set('memory_limit','-1');
$zip = new ZipArchive;
if ($zip->open('ItemExport.zip') === TRUE) {
for($i = 0; $i < $zip->numFiles; $i++) {
$filename = $zip->getNameIndex($i);
$zip->extractTo('.', array($filename));
}
$zip->close();
} else {
echo 'failed';
}
}
?>
Thanks in advance for any help!
EDIT**
The file shows up almost immediately in the correct directory and a few seconds later it gives the file size 38,480kb. At that point it doesn't do anything else. After waiting on it for MORE than long enough 5-10 minutes+ I opened the file. It is "locked for editing by 'another user'" as it is still being held by http. The writing to the csv file just stops mid-word, mid-sentence on column 9 of 11 in row 214,442.

PHP - Chunked file copy (via FTP) has missing bytes?

So, I'm writing a chunked file transfer script that is intended to copy files--small and large--to a remote server. It almost works fantastically (and did with a 26 byte file I tested, haha) but when I start to do larger files, I notice it isn't quite working. For example, I uploaded a 96,489,231 byte file, but the final file was 95,504,152 bytes. I tested it with a 928,670,754 byte file, and the copied file only had 927,902,792 bytes.
Has anyone else ever experienced this? I'm guessing feof() may be doing something wonky, but I have no idea how to replace it, or test that. I commented the code, for your convenience. :)
<?php
// FTP credentials
$server = CENSORED;
$username = CENSORED;
$password = CENSORED;
// Destination file (where the copied file should go)
$destination = "ftp://$username:$password#$server/ftp/final.mp4";
// The file on my server that we're copying (in chunks) to $destination.
$read = 'grr.mp4';
// If the file we're trying to copy exists...
if (file_exists($read))
{
// Set a chunk size
$chunk_size = 4194304;
// For reading through the file we want to copy to the FTP server.
$read_handle = fopen($read, 'rb');
// For appending to the destination file.
$destination_handle = fopen($destination, 'ab');
echo '<span style="font-size:20px;">';
echo 'Uploading.....';
// Loop through $read until we reach the end of the file.
while (!feof($read_handle))
{
// So Rackspace doesn't think nothing's happening.
echo PHP_EOL;
flush();
// Read a chunk of the file we're copying.
$chunk = fread($read_handle, $chunk_size);
// Write the chunk to the destination file.
fwrite($destination_handle, $chunk);
sleep(1);
}
echo 'Done!';
echo '</span>';
}
fclose($read_handle);
fclose($destination_handle);
?>
EDIT
I (may have) confirmed that the script is dying at the end somehow, and not corrupting the files. I created a simple file with each line corresponding to the line number, up to 10000, then ran my script. It stopped at line 6253. However, the script is still returning "Done!" at the end, so I can't imagine it's a timeout issue. Strange!
EDIT 2
I have confirmed that the problem exists somewhere in fwrite(). By echoing $chunk inside the loop, the complete file is returned without fail. However, the written file still does not match.
EDIT 3
It appears to work if I add sleep(1) immediately after the fwrite(). However, that makes the script take a million years to run. Is it possible that PHP's append has some inherent flaw?
EDIT 4
Alright, further isolated the problem to being an FTP problem, somehow. When I run this file copy locally, it works fine. However, when I use the file transfer protocol (line 9) the bytes are missing. This is occurring despite the binary flags the two cases of fopen(). What could possibly be causing this?
EDIT 5
I found a fix. The modified code is above--I'll post an answer on my own as soon as I'm able.
I found a fix, though I'm not sure exactly why it works. Simply sleeping after writing each chunk fixes the problem. I upped the chunk size quite a bit to speed things up. Though this is an arguably bad solution, it should work for my uses. Thanks anyway, guys!

PHP file uploading trouble

I'm having an extremely weird problem with a PHP script of mine.
I'm uploading a couple of files and having PHP put them all in one folder.
I've have trouble with random files being sent and random ones not being sent. So I debugged it and I got a very weird result from the $_FILES[] array.
I tried it with 3 files.
$_FILES["addFile"]["name"] Holds the names of the 3 files.
You'd expect $_FILES["addFile"]["tmp_name"] to hold the 3 temporary names that PHP uses to copy the files, but it doesn't. It holds just one name. The other 2 are empty strings, which generate an error whilst uploading(which I supress from being displayed)
This is very odd. I've tried mulitple situations and it just keeps on happening.
This must be something in my settings or perhaps even my code.
Here's my code:
$i = 0;
if (!empty($_FILES['addFile'])) {
foreach($_FILES['addFile'] as $addFile) {
$fileToCopy = $_FILES["addFile"]["tmp_name"][$i];
$fileName = $_FILES["addFile"]["name"][$i];
$i++;
if(!empty($fileToCopy)){
$copyTo = $baseDir."/".$fileName;
#copy($fileToCopy, $copyTo) or die("cannot copy ".$fileToCopy." to ".$copyTo);
}
}
exit(0);
}
Since the tmp_name is empty, the if-value will be false so it's gonna skip the die() function.
Does anybody know what might be causing this?
further info: I'm using Windows XP, running WAMP server. Never had this problem before and I can acces all maps from which I've tried to upload. Security settings of windows can't be the issue I think.
I'm sorry but it seams to me that you are trying to upload all 3 files with the same variable name? Is this right?
But this will not work because they will overwrite each other.
I think the better an cleaner way it would be to use something like
$i = 0;
foreach($_FILES['addFile'.$i] as $addFile) {
if(!empty($addFiles) {
move_uploaded_file($addFile['temp_name'], 'YOUR DIRECTORY');
}
$i++;
}
Relevent, but probably not going to help: but move_uploaded_file is a (slightly) better way to handle uploaded files than copy.
Are any of the files large? PHP has limits on the filesize and the time it can take to upload them ...
Better to send you here than attempt to write up what it says:
http://uk3.php.net/manual/en/features.file-upload.common-pitfalls.php
Your loop logic is incorrect. You are using a foreach loop on the file input name directly, which stores several properties that are of no interest to you ('type','size', etc).
You should get the file count from the first file and use it as the loop length:
if (!empty($_FILES['addFile']) && is_array($_FILES['addFile']['name'])) {
$length = count($_FILES['addFile']['name']);
for($i = 0; $i < $length; $i++) {
$result = move_uploaded_file($_FILES['addFile']['tmp_name'][$i],$baseDir."/" . $_FILES['addFile']['name'][$i]);
if($result === false) {
echo 'File upload failed. The following error has occurred: ' . $_FILES['addFile']['error'][$i];
}
}
}
Check the error code if you are still having problems, it should provide all the information you need to debug it.

Categories