I'm reading files that are located inside a .zip archive (in this case an XBRL taxonomy) using PHP 7 and its ZipArchive functions, but a whole lot of files that I am sure are inside are simply skipped, ignored as they do not exist.
This is the result of running zipinfo www.eba.europa.eu.zip on the file
https://www.dropbox.com/s/336njdmfg8uaho8/output-zipinfo.txt?dl=0
This is the result of reading the content of the zip using the following code:
$zip = new \ZipArchive();
$zip->open("www.eba.europa.eu.zip");
for ($i = 0; $i < $zip->numFiles; $i++) {
echo 'Filename: ' . $zip->getNameIndex($i) . PHP_EOL;
}
https://www.dropbox.com/s/7njkp5i92d68fxs/output-ziparchive.txt?dl=0
As you can see all the files with finrep in their name just are not present in the second test.
What could it be? Missing permissions on something? File size/number limit? Sorry for the Dropbox links, but the logs are both quite big considering the number of files.
Thanks in advance for the help!
Related
I am trying to extract files from a zip file, but its failing with following error
Warning: copy(zip://upload/myzip-file.zip#myzip-file/file_001.csv): Failed to open stream: operation failed in {code line}
My file myzip-file.zip is placed inside upload folder, my code is able to read the contents of this file, but its unable to extract file one by one (I want to extract particular files only. I also want to avoid creation of sub folder)
$zip = new ZipArchive;
if ($zip->open($zipPath) === true) {
for($i = 0; $i < $zip->numFiles; $i++) {
$filename = $zip->getNameIndex($i);
$fileinfo = pathinfo($filename);
copy("zip://".$zipPath."#".$filename, "my-path/".$fileinfo['basename']);
}
$zip->close();
}
I suspect that copy functoin is not able to understand zip://
I found this sample on net where people have achived same using copy command but its not working for me any more.
Please note
My php script is at same location as are upload and my-path (All three in same directory)
My Zip does contain an extra folder myzip-file and its confirmed by extracting the full zip contentents and this sinppet $zip->getNameIndex($i); also revealed that.
Please note you don't have to fix it, but if have any sample which is extracting one single file from zip. It will work for me.
I have tested your PHP script, it will work if
using relative path (so use $zipPath="./upload/myzip-file.zip"; and "./my-path/")
my-path is writable
over the iteration, better do not process the "file" if the $filename is actually a directory
so the directory structure is like the attached picture (myzip-file.zip is placed inside the upload folder, the process.php is the PHP to do the job)
So use the following code (I tested in a linux server and it works)
<?php
$zipPath="./upload/myzip-file.zip";
$zip = new ZipArchive;
if ($zip->open($zipPath) === true) {
for($i = 0; $i < $zip->numFiles; $i++) {
$filename = $zip->getNameIndex($i);
$fileinfo = pathinfo($filename);
if (substr($filename, -1) !="/"){
copy("zip://".$zipPath."#".$filename, "./my-path/".$fileinfo['basename']);
}
}
$zip->close();
}
?>
I am using ZipArchive class to unzip a file and put its contents somewhere useful. Using information derived from the comments on php.net, I ended up writing this function:
function unzip(string $zipFile, string $destination) {
$zip = new ZipArchive();
$zip->open($zipFile);
for($i=0; $i<$zip->numFiles; $i++) {
$file=$zip->getNameIndex($i);
if(substr($file,-1) == '/') continue; // skip containing folder
$name=basename($file);
copy("zip://$zipFile#$file","$destination/$name");
}
$zip->close();
}
This is to copy the individual files without the folder structure.
I can understand most of the code, but I cannot get any information on the following expression:
"zip://$zipFile#$file"
I know what it doing (obviously it is extracting one of the files from the Zip archive), but can anyone tell me more about the zip:// protocol, and why it uses the # to reference a particular file?
Check out source of Zip extension
https://github.com/php/php-src/blob/master/ext/zip/zip_stream.c
line 135. fragment = strchr(path, '#');
get pointer of entry in path/to/zip#entry
line 141.
if (strncasecmp("zip://", path, 6) == 0)
{
path += 6;
}
if zip:// is equal first 6 characters of path
I won't go into the logic of this code. It just exists here (zip_stream.c) and maybe somewhere else.
It seems like this extension "creates" zip:// protocol over php executable that is laying over apache server.
I'm using the ZipArchive class in PHP for the first time, and I'm having a bit of trouble. All I'm trying to do is iterate through the files in the ZIP archive, and it all works, except it doesn't seem there's any way to get subdirectory names inside the ZIP file, and I need that information. If there are files inside the subdirectoy, I suppose I could parse the directory names out of the file names, but if there are empty directories, they're completely lost.
For instance, if I have this directory tree in a ZIP:
root_folder
root_folder -> test_file.ext
root_folder -> empty_dir
Now I try to read that ZIP file's entries into memory like this in PHP:
<?php
$zip = new ZipArchive;
if ($zip->open('test.zip') == TRUE) {
for ($i = 0; $i < $zip->numFiles; $i++) {
$filename = $zip->getNameIndex($i);
echo $filename."<br />";
}
$zip->close();
}
?>
If I do this, then root_folder\test_file.ext is found correctly, but root_folder\empty_dir is just lost entirely.
So how do I use the ZipArchive class to find all the files and directories, including empty ones inside the archive?
A real head scratcher this one - any help would be gratefully received.
I have been using the zipArchive library to extract csv files from a zip.
Oddly, it will only extract 40 files properly. Files with an index 40 or greater appear as empty files, files 0-39 extract perfectly.
This is the case regardless of the combination of files and the size of the files. I have tried removing the 39th file and the 40th file from the zip and the problem just moves. No matter what combination of files I use, it extracts 40 files properly and then just dies.
Thanks to this forum, I have tried using Shell Exec with exactly the same outcome.
I have also tried extracting the files one at a time, using a zip with only the csv files and zips with multiple different file types. Always only 40 are extracted.
This is such a suspiciously round number that it must surely be a setting somewhere that I cannot find or otherwise a bug.
For what it is worth, the unzipping code is below:
$zip = new ZipArchive;
if ($zip->open('Directory/zipname.zip') == TRUE) {
for ($i = 0; $i < $zip->numFiles; $i++) {
$filename = $zip->getNameIndex($i);
if(substr(strrchr($filename,'.'),1,3)=="csv")
{
$zip->extractTo('Directory/',$filename);
}
}
}
I have also tried the following which uses a different method with the same results :-(
$zip2 = new ZipArchive;
if ($zip2->open('Directory/zipname.zip') == TRUE) {
for ($i = 0; $i < $zip2->numFiles; $i++) {
$filename = $zip2->getNameIndex($i);
if(substr(strrchr($filename,'.'),1,3)=="csv")
{
$content=$zip2->getFromIndex($i);
$thefile=fopen("directory/filename","w");
fwrite($thefile,$content);
fclose($thefile);
}
}
}
FINALLY found the answer. Thanks to all who tried to help.
For others suffering in the same way, the problem was solved by increasing the server disk allocation. I was on a rather old plan which had served well until the advent of a new national database that increased the amount of storage 10 fold.
A measly 100MB allowance meant that the server would only do so much before spitting the dummy.
Interestingly, a similar problem occurred with trying other file operations - it seemed to be limited to 40 file operations per script, regardless of the size of each file.
I am trying to unzip one file that has two csv files in it. Every variation of my code has the same results. Using the code I have currently, it gets only one file partially out and then hangs. The file unzipped shows that it is 38,480kb and gets stuck toward the end of row 214410. The true file in the archive is 38,487kb and has a total of 214442 rows. Any idea what could be causing this to hang at the last minute? I am doing all my testing with xampp on localhost on a windows 7 machine. This is in php, and is the only code in the file. The required zip file is in the same folder with it.
<?php
ini_set('memory_limit','-1');
$zip = new ZipArchive;
if ($zip->open('ItemExport.zip') === TRUE) {
for($i = 0; $i < $zip->numFiles; $i++) {
$filename = $zip->getNameIndex($i);
$zip->extractTo('.', array($filename));
}
$zip->close();
} else {
echo 'failed';
}
}
?>
Thanks in advance for any help!
EDIT**
The file shows up almost immediately in the correct directory and a few seconds later it gives the file size 38,480kb. At that point it doesn't do anything else. After waiting on it for MORE than long enough 5-10 minutes+ I opened the file. It is "locked for editing by 'another user'" as it is still being held by http. The writing to the csv file just stops mid-word, mid-sentence on column 9 of 11 in row 214,442.