PHP | Get file that is most similar to string - php

Currently I have a zip folder with files in it that I do not know the filenames of. The only thing I know is that one filename is very similar to a string a have. It is literally one character off.
What I am trying to do right now is to extract only the file that is the most similar to the string I have. To extract only one file from a zip I use the following code that works:
$zip = new ZipArchive;
if ($zip->open('directory/to/zipfile') === TRUE)
{
$zip->extractTo('directory/where/to/extract', array('the/filename/that/is/most/similair/most/go/here'));
$zip->close();
echo 'ok';
}
else
{
echo 'failed';
}
I know that to check the similarity of strings I can use the following code:
$var_1 = 'PHP IS GREAT';
$var_2 = 'WITH MYSQL';
similar_text($var_1, $var_2, $percent);
And based on the percentage I can tell which file is most similar to the string I have. The only thing I am worried about is that ZipArchieve doesn't have a function to retrieve files from a zip without knowing the exact filename.
So I was wondering if there is a way to retrieve a single file from a zip based on a string that is most similar to the filename.

This comment in the docs mentions how to list the files in a zip archive, so, all you would have to do is loop through all the file names and find the one that closest matches the string you have and then extract it.
$search = 'Closefilename.doc';
$za = new ZipArchive();
$za->open('theZip.zip');
$similarity = 0;
for( $i = 0; $i < $za->numFiles; $i++ ){
$stat = $za->statIndex( $i );
similar_text($stat['name'], $search, $sim);
if ($sim > $similarity) {
$similarity = $sim;
$filename = $stat['name'];
}
}
// Now extract $filename;

Try this code:
// Your Zip File path
$zip = zip_open( $fileName );
if ( is_resource( $zip ) ) {
while( $zip_entry = zip_read( $zip ) ) {
$zip_entry_string = zip_entry_read ( $zip_entry );
// Compare here with similar_text
// If success you can write this string to file
}
}
zip_close( $zip );
?>

Related

How to save the image fetched from DOCX by PHP using ZipArchive

Brief description: I have a Docx file. I have done a simple code in PHP which extracts the images in that file and display it on the page.
What I want to achieve: I want that these images should be saved beside my php file with the same name and format.
My folder has sample.docx (which has images), extract.php (which extracts the images from docx) and display.php
Below is the code of extract.php
<?php
/*Name of the document file*/
$document = 'sample.docx';
/*Function to extract images*/
function readZippedImages($filename) {
/*Create a new ZIP archive object*/
$zip = new ZipArchive;
/*Open the received archive file*/
if (true === $zip->open($filename)) {
for ($i=0; $i<$zip->numFiles;$i++) {
/*Loop via all the files to check for image files*/
$zip_element = $zip->statIndex($i);
/*Check for images*/
if(preg_match("([^\s]+(\.(?i)(jpg|jpeg|png|gif|bmp))$)",$zip_element['name'])) {
/*Display images if present by using display.php*/
echo "<image src='display.php?filename=".$filename."&index=".$i."' /><hr />";
}
}
}
}
readZippedImages($document);
?>
display.php
<?php
/*Tell the browser that we want to display an image*/
header('Content-Type: image/jpeg');
/*Create a new ZIP archive object*/
$zip = new ZipArchive;
/*Open the received archive file*/
if (true === $zip->open($_GET['filename'])) {
/*Get the content of the specified index of ZIP archive*/
echo $zip->getFromIndex($_GET['index']);
}
$zip->close();
?>
How can I do that?
I not sure that you need to open the zip archive multiple times like this - especially when another instance is already open but I'd be tempted to try something like the following - I should stress that it is totally untested though.
Updated after testing:
There is no need to use display.php if you do like this - seems to work ok on different .docx files. The data returned by $zip->getFromIndex yields the raw image data ( so I discovered )so passing that in a query string is not possible due to the length. I tried to avoid opening/closing the zip archive unnecessarily hence the approach below which adds the raw data to the output array and the image is then displayed using this base64 encoded data inline.
<?php
#extract.php
$document = 'sample.docx';
function readZippedImages($filename) {
$paths=[];
$zip = new ZipArchive;
if( true === $zip->open( $filename ) ) {
for( $i=0; $i < $zip->numFiles;$i++ ) {
$zip_element = $zip->statIndex( $i );
if( preg_match( "([^\s]+(\.(?i)(jpg|jpeg|png|gif|bmp))$)", $zip_element['name'] ) ) {
$paths[ $zip_element['name'] ]=base64_encode( $zip->getFromIndex( $i ) );
}
}
}
$zip->close();
return $paths;
}
$paths=readZippedImages( $document );
/* to display & save the images */
foreach( $paths as $name => $data ){
$filepath=__DIR__ . '/' . $name;
$dirpath=pathinfo( $filepath, PATHINFO_DIRNAME );
$ext=pathinfo( $name, PATHINFO_EXTENSION );
if( !file_exists( $dirpath ) )mkdir( $dirpath,0777, true );
if( !file_exists( $filepath ) )file_put_contents( $filepath, base64_decode( $data ) );
printf('<img src="data:image/%s;base64, %s" />', $ext, $data );
}
?>

Move file to directory

So with your help I have been able to assemble the below.
$dir = "./reporting/live-metrics/";
$des = "./reporting/historic-metrics/";
$ctime = time();
foreach (glob($dir."*") as $file) {
$live = file_get_contents($file);
if (strpos($live, 'CORO') === false && filemtime($file) < time() - 1 * 10) {
$exclude[] = $live;
$lines = file( $file , FILE_IGNORE_NEW_LINES );
$lines[3] = 'Taken Down';
$lines[5] = $ctime;
file_put_contents( $file , implode( "\n", $lines ) );
if (!file_exists($des.basename($file).PHP_EOL)) {
mkdir($des.basename($file), 0777, true);
}
rename($file,$des.$ctime);
}
}
My issue is that I am attempting to move the file to the new directory created but I am having a little issue with it. No matter what I do I can only get it to move to $des, i cant seem to get ti to move to the dynamicaly created directory for each specific file. I am assuming it has to do with the fact I am not using rename to its correct params. Below are the some of the combinations I have tried to get it to rename and move.
rename($file,$des.basename($file).PHP_EOL.$ctime); //doesn't move or rename
rename($file,$des.basename($file).$ctime); //adds to historic-metrics/ as jason1465519298
I also tried creating a function and setting the rename to call on that. eg.
$path = $des.basename($file).PHP_EOL;
rename($file,$path.$ctime);
Currently the script is great up until the moving the file. It will move it to ./reporting/historic-metrics/ but I would like it to move to the directory just created. EG, if the file it is currently handeling is called 'Jason' then it will create ./reporting/historic-metrics/Jason but move the file to ./reporting/historic-metrics/
There seems to be two possibilities:
Source or destination file paths may be wrong, you can print and check
The newly created destination directory is getting the
correct permission & ownership.
Otherwise your script looks OK.
I finally got it. My main issue was trying to get the filepath to send the file to. After a few interruptions and rethinking my approach I came up with the below. I know it isn't pretty or as slim as could be made but it does the job perfectly.
$dir = "./reporting/live-metrics/";
$des = "./reporting/historic-metrics/";
$ctime = time();
foreach (glob($dir."*") as $file) {
$live = file_get_contents($file);
if (strpos($live, 'CORO') === false && filemtime($file) < time() - 1 * 10) {
$exclude[] = $live;
$lines = file( $file , FILE_IGNORE_NEW_LINES );
$lines[3] = 'Taken Down';
$lines[5] = $ctime;
file_put_contents( $file , implode( "\n", $lines ) );
if (!file_exists($des.basename($file).PHP_EOL)) {
mkdir($des.basename($file), 0777, true);
}
$user = basename($file); //Gets file name that was used in mkdir
$path = (String) $des.$user."/"; //Compiles variables into string
rename($file,$path.$ctime);
}
}

Cannot extract zip file in php, no feedbackor error

I am retrieving my google map in a kmz format like this:
file_put_contents($_SERVER['DOCUMENT_ROOT'].'/temp/map.kmz', file_get_contents('https://mapsengine.google.com/map/kml?mid=zLucZBnh_ipg.kS906psI1W9k') );
$zip = new ZipArchive;
$res = $zip->open($_SERVER['DOCUMENT_ROOT'].'/temp/map.kmz');
if ($res === true)
{
trace("Number of files: $res->numFiles".PHP_EOL);
for( $i = 0; $i < $res->numFiles; $i++ )
{
$stat = $res->statIndex( $i );
print_r( basename( $stat['name'] ) . PHP_EOL );
}
}
But no files are showing and $zip->extractTo() is not working either. The file is downloaded on the server and I can extract it manually though. I have tried renaming the file to .zip or .kmz, still not working. I have opened the map.kmz file in Winrar and it does indeed say that it is a zip file format.
Any idea why it's not working? Do I need some special permissions to read the number of files or extract?
Check your file types .mkz and .kmz.
file_put_contents($_SERVER['DOCUMENT_ROOT'].'/temp/map.mkz',
file_get_contents('https://mapsengine.google.com/map/kml? mid=zLucZBnh_ipg.kS906psI1W9k') );
$zip = new ZipArchive;
$res = $zip->open($_SERVER['DOCUMENT_ROOT'].'/temp/map.kmz');
Got tired of the damn class not working, tried this method instead and it works:
$data = file_get_contents("https://mapsengine.google.com/map/kml?mid=zLucZBnh_ipg.kS906psI1W9k");
file_put_contents($_SERVER['DOCUMENT_ROOT'].'/temp/kmz_temp', $data);
ob_start();
passthru("unzip -p {$_SERVER['DOCUMENT_ROOT']}/temp/kmz_temp");
$xml_data = ob_get_clean();
header("Content-type: text/xml");
echo $xml_data;
exit();

How to add a recent file with a .zip file extension, in a specified folder to a php variable?

I would like to script in php code that will search a specific folder for a recently added file with a .zip file extension and add it to a variable to be manipulated later.
Use scandir to look for files in the specific folder, then isolate zip files using some strpos() or regexp on the retrieved filenames.
If needed, test the last modification time of the zip files found.
Edit: Using glob() will even be faster to match *.zip files.
[Edit]
Managed to come up with this code but i think i coded dirty. Any way to clean this up?
$show = 2; // Change to 0 for listing all found file types
$dir = ''; // Blank if the folder/directory to be scanned is the current one (with the script)
if($dir) chdir($dir);
$files = glob( '*.zip');
usort( $files, 'filemtime_compare' );
function filemtime_compare( $a, $b )
{
return filemtime( $b ) - filemtime( $a );
}
$i = 0;
foreach ( $files as $file )
{
++$i;
if ( $i == $show ) break;
$value = $file;
}
echo "This is the file name in the variable: " . $value;
?>

How to get ZipArchive to not overwrite certain files and folders

I would like to extract a zip folder to a location and to replace all files and folders except a few, how can I do this?
I currently do the following.
$backup = realpath('./backup/backup.zip');
$zip = new ZipArchive();
if ($zip->open("$backup", ZIPARCHIVE::OVERWRITE) !== TRUE) {
die ('Could not open archive');
}
$zip->extractTo('minus/');
$zip->close();
How can I put conditions in for what files and folders should NOT be replaced? It would be great if some sort of loop could be used.
Thanks all for any help
You could do something like this, I tested it and it works for me:
// make a list of all the files in the archive
$entries = array();
for ($idx = 0; $idx < $zip->numFiles; $idx++) {
$entries[] = $zip->getNameIndex($idx);
}
// remove $entries for the files you don't want to overwrite
// only extract the remaining $entries
$zip->extractTo('minus/', $entries);
This solution is based on the numFiles property and the getNameIndex method, and it works even when the archive is structured into subfolders (the entries will look like /folder/subfolder/file.ext).
Also, the extractTo method takes a second optional paramer that holds the list of files to be extracted.
If you just want to extract specific files from the archive (and you know what they are) then use the second parameter (entries).
$zip->extractTo('minus/', array('file1.ext', 'newfile2.xml'));
If you want to extract all the files that do not exist, then you can try one of the following:
$files = array();
for($i = 0; $i < $zip->numFiles; $i++) {
$filename = $zip->getNameIndex($i);
// if $filename not in destination / or whatever the logic is then
$files[] = $filename;
}
$zip->extractTo($path, $files);
$zip->close();
You can also use $zip->getStream( $filename ) to read a stream that you then write to the destination file.

Categories