I have a slightly unconventional task I am trying to accomplish with .ZIP archives in PHP. I have a zip archive used for an automation task (It's a startup package for Amazon EC2 instances) which contains a number of text and xml files. What I need to do is find/replace a few pieces of text within those files, and output a BASE 64 encoded string (not write a new .zip file) using PHP on the fly.
I have no problem with getting the file contents and base64 enconding them with file_get_contents(), and base64_encode(), or the find/replace, it's the unzipping, and zipping to and from strings I can't seem to figure out.
I would like to avoid unzipping the archive, copying the files, editing the files writing a new .zip to disk, and then getting the contents and encoding that. I was hoping there might be a solution that looks more like this:
Get the contents of the zip file into a string.
$originalZipFile = file_get_contents('Path/To/ZipFile');
"Unzip" the data in that string, to a new string to expose the bits of text I want to find/replace.
$unzippedFile = someFunction($originalZipFile);
Find and replace bits of text.
$processedString = str_replace($find, $replace, $unzippedFile);
"Rezip" the processed string into a new string.
$rezippedFile = someOtherFunction($processedString);
Base64 encode the "rezziped" string.
$desiredOutputString = base64_encode($rezippedFile);
I have looked at the PHP ZipArchive class, but it doesn't seem to have the functions I'm looking for.
Any insights are greatly appreciated!
-Oliver
Well, I believe I found a pretty good solution to this. For any others looking for similar solutions, I would recommend looking at the ZipStream-PHP class by Paul Duncan.
With this class, you are able to dynamically write files, contents, and directories to a zip file, which is then streamed without writing a file to disk.
Pretty automagical.
Related
I'm trying to write a PHP file on a server and to bypass the extension in the end.
This is the PHP file - 1.php:
<?php
file_put_contents("folder\\".$GET['file'].".PNG",$_GET['content']);
?>
I'm trying to bypass the PNG extension and to write a PHP file.
like this:
1.php?file=attack.php%00&content=blabla
but it's not working
I tried:
Null char (%00,%u0000)
Long filename
CRLF chars
space char
?,&,|,>,<,(,),{,},[,],\,!,~,:,; chars
backspace char
../
php protocol
php://filter/write=convert.base64-decode/resource=1.php
(will not work because the folder in the begging)
Anyone have any idea?
Thanks!
There are several fundamental problems here;
This code is very unsafe, I could set get as ../../1.php and overwrite this file to do whatever I want. It appears that you're doing some security testing however, so I guess that may be the problem
php is not a protocal, it's a language so php://anything should not work.
folder\\ doesn't make sense, what is this supposed to be/do?
That said though, for educational purposes prepending ../../ should allow you to escape out of the folder/ directory.
For example if this is in /home/Zak/mytest/ with the expectation of a directory within that called folder designated to store these PNG files, then a file of ../../zak_homedir should put a file at /home/Zak/zak_homedir.PNG due to relative path resolution.
I have tried to extract the user email addresses from my server. But the problem is maximum files are .txt but some are CSV files with txt extension. When I am trying to read and extract, I could not able to read the CSV files which with TXT extension. Here is my code:
<?php
$handle = fopen('2.txt', "r");
while(!feof($handle)) {
$string = fgets($handle);
$pattern = '/[A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4}/i';
preg_match_all($pattern, $string, $matches);
foreach($matches[0] as $match)
{
echo $match;
echo '<br><br>';
}
}
?>
I have tried to use this code for that. The program is reading the complete file which are CSV, and line by line which are Text file. There are thousands of file and hence it is difficult to identify.
Kindly, suggest me what I should do to resolve my problem? Is there any solution which can read any format, then it will be awesome.
Well your files are different. Because of that you will have to take a different approach for each of those. In more general terms this is usually calling adapting and is mostly provided using the Adapter design pattern.
Should you use the adapter design pattern you would have a code inspecting the extension of a file to be opened and a switch with either txt or csv. Based on the value you would retrieve aTxtParseror aCsvParser` respectively.
However, before diving deep into this territory you might want to have a look at the files first. I cannot say this for sure without seeing the structures but you can. If the contents of both the text and csv files are the same then a very simple approach is to change the extension to either txt or a csv for all files and then process them using same logic, knowing files with the same extension will now be processed in the same manner.
But from what I understood the file structures actually differ. So to keep your code concise the adapter pattern, having two separate classes/functions for parsing and another one on top of that for choosing the right parsing function (this top function would actually be a form of a strategy) and running it.
Either way, I very much doubt so there is a solution for the problem you are facing as a file structure is mostly your and your own.
Ok, so problem is when CSV file has too long string line. Based on this restriction I suggest you to use example from php.net Here is an example:
$handle = #fopen("/tmp/inputfile.txt", "r");
if ($handle) {
while (($buffer = fgets($handle, 4096)) !== false) {
echo $buffer;
// do your operation for searching here
}
if (!feof($handle)) {
echo "Error: unexpected fgets() fail\n";
}
fclose($handle);
}
I'm planning to run a php program from Mac Terminal. I have a folder on my desktop with around 800 .csv files and I need to write a php program that reads through and reads each one so that I can run some transformations on the data it's storing. I know how to parse the .csv once it's loaded but I'm wondering if there is a way to load each file without having to name it explicitly? I don't have a list of the 800 file names but I feel like there has to be a way to just read in all the files in a folder in a loop or something without having the title of each file listed -- I don't have much coding experience, so forgive me if there's an obvious answer of which I'm oblivious.
Thank you!
There are a few way todo this but glob'ing is very straightforward:
<?php
foreach (glob("*.csv") as $filename) {
//do somthing
}
?>
You can loop through all files in a directory using readdir() :http://php.net/manual/en/function.readdir.php.
Once you get the file name using readdir() you can parse it by either breaking the file content into an array and working with the cells by looping through the array using str_getcsv() (requires at least phpv5.3) or the old fashion fgetcsv() to read through the file one line at a time. For each file create a string variable, and after line you read through and transform, simply append the modified line to this string with an end-of-line character appended as well. After reading through the entire file, simply replace the file contents of the original with file_put_contents()
I have a PHP script that generates a dynamic PHP sitemap from my site's database to an xml file using fopen() and fwrite().
How can I compress this file using GZ compression dynamically as I write it?
I tried fwrite()-ing strings that I ran through gzcompress() into the file and renaming it ".xml.gz", but it doesn't seem the file it;s creating is a well-fromed archive.
Not using fopen and fwrite but gzopen() and gzwrite() should do the trick for you.
From the manual:
# Sample #1 gzwrite() example
<?php
$string = 'Some information to compress';
$gz = gzopen('somefile.gz','w9');
gzwrite($gz, $string);
gzclose($gz);
?>
If i understood correctly
this is a quote from the php site that we all should keep in mind.
Take Heed 07-Nov-2010 08:50 Read the description of gzwrite() very
carefully. If the 'length' option is not specified, then the input
data will have slashes stripped on systems where magic quotes are
enabled. This is important information to know when compressing
files.
http://php.net/manual/en/function.gzwrite.php
How to search text in some files like PDF, doc, docs or txt using PHP?
I want to do similar function as Full Text Search in MySQL,
but this time, I'm directly search through files, not database.
The search will do searching in many files that located in a folder.
Any suggestion, tips or solutions for this problem?
I also noticed that, google also do searching through the files.
For searching PDF's you'll need a program like pdftotext, which converts content from a pdf to text. For Word documents a simular thingy could be available (because of all the styling and encryption in Word files).
An example to search through PDF's (copied from one of my scripts (it's a snippet, not the entire code, but it should give you some understanding) where I extract keywords and store matches in a PDF-results-array.):
foreach($keywords as $keyword)
{
$keyword = strtolower($keyword);
$file = ABSOLUTE_PATH_SITE."_uploaded/files/Transcripties/".$pdfFiles[$i];
$content = addslashes(shell_exec('/usr/bin/pdftotext \''.$file.'\' -'));
$result = substr_count(strtolower($content), $keyword);
if($result > 0)
{
if(!in_array($pdfFiles[$i], $matchesOnPDF))
{
array_push($matchesOnPDF, array(
"matches" => $result,
"type" => "PDF",
"pdfFile" => $pdfFiles[$i]));
}
}
}
Depending on the file type, you should convert the file to text and then search through it using i.e. file_get_contents() and str_pos(). To convert files to text, you have - beside others - the following tools available:
catdoc for word files
xlhtml for excel files
ppthtml for powerpoint files
unrtf for RTF files
pdftotext for pdf files
If you are under a linux server you may use
grep -R "text to be searched for" ./ // location is everything under the actual directory
called from php using exec resulting in
cmd = 'grep -R "text to be searched for" ./';
$result = exec(grep);
print_r(result);
2021 I came across this and found something so I figure I will link to it...
Note: docx, pdfs and others are not regular text files and require more scripting and/or different libraries to read and/or edit each different type unless you can find an all in one library. This means you would have to script out each different file type you want to search though including a normal text file. If you don't want to script it completely then you have to install each of the libraries you will need for each of the file types you want to read as well. But you still need to script each to handle them as the library functions.
I found the basic answer here on the stack.