shell_exec() statement to pdftotext entire directory? - php

I'm at a loss as to how I could build a loop to pdftotext and entire directory through a shell_exec() statement.
Something like :
$pdfs = glob("*.pdf");
foreach($pdfs as $pdfs) {
shell_exec('pdftotext '.$pdfs.' '.$pdfs'.txt');
}
But I'm unsure how I can drop the .pdf extension the 2nd time I call $pdfs in my shell_exec() statement and replace that with .txt
Not really sure this loop is correct either....

Try
foreach(glob("*.pdf") as $src) {
// Manually remove file extension because glob() may return a dir path component
$parts = explode('.', $src);
$parts[count($parts) - 1] = 'txt';
$dest = implode('.', $parts);
// Escape shell arguments, just in case
shell_exec('pdftotext '.escapeshellarg($src).' '.escapeshellarg($dest));
}
Basically, loop the PDF files in the directory and execute the command for each one, using just the name component of the file name (extracted with pathinfo())see edit for the output file (so test.pdf becomes test.txt).
Using the result of glob() directly in foreach easily avoids the variable naming collision you had in the code above.
EDIT
I have change the above code to manually remove the file extension when generating the output file name. This is because glob() may return a directory component of the path strings, as well as just a file name. Using pathinfo() or basename() will strip this off, and since we know that a . will be present in the file name (the rule passed to glob() dictates this) we can safely remove everything after the last one. I have also added escapeshellarg() for good measure - it is highly unlikely (if not impossible) that a file name that already exists would fall foul of this, but it is best to be safe.

$pdfs = glob("*.pdf");
$fmt='/path/to/pdftotext "%s" "%s.txt"';
foreach($pdfs as $thispdf) {
shell_exec(sprintf($fmt, $thispdf, basename($thispdf, ".pdf")));
}

Related

comma separated in text file in php

if(isset($_POST['submit']))
{
$file = $_FILES['file']['name'];
$fh = fopen($file,'r+');
// string to put username and passwords
$users = '';
while(!feof($fh)) {
$user = explode(' ',fgets($fh));
foreach ($user as $value)
{
$number= rand(1000,10000);
$final_number[] = $value .','. $number;
}
}
//print_r($final_number);
file_put_contents($_FILES['file']['name'], $final_number);
}
this is my code for appending a random text to a string with comma and save it in text file but when i am saving it it is not saving properly after comma it is going to next line which should not happen plzz.. help me
Your code starts with a very big issue: you try to open and read from a file that, most probably, doesn't exist.
$file = $_FILES['file']['name'];
$fh = fopen($file,'r+');
As you can read in the documentation, assuming that your form contains an input element of type file having the name file, $_FILES['file']['name'] is the original name of the uploaded file, on the user's computer. It is only the name and it is not the name of the file on the server. It is provided just as a hint for the file's content (check the filename extension) but you cannot rely on it.
The content of the file is temporarily stored on the webserver in a file whose path can be found in $_FILES['file']['tmp_name']. You should pass it to the PHP function is_uploaded_file() to be sure the file was uploaded and your script is not the victim of an injection attempt then, if you need to keep it, use move_uploaded_file() to move it where you need. If you don't move it, when your script ends the temporary file is deleted.
Another problem of your code is on the lines:
$user = explode(' ',fgets($fh));
foreach ($user as $value)
As explained in the documentation, the function fgets() called without a second argument reads a line from the input file, including the newline character that ends it. Since you split the line into words I think you don't need the newline character. You can remove it by using trim() with the string returned by fgets() before passing it to explode().
The last issue of the code is:
file_put_contents($_FILES['file']['name'], $final_number);
Because $final_number is an array1, file_put_contents() joins its elements to get a string and writes the string into file. This operation concatenates the random value generated for a $value with the next $value and there is no way to tell which is which after the data is stored in the file. You probably need to keep them on separate lines. Use function implode() on $final_number, with "\n" as its first argument and write the generated string into the file instead.
The last one: don't write the generated content to $_FILES['file']['name']. It is not safe! It contains a string received from the browser; a malicious user can put whatever path they want there and your script will overwrite a file that it shouldn't change.
Create a directory dedicated to store files generated by your code and generate filenames based on an always incremented counter (the current time() or microtime() f.e.) for the files you store there. Never trust the data you receive from the browser.
1 $final_number is used as $final_number[] = ... and, because it is not defined when this line of code is executed for the first time, PHP creates an empty array for you and stores it in $final_number. Don't rely on this feature. Always initialize your variables before their first use. Put $final_number = array(); before the while().
I am going to use a different approach than you, let's say that the data you want to save to the file is stored in the variable $data.
So to append this data to the file with a comma at first, we can use just two lines of code:
$previousFileContent = file_get_contents("filename.txt");
file_put_contents("filename.txt", trim($previousFileContent . "," . $data));

Read a path stored in a text file and use it in opendir of PHP

I want to store a path (pointing to a directory) in a text file and open the path when required in PHP. Here's what I have done, which is quite simple but doesn't really work.
$dir = file_get_contents('./dir_file');
$dir_content = get_fname($dir);
function get_fname($dir) {
$dirhandle = opendir($dir);
if (!dirhandle) { exit; }
.........
}
The value of $dir is what it is in the text file. The code doesn't work. The function exits in the if statement.
I tried to replace the first line with
$dir = '/home/user/work'; //which is the path stored in the text file.
It works. So I suspect it's the problem of opendir. I can't figure out what causes this problem.
Any help will be appreciated. Many thanks.
Check if the file you're reading from has any line breaks, spaces, etc... after the actual path part. If you pass those in to opendir, it's going to look for a directory which has those literal characters in it, and most likely fail.
Adding a trim() call may help:
$dir = trim(file_get_contents('./dir_file'));
which will remove any such whitespace characters.

How to check an exectuable's path is correct in PHP?

I'm writing a setup/installer script for my application, basically just a nice front end to the configuration file. One of the configuration variables is the executable path for mysql. After the user has typed it in (for example: /path/to/mysql-5.0/bin/mysql or just mysql if it is in their system PATH), I want to verify that it is correct. My initial reaction would be to try running it with "--version" to see what comes back. However, I quickly realised this would lead to me writing this line of code:
shell_exec($somethingAUserHasEntered . " --version");
...which is obviously a Very Bad Thing. Now, this is a setup script which is designed for trusted users only, and ones which probably already have relatively high level access to the system, but still I don't think the above solution is something I want to write.
Is there a better way to verify the executable path? Perhaps one which doesn't expose a massive security hole?
Running arbitrary user commands is like running queries based on user input... Escaping is the key.
First, validate if it is an executable using is_executable().
PHP exposes two functions for this: escapeshellarg() and escapeshellcmd().
escapeshellarg() adds single quotes around a string and quotes/escapes any existing single quotes allowing you to pass a string directly to a shell function and having it be treated as a single safe argument.
escapeshellcmd() escapes any characters in a string that might be used to trick a shell command into executing arbitrary commands.
This should limit the amount of risk.
if(is_executable($somethingAUserHasEntered)) {
shell_exec(escapeshellarg($somethingAUserHasEntered) . " --version");
}
After all, doing rm --version isn't very harmful, and "rm -rf / &&" --version will get you anywhere very fast.
EDIT: Since you mentioned PATH... Here is a quick function to validate if the file is an executable according to PATH rules:
function is_exec($file) {
if(is_executable($file)) return true;
if(realpath($file) == $file) return false; // Absolute Path
$paths = explode(PATH_SEPARATOR, $_ENV['PATH']);
foreach($paths as $path) {
// Make sure it has a trailing slash
$path = rtrim($path, DIRECTORY_SEPARATOR) . DIRECTORY_SEPARATOR;
if(is_executable($path . $file)) return true;
}
return false;
}
You could try a simple file_exists call to determine if something exists at that location, along with an is_executable to confirm that it's something you can run.
have you looked at is_dir() or is_link() or is_file() or is_readable()
Hope these help.
system('which '.escapeshellarg($input)) will give you the absolute path to the executable, regardless if it's just the name or an absolute path.

PHP - Open or copy a file when knowing only part of its name?

I have a huge repository of files that are ordered by numbered folders. In each folder is a file which starts with a unique number then an unknown string of characters. Given the unique number how can i open or copy this file?
for example:
I have been given the number '7656875' and nothing more.
I need to interact with a file called '\server\7656800\7656875 foobar 2x4'.
how can i achieve this using PHP?
If you know the directory name, consider using glob()
$matches = glob('./server/dir/'.$num.'*');
Then if there is only one file that should start with the number, take the first (and only) match.
Like Yacoby suggested, glob should do the trick. You can have multiple placeholders in it as well, so if you know the depth, but not the correct naming, you can do:
$matchingFiles = glob('/server/*/7656875*');
which would match
"/server/12345/7656875 foo.txt"
"/server/56789/7656875 bar.jpg"
but not
"/server/12345/subdir/7656875 foo.txt"
If you do not know the depth glob() won't help, you can use a RecursiveDirectoryIterator passing in the top most folder path, e.g.
$iterator = new RecursiveIteratorIterator(
new RecursiveDirectoryIterator('/server'));
foreach($iterator as $fileObject) {
// assuming the filename begins with the number
if(strpos($fileObject->getFilename(), '7656875') === 0) {
// do something with the $fileObject, e.g.
copy($fileObject->getPathname(), '/somewhere/else');
echo $fileObject->openFile()->fpassthru();
}
}
* Note: code is untested but should work
DirectoryIterator return SplFileInfo objects, so you can use them to directly access the files through a high-level API.
$result = system("ls \server\" . $specialNumber . '\');
$fh = fopen($result, 'r');
If it's hidden below in sub-sub-directories of variable length, use find
echo `find . -name "*$input*"`;
Explode and trim each result, then hope you found the correct one.

How to conduct File path operation in PHP?

Something like:
/directory/a/b - /directory/ = a/b
Is it possible to do this easily?
Since you're working with paths, platform sensitivity is important; Windows has a different path separator than most other platforms, and to write reusable code you can't snub a platform.
PHP has a few functions to deal with paths. If you're handed a really strange path like ~foo/bar//bitty/../index.php, use realpath to clean that up for you.
$path = realpath("~foo/bar//bitty/../index.php");
/* output: /home/foo/bar/index.php */
Other functions will aid you -- for example, to get the path part of a filename by itself, use dirname:
print dirname($path);
/* output: /home/foo/bar */
Once you have that, split on the separators and do whatever work you want. The real trick is having PHP worry about all the weirdness in paths for you, and then just working with each part separately. Look into pathinfo and basename as well. I think this is what you were asking for, not how to do dumb string replacements.
Don't forget not allowing injection to your application! Working with paths from Web input is dangerous. Never trust user input.
echo str_replace("/directory/","","/directory/a/b");
And to use this on other types of strings, your full string goes in the third parameter, and whatever you're "subtracting" goes as the first parameter.
Using the dirname() funciton and some strings you can cut the original path up and get the pieces.
<?php
// from: http://php.net/manual/en/function.dirname.php
$path = "/dirname/a/b";
$dir = dirname(dirname($path));
echo "dir at front=$dir\n";
$len = strlen($dir);
$dirname = substr ( $path, 0, $len+1 );
echo "dirname=$dirname\n";
$last_2 = substr ( $path, $len+1 );
echo "last_2=$last_2\n";
?>
results in
$ php x.php
dir at front=/dirname
dirname=/dirname/
last_2=a/b

Categories