REAL basename vs PHP basename (pathinfo) - php

I've got for example a watermark file: ROOT.'/media/watermarks/1.jpg'.
In the future user can use (in some custom php template system) for example: 'watermark-filename', 'watermark-basename', , 'watermark-directory', etc to get needed data.
I'm trying to create some reasonable global variables names.
The question is, what does really 'basename' mean?
Terminal:
basename /path/to/source/file.ext -> "file"
PHP:
<?php
echo basename('/path/to/source/file.ext'); // file.ext
$path_parts = pathinfo('/path/to/source/file.ext');
echo $path_parts['basename']; // file.ext
?>
Wikipedia:
Many file systems, including FAT, NTFS, and VMS systems, allow a filename extension that consists of one or more characters following the last period in the filename, dividing the filename into two parts: a basename or stem and an extension or suffix used by some applications to indicate the file type.
I know Wikipedia is not a source, but according to my best knowledge, in operating systems
filename = file.ext
basename = file
extension = ext
While in php:
filename = file
basename = file.ext
extension = ext
Why?

Related

Use PHP to write a file to Windows that contains Japanese characters in the filename

I want to save a file to Windows using Japanese characters in the filename.
The PHP file is saved with UTF-8 encoding
<?php
$oldfile = "test.txt";
$newfile = "日本語.txt";
copy($oldfile,$newfile);
?>
The file copies, but appears in Windows as
日本語.txt
How do I make it save as
日本語.txt
?
I have ended up using the php-wfio extension from https://github.com/kenjiuno/php-wfio
After putting php_wfio.dll into php\ext folder and enabling the extension, I prefixed the filenames with wfio:// (both need to be prefixed or you get a Cannot rename a file across wrapper types error)
My test code ends up looking like
<?php
$oldfile = "wfio://test.txt";
$newfile = "wfio://日本語.txt";
copy($oldfile,$newfile);
?>
and the file gets saved in Windows as 日本語.txt which is what I was looking for
Starting with PHP 7.1, i would link you to this answer https://stackoverflow.com/a/38466772/3358424 . Unfortunately, the most of the recommendations are not valid, that are listed in the answer that strives to be the only correct one. Like "just urlencode the filename" or "FS expects iso-8859-1", etc. are terribly wrong assumptions that misinform people. That can work by luck but are only valid for US or almost western codepages, but are otherwise just wrong. PHP 7.1 + default_charset=UTF-8 is what you want. With earlier PHP versions, wfio or wrappers to ext/com_dotnet might be indeed helpful.
Thanks.

Finding file without knowing the extension in PHP

I have a bunch of uniquely named images with different extensions, if I have one of the unique names, but I don't know the extension (it's an image extension), how can I find the image extension as fast as possible? I've seen other people doing this by searching all possible file extensions on that file name, but it seems too slow to try and load 6 different possible combinations before bringing up the original image.
Does anyone know an easier way?
You could use glob for this. Might not be the best solution but it is simple;
The glob() function searches for all the pathnames matching pattern
according to the rules used by the libc glob() function, which is
similar to the rules used by common shells.
$files = glob('filenamewithoutextension.*');
if (sizeof($files) > 0) {
$file = $files[0]; // Might be more than one hit however we are only interested in the first one?
}
After getting the filename you can use pathinfo to get the specific extension.
$extension = pathinfo($file, PATHINFO_EXTENSION);

php better way to handle file extension [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to extract a file extension in PHP?
Get the file extension (basename?)
trying tot learn from other people´s code , I see a lot of methods to strip a filename from it´s extension, but most of the methods seems too localized as they assume a certain condition. for example :
This will assume only 3-character extension (like .txt, .jpg, .pdf)
substr($fileName, 0, -4);
or
substr($fileName, 0, strrpos($fileName, '.'));
But this can cause problems on file names like .jpeg, .tiff .html . or only 2 like .jsOr .pl
(browsing this list shows some file names can have only 1 character, and some as many as 10 (!) )
some other methods i have seen rely on the point (.)
for example :
return key(explode(“.”, $filename));
Can cause problems with filenames like 20121029.my.file.name.txt.jpg
same here :
return preg_replace('/\.[^.]*$/', '', $filename);
some people use the pathinfo($file) and / or basename() (is it ALWAYS safe ?? )
basename($filename);
and many many other methods ..
so my question has several parts :
what is the best way to "strip" a file extension ? (with the point)
what is the best way to "get" the file extension (without the point) and / or check it
will php own functions (basename) will recognize ALL extensions regardless of how exotic they might be or how the filename is constructed ?
what if any influence does the OS has on the matter ? (win, linux, unix...)
all those small sub-questions , which i would like to have an answer to can be summed-up in an overall single question :
Is there a bullet-proof , overall, always-work, fail-proof , best-practice , über_function that will work under all and any condition ??
EDIT I - another file extension list
Quoting from the duplicate question's top answer:
$ext = pathinfo($filename, PATHINFO_EXTENSION);
this is the best available way to go. It's provided by the operating system, and the best you can do. I know of no cases where it doesn't work.
One exception would be a file extension that contains a .. But no sane person would introduce a file extension like that, because it would break everywhere plus it would break the implicit convention.
for example in a file 20121021.my.file.name.txt.tar.gz - tar.gz would be the extention..
Nope, it's much simpler - and maybe that is the root of your worries. The extension of 20121021.my.file.name.txt.tar.gz is .gz. It is a gzipped .gz file for all intents and purposes. Only when you unzip it, it becomes a .tar file. Until then, the .tar in the file name is meaningless and serves only as information for the gunzip tool. There is no file extension named .tar.gz.
That said, detecting the file extension will not help you determine whether a file is actually of the type it claims. But I'm sure you know that, just putting this here for future readers.

shell_exec() statement to pdftotext entire directory?

I'm at a loss as to how I could build a loop to pdftotext and entire directory through a shell_exec() statement.
Something like :
$pdfs = glob("*.pdf");
foreach($pdfs as $pdfs) {
shell_exec('pdftotext '.$pdfs.' '.$pdfs'.txt');
}
But I'm unsure how I can drop the .pdf extension the 2nd time I call $pdfs in my shell_exec() statement and replace that with .txt
Not really sure this loop is correct either....
Try
foreach(glob("*.pdf") as $src) {
// Manually remove file extension because glob() may return a dir path component
$parts = explode('.', $src);
$parts[count($parts) - 1] = 'txt';
$dest = implode('.', $parts);
// Escape shell arguments, just in case
shell_exec('pdftotext '.escapeshellarg($src).' '.escapeshellarg($dest));
}
Basically, loop the PDF files in the directory and execute the command for each one, using just the name component of the file name (extracted with pathinfo())see edit for the output file (so test.pdf becomes test.txt).
Using the result of glob() directly in foreach easily avoids the variable naming collision you had in the code above.
EDIT
I have change the above code to manually remove the file extension when generating the output file name. This is because glob() may return a directory component of the path strings, as well as just a file name. Using pathinfo() or basename() will strip this off, and since we know that a . will be present in the file name (the rule passed to glob() dictates this) we can safely remove everything after the last one. I have also added escapeshellarg() for good measure - it is highly unlikely (if not impossible) that a file name that already exists would fall foul of this, but it is best to be safe.
$pdfs = glob("*.pdf");
$fmt='/path/to/pdftotext "%s" "%s.txt"';
foreach($pdfs as $thispdf) {
shell_exec(sprintf($fmt, $thispdf, basename($thispdf, ".pdf")));
}

File naming convention and allowed characters between different OS

I'm wring a piece of code in PHP for saving email attachments. Can i assume that this will never fail because different allowed characters between OS?
foreach($message->attachments as $a)
{
// Make dir if not exists
$dir = __DIR__ . "/saved/$uid"; // Message id
if (!file_exists($dir)) mkdir($dir) or die("Cannot create: $dir");
// Save the attachment using original!!! filename as found in email
$fp = fopen($dir . '/' . $a->filename, 'w+');
fwrite($fp, $a->data);
fclose($fp);
}
You should never use a name that you have no control over, it can contain all sorts of characters, like ../../...
You can use a function like basename to clean it up and a constant like DIRECTORY_SEPARATOR to separate directories.
Personally I would rename the file but you can also filter the variables before using them.
it is good practice to replace certain characters that may occur in filenames on windows.
Unix can handle almost any character in a file name (but not "/" and 0x00 [the null Character]), but to prevent encoding problems and difficulties on downloading a file I would suggest to replace anything that does not match
/[A-Za-Z0-9_-\.]/g, which satisfies the POSIX fully portable filename format.
so a preg_replace("/[^A-Za-Z0-9_-\.]/g","_",$filename); will do a good job.
a more generous approach would be to replace only |\?*<":>+[]\x00/ which leaves special language characters like öäü untouched and is compatible with FAT32, NTFS, any Unix and Mac OS X.
in that case use preg_replace("/[\|\\\?\*<\":>\+\[\]\/]\x00/g","_",$filename);
NO, you should assume that this will have a high probability of failing. for 2 reasons:
what if 2 emails have files named the same (selfie.jpg, for example)?
what if filename contains unacceptable characters?
you should use an internal naming convention (user+datetime+sequential, for example) and save names in a MySQL table with at least 3 fields:
Id - Autonumbered
filename - as saved by your php
original name - as in the original email
optional username - or usercode or email address of whomever sent email
optional datetime stamp
save the original filename as a VARCHAR and you will be able to keep track of original name and even show it, search for it, etc.

Categories