im' trying to make a file based article system and i want the folders to be my categories and files inside it to be the actual articles. But sometimes i need some special characters in my folder/file name(\/:*?",and actually i'm interested just in double quotes and question mark). Is there a way to do the trick...something like & is in html or something like this. thanks
Short answer: Your operating system could support such file names, but it doesn't seem to.
There isn't a simple way for you to do something easy like & for this. You could store the real filename, or make a conversion table such that something like _____questionmark_____ converted into that symbol or something silly like that, but then you run into problems with that particular string.
Fundamentally though, you should store the title separately from the file itself. A database would be an appropriate location.
At a deeper level, if you're asking a question like this, I think it's safe to say that allowing users to specify filenames on your system is likely to be a large security risk.
There are only few special characters which aren't allowed in file names. so keep your own insanitized sequence for those characters. For instance, replace all '?' with '#quest' before creating files and so. Do the reverse when you read them, aint this good? Insanitized means some combination of characters that we don't type usually like '#quest'.
I would recommend using a .htaccess file to pass the "filename" as an argument to your PHP script. Subsequently have the script look up the article in a database lookup table that points to the article file.
Related
I have a site where users can change their location.
I have all of the available countries stored in a DB and an image for each of these in a folder in the same directory.
However, some countries have special characters and don't display properly or else can't find the image.
The countries in question are:
Côte d'Ivoire
Česká republika
I tried url encoding them so it was like this: %C4%8Cesk%C3%A1+republika
I need a way to store these in the DB in such a way as that they display the name correctly on the site and find the image of the same name.
First of all, see UTF-8 all the way through for all the things you need to do correctly to make non-ASCII characters work in your app in general.
Secondly, it's… tricky… to serve files with non-ASCII file names over the web. 1) You need to ensure that you encode all URLs for these files with percent encoding, as you already seem to do. 2) The web server will take that URL, percent-decode it to a byte string, and then ask the underlying operating/file system to look for a file with a name with that string. This is the tricky part: you won't know exactly what byte string your OS/file system uses to represent that file exactly. You would need to figure that out first, then encode the URL specifically so it will decode exactly to the correct string.
And when you move to a different server, especially if you're moving from Windows to *NIX or vice versa, you can do that all over again since those system do things very differently.
In a nutshell, it's often more hassle than it's worth, and you should store your images with ASCII-only names to avoid all that. Specifically for countries, it'd make a whole lot of sense to use the two-character country codes for the image name (e.g. "cz.jpg").
I can't seem to find a reference. I am assuming the PHP function file_exists uses system calls on linux and that these are safe for any string that does not contain a \0 character, but I would like to be sure.
Does anyone have (preferably non-anecdotal) information regarding this? Is is vulnerable to injection if I don't check the strings first?
I guess you need to, because the user may enter something like :
../../../somewhere_else/some_file and access a file that he is not allowed to access .
I suggest that you generate the absolute path of the file independently in your php code and just get the file name from user by basename()
or exclude any input containing ../ like :
$escaped_input = str_replace("../","",$input);
It depends on what you're trying to protect against.
file_exists doesn't do any writing to disk, which means that the worst that can happen is that someone gains some information about your file system or the existence of files that you have.
In practice however, if you're doing something later on with the same file that was previously checked with file_exists, such as includeing it, you may wish to perform more stringent checks.
I'm assuming that you may be passing arbitrary values, possibly sourced from user input, into this function.
If that is the case, it somewhat depends on why you actually need to use file_exists in the first place. In general, for any filesystem function that the user can pass values directly into, I'd try to filter out the string as much as possible. This is really just being pedantic and on the safe side, and may be unnecessary in practice.
So, for example, if you only ever need to check the existence of a file in a single directory, you should probably strip out directory delimiters of all sorts.
From personal experience, I've only ever passed user input into a file_exists call for mapping to a controller file, in which case, I'd just strip out any non-alphanumeric + underscore character.
UPDATE: reading your comments recently added, no there aren't special characters as this isn't executed in a shell. Even \0 should be fine, at least on newer PHP versions (I believe older ones would cut the string before the \0 when sent to underlying filesystem calls).
I'm stuck on a crazy project that has me looking for a strange solution. I've got a XFA PDF document generated by an outside party. There's are several checkmark characters '✓' on the PDF's that I need to simply change to 'X'. The reason for this is beyond my control. I'm just looking for a way to change the ✓'s into X's. Can anyone point me in the right direction? Is it possible?
Currently we use PHP and TCPDF for creating "our" server PDF's, but this particular PDF is generated outside of my control by a third party that doesn't want to alter their way of doing things. To make things worse, I don't know how many or where the checkmarks may exist. It's just one very specific character that is in need of changing. Does any know a way of hacking the document to change the character?
Character 2713
http://www.fileformat.info/info/unicode/char/2713/index.htm
Yes, I think you can. To my (rather limited) knowledge of the PDF format, you can only reliably search and replace strings of one character in length, since they are created by placing strings of variable length at specific co-ordinates, in an arbitrary order. The string 'hello' could therefore be one string of five letters, or five strings of one letter each or some combination thereof, all placed in the correct position (and in whatever order the print driver decided upon).
I'm afraid I don't know of any libraries that will do this, but I'd be surprised if they don't exist. You'll need to read PDF objects in, do the replacement, and write them out to a new file. I'd start off researching around the answers to this question.
Edit: this looks like it might be useful.
I have a PHP page that accepts input from a form post, but instead of directing that input to a database it is being used to retrieve a file from the file system. What is a good method for escaping a string destined for the file system rather then a database? Is mysql_real_escape_string() appropriate?
If you're using user-provided input to specify a filename directory, you'll have to make sure that the provided filename/path isn't trying to break "out" of your site's playground.
e.g. having something like
readfile($_GET['filepath']);
will send out ANYTHING on your server that the attack knows the path for. Even something like
readfile('/path/to/your/site/download/' . $_GET['filepath']);
accomplishes the same, if the user specifies enough '../../../' to get to whatever file they want.
mysql_real_escape_string() is NOT appropriate for this, as you're not doing a database operation. Use appropriate tools for appropriate jobs. In a goofy way, m_r_e_s() is a banana, and you need a giraffe. Something like
readfile('/path/to/your/site/download/' . basename($_GET['filepath']));
would be relatively save as basename() will extract only the filename portion of the user-provided file, so even if they pass in ../../../../../etc/passwd, basename will return only passwd.
You always only need to escape characters that are otherwise interpreted by your target system. For databases you usually make sure to escape quotes so you use mysql_real_escape_string or others. If your target is html, you usually use htmlspecialchars to make sure you get rid of html special characters (namely <, > and &). If your target is CSV, you basically only need to make sure line breaks and the CSV separator are escaped.
So depending on your target you can either reuse an existing escape function, define your own, or even go without one. If all you do is dump the input in a single file, then there is not much you need to take care of, as long as you specify the filename and that file is never used (or interpreted) by anything else than your application.
So think of what kind of special characters your target format requires for it to work, and simply escape those. You can usually ignore the rest.
edit:
If you want to use the input as the file path or file name, you can simply decide yourself how gracious you are, and what characters you want to support. A simple method would be to replace everything except latin characters and numbers (and maybe some special characters like _ and -) by something else. For example:
preg_replace( '/[^A-Za-z0-9_-]/', '_', $text );
is there any way we can check if a php file has been obfuscated, using php? I was thinking regex maybe (for instance ioncube's encoded file contains a very long alphabet string, etc.
One idea is to check for whitespace. The first thing that an obfuscator will do is to remove extra whitespace. Another thing you can look for is the number of characters per line, as obfuscators will put all the code into few (one?) lines.
Often, obsfuscators initialize very large arrays to translate variables into less meaningful names (eg. see obsfucator article
One technique may be to search for these super-large arrays, close to the top of the class/file etc. You may be able to hook xdebug up to examine/look for these. The whole thing of course depends on the obsfuscation technique used. Check the source code, there may be patterns they've used that you can search on.
I think you can use token_get_all() to parse the file - then compute some statistics. For example check for number of function calls(in calse obfuscator uses some eval() string and nothing else) and calculate average function length - for obfuscators it will usually be about 3-5 chars, for normal PHP code it should be much bigger. You can also use dictionary lookup for function/variable names, check for comments etc. I think if you know all obfuscator formats that you want to detect - it will be easy.