PHP SimpleXML and escaped strings - php

I have a script pulls some data out of a database and then sends it in xml format using SimpleXML. Clients that use the service load the xml using file_get_contents(url-to-script) and then use SimpleXML to find the relevant data and display it on a webpage.
The problem I am having is that the output on the webpage is always escaped. It doesn't matter how many times I run the strings I am getting back through stripslashes, it is still escaped.
Any ideas how to strip the slashes?
EDIT:
The data is put into the database through a 3rd party application, we never enter it programmatically. However, we extract from it in a number of other applications and it is not escaped.
We are not using addslashes on the data anywhere.
magic_quotes_gpc is on, all the other magic_quotes options are off

Please try and answer the below questions and update your question, it will help figure out where your problem is
How is the data being input into the database?
Are you using addslashes on it?
Is magic_quotes turned on on your server?
Do you have an example of the xml data Before entry into the database and when it comes out?
Can you post some of your code of where the xml data is entered into the database?
This is necessary information to help you without blindly guessing.

Related

PHP htmlspecialchars with MySQL - before or after storage?

I have always been told to "sanitize" input to a database and one of the ways to do this (as well as using prepared statements) is using htmlspecialchars() and htmlentities().
This stores quotes as " so printing the output of the database to a page "naked" has never been a problem for XSS attacks etc.
However, I have been asked to have part of my application export certain values as pure data in .csv format and now it's full of said HTML entities.
It seems that I have two options:
Decode all values before exporting the data and leave everything else the way it is.
Exclude "sanitation" before input to the database and make sure to sanitize on the output instead (except for data exports).
As much information as there is out there, I can't find the generally accepted way to do this - is it best to do this process on the way in or way out of the database? Obviously, doing both gives me silly values like &

Using POST information in mySQL queries

Simple question, probably takes a rather simple answer, but I haven't found anything online about it. I'm working on an Android app that needs to access a mySQL cloud database. To this end, I'm creating a PHP webservice to interface the two (as well as provide an extra layer of protection for the mySQl credentials). The issue is that I"m using a POST method to send variables to the queries to tell them what operations to do, such as:
$db->query("DROP TABLE {$_POST[table]}");
The issue is that when I try to send values for something like an INSERT statement, I need to use quotes around the strings. Unfortunately, the UTF-8 encoding turns all the quotes into escaped quotes. What should I do to fix this?
Use stripslash(%str) to remove the slashes. I don't know where the slashes are coming from, but putting this into code to remove the slashes works.

Best practices about parsing multi language feed

I'm having a problem parsing data from different feeds, some of them in English, others in Italian and others in Spanish. I'm parsing using a PHP script and saving the parsed data into my MySQL database.
The problem is that when I parse items that contains "non common" characters like: "Strage di Viareggio Più" when I look into my database the phrase is stored in this way: "Strage di Viareggio Più".
My database can use that kind character because when I input that manualy it works fine, in the original feed (rss file) the phrase is also fine, I think is my PHP server who is changing the letter. How can I solve this? Thanks!
Make sure that the database uses UTF-8 (as you say it does) and that the PHP script has its internal encoding set to UTF-8, which you can achieve with iconv_set_encoding. If you're reading data from an HTTP request that should be all you need, as long as the request tags its own encoding correctly.
Looks like input data is in UTF-8, but charset/collation of DB table - ASCII. I would suggest to have UTF-8 everywhere.
What you need to implement, before saving to MySQL is:
http://php.net/manual/en/function.htmlentities.php
Check these different threads for more information
Best practices in PHP and MySQL with international strings
htmlentities() makes Chinese characters unusable
What I find incredible is that this question has received -2 in the past 24 hours without any comments.
From the question posted:
I'm parsing using a PHP script and saving the parsed data into my MySQL database.
and
I think is my PHP server who is changing the letter. How can I solve this? Thanks!
The answers posted so far are related to the encoding and settings of MySQL. The person asking the question has clearly stated that he can insert special characters manually and is having no problems:
My database can use that kind character because when I input that manualy it works fine
My answer was to help him convert the characters into an html entity which will circumvent the problem he is having with the RSS feed and answering the question posted.

Does using the converting input from HTML forms into htmlentities protect attacks invoving JavaScript insertion?

I was wondering if converting POST input from an HTML form into html entities, (via the PHP function htmlentities() or using the FILTER_SANITIZE_SPECIAL_CHARS constant in tandem with the filter_input() PHP function ), will help defend against any attacks where a user attempts to insert any JavaScript code inside the form field or if there's any other PHP based function or tactic I should employ to create a safe HTML form experience?
Sorry for the loaded run-on sentence question but that's the best I could word it in a hurry.
Any responses would be greatly appreciated and thanks to all in advance.
racl101
It would turn the following:
<script>alert("Muhahaha");</script>
into
<script>alert("Muhahaha");</script>
So if you're printing out this data into HTML later, you would be protected. It wouldn't protect you from:
"; alert("Muhahaha");
just in case you were echoing into a script like so:
var t = "Hello there <?php echo $str;?>";
For this purpose, you should use addslashes() and a database string escaping method like mysql_real_escape_string().
yes, that is one way to sanitise. it has the benefit that you can always display the database contents without fear of xss attacks. however, a 'purer' approach is to store the raw data in the database and sanitise in the view - so every time you want to show the text, use htmlentities() on it.
however, your approach does not take into account sql injection attacks. you might want to look at http://php.net/manual/en/function.mysql-real-escape-string.php to guard against that.
Yes, do this when you want to display data to a webpage, but I recommend you don't store the HTML in the database as encoded, this may seem fine for large text fields, but when you have shorter titles, say a 32 character, a normal 30 character string that contains an & would become & and this would either cause a SQL error or the data to be cut off.
So the rule of thumb is, store everything row (obviously prevent SQL injection) and treat EVERYTHING as tainted, no matter where it comes from: the database, user forms, rss feeds, flat files, XML, etc. This is how you build good security without worrying about the data overflowing, or the fact you might oneday need to extract the data to a non web user where the HTML encoding is a problem.

Store html entities in database? Or convert when retrieved?

Quick question, is it a better idea to call htmlentities() (or htmlspecialchars()) before or after inserting data into the database?
Before: The new longer string will cause me to have to change the database to hold longer values in the field. (maxlength="800" could change to a 804 char string)
After: This will require a lot more server processing, and hundreds of calls to htmlspecialchars() could be made on every page load or AJAX load.
SOOO. Will converting when results are retrieved slow my code significantly? Should I change the DB?
I'd recommend storing the most raw form of the data in the database. That gives you the most flexibility when choosing how and where to output that data.
If you find that performance is a problem, you could cache the HTML-formatted version of this data somehow. Remember that premature optimization is a bad thing.
I have no experience of php but generally I always convert or escape nearest to output. You don't know when your output requirements will change, for example you may want to spit out data as XML, or JSON arrays and so escaping for HTML and then storing means you're limited to using the data as HTML alone.
In a php/MySQL web app, data flows in two ways
Database -> scripting language (php) -> HTML output -> browser ->screen
and
Keyboard-> browser-> $_POST -> php -> SQL statement -> database.
Data is defined as everything provided by the user.
ALWAYS ALWAYS ALWAYS....
A) process data through mysql_real_escape_string as you move it into an SQL statement, and
B) process data through htmlspecialchars as you move it into the HTML output.
This will protect you from sql injection attacks, and enable html characters and entities to display properly (unless you manage to forget one place, and then you have opened up a security hole).
Did I mention that this has to be done for every single piece of data any user could ever have touched, altered or provided via a script?
p.s. For performance reasons, use UTF-8 encoding everywhere.
It's best to store text as raw and encode it as needed, to be honest, you always need to htmlencode your data anyways when you're outputting it to the wbe page to prevent XSS hacking.
You shouldn't encode your data before you put it in the database. The main reason are:
If such data is near the column size limit, say 32 chars, if the title was "Steve & Fred blah blah" then you might go over that column limit because a 1 char & becomes a 5 char & amp;
You are assuming the data will always be displayed in a web page, in the future you never know where you'll be looking at the data and you might not want it encoded, now you have to decode it and it's possible you might not have access to PHP's decode function
It is the way of the craftsman to "measure twice, optimize once".
If you don't need high performance for your website, store it as raw data and when you output it do what you want.
If you need performance then consider storing it twice: raw data to do what you want with it and another field with the filtered data. It could be seen as redundant, but CPU is expensive, while data storage is really cheap.
The easiest way is store the data "as is" and then convert to htmlentities wherever it is needed.
The safest solution is to filter the data before it goes in into the Database as this prevents possible attacks on your server and database from the lack of security implementation, and then convert it however you need when needed. Also if you are using PDO this will happen automatically for you using prepared statements.
http://php.net/PDO
We had this debate at work recently. We decided to store the escaped values in the database, because before (when we were storing it unescaped) there were corner cases where data was being displayed without being escaped. This can lead to XSS. So we decided to store it escaped to be safe, and if you want it unescaped you have to do the work yourself.
Edit: So to everyone who disagrees, let me add some backstory for my case. Let's say you're working in a team of 50+ people... and data from the database is not guaranteed to be HTML-Encoded on the way out - there's no built-in mechanism for it so the developer has to write the code to do it. And this data is shown all over the place so it's not going through 1 developer's code it's going through 30's - most of whom have no clue about this data (or that it could even contain angle brackets which is rare) and merely want to get it shown on the page, move on, and forget about it.
Do you still think it's better to put the data, in HTML, into the database and rely on random people who are not-you to do things properly? Because frankly, while it certainly may not seem warm-fuzzy-best-practicey, I prefer to fail closed (meaning when the data comes through in a Word Doc it looks like Value<Stock rather than Value<Stock) rather than open (so the Word Doc looks right with no work, but some corner of the platform may/likely-is vulnerable to XSS). You can't have both.

Categories