PHP function to securely encode characters for multiple purposes [duplicate] - php

This question already has answers here:
Enclosing the string with double quotes
(3 answers)
Closed 8 years ago.
I am trying to build a function (unless there is already one, I was not able to find one) that satisfies:
being saved in a MySQL database → mysqli_real_escape_string
being saved in a serialized array in a MySQL database (I had issues when unserialize failed)
as for output:
doesn't interfer with HTML → utf8_encode(htmlentities($source, ENT_QUOTES | ENT_HTML401, 'UTF-8'));
doesn't interfer with it being a query in an URL, thus encoding the '&','%'
Please give me any advice if there is an idea on how to improve secure encoding.
And I am not sure about the functions give, whether they are the best to be used.
I also had issues with non-printable characters and tried
PHP: How to remove all non printable characters in a string?
$s = preg_replace('/[\x00-\x08\x0B\x0C\x0E-\x1F\x80-\x9F]/u', '', $s);
EDIT
Because of the diversity of this question, I want to substantiate the question on how to clean a string that is an element of an array that is put with serialize() in a database ´?
For instance, I had a failure when trying to unserialize after having put a string containing a newline (\n or \r) into an string element of an array that has been serialized successfully...
EDIT_2
The reason for why I have tried to issue encoding HTML entities before saving them into the DB using mysqli_real_escape_string() is that when recalling/loading this object from the DB, the data has changed. For example a user wants to put the string test'test into the database that is encoded by mysqli_real_escape_string() to test\'test and then when loaded from the DB it's still test\'test whcih is NOT what the user wants to have neither what he has sent . Please if you could find a solution for this -- mine was to apply sth. like where mysqli_real_escape_string() had no effect as the quotes have already been HTML encoded.

From the top of my head, I feel you should try json_encode and json_decode

Related

php filter a string so that json_encode does not error out

I'm grabbing a bunch of data from a database and putting it into a PHP array. I'm then looking to json_encode that array using $output = json_encode($out).
My issue is that from time to time, something in the array is not able to be read by json_encode and the whole thing fails. If I use print_r($out) to have a look, I can clearly see where it's failing, because the character that is screwing things up always appears as a question mark inside of a black diamond �.
First - what are these characters?
Second - Is there a function I can pass the elements through prior to adding them to the array that would strip these out, or replace 'them' with blanks?
I found the answer to this. Since the data coming FROM the database was stored with the "black diamond" character, I needed to get this out POST grabbing it from the database.
$x[4] = utf8_encode(odbc_result($query, 'B'));
By passing the result through utf8_encode, the string is encoded into UTF-8 and the illegal character is removed.
Say echo json_encode($out);
This will solve your issue
Black diamonds are browser issue. Database uses plain question marks.
It seems you are getting already wrong data from databalse. But that's quite tricky to have incorrect utf with your settings. You need to check everything
if your table marked with utf8 charset
if your data indeed encoded in utf (not marked but indeed encoded)
if your server sending correct charset in Content-type header.
it is also useful to see the page choosing different charsets from your browser menu.
But first of all you have to wipe any trace of all random actions you tried, all these various encode, decode and stuff. Just plain and direct output from database. Otherwise you will never get to the problem

htmlspecialchars() x htmlentities() [duplicate]

This question already has answers here:
htmlentities() vs. htmlspecialchars()
(12 answers)
Closed 8 years ago.
I have read their documentation, but I still don't get when to use each of them and their difference.
Let's consider the situation of having a general string in a variable and needing to echo it inside HTML code. If it has any HTML markup in it, I want it converted to HTML code (< replaced by <, & replaced by &. If it has UTF special chars that aren't available in HTML code, it's replaced by HTML number (• replaced by •).
What's the best function for that?
A harder need: unprintable chars, like \n, char(10), char(13), etc, be replaced by their number code, in the case the string is printed inside <pre> or any special textarea so that the string be dumped.
htmlentities is a workaround for not having set the character type of the document properly. htmlspecialchars is the correct function to use for merely writing text into an HTML document.
As to your second question, I think you're looking for addcslashes.

JSON Encode and curly quotes

I've run into an interesting behavior in the native PHP 5 implementation of json_encode(). Apparently when serializing an object to a json string, the encoder will null out any properties that are strings containing "curly" quotes, the kind that would potentially be copy-pasted out of MS Word documents with the auto conversion enabled.
Is this an expected behavior of the function? What can I do to force these kinds of characters to covert to their basic equivalents? I've checked for character encoding mismatches between the database returning the data and the administration page the inserts it and everything is setup correctly - it definitely seems like the encoder just refuses these values because of these characters. Has anyone else encountered this behavior?
EDIT:
To clarify;
MSWord will take standard quotation marks and apostraphes and convert them to more aesthetic "fancy" or "curly" quotes. These characters can cause problems when placed in content managers that have charset mistmatches between their editing interface (in the html) and the database encoding.
That's not the problem here, though. For example, I have a json_object representing a person's profile and the string:
Jim O’Shea
The UTF code for that apostraphe being \u2019
Will come out null in the json object when fetched from database and directly json_encoded.
{"model_name":"Bio","logged":true,"BioID":"17","Name":null,"Body":"Profile stuff!","Image":"","Timestamp":"2011-09-23 11:15:24","CategoryID":"1"}
Never had this specific problem (i.e. with json_encode()) but a simple - albeit a bit ugly - solution I have used in other places is to loop through your data and pass it through this function I got from somewhere (will credit it when I find out where I got it):
function convert_fancy_quotes ($str) {
return str_replace(array(chr(145),chr(146),chr(147),chr(148),chr(151)),array("'","'",'"','"','-'),$str);
}
json_encode has the nasty habit of silently dropping strings that it finds invalid (i.e. non-UTF8) characters in. (See here for background: How to keep json_encode() from dropping strings with invalid characters)
My guess is the curly quotes are in the wrong character set, or get converted along the way. For example, it could be that your database connection is ISO-8859-1 encoded.
Can you clarify where the data comes from in what format?
If I ever need to do that, I first copy the text into Notepad and then copy it from there. Notepad forces it to be normal quotes. Never had to do it through code though...

How do I make the ö character appear properly in an XML file created via PHP? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
php: using DomDocument whenever I try to write UTF-8 it writes the hexadecimal notation of it.
Here is a snippet of code:
echo "<char>".$row{'char'}."</char>";
row{'char'} is pulling back the german character ö from the database. In PHP, how can I convert that to the correct encoding to be used properly in XML?
Is there a PHP function that can convert everything as needed to the correct format for XML? Or do I need to do it character by character, like so?
echo "<format>".str_replace("&", "&", $row{'format'})."</format>";
Thanks for the help!
Without knowing what encoding you have in your database and what encoding you want in your XML output it's hard to be specific, but the iconv function could be useful to do the conversion.
Also. you should really consider using an XML DOM instead of outputting xml-as-plaintext with echo. Check out for example Reading and writing the XML DOM with PHP
. If you don't, you will most likely end up with other strange problems with your xml output down the road.
Trust me, I've been there. :-)
Passing the data pulled from the database through htmlentities() should do that. It changes "ö" to "ö".
echo "<char>".htmlentities($row{'char'})."</char>";
The PHP Manual

php functions binary safe? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
In PHP what does it mean by a function being binary-safe ?
What exacly does it mean a function (example: dirname) is binary safe?
It means two things. First the function works on strings that contain \0 the NUL byte. This is not a given, because functions are often implemented in C which would treat that as string terminator. PHP however uses length-denominated strings.
Second, in some contexts it means that a particular string function ignores the character set and does not try to interpret UTF-8 sequences. For raw binary data the UTF-8 sequencing would be wrong, thus making functions fail if they try to treat it as text.
It means that the data will not be interpreted as text.
It means that binary data can pass through the function, and it won't be treated as text. Sometimes if you have string functions and you try to use them for raw binary data (such as a string replace function in other languages), they will garble your data.
Perhaps a better description at http://en.wikipedia.org/wiki/Binary-safe

Categories