How Can I get NSStrings containing any character into my database?

How Can I get NSStrings containing any character into my database? - php

I let users write and then post what they have written to my MYSQL database, using PHP. I have been sending the strings as URLs and then $_GET['string'] in the php and then putting them in the database. I always have to take care of the spaces in the string by replacing them with %20. And then I had to replace all kinds of different characters on top of that in order for the URLs to work. This is a losing battle and the users expect their strings to be saved but if they contain a character I have not thought of, this will not be the case. I have even tried sending along the strings as NSData in a POST but that did not seem to save the strings either.
How can I be sure the users' strings will save, no matter what crazy characters they type?
Thanks,
R

Encode your data using NSUTF8StringEncoding before sending it to the server, and always use POST to send data to the server instead of GET. Also, it's a good idea to stop using ASCII altogether and to replace it with Unicode wherever you use strings. UTF-8 is a very convenient and compact Unicode encoding.

Related

real_escape_string not cleaning up entered text

I thought the proper way to "sanitize" incoming data from an HTML form before entering it into a mySQL database was to use real_escape_string on it in the PHP script, like this:
$newsStoryHeadline = $_POST['newsStoryHeadline'];
$newsStoryHeadline = $mysqli->real_escape_string($newsStoryHeadline);
$storyDate = $_POST['storyDate'];
$storyDate = $mysqli->real_escape_string($storyDate);
$storySource = $_POST['storySource'];
$storySource = $mysqli->real_escape_string($storySource);
// etc.
And once that's done you could just insert the data to the DB like this:
$mysqli->query("INSERT INTO NewsStoriesTable (Headline, Date, DateAdded, Source, StoryCopy) VALUES ('".$newsStoryHeadline."', '".$storyDate."', '".$dateAdded."', '".$storySource."', '".$storyText."')");
So I thought doing this would take care of cleaning up all the invisible "junk" characters that may be coming in with your submitted text.
However, I just pasted some text I copied from a web-page into my HTML form, clicked "submit" - which ran the above script and inserted that text into my DB - but when I read that text back from the DB, I discovered that this piece of text did still have junk characters in it, such as â€“.
And those junk characters of course caused the PHP script I wrote that retrieves the information from the DB to crash.
So what am I doing wrong?
Is using real_escape_string not the way to go here? Or should I be using it in conjunction with something else?
OR, is there something I should be doing (like more escaping) when reading reading data back out from the the mySQL database?
(I should mention that I'm an Objective-C developer, not a PHP/mySQL developer, but I've unfortunately been given this task to do some DB stuff - hence my question...)
thanks!

Your assumption is wrong. mysqli_real_escape_string’s only intention is to escape certain characters so that the resulting string can be safely used in a MySQL string literal. That’s it, nothing more, nothing less.
The result should be that exactly the passed data is retained, including ‘junk’. If you don’t want that ‘junk’ in your database, you need to detect, validate, or filter it before passing to to MySQL.
In your case, the ‘junk’ seems to be due to different character encodings: You input data seems to be encoded with UTF-8 while it’s later displayed using Windows-1250. In this scenario, the character – (U+2013) would be encoded with 0xE28093 in UTF-8 which would represent the three characters â, €, and “ in Windows-1250. Properly declaring the document’s encoding would probably fix this.

Sanitization is a tricky subject, because it never means the same thing depending on the context. :)
real_escape_string just makes sure your data can be included in a request (inside quotes, of course) without having the possibility to change the "meaning" of the request.
The manual page explains what the function really does: it escapes nul characters, line feeds, carriage returns, simple quotes, double quotes, and "Control-Z" (probably the SUBSTITUTE character). So it just inserts a backslash before those characters.
That's it. It "sanitizes" the string so it can be passed unchanged in a request. But it doesn't sanitize it under any other point of view: users can still pass for instance HTML markers, or "strange" characters. You need to make rules depending on what your output format is (most of the time HTML, but HTTP isn't restricted to HTML documents), and what you want to let your users do.
If your code can't handle some characters, or if they have a special meaning in the output format, or if they cause your output to appear "corrupted" in some way, you need to escape or remove them yourself.
You will probably be interested in htmlspecialchars. Control characters generally aren't a problem with HTML. If your output encoding is the same as your input encoding, they won't be displayed and thus won't be an issue for your users (well, maybe for the W3C validator). If you think it is, make your own function to check and remove them.

Store special character in mysql database that can be read by JavaScript and HTML

I'm storing data in a MySQL database that may have some special characters. I'm wondering how to store it so that these characters are preserved if they're either output to HTML via PHP OR via JavaScript, e.g. createTextNode.
For example, the division symbol (÷) has the html code ÷, and when I store it as that it shows up fine when put directly into HTML by PHP, but when I pull it into JavaScript using $.getJSON and then insert it with createTextNode it shows up looking like ÷.
I also tried storing the symbol in the SQL directly, but my understanding is that the column would need to be changed from VARCHAR to NVARCHAR and that would cause a performance hit that doesn't seem necessary.
Given that I can modify the SQL, the PHP, or the JavaScript, is there an easy fix here? Maybe a way to unescape the HTML entity in JavaScript?

As answered by Yogesh, you should switch your collation of the DB to utf8_general_ci

So there's probably two things going on:
JSON escapes special characters.
Somewhere, something in your code flow is URL encoding the strings too.
So you just need to decode the string in your JavaScript, or you need to find what part of your code is URL encoding those strings and fix it.

Do I need to use HTML entities when storing data in the database?

I need to store special characters and symbols into mysql database. So either I can store it as it is like 'ü' or convert it to html code such as 'ü'
I am not sure which would be better.
Also I am having symbols like '♥', '„' .
Please suggest which one is better? Also suggest if there is any alternative method.
Thanks.

HTML entities have been introduced years ago to transport character information over the wire when transportation was not binary safe and for the case that the user-agent (browser) did not support the charset encoding of the transport-layer or server.
As a HTML entity contains only very basic characters (&, ;, a-z and 0-9) and those characters have the same binary encoding in most character sets, this is and was very safe from those side-effects.
However when you store something in the database, you don't have these issues because you're normally in control and you know what and how you can store text into the database.
For example, if you allow Unicode for text inside the database, you can store all characters, none is actually special. Note that you need to know your database here, there are some technical details you can run into. Like you don't know the charset encoding for your database connection so you can't exactly tell your database which text you want to store in there. But generally, you just store the text and retrieve it later. Nothing special to deal with.
In fact there are downsides when you use HTML entities instead of the plain character:
HTML entities consume more space: ü is much larger than ü in LATIN-1, UTF-8, UTF-16 or UTF-32.
HTML entities need further processing. They need to be created, and when read, they need to be parsed. Imagine you need to search for a specific text in your database, or any other action would need additional handling. That's just overhead.
The real fun starts when you mix both concepts. You come to a place you really don't want to go into. So just don't do it because you ain't gonna need it.

Leave your data raw in the database. Don't use HTML entities for these until you need them for HTML. You never know when you may want to use your data elsewhere, not on a web page.

My suggestion would mirror the other contributors, don't convert the special entities when saving them to your database.
Some reasons against conversion:
K.I.S.S principle (my biggest reason not to do it)
most entities will end up consuming more space then prior to being converted
loose the ability to search for the entities ü in a word, would be [word]+ü+[/word], and you would have to do a string comparison of the html equivalent of ü => [word]+ü+[/word].
your ouput may change from HTML to say an API for mobile, etc which makes conversion very unnecessary.
need to convert on input of data, and on output (again if your output changes from plain HTML to something else).

encoding/decoding textarea value using rawurlencode/rawurldecode

When encoding newline of textarea before storing into mysql using PHP with rawurlencode function encodes newline as %0D%0A.
For Example:
textarea text entered by user:
a
b
encoding using rawurlencode and store into database will store value as a%0D%0Ab
When retrieving from database and decoding using rawurldecode does not work and code gives error. How to overcome this situation and what is the best way to store and retrieve and display textarea values.

can you first encode this textarea string using base64_encode and then perform a base64_decode on the same, if the above does not work for you.
If the textarea does not contain URLs, you should rather use base64_encode then rawurlencode and then store as normal.

You simply should not use rawurlencode for escaping data for your database.
Each target format has it's own escaping method which in general terms makes sure it is stored/display/transferred safely from one place to another, and it doesn't need decoding at the other end.
For instance:
displaying text in HTML, use htmlentities or htmlspecialchars
storing in database, use mysqli_real_escape_string, pg_escape_string, etc...
transferring variablename, use urlencode
transferring variablecontent, use rawurlencode
etc...
You should notice that decoding these things is often done by the browser/database. So no data is actually stored escaped. And decoding doesn't need te be done by your code.
The problem is probably because you escape a sequence with rawurlencode, but your database expected the escaped format for the specific brand of database. And de-escaped it using that assumption, which was wrong, which messed up your string.
Conclusion: find out what brand database you are using, look up the specific escape function for that database, and use the proper escaping function on all your content "transferral".
P.S.: some definition may not be correct, please comment on that. I wanted to make the idea stick but am probably not using all the right terms.

First of all it is very uncommon to run textarea through urlencode()
urlencode was not designed for this purpose.
Second, if you still want to do this, then maybe the problem comes from database. First you need to tell us what database you using and what TYPE you using for storing this data: do you store it as TEXT or as BINARY data? Have you setup the correct charset in database?

HTML Purifier selectively eating special characters

Using PHP against a UTF-8 compliant database. Here's how input goes in.
user types input into textarea
textarea encoded with javascript escape()
passed via HTTP post
decoded with PHP rawurldecode()
passed through HTMLPurifier with default settings
escaped for MySQL and stored in database
And it comes out in the usual way and I run unescape() on page load. This is to allow people to, say, copy and paste directly from a word document and have the smart quotes show up.
But HTMLPurifier seems to be clobbering non-UTF-8 special characters, ones that escape() to a simple % expression, like Ö, which escapes to %D6, whereas smartquotes escape to %u2024 or something and go into the database that way. It takes out both the special character and the one immediately following.
I need to change something in this process. Perhaps I need to change multiple things.
What can I do to not get special characters clobbered?

textarea encoded with javascript escape()
escape isn't safe for non-ascii. Use escapeURIComponent
passed via HTTP post
I assume that you use XmlHttpRequest? If not, make sure that the page containing the form is served as utf-8.
decoded with PHP rawurldecode()
If you access the value through $_POST, you should not decode it, since that has already been done. Doing so will mess up data.
escaped for MySQL and stored in database
Make sure you don't have magic quotes turned on. Make sure that the database stores tables as utf-8 (The encoding and the collation must be both utf-8). Make sure that the connection between php and MySql is utf-8 (Use set names utf8, if you don't use PDO).
Finally, make sure that the page is served as utf-8 when you output the string again.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.