Decoding html entities from database - php

I'm having some problems displaying strings with html entity on the web browser.
Currently i'm working on one system where string (e.g. file name) containing some html spec chars like single quotes could be stored to database encoded as html entity.
When i fetch that record from database and want to display that "file name" on page it shows me the same text.
For example:
filename in database: exampleFileName'
i want to display on page: exampleFileName'
I know that there is php functions such as html_entity_decode() and etc.
But i think it would be stupid to use it everywhere where i'm passing this string to html view.
Application is created using Zend Framework 2 and it's using Doctrine2 ORM.
Does doctrine can't handle this ? I mean is it possible somehow to get that string from database already decoded as normal text?
Or maybe it's something wrong with database default collation (utf8_swedish_ci)?
Finally,
what are the best practices for storing such kind of strings (with single quotes) into database ?
I would be really grateful for some answers.
Thank you.

Related

Store special character in mysql database that can be read by JavaScript and HTML

I'm storing data in a MySQL database that may have some special characters. I'm wondering how to store it so that these characters are preserved if they're either output to HTML via PHP OR via JavaScript, e.g. createTextNode.
For example, the division symbol (÷) has the html code ÷, and when I store it as that it shows up fine when put directly into HTML by PHP, but when I pull it into JavaScript using $.getJSON and then insert it with createTextNode it shows up looking like ÷.
I also tried storing the symbol in the SQL directly, but my understanding is that the column would need to be changed from VARCHAR to NVARCHAR and that would cause a performance hit that doesn't seem necessary.
Given that I can modify the SQL, the PHP, or the JavaScript, is there an easy fix here? Maybe a way to unescape the HTML entity in JavaScript?
As answered by Yogesh, you should switch your collation of the DB to utf8_general_ci
So there's probably two things going on:
JSON escapes special characters.
Somewhere, something in your code flow is URL encoding the strings too.
So you just need to decode the string in your JavaScript, or you need to find what part of your code is URL encoding those strings and fix it.

Zend Db Results escaped and html url encoded

When I use Zend DB (PDO Mysql Adapter), I'm getting back results that are not only escaped, but also HTML url-encoded.
I'm inserting the rows into the database as is, not escaped or html encoded. I'm curious to know:
How I can get back results that aren't escaped and html encoded,
If I should be doing something to treat the data before inserts,
And if it isn't possible to get back results that aren't escaped and html url encoded, how to do it myself.
I'd like to know if retrieving the results as escaped and html encoded is actually the proper way to do things?
Actually I think you just wonder why the strings are encoded.
AFAIK ZF does not does how you describe is does. So the "error" must be somewhere in your data-processing.
Additional Note: To improve your questions in the future, I would ask more straight forward and provide some code what you actually do otherwise your question is much of an invitation to guess around. With a concrete example it often works best.

Database contains #39 instead of #039

-- Sytem is MySQL, PHP, Apache and the code is built around the Codeigniter Framework
EDIT FOR CLARITY: I am not storing data, I am trying to retrieve data that was stored some years ago (badly as escaped data). In the database the name Fred' is stored as Fred&#39 yet when I convert Fred' using htmlspecialcahrs it comes out as Fred&#039. My question is what do I need to do to make Fred' convert to Fred&#39 and any other equivalents?
Original Question
I've inherited a database from another system (Invision Power Board to be exact). The site is now custom coded using Codeigniter but is using the same member database from the old Invision Power Board site.
I've now discovered a problem where by if a user has an apostrophe in their name e.g. "Fred'" codeigniter's built in html_escape function (which just uses htmlspecialchars) converts it to Fred&#039
Yet in the database the name is saved as: Fred&#39 and thus the lookup fails.
I'm not sure what Invision Power Board was doing to the string before inserting it into the database, but does anyone have any idea how I could ensure that it is converted to &#39 instead of &#039 ?
Simply saying do a str_replace or change the data in the db is not useful as there are hundreds of possibilities for what could be in a users name. A quick search for users with a # in their name (presumably a special char) shows up 440 users who are currently unable to login due to this bug in our site.
EDIT: Fixed some formatting to remove ";" so it doesn't just display an apostrophe
You can use preg_replace() to remove 0's from php generated string before comparison:
$string = 'Fred&#039';
$string = preg_replace('/&#0+([1-9]+)/', '&#$1', $string);
var_dump(str_split($string));
// str_split to show real result

Best practices about parsing multi language feed

I'm having a problem parsing data from different feeds, some of them in English, others in Italian and others in Spanish. I'm parsing using a PHP script and saving the parsed data into my MySQL database.
The problem is that when I parse items that contains "non common" characters like: "Strage di Viareggio Più" when I look into my database the phrase is stored in this way: "Strage di Viareggio Più".
My database can use that kind character because when I input that manualy it works fine, in the original feed (rss file) the phrase is also fine, I think is my PHP server who is changing the letter. How can I solve this? Thanks!
Make sure that the database uses UTF-8 (as you say it does) and that the PHP script has its internal encoding set to UTF-8, which you can achieve with iconv_set_encoding. If you're reading data from an HTTP request that should be all you need, as long as the request tags its own encoding correctly.
Looks like input data is in UTF-8, but charset/collation of DB table - ASCII. I would suggest to have UTF-8 everywhere.
What you need to implement, before saving to MySQL is:
http://php.net/manual/en/function.htmlentities.php
Check these different threads for more information
Best practices in PHP and MySQL with international strings
htmlentities() makes Chinese characters unusable
What I find incredible is that this question has received -2 in the past 24 hours without any comments.
From the question posted:
I'm parsing using a PHP script and saving the parsed data into my MySQL database.
and
I think is my PHP server who is changing the letter. How can I solve this? Thanks!
The answers posted so far are related to the encoding and settings of MySQL. The person asking the question has clearly stated that he can insert special characters manually and is having no problems:
My database can use that kind character because when I input that manualy it works fine
My answer was to help him convert the characters into an html entity which will circumvent the problem he is having with the RSS feed and answering the question posted.

What's the best practice method for storing raw php, javascript, html or similar in a mysql database?

The example web page has 2 fields and allows a user to enter a title and code. Both fields would later be embed and displayed in an HTML page for viewing and/or editing but not execution. In other words, any PHP or javascript or similar should not run but be displayed for editing and copying.
In this case, what is the best way to escape these fields before database insertion and after (for HTML display)
You need to use the function htmlspecialchars() in php
that will change any special characters (eg < and >) into their special HTML encoded characters (eg &lt and &gt). When you get these from the database and output them as HTML they will display as code, but won't harm your script or execute.
I faced with the same problem a few days back, to put the codes (javascript or PHP ) in the html in a non executable way, I used textarea, it solved the purpose.
The problem however, was with the database. I cannot use the typical escape functions with the data, as it is affecting my data, for example the tags are getting messed up.
To solve this problem, I encoded the data in base 64 format before putting it in the database. So what is happening is my JavaScript code is encoded and the resultant code is no longer a Javascript code and I can use the escape functions on this and store it in the database.
I am open to suggestions, feel free to comment.

Categories