I need to serialize array which contains URLs:
Array(
'url1' => 'http://www.example.com',
'url2' => 'http://www.example1.com'
)
and store it in DB.
When I serialize it standard way, it doesn't work as it contains special chars. I found solution to encode it with base64_encode . Then it works but string is unreadable from me in DB manager program. Is there a way to make this work without base64_decode ?
It should always set off a red flag when you're trying to store serialized data in a relational database. Normalize your schema so you don't have to serialize.
Storing your data in a poor format so it is readable while in the DB is not a good idea. You want to store it in a format that is the most efficient for database system, then update your manager to unserialize it when you are ready yo display.
json_encode is popular these days, and helps make your data portable.
If you're using PHP 5+, try using JSON instead of the native PHP serializer. JSON is a lot more portable.
But your problem could be with automatic escaping of quotes. It would be helpful if you can show examples of your input & output to/from the DB.
There's no reason why serialize shouldn't work on this example, so it may be more to do with adequate escaping of inputs in your SQL query rather than an issue with the kind of serialisation you're doing. If you're using MySQL, try running the serialised data through mysql_real_escape_string() before you concatenate it into your SQL statement.
Separately, I tend to prefer json_encode() for serialisation of values to a DB field, because serialise tends to make serialised data that is very hard to read manually, and extremely difficult to edit.
Related
I need to export data in selected tables from a MySQL database and import it into another MySQL database with a slightly different structure. IOW, I need to modify the data between the export and import (and not just the field names).
I've tried using json_encode and json_decode and it almost works, but if all the data is not pure utf8, json_encode falls over and utf8_encode doesn't solve this.
I'm considering CSV, serialize, and generating SQL in PHP. Which of those options will give me the most reliable transfer?
You could probably use PDO to query and create an array of objects than can be directly modified in code. Then use that data to import into the other database.
The better question is why UTF-8 isn't working for you. Make sure you are using a recent version of PHP.
Also make sure you are using the mb_ prefixed functions to maintain UTF-8 (or other multibyte) encoding.
List of PHP MultiByte String Functions
You can also shape the data during the export, depending on how you're dumping the data using the SELECT statement and re-ordering columns or selectively omitting some or using conditionals to change the data. The result will still be a set of rows and columns.. as long as they match the structure of your destination, you can do a blind import. Chances are, you're using mysqldump or something though, which isn't quite as easy. Depending on the changes needed, you could always use SQL to modify the data after re-import as well.
Since SQL is made for juggling data, that could possibly be the easiest way to deal with things rather than trying to parse a ton of stuff with PHP while it's in it's dumped format.
My problem was not caused by an issue with json_encode as I thought. Instead, the issue was a faulty left join.
So the answer is that all the listed methods are reliable as long as your query is formed properly.
I have a object with some data being posted to a php script from a javascript. This data is coming from a for, so the user will will out a form, when they hit enter an Ajax script will take the form database, put it into an object then post it to my php encoding it with JSON.
Now i'm new to stuff like JSON so im not 100% sure what its doing, i've read a bit online and my conclusion is that it encodes the data with a sort of universal encoding that all programming languages have..... Maybe not the best description of it but hey. So this isn't doing the same thing as escaping the data is it?
Any, before i process the data and put in into a database i want to escape it but im not sure of the best way to go about this? is there a way i could escape the hole object? Any tips or tricks for this sort of thing?
No, jsons are't escaped at all.
On PHP side you could use json_decode to retrive a decoded form of the data then you will access all of the original object property as a PHP array.
JSON indeed is "universal" in that it is UTF-8 by default, and multi-byte sequences are escaped in \uuuuu format.
However, if you want to store the entire JSON object in the database as-is, that doesn't take away the need to escape the entire string before you insert it into the database, using the string escaping function of your database (or parametrized queries if your library supports them).
Encoding something in JSON is no the same as escaping it. Basically JSON is a serialization format based on Javascript object literals. So on the php side you need to:
Decode the json to PHP
Validate the vales
Escape the values
Insert the values into your db
After you decode the JSON you will be left with an array (see json_decode, and pass true as the second arg to make sure its an array and not a mic of stdObject and arrays).
So then you can pull out the data you ned and escape it you normally would any array passed to you through $_POST before insertion.
When encoding newline of textarea before storing into mysql using PHP with rawurlencode function encodes newline as %0D%0A.
For Example:
textarea text entered by user:
a
b
encoding using rawurlencode and store into database will store value as a%0D%0Ab
When retrieving from database and decoding using rawurldecode does not work and code gives error. How to overcome this situation and what is the best way to store and retrieve and display textarea values.
can you first encode this textarea string using base64_encode and then perform a base64_decode on the same, if the above does not work for you.
If the textarea does not contain URLs, you should rather use base64_encode then rawurlencode and then store as normal.
You simply should not use rawurlencode for escaping data for your database.
Each target format has it's own escaping method which in general terms makes sure it is stored/display/transferred safely from one place to another, and it doesn't need decoding at the other end.
For instance:
displaying text in HTML, use htmlentities or htmlspecialchars
storing in database, use mysqli_real_escape_string, pg_escape_string, etc...
transferring variablename, use urlencode
transferring variablecontent, use rawurlencode
etc...
You should notice that decoding these things is often done by the browser/database. So no data is actually stored escaped. And decoding doesn't need te be done by your code.
The problem is probably because you escape a sequence with rawurlencode, but your database expected the escaped format for the specific brand of database. And de-escaped it using that assumption, which was wrong, which messed up your string.
Conclusion: find out what brand database you are using, look up the specific escape function for that database, and use the proper escaping function on all your content "transferral".
P.S.: some definition may not be correct, please comment on that. I wanted to make the idea stick but am probably not using all the right terms.
First of all it is very uncommon to run textarea through urlencode()
urlencode was not designed for this purpose.
Second, if you still want to do this, then maybe the problem comes from database. First you need to tell us what database you using and what TYPE you using for storing this data: do you store it as TEXT or as BINARY data? Have you setup the correct charset in database?
So I need to encode an array in PHP and store it in plain text in MySQL database, my question is should I use serialize() or json_encode()? What are the advantages and disadvantages of each of them?
I think either of them would do in this situation. But which one would you prefer and why? If it is for something other than an array?
Main advantage of serialize : it's specific to PHP, which means it can represent PHP types, including instances of your own classes -- and you'll get your objects back, still instances of your classes, when unserializing your data.
Main advantage of json_encode : JSON is not specific to PHP : there are libraries to read/write it in several languages -- which means it's better if you want something that can be manipulated with another language than PHP.
A JSON string is also easier to read/write/modify by hand than a serialized one.
On the other hand, as JSON is not specific to PHP, it's not aware of the stuff that's specific to PHP -- like data-types.
As a couple of sidenotes :
Even if there is a small difference in speed between those two, it shouldn't matter much : you will probably not serialize/unserialize a lot of data
Are you sure this is the best way to store data in a database ?
You won't be able to do much queries on serialized strins, in a DB : you will not be able to use your data in where clauses, nor update it without the intervention of PHP...
I did some analysis on Json Encoding vs Serialization in PHP. And I found that Json is best for plain and simple data like array.
See the results of my experiments at https://www.shozab.com/php-serialization-vs-json-encoding-for-an-array/
Another advantage of json_encode over serialize is the size. I noticed that as I was trying to figure out why our memcache used memory was getting so big, and was trying to find ways to reduce is:
<?php
$myarray = array();
$myarray["a"]="b";
$serialize=serialize($myarray);
$json=json_encode($myarray);
$serialize_size=strlen($serialize);
$json_size=strlen($json);
var_dump($serialize);
var_dump($json);
echo "Size of serialized array: $serialize_size\n";
echo "Size of json encoded array: $json_size\n";
echo "Serialize is " . round(($serialize_size-$json_size)/$serialize_size*100) . "% bigger\n";
Which gives you:
string(22) "a:1:{s:1:"a";s:1:"b";}"
string(9) "{"a":"b"}"
Size of serialized array: 22
Size of json encoded array: 9
Serialize is 59% bigger
Obviously I've taken the most extreme example, as the shorter the array, the more important the overhead with serialize (relative to the initial object size, due to formatting which imposes a minimum number of characters no matter how small the content). Still from a production website I see serialized array that are 20% bigger than their json equivalent.
Well firstly serializing an array or object and storing it in a database is typically a code smell. Sometimes people end up putting a comma separated list into a column and then get into all sorts of trouble when they later find out they need to query on it.
So think very carefully about that if this is that kind of situation.
As for the differences. PHP serialize is probably more compact but only usable with PHP. JSON is cross-platform and possibly slower to encode and decode (although I doubt meaningfully so).
If you data will never has to leave your PHP application, I recommend serialize() because it offers a lot of extra functionality like __sleep() and __wakeup() methods for your objects. It also restores objects as instances of the correct classes.
If you will pass the serialized data to another application, you should use JSON or XML for compatibility.
But storing a serialized objet into a database? Maybe you should think about that again. It can be real trouble later.
First, thanks to Shozab Hasan and user359650 for these tests. I was wondering which choice was the best and now i know:
To encode a simple array, JSON which is OK with both PHP AND javascript, maybe other languages.
To encode a PHP object, serialize is a better choice because of specificity of PHP Objects only instanciable with PHP.
To store datas, either store encoded datas in a file or use MySQL with standard format. It would be much easier to get your datas back. MySQL has great functions to get datas the way you'd like to get them without PHP treatment.
I've never made any test but i think that file storage is the best way to store your datas if system file sorting is enough to get back your files in alphabetical/numeral order.
MySQL is to greedy for this kind of treatment and uses file system too...
Quick question, is it a better idea to call htmlentities() (or htmlspecialchars()) before or after inserting data into the database?
Before: The new longer string will cause me to have to change the database to hold longer values in the field. (maxlength="800" could change to a 804 char string)
After: This will require a lot more server processing, and hundreds of calls to htmlspecialchars() could be made on every page load or AJAX load.
SOOO. Will converting when results are retrieved slow my code significantly? Should I change the DB?
I'd recommend storing the most raw form of the data in the database. That gives you the most flexibility when choosing how and where to output that data.
If you find that performance is a problem, you could cache the HTML-formatted version of this data somehow. Remember that premature optimization is a bad thing.
I have no experience of php but generally I always convert or escape nearest to output. You don't know when your output requirements will change, for example you may want to spit out data as XML, or JSON arrays and so escaping for HTML and then storing means you're limited to using the data as HTML alone.
In a php/MySQL web app, data flows in two ways
Database -> scripting language (php) -> HTML output -> browser ->screen
and
Keyboard-> browser-> $_POST -> php -> SQL statement -> database.
Data is defined as everything provided by the user.
ALWAYS ALWAYS ALWAYS....
A) process data through mysql_real_escape_string as you move it into an SQL statement, and
B) process data through htmlspecialchars as you move it into the HTML output.
This will protect you from sql injection attacks, and enable html characters and entities to display properly (unless you manage to forget one place, and then you have opened up a security hole).
Did I mention that this has to be done for every single piece of data any user could ever have touched, altered or provided via a script?
p.s. For performance reasons, use UTF-8 encoding everywhere.
It's best to store text as raw and encode it as needed, to be honest, you always need to htmlencode your data anyways when you're outputting it to the wbe page to prevent XSS hacking.
You shouldn't encode your data before you put it in the database. The main reason are:
If such data is near the column size limit, say 32 chars, if the title was "Steve & Fred blah blah" then you might go over that column limit because a 1 char & becomes a 5 char & amp;
You are assuming the data will always be displayed in a web page, in the future you never know where you'll be looking at the data and you might not want it encoded, now you have to decode it and it's possible you might not have access to PHP's decode function
It is the way of the craftsman to "measure twice, optimize once".
If you don't need high performance for your website, store it as raw data and when you output it do what you want.
If you need performance then consider storing it twice: raw data to do what you want with it and another field with the filtered data. It could be seen as redundant, but CPU is expensive, while data storage is really cheap.
The easiest way is store the data "as is" and then convert to htmlentities wherever it is needed.
The safest solution is to filter the data before it goes in into the Database as this prevents possible attacks on your server and database from the lack of security implementation, and then convert it however you need when needed. Also if you are using PDO this will happen automatically for you using prepared statements.
http://php.net/PDO
We had this debate at work recently. We decided to store the escaped values in the database, because before (when we were storing it unescaped) there were corner cases where data was being displayed without being escaped. This can lead to XSS. So we decided to store it escaped to be safe, and if you want it unescaped you have to do the work yourself.
Edit: So to everyone who disagrees, let me add some backstory for my case. Let's say you're working in a team of 50+ people... and data from the database is not guaranteed to be HTML-Encoded on the way out - there's no built-in mechanism for it so the developer has to write the code to do it. And this data is shown all over the place so it's not going through 1 developer's code it's going through 30's - most of whom have no clue about this data (or that it could even contain angle brackets which is rare) and merely want to get it shown on the page, move on, and forget about it.
Do you still think it's better to put the data, in HTML, into the database and rely on random people who are not-you to do things properly? Because frankly, while it certainly may not seem warm-fuzzy-best-practicey, I prefer to fail closed (meaning when the data comes through in a Word Doc it looks like Value<Stock rather than Value<Stock) rather than open (so the Word Doc looks right with no work, but some corner of the platform may/likely-is vulnerable to XSS). You can't have both.