I am trying store temporary data (such as cart products, session_data) in DB. And I choosed YAML for this instead of serialize() function. Because YAML data is easily readable by human and portable between programming languages.
Am I in trouble with YAML if I store my temprory data in database?
Personally I would use serialize for two reasons:
Its included in PHP by default.
What you put in is what you get out.
In regards to the second point. Serialize doesn't just convert to a string it records the type as well and PHP calls functions on objects so you can choose what to serialise and what do do with the data when you unserialise it.
See: __sleep and __wake
It may not be easy to read directly from the database but it wouldn't take two minutes to write a script that could pull it out, unserialise it and do a print_r on the data to view what's stored.
Personally, I wouldn't use YAML. It's too format-dependent (Requiring new lines, whitespace, etc) and there's no native parser in PHP. Instead, I'd use JSON for this. It's trivial to handle natively, and is quite human readable (no as much as YAML, but much more so than serialized). It's the best of both worlds.
But, with that said, you really should ask yourself the question as to why you want to store a serialized representation of a complex data structure in a field in the DB... For most cases, it might be better to store a normalized representation of the data (so it's searchable easily, etc). It's not "bad" to store serialized data, but it might not be optimal or the right choice depending on what you're trying to do. It's generally far better than using an Entity-Attribute-Value store, but you need to really think about what you're doing to decide if it's the right thing.
Just make sure you are escaping everything potentially dangerous i.e. user input and you are fine.
Related
Here's a scenario, I'm importing data from multiple different sources, and some do encode special chars, some don't.
For example some will send like this: 6.67" and others will send same data as 6.67".
Is there any possible downside (I don't care about any potential performance hit), if I simply run all strings through html_entity_decode?
If there's some downside, what would be the best way to ensure, that ultimately I end up with uniform values?
I'm working with a number of XML feeds to retrieve data (from an external source). I will be retrieving the data, then sending this to my own MySQL database, so that I can then manipulate it how I wish.
I'm just hoping for some advice on best practice in terms of this process please. I'd like to make this as automated as possible, but I'm cautious of sending unvalidated XML data from an external source straight to my own database.
I will be putting in place a few standard validations to escape strings, etc, but should I be looking to 'cleanse' every piece of data (automatically) before committing to my own DB?
Should I perhaps validate each piece of data against it's own set of rules before it makes it's way to my database?
I hope that's clear enough. I'd love to hear some opinions if possible please.
There are 2 things you should worry about: 1 sql injection 2 cross-site scripting.
The first one is simple just use prepared statemants mysqli or PDO.
For corss-site scripting you can either choose to clean it before you put it in a database or when you retrieve it. Personnely i like to do the second one. just use the function htmlspecialchars() before you echo something and you should be safe.
I'm passing urlencode()d serialize()d arrays around my webpages, via $_GET[].
Is it safe to deserialize() a value from $_GET? The deserialized array will sometimes be shown to the user. Would it be possible for a user to expose/reference variables or functions etc within my code? In other words, when deserializing the value, does PHP treat it as data or code?
Update:
I see the documentation says:
"Circular references inside the array/object you are serializing will also be stored. Any other reference will be lost. "
So that means i'm safe? :-)
Absolutely, positively, no.
You shouldn't blindly trust anything from the client side, however there is a way you can give yourself more confidence.
I'm assuming that if you've got PHP serialized data coming from the client side, that client obtained that from a server at some point? If that's the case, and the client doesn't modify the data, you could include a hash along with the data to verify it hasn't been tampered with.
The other alternative would be to unserialize the object, but regard it as 'tainted', then copy and re-verify the unserialized data into a 'clean' object.
This method is as "safe "as any other kind of incoming GET or POST data - you will always need to sanitize the data before working with it! But there are additional issues with unserializing user data.
When unserializing an object, PHP will look whether the class has a __wakeup magic method. That method will get executed if present.
Now this is not a massive security hole in itself, because the class definition is never transmitted in the serialized data. Any malicious code would have to be present in the system already. However, there are conceivable scenarios where this could be a problem (e.g. a plug-in system that can install third party code) and I would be very wary with this.
Also, theoretically, this allows an attacker to create an object of any class inside your script. While not a security problem straight away, it is surely not good practice to do.
JSON encoding would be a more safe way, because it can contain only "dumb" data.
You are serializing only data-part of objects/arrays/variables, the actual executable code is not serialized- there is no point in doing that - serialization helps to transfer your data between two different worlds- executed code can be same or different there - for data it does not matter.
Though possible hacks would be possible - but only based on data - classes and types and values might differ - it's up the code how can it cope with errors during deserialization.
Yes, its safe. You are asking is it safe to serialize the value of the $_GET array. Yes, it is safe. Nothing gets executed during the serialization of array. Since $_GET array does not contain any objects, only the parameters from query string, it cannot do any harm during serialization/unserialization.
You mentioned something you saw on documentation about circular references. Don't worry about that, it does not apply in your case because there are no objects inside the $_GET array.
As far as using the actual data from the $_GET array, that's a different question and the answer would be no, it's not safe to use data from the $_GET array without applying some type of filter or validation first
I wish to be able to generate URL variables like this:
http://example.com/195yq
http://example.com/195yp
http://example.com/195yg
http://example.com/195yf
The variables will match to a MySQL record set so that I can pull it out. At the time of creation of that record set, I wish to create this key for it.
How can I do this? What is the best way to do this? Are there any existing classes that I can make use of?
Thanks all
Basically, you need a hash function of some sort.
This can be SHA-1 function, an MD5 function (as altCogito), or using other data you have in the record and encoding it.
How many records do you think you will have? Be careful on selecting the solution as a hash function has to be big enough to cover you well, but too big and you create a larger database than you need. I've used SHA-1 and truncate to 64 or 32 bits, depending on need.
Also, look at REST. Consider that the URL may not have to be so mysterious...
This can be done pretty straightforward in a couple of ways:
Use a hash, such as MD5. (would be long)
Base-64 encode a particular ID or piece of data
If you're asking whether or not there is anything that does this already with MySQL records, I'm not sure that you'll find something as design-wise, data is really, really conceptually far away from the URLs in your website, even in frameworks like Grails. Most don't even attempt to wrap up front-end and data-access functionality into a single piece.
In what context will you be using this? Why not just pass post variables? It is far more secure and neat. You will still accomplish the number in the url such as id=195yq, and there is a way to hide them by configuring your php.ini file.
I hope this can be of help to you.
Please keep this in mind. When you pass variables in the address bar it is easy for someone to change the variable, and access information you may not want them to access.
In the examples you gave, it looks like you're base-64 encoding the numeric primary key. That's how I would do it and have done it in the past, but I'm not sure it's any better than passing the ID in the clear because base-64 decoding is trivial.
In the MVC way of doing things, where is the best place to run, for example htmlspecialchars() on any input? Should it happen in the view (it sort of makes sense to do it here, as I should be dealing with the raw input throughout the controller and model?)
I'm not quite sure... What are benefits of doing it in the view or controller? This is just reguarding outputting to a page... to minimize potential XSS exploits.
Well, that depends, doesn't it? You should sanitize everything you OUTPUT in the view. First, because sanitization depends on the format of your output. A JSON sanitized output is different than an HTML sanitized output, right? Second, because you never want to trust the data you have. It might have been compromised through any number of ways.
That won't protect against SQL injections and such, though. Now, you never want to do that in a client-side javascript, because an attacker may easily replace that. Again, my advice is sanitization at the point of usage. If you are just writing to a file, it might not be needed. And some database access libraries do not needed it either. Others do.
At any rate, do it at the point of usage, and all the source code becomes more reliable against attacks and bugs (or attacks through bugs).
This is why thinking in design patterns sucks. What you should be asking is where is the most efficient place to do this? If the data is write-once/read-many then sanitising it every time it's output (on page view) is going to put unnecessary load on the server. Make your decision based on how the data will be used, where you can setup caching, how you do searches, etc.. not on the merits of a pattern.
From what you've said I'd perform the sanitation just ahead of writing it to the DB. Then you're not only ensuring the data is safe to insert but you're also ensuring that no future mistakes can result in unsanitised data being sent. If you ever want the original text for some reason you just invert your original transformation.
You should not be concerned about storing html encoded text in your DB since ALL text is encoded in one form or another. If you need to search the text you just encode the search string as well. If you need another format then that's another story but then you would have to evaluate your options based on your needs.
I think the best way is to escape the view - output, and store everything in original in your database.
Why ? With this method you're able to use the db records for every use case.
You can do it in the view (via javascript validation), but data coming from the rendered view to the controller is still considered untrusted, so you will still have to sanitize it in the controller.
In the examples I've seen (such as nerddinner), the sanitizing code is part of the model classes. Some people use validation libraries.
I don't there's any 'best' place to sanitize. Depending on the use case, we may need to implement sanitizing logic in more than one tiers.
The general rule is : fat model, thin controller.
Now, how you apply that rule is a different story :)
The way i think of it is your controller should really just be controlling the flow, redirecting to pages and etc. Any validation should take place in your model. If you want to do client side validation, you'd probably put it in the view. Any developer concerned about security would do validation on the client and on the server.
I put it in the "controller" as most of today's frameworks define it. (Not getting into the discussion of how pure that is) It is not something that belongs directly in a view template, but it also does not necessarily need to be in the model, as you may want the original data sometimes and not others.
So when I'm loading data from the model in the controller and assigning it to a view (smarty template, in my case), I run it through the HTML Purifier first.
I'm going to buck the answering trend here and give this advice:
Untrusted input should be confined as rigidly as possible - by reducing the number of places that you interact with input before its safety has been evaluated, you reduce your threat exposure when someone who is thinking about a bug fix or functionality improvement rather than security changes the system under discussion.
Depends on the type of user input and what the validation it is you're running on it.
If you want to clean the input, I'd put the logic in the controller, and also in the view when you output data that comes from the database (or any source really).
If you are doing data validation, I'd do it both on the client side with javascript, as well as in the model.