Regex to match PHP serialized data inside a string - php

I'm working with Zend Framework 2's session manager in PHP, and want to unserialize the session data so I can change the way the data is stored. I thought regex was the way to do it, but I can't figure out how to make sure the regex is right for this type of string.
Sample input:
__ZF|a:2:{s:20:"_REQUEST_ACCESS_TIME";d:1099999999.9999999999999999999999;s:6:"_VALID";a:1:{s:25:"Zend\Session\Validator\Id";s:26:"xxxxxxxxxxxxxxxxxxxxxxxxxx";}}initialized|C:23:"Zend\Stdlib\ArrayObject":403:{a:4:{s:7:"storage";a:3:{s:4:"init";i:1;s:10:"remoteAddr";s:13:"127.000.00.01";s:13:"httpUserAgent";s:114:"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";}s:4:"flag";i:2;s:13:"iteratorClass";s:13:"ArrayIterator";s:19:"protectedProperties";a:4:{i:0;s:7:"storage";i:1;s:4:"flag";i:2;s:13:"iteratorClass";i:3;s:19:"protectedProperties";}}}
Expected output:
'__ZF|a:2:{s:20:"_REQUEST_ACCESS_TIME";d:1099999999.9999999999999999999999;s:6:"_VALID";a:1:{s:25:"Zend\Session\Validator\Id";s:26:"xxxxxxxxxxxxxxxxxxxxxxxxxx";}}'
'initialized|C:23:"Zend\Stdlib\ArrayObject":403:{a:4:{s:7:"storage";a:3:{s:4:"init";i:1;s:10:"remoteAddr";s:13:"127.000.00.01";s:13:"httpUserAgent";s:114:"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";}s:4:"flag";i:2;s:13:"iteratorClass";s:13:"ArrayIterator";s:19:"protectedProperties";a:4:{i:0;s:7:"storage";i:1;s:4:"flag";i:2;s:13:"iteratorClass";i:3;s:19:"protectedProperties";}}}'
What I tried:
$pattern = '/\w+\|.*?}}+/'; // this works for the sample input, but may be too general and certainly won't work for serialized data without a nested array
$pattern = '/\w+\|(a:\d+:{.*?}|o:\d+:\"[a-z0-9_]+\":\d+:{.*?})/'; // doesn't capture the `initialized` data
Where I am stuck:
Put generally, I can't figure out the best way to split apart the __ZF data from the initialized data (especially when there are other non-Zend variables in the session). Specifically, I can't figure out what regex to use to get serialized data.
I tried to put an example on RegexPlanet, but couldn't figure out the interface, and it only seemed to produce bizarre results. If it helps, I'm fairly sure ZF PHP produces its serialized session data like this:
$text = "";
foreach ($_SESSION as $key => $value) {
$text .= $key . "|" . serialize($value);
}
...but I haven't found the source code for that.

I found out about ini_set('session.serialize_handler', 'php_serialize'); It changes the serialization to use PHP's regular serialize method instead of the alternate, which solves the problem. – Miryafa

Related

I need to add fields to a JSON string in PHP, but I'm having problems

So... I need to save a large-ish amount of data from a platform with an excruciatingly limited amount of memory.
Because of this, I'm basically storing the data on my webserver, using a php script to just write JSON to a flat file, because I'm lazy af.
I could go to the trouble of having it store the data in my mysql server, but frankly the flat file thing should have been trivial, but I've run up against a problem. There are several quick and dirty workarounds that would fix it, but I've been trying to fix it the "right" way (I know, I know, the right way would be to just store the data in mysql, but I actually need to be able to take the json file this produces and send it back to the platform that needs the data (In a ridiculously roundabout fashion), so it made sense to just have the php save it as a flat file in the first place. And It's already working, aside from this one issue, so I hate to reimpliment.
See... Because of the low memory on the platform I'm sending the json to my server from... I'm sending things one field at a time. Each call to the php script is only setting ONE field.
So basically what I'm doing is loading the file from disk if it exists, and running it through json_decode to get my storage object, and then the php file gets a key argument and a value argument, and if the key is something like "object1,object2", it explodes that, gets the length of the resulting array, and then stores the value in $data->$key[0]->$key[1].
Then it's saved back to disk with fwrite($file, json_encode($data));
This is all working perfectly. Except when $value is a simple string. If it's an array, it works perfectly. If it's a number, it works fine. If it's a string, I get null from json_decode. I have tried every way I can think of to force quotes on to the ends of the $value variable in the hopes of getting json_decode to recognize it. Nothing works.
I've tried setting $data->$key[0]->$key[1] = $value in cases where value is a string, and not an array or number. No dice, php just complains that I'm trying to set an object that doesn't exist. It's fine if I'm using the output of json_decode to set the field, but it simply will not accept a string on its own.
So I have no idea.
Does anyone know how I can either get json_decode to not choke on a string that's just a string, or add a new field to an existing php object without using the output of json_decode?
I'm sure there's something obvious I'm missing. It should be clear I'm no php guru. I've never really used arrays and objects in php, so their vagaries are not something I'm familiar with.
Solutions I'm already aware of, but would prefer to avoid, are: I could have the platform that's sending the post requests wrap single, non-numeric values with square braces, creating a single item array, but this shouldn't be necessary, as far as I'm aware, so doing this bothers me (And ends up costing me something like half a kilobyte of storage that shouldn't need to be used).
I could also change some of my json from objects to arrays in order to get php to let me add items more readily, but it seems like there should be a solution that doesn't require that, so I'd really prefer not to...
I skim through your post.
And I know this works for StdClass :
$yourClass->newField = $string;
Is this what you wanted ?
OK so... ultimately, as succinctly as possible, the problem was this:
Assuming we have this JSON in $data:
{
"key1":
{
"key2":["somedata","someotherdata"]
}
}
And we want it to be:
{
"key1":
{
"key2":["somedata","someotherdata"],
"key3":"key3data"
}
}
The php script has received "key=key1,key3&value=key3data" as its post data, and is initialized thusly:
$key = $_POST["key"];
$key = explode($key,",");
$value = $_POST["value"];
...which provides us with an array ($key) representing the nested json key we want to set as a field, and a variable ($value) holding the value we want to set it to.
Approach #1:
$data->$key[0]->$key[1] = json_decode($value);
...fails. It creates this JSON when we re-encode $data:
{
"key1":
{
"key2":["somedata","someotherdata"],
"key3":null
}
}
Approach #2:
$data->$key[0]->$key[1] = $value;
...also fails. It fails to insert the field into $data at all.
But then I realized... the problem with #2 is that it won't let me set the nonexistent field, and the problem with approach #1 is that it sets the field wrong.
So all I have to do is brute force it thusly:
$data->$key[0]->$key[1] = json_decode($value);
if (json_decode($value) == NULL)
{
$data->$key[0]->$key[1] = $value;
}
This works! Since Approach #1 has created the field (Albeit with the incorrect value), PHP now allows me to set the value of that field without complaint.
It's a very brute force sort of means of fixing the problem, and I'm sure there are better ones, if I understood PHP objects better. But this works, so at least I have my code working.

PHP Unserialize data for use in array - sub standard characters in string

I am using a jQuery plugin of nestable forms and storing the order of these in a database using serialize (achieved through JS). Once I retrieve this data from the database I need to be able to unserialize it so that each piece of data can be used.
An example of the data serialized and stored is
[{"id":"H592736029375"},{"id":"K235098273598"},{"id":"B039571208517"}]
The number of ID's stored in each serialized data varies and the JS plugin adds the [ and ] brackets around the serialization.
I have used http://www.unserialize.com/ to test an unserialization of the data and it proves successful using print_r. I have tried replicating this with the following code:
<?php
print_r(unserialize('[{"id":"H592736029375"},{"id":"K235098273598"},{"id":"B039571208517"}]'));
?>
but I get an error. I am guessing that I need to use something similar to strip_tags to remove the brackets, but am unsure. The error given is as follows
Notice: unserialize(): Error at offset 0 of 70 bytes
Once I have the unserialized data I need to be able to use each ID as a variable and I am assuming to do so I need to do something as:
<?php
$array = unserialize('[{"id":"H592736029375"},{"id":"K235098273598"},{"id":"B039571208517"}]');
foreach($array as $key => $val)
{
// Do something here, use each individial ID however
// e.g database insert using $val['id']; to get H592736029375 then K235098273598 and finally B039571208517
}
?>
Is anyone able to offer any help as to how to strip the serialized data correctly to have the ID's ready in an array to then be used in the foreach function?
Much appreciated.
PHP's serialize() and unserialize() functions are PHP specific, not for communicating with other languages.
It looks like your JS serialize function is actually generating JSON though, so on the PHP side, use json_decode() rather than unserialize.
Here's a fiddle
$data = '[{"id":"H592736029375"},{"id":"K235098273598"},{"id":"B039571208517"}]';
$array = json_decode($data, true);
foreach($array as $index=>$data){
echo "$index) {$data['id']}\n";
}
Outputs:
0) H592736029375
1) K235098273598
2) B039571208517

Find and concatenate result in a pattern

I have a long PHP file and I want to copy all the variable names only and build an insert sql query. Is there a way where I can search for a pattern using regular expression and concatenate the find result till I collected all the variable and spit it out in a statement?
I am using TextMate and am familiar with regular expression search. Regex search result give $0,$1 and so forth argument. Do not know if this possible though. Solution in any editor will do not just text mate.
I have just too many variable (+100) don't feel like copy every single one. Here my sample file
$ID = $_POST['id'];
$TXN_TYPE = $_POST['txn_type'];
$CHARSET = $_POST['charset']
$CUSTOM = $_POST['custom'];
You could try something with get_defined_vars(). However this function also lists GLOBAL vars. You can use this snippet to remove them if you don't want them and display only the vars you defined
$variables = array_diff(get_defined_vars(), array(array()));
However this snippet generates Notices and I haven't found a way to solve them yet.
If you've only got $_POST variables you can loop through the $_POST array itself
You create the SQL programmatically while looping through the array.
My own solution is, do the inverse. It is not probably possible.
Leave only the variable names Remove all the rest. Use
[space].+ regex to remove everything that is after the variable name.
clean the file so that only variable names are left. then do a couple more find and replace to bring the variable name in the form you want.
If you're looking to match only the variable names (not the $_POST array indices), then the regular expression is pretty much provided in the PHP documentation:
\$[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*
This will, of course, include $_POST, but that should be easy enough to remove. If not, you could do it with negative lookahead (if TextMate supports it):
\$(?!_POST($|[^a-zA-Z0-9_\x7f-\xff]))[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*

PHP - Exploding on character(s) that can NEVER be user-defined... How?

Ok, am trying to find a character or group of characters, or something that can be used that I can explode from, since the text is user-defined, I need to be able to explode from a value that I have that can never be within the text.
How can I do this?
An example of what I'm trying to do...
$value = 'text|0||#fd9||right';
Ok,
text is something that should never change in here.
0, again not changeable
#fd9 is a user-defined string that can be anything that the user inputs...
and right sets the orientation (either left or right).
So, the problem I'm facing is this: How to explode("||", $value) so that if there is a || within the user-defined part... Example:
$value = 'text|0||Just some || text in here||right';
So, if the user places the || in the user-defined part of the string, than this messes this up. How to do this no matter what the user inputs into the string? So that it should return the following array:
array('text|0', 'Just some || text in here', 'right');
Should I be using different character(s) to explode from? If so, what can I use that the user will not be able to input into the string, or how can I check for this, and fix it? I probably shouldn't be using || in this case, but what can I use to fix this?
Also, the value will be coming from a string at first, and than from the database afterwards (once saved).
Any Ideas?
The problem of how to represent arbitrary data types as strings always runs up against exactly the problem you're describing and it has been solved in many ways already. This process is called serialization and there are many serialization formats, anything from PHP's native serialize to JSON to XML. All these formats specify how to present complex data structures as strings, including escaping rules for how to use characters that have a special meaning in the serialization format in the serialized values themselves.
From the comments:
Ok, well, basically, it's straight forward. I already outlined 13 of the other parameters and how they work in Dream Portal located here: http://dream-portal.net/topic_122.0.html so, you can see how they fit in. I'm working on a fieldset parameter that basically uses all of these parameters and than some to include multiple parameters into 1. Anyways, hope that link helps you, for an idea of what an XML file looks like for a module: http://dream-portal.net/topic_98.0.html look at the info.xml section, pay attention to the <param> tag in there, at the bottom, 2 of them.
It seems to me that a more sensible use of XML would make this a lot easier. I haven't read the whole thing in detail, but an XML element like
<param name="test_param" type="select">0:opt1;opt2;opt3</param>
would make much more sense written as
<select name="test_param">
<option default>opt1</option>
<option>opt2</option>
<option>opt3</option>
</select>
Each unique configuration option can have its own unique element namespace with custom sub-elements depending on the type of parameter you need to represent. Then there's no need to invent a custom mini-format for each possible parameter. It also allows you to create a formal XML schema (whether this will do you any good or not is a different topic, but at least you're using XML as it was meant to be used).
You can encode any user input to base64 and then use it with explode or however you wish.
print base64_encode("abcdefghijklmnopqrstuvwxyz1234567890`~!##$%^&*()_+-=[];,./?>:}{<");
serialized arrays are also not a bad idea at all. it's probably better than using a comma separated string and explode. Drupal makes good use of serialized arrays.
take a look at the PHP manual on how to use it:
serialize()
unserialize()
EDIT: New Solution
Is it a guarantee that text doesn't contain || itself?
If it doesn't, you can use substr() in combination with strpos() and strrpos() instead of explode
Here's what I usually do to get around this problem.
1) capture user's text and save it in a var $user_text;
2) run an str_replace() on $user_text to replace the characters you want to split by:
//replace with some random string the user would hopefully never enter
$modified = str_replace('||','{%^#',$user_text);
3) now you can safely explode your text using ||
4) now run an str_replace on each part of the explode, to set it back to the original user entered text
foreach($parts as &$part) {
$part = str_replace('{%^#','||',$part);
}

Getting one value out of a serialized array in PHP

What would you say is the most efficient way to get a single value out of an Array. I know what it is, I know where it is. Currently I'm doing it with:
$array = unserialize($storedArray);
$var = $array['keyOne'];
Wondering if there is a better way.
You are doing it fine, I can't think of a better way than what you are doing.
You unserialize
You get an array
You get value by specifying index
That's the way it can be done.
Wondering if there is a better way.
For the example you give with the array, I think you're fine.
If the serialized string contains data and objects you don't want to unserialize (e.g. creating objects you really don't want to have), you can use the Serialized PHP library which is a complete parser for serialized data.
It offers low-level access to serialized data statically, so you can only extract a subset of data and/or manipulate the serialized data w/o unserializing it. However that looks too much for your example as you only have an array and you don't need to filter/differ too much I guess.
Its most efficient way you can do, unserialize and get data, if you need optimize dont store all variables serialized.
Also there is always way to parse it with regexp :)
If you dont want to unseralize the whole thing (which can be costly, especially for more complex objects), you can just do a strpos and look for the features you want and extract them
Sure.
If you need a better way - DO NOT USE serialized arrays.
Serialization is just a transport format, of VERY limited use.
If you need some optimized variant - there are hundreds of them.
For example, you can pass some single scalar variable instead of whole array. And access it immediately
I, too, think the right way is to un-serialize.
But another way could be to use string operations, when you know what you want from the array:
$storedArray = 'a:2:{s:4:"test";s:2:"ja";s:6:"keyOne";i:5;}';
# another: a:2:{s:4:"test";s:2:"ja";s:6:"keyOne";s:3:"sdf";}
$split = explode('keyOne', $storedArray, 2);
# $split[1] contains the value and junk before and after the value
$splitagain = explode(';', $split[1], 3);
# $splitagain[1] should be the value with type information
$value = array_pop(explode(':', $splitagain[1], 3));
# $value contains the value
Now, someone up for a benchmark? ;)
Another way might be RegEx ?

Categories