How to format var_export to php5.4 array syntax - php

There are lots of questions and answers around the subject of valid php syntax from var outputs, what I am looking for is a quick and clean way of getting the output of var_export to use valid php5.4 array syntax.
Given
$arr = [
'key' => 'value',
'mushroom' => [
'badger' => 1
]
];
var_export($arr);
outputs
array (
'key' => 'value',
'mushroom' =>
array (
'badger' => 1,
),
)
Is there any quick and easy way to have it output the array as defined, using square bracket syntax?
[
'key' => 'value',
'mushroom' => [
'badger' => 1
]
]
Is the general consensus to use regex parsing? If so, has anyone come across a decent regular expression? The value level contents of the arrays I will use will all be scalar and array, no objects or classes.

I had something similar laying around.
function var_export54($var, $indent="") {
switch (gettype($var)) {
case "string":
return '"' . addcslashes($var, "\\\$\"\r\n\t\v\f") . '"';
case "array":
$indexed = array_keys($var) === range(0, count($var) - 1);
$r = [];
foreach ($var as $key => $value) {
$r[] = "$indent "
. ($indexed ? "" : var_export54($key) . " => ")
. var_export54($value, "$indent ");
}
return "[\n" . implode(",\n", $r) . "\n" . $indent . "]";
case "boolean":
return $var ? "TRUE" : "FALSE";
default:
return var_export($var, TRUE);
}
}
It's not overly pretty, but maybe sufficient for your case.
Any but the specified types are handled by the regular var_export. Thus for single-quoted strings, just comment out the string case.

For anyone looking for a more modern-day solution, use the Symfony var-exporter, also available as a standalone library on composer, but included in Symfony by default.
composer require symfony/var-exporter
use Symfony\Component\VarExporter\VarExporter;
// ...
echo VarExporter::export($arr)

I realize this question is ancient; but search leads me here. I didn't care for full iterations or using json_decode, so here's a preg_replace-based var_export twister that gets the job done.
function var_export_short($data, $return=true)
{
$dump = var_export($data, true);
$dump = preg_replace('#(?:\A|\n)([ ]*)array \(#i', '[', $dump); // Starts
$dump = preg_replace('#\n([ ]*)\),#', "\n$1],", $dump); // Ends
$dump = preg_replace('#=> \[\n\s+\],\n#', "=> [],\n", $dump); // Empties
if (gettype($data) == 'object') { // Deal with object states
$dump = str_replace('__set_state(array(', '__set_state([', $dump);
$dump = preg_replace('#\)\)$#', "])", $dump);
} else {
$dump = preg_replace('#\)$#', "]", $dump);
}
if ($return===true) {
return $dump;
} else {
echo $dump;
}
}
I've tested it on several arrays and objects. Not exhaustively by any measure, but it seems to be working fine. I've made the output "tight" by also compacting extra line-breaks and empty arrays. If you run into any inadvertent data corruption using this, please let me know. I haven't benchmarked this against the above solutions yet, but I suspect it'll be a good deal faster. Enjoy reading your arrays!

With https://github.com/zendframework/zend-code :
<?php
use Zend\Code\Generator\ValueGenerator;
$generator = new ValueGenerator($myArray, ValueGenerator::TYPE_ARRAY_SHORT);
$generator->setIndentation(' '); // 2 spaces
echo $generator->generate();

As the comments have pointed out, this is just an additional syntax. To get the var_export back to the bracket style str_replace works well if there are no ) in the key or value. It is still simple though using JSON as an intermediate:
$output = json_decode(str_replace(array('(',')'), array('&#40','&#41'), json_encode($arr)), true);
$output = var_export($output, true);
$output = str_replace(array('array (',')','&#40','&#41'), array('[',']','(',')'), $output);
I used the HTML entities for ( and ). You can use the escape sequence or whatever.

Related

Why doesn't this array and foreach work with mysqli?

I got this code
$a = "I love Steve Jobs.";
$targets = array('bill gates','steve jobs');
foreach($targets as $t)
{
if (preg_match_all("/\b" . $t . "\b/i", $a)) {
$b[] = $t;
}
}
This code finds ALL matches in the array, which is great
Now I got a database with a table called "tags" - in here there is around 500 keywords. I wan't to make this work together with the script above.
All I have to do in my logic, is to make an array which replaces $targets.
This code I made right here, does not work:
$result = $con->query("SELECT tag FROM tags");
$targets = $result->fetch_all(MYSQLI_ASSOC);
foreach($targets as $t)
{
if (preg_match_all("/\b" . $t . "\b/i", $a)) {
$b[] = $t;
}
}
I get this notice:
Notice: Array to string conversion
I don't know what to do - 2 hours passed by and I can't figure it out.
Thank you :)
in your code, you have:
if (preg_match_all("/\b" . $t . "\b/i", $a)) {
but $t is row in array, that is array with key tag.
you should have:
if (preg_match_all("/\b" . $t['tag'] . "\b/i", $a)) {
Same in other line, should be:
$b[] = $t['tag'];
summary
fetch_all return array of arrays that hold row from database.
You can always check value with methods print_r or var_dump
mysqli_fetch_all/mysqli_result::fetch_all returns an array of arrays representing result rows. In your example it would return a structure like this:
array(
array('tag' => 'bill gates'),
array('tag' => 'steve jobs')
)
You should be using $t['tag'] instead of $t inside the loop.
By the way, the easiest way to quickly debug situations like these, you can output the query results (or any other variable) with print_r() or var_dump(), and see how they're structured.
You can try use this code:
mysqli_fetch_assoc($result)

Difference between "==" and "strcmp" in PHP

I was asked to create simulation array_keys function, but check equality "==" returns false.
But with "strcmp ($a, $b) == 0" return true.
class Utility
{
public function convertArrayKeys(array $array)
{
$i = 0;
$elements = [];
foreach ($array as $element=>$value) {
$elements[] = ' ' . $i++ . " => '" . $element . "'";
}
return 'array ( ' . implode(', ', $elements) . ', )';
}
public function testStrings(array $array)
{
$arrayOur = $this->convertArrayKeys($array);
$arrayPhp = var_export(array_keys($array),true);
if ($arrayOur == $arrayPhp){
echo 'They are identical :)';
} else {
echo 'They are not identical :(';
echo '<br>';
print_r(str_split($arrayOur));
echo '<br>';
print_r(str_split($arrayPhp));
}
}
}
View:
$repository = array('box'=>'blue', 'cube'=>'red', 'ball'=>'green');
$utility = new Utility();
echo "OUr array_keys: ";
echo $utility->convertArrayKeys($repository);
echo "<br />";
echo "PHP array_keys: ";
print_r (var_export(array_keys($repository)));
echo "<hr >";
echo "<br />";
echo $utility->testStrings($repository);
I would appreciate to know because
Edit: The reason that the two don't work in THIS instance is that your functions dont produce identical outputs: yours produces:
array ( 0 => 'box', 1 => 'cube', 2 => 'ball', )
where the php function produces:
array (
0 => 'box',
1 => 'cube',
2 => 'ball',
)
If you were to view that in the web browser i think the web browser renderer does whitespace trickery. However try putting uhh <pre> tags around it (or run in command line to check).
Basically == does more then compare the two values - the documentation suggests "after type juggling". You can get some weird things by comparing strings using ==. One good example is: '1e3' == '1000'. It is useful to use ==at times, but possibly not in conjunction with strings.
Strcmp also though doesn't return a true/false answer, but a -1, 0, 1 answer indicating which string is alphabetically in front of the other.
You should also look at === which can also have helpful uses, but personally I would stick with strcmp with strings.
Hi never use == in PHP. It will not do what you expect. Even if you are comparing strings to strings, PHP will implicitly cast them to floats and do a numerical comparison if they appear numerical.
try these you will know the reason
$something = 0;
echo ('password' == $something) ? 'true' : 'false';// true
$something = 0;
echo ('password' === $something) ? 'true' : 'false'; // false
echo strcmp('password123',$something); // 1
Because they are not arrays, rather they are strings. Arrays are not created like this. you are doing it wrong. Were they arrays then
if ($arrayOur == $arrayPhp)
would have evaluated to true. But they are just strings and
"strcmp ($a, $b) == 0"
Does not return true, because there are whitespaces in the first string,
return 'array ( ' . implode(', ', $elements) . ', )';
You are doing it completely wrong. You need to correct your approach.

Why won't PHP recognize two equal strings?

I'm working on a php function that will compare the components of two arrays. Each value in the arrays are only one english word long. No spaces. No characters.
Array #1: a list of the most commonly used words in the english
language. $common_words_array
Array #2: a user-generated sentence, converted to lowercase, stripped
of punctuation, and exploded() using the space (" ") as a delimiter.
$study_array
There's also a $com_study array, which is used in this case to keep
track of the order of commonly used words which get replaced in the
$study_array by a "_" character.
Using nested for loops, what SHOULD happen is that the script should compare each value in Array #2 to each value in Array #1. When it finds a match (aka. a commonly used english word), it will do some other magic that's irrelevant to the current problem.
As of right now, PHP doesn't recognize when two array string values are equivalent. I'm adding in the code to the problematic function here for reference. I've added in a lot of unnecessary echo commands in order to localize the problem to the if statement.
Can anybody see something that I've missed? The same algorithm worked perfectly in Python.
function create_question($study_array, $com_study, $common_words_array)
{
for ($study=0; $study<count($study_array); $study++)
{
echo count($study_array)." total in study_array<br>";
echo "study is ".$study."<br>";
for ($common=0; $common<count($common_words_array); $common++)
{
echo count($common_words_array)." total in common_words_array<br>";
echo "common is ".$common."<br>";
echo "-----<br>";
echo $study_array[$study]." is the study list word<br>";
echo $common_words_array[$common]." is the common word<br>";
echo "-----<br>";
// The issue happens right here.
if ($study_array[$study] == $common_words_array[$common])
{
array_push($com_study, $study_array[$study]);
$study_array[$study] = "_";
print_r($com_study);
print_r($study_array);
}
}
}
$create_question_return_array = array();
$create_question_return_array[0] = $study_array;
$create_question_return_array[1] = $com_study;
return $create_question_return_array;
}
EDIT: At the suggestion of you amazing coders, I've updated the if statement to be much more simple for purposes of debugging. See below. Still having the same issue of not activating the if statement.
if (strcmp($study_array[$study],$common_words_array[$common])==0)
{
echo "match found";
//array_push($com_study, $study_array[$study]);
//$study_array[$study] = "_";
//print_r($com_study);
//print_r($study_array);
}
EDIT: At bansi's request, here's the main interface snippet where I'm calling the function.
$testarray = array();
$string = "This is a string";
$testarray = create_study_string_array($string);
$testarray = create_question($testarray, $matching, $common_words_array);
As for the result, I'm just getting a blank screen. I would expect to have the simplified echo statement output "match found" to the screen, but that's not happening.
(EDIT) make sure your that your splitting function removes excess whitespace (e.g. preg_split("\\s+", $input)) and that the input is normalized properly (lowercase'd, special chars stripped out, etc.).
On mobile and can't seem to copy text. You forgot a dollar sign when accessing the study array in your push command.
change
array_push($com_study, $study_array[study]);
to
array_push($com_study, $study_array[$study]);
// You missed a $ ^ here
Edit:
The following code outputs 3 'match found'. i don't know the values of $common_words_array and $matching, so i used some arbitrary values, also instead of using function create_study_string_array i just used explode. still confused, can't figure out what exactly you are trying to achieve.
<?php
$testarray = array ();
$string = "this is a string";
$testarray = explode ( ' ', $string );
$common_words_array = array (
'is',
'a',
'this'
);
$matching = array (
'a',
'and',
'this'
);
$testarray = create_question ( $testarray, $matching, $common_words_array );
function create_question($study_array, $com_study, $common_words_array) {
echo count ( $study_array ) . " total in study_array<br>";
echo count ( $common_words_array ) . " total in common_words_array<br>";
for($study = 0; $study < count ( $study_array ); $study ++) {
// echo "study is " . $study . "<br>";
for($common = 0; $common < count ( $common_words_array ); $common ++) {
// The issue happens right here.
if (strcmp ( $study_array [$study], $common_words_array [$common] ) == 0) {
echo "match found";
}
}
}
$create_question_return_array = array ();
$create_question_return_array [0] = $study_array;
$create_question_return_array [1] = $com_study;
return $create_question_return_array;
}
?>
Output:
4 total in study_array
3 total in common_words_array
match foundmatch foundmatch found
Use === instead of ==
if ($study_array[$study] === $common_words_array[$common])
OR even better use strcmp
if (strcmp($study_array[$study],$common_words_array[$common])==0)
Use built-in functions wherever possible to avoid unnecessary code and typos. Also, providing sample inputs would be helpful too.
$study_array = array("a", "cat", "sat", "on","the","mat");
$common_words_array = array('the','a');
$matching_words = array();
foreach($study_array as $study_word_index=>$study_word){
if(in_array($study_word, $common_words_array)){
$matching_words[] = $study_word;
$study_array[$study_word_index] = "_";
//do something with matching words
}
}
print_r($study_array);
print_r($matching_words);

How to repair a serialized string which has been corrupted by an incorrect byte count length?

I am using Hotaru CMS with the Image Upload plugin, I get this error if I try to attach an image to a post, otherwise there is no error:
unserialize() [function.unserialize]: Error at offset
The offending code (error points to line with **):
/**
* Retrieve submission step data
*
* #param $key - empty when setting
* #return bool
*/
public function loadSubmitData($h, $key = '')
{
// delete everything in this table older than 30 minutes:
$this->deleteTempData($h->db);
if (!$key) { return false; }
$cleanKey = preg_replace('/[^a-z0-9]+/','',$key);
if (strcmp($key,$cleanKey) != 0) {
return false;
} else {
$sql = "SELECT tempdata_value FROM " . TABLE_TEMPDATA . " WHERE tempdata_key = %s ORDER BY tempdata_updatedts DESC LIMIT 1";
$submitted_data = $h->db->get_var($h->db->prepare($sql, $key));
**if ($submitted_data) { return unserialize($submitted_data); } else { return false; }**
}
}
Data from the table, notice the end bit has the image info, I am not an expert in PHP so I was wondering what you guys/gals might think?
tempdata_value:
a:10:{s:16:"submit_editorial";b:0;s:15:"submit_orig_url";s:13:"www.bbc.co.uk";s:12:"submit_title";s:14:"No title found";s:14:"submit_content";s:12:"dnfsdkfjdfdf";s:15:"submit_category";i:2;s:11:"submit_tags";s:3:"bbc";s:9:"submit_id";b:0;s:16:"submit_subscribe";i:0;s:15:"submit_comments";s:4:"open";s:5:"image";s:19:"C:fakepath100.jpg";}
Edit: I think I've found the serialize bit...
/**
* Save submission step data
*
* #return bool
*/
public function saveSubmitData($h)
{
// delete everything in this table older than 30 minutes:
$this->deleteTempData($h->db);
$sid = preg_replace('/[^a-z0-9]+/i', '', session_id());
$key = md5(microtime() . $sid . rand());
$sql = "INSERT INTO " . TABLE_TEMPDATA . " (tempdata_key, tempdata_value, tempdata_updateby) VALUES (%s,%s, %d)";
$h->db->query($h->db->prepare($sql, $key, serialize($h->vars['submitted_data']), $h->currentUser->id));
return $key;
}
unserialize() [function.unserialize]: Error at offset was dues to invalid serialization data due to invalid length
Quick Fix
What you can do is is recalculating the length of the elements in serialized array
You current serialized data
$data = 'a:10:{s:16:"submit_editorial";b:0;s:15:"submit_orig_url";s:13:"www.bbc.co.uk";s:12:"submit_title";s:14:"No title found";s:14:"submit_content";s:12:"dnfsdkfjdfdf";s:15:"submit_category";i:2;s:11:"submit_tags";s:3:"bbc";s:9:"submit_id";b:0;s:16:"submit_subscribe";i:0;s:15:"submit_comments";s:4:"open";s:5:"image";s:19:"C:fakepath100.jpg";}';
Example without recalculation
var_dump(unserialize($data));
Output
Notice: unserialize() [function.unserialize]: Error at offset 337 of 338 bytes
Recalculating
$data = preg_replace('!s:(\d+):"(.*?)";!e', "'s:'.strlen('$2').':\"$2\";'", $data);
var_dump(unserialize($data));
Output
array
'submit_editorial' => boolean false
'submit_orig_url' => string 'www.bbc.co.uk' (length=13)
'submit_title' => string 'No title found' (length=14)
'submit_content' => string 'dnfsdkfjdfdf' (length=12)
'submit_category' => int 2
'submit_tags' => string 'bbc' (length=3)
'submit_id' => boolean false
'submit_subscribe' => int 0
'submit_comments' => string 'open' (length=4)
'image' => string 'C:fakepath100.jpg' (length=17)
Recommendation .. I
Instead of using this kind of quick fix ... i"ll advice you update the question with
How you are serializing your data
How you are Saving it ..
================================ EDIT 1 ===============================
The Error
The Error was generated because of use of double quote " instead single quote ' that is why C:\fakepath\100.png was converted to C:fakepath100.jpg
To fix the error
You need to change $h->vars['submitted_data'] From (Note the singe quite ' )
Replace
$h->vars['submitted_data']['image'] = "C:\fakepath\100.png" ;
With
$h->vars['submitted_data']['image'] = 'C:\fakepath\100.png' ;
Additional Filter
You can also add this simple filter before you call serialize
function satitize(&$value, $key)
{
$value = addslashes($value);
}
array_walk($h->vars['submitted_data'], "satitize");
If you have UTF Characters you can also run
$h->vars['submitted_data'] = array_map("utf8_encode",$h->vars['submitted_data']);
How to detect the problem in future serialized data
findSerializeError ( $data1 ) ;
Output
Diffrence 9 != 7
-> ORD number 57 != 55
-> Line Number = 315
-> Section Data1 = pen";s:5:"image";s:19:"C:fakepath100.jpg
-> Section Data2 = pen";s:5:"image";s:17:"C:fakepath100.jpg
^------- The Error (Element Length)
findSerializeError Function
function findSerializeError($data1) {
echo "<pre>";
$data2 = preg_replace ( '!s:(\d+):"(.*?)";!e', "'s:'.strlen('$2').':\"$2\";'",$data1 );
$max = (strlen ( $data1 ) > strlen ( $data2 )) ? strlen ( $data1 ) : strlen ( $data2 );
echo $data1 . PHP_EOL;
echo $data2 . PHP_EOL;
for($i = 0; $i < $max; $i ++) {
if (#$data1 {$i} !== #$data2 {$i}) {
echo "Diffrence ", #$data1 {$i}, " != ", #$data2 {$i}, PHP_EOL;
echo "\t-> ORD number ", ord ( #$data1 {$i} ), " != ", ord ( #$data2 {$i} ), PHP_EOL;
echo "\t-> Line Number = $i" . PHP_EOL;
$start = ($i - 20);
$start = ($start < 0) ? 0 : $start;
$length = 40;
$point = $max - $i;
if ($point < 20) {
$rlength = 1;
$rpoint = - $point;
} else {
$rpoint = $length - 20;
$rlength = 1;
}
echo "\t-> Section Data1 = ", substr_replace ( substr ( $data1, $start, $length ), "<b style=\"color:green\">{$data1 {$i}}</b>", $rpoint, $rlength ), PHP_EOL;
echo "\t-> Section Data2 = ", substr_replace ( substr ( $data2, $start, $length ), "<b style=\"color:red\">{$data2 {$i}}</b>", $rpoint, $rlength ), PHP_EOL;
}
}
}
A better way to save to Database
$toDatabse = base64_encode(serialize($data)); // Save to database
$fromDatabase = unserialize(base64_decode($data)); //Getting Save Format
I don't have enough reputation to comment, so I hope this is seen by people using the above "correct" answer:
Since php 5.5 the /e modifier in preg_replace() has been deprecated completely and the preg_match above will error out. The php documentation recommends using preg_match_callback in its place.
Please find the following solution as an alternative to the above proposed preg_match.
$fixed_data = preg_replace_callback ( '!s:(\d+):"(.*?)";!', function($match) {
return ($match[1] == strlen($match[2])) ? $match[0] : 's:' . strlen($match[2]) . ':"' . $match[2] . '";';
},$bad_data );
Quick Fix
Recalculating the length of the elements in serialized array - but don't use (preg_replace) it's deprecated - better use preg_replace_callback:
Edit: New Version now not just wrong length but it also fix line-breaks and count correct characters with aczent (thanks to mickmackusa)
// New Version
$data = preg_replace_callback('!s:\d+:"(.*?)";!s',
function($m) {
return "s:" . strlen($m[1]) . ':"'.$m[1].'";';
}, $data
);
There's another reason unserialize() failed because you improperly put serialized data into the database see Official Explanation here. Since serialize() returns binary data and php variables don't care encoding methods, so that putting it into TEXT, VARCHAR() will cause this error.
Solution: store serialized data into BLOB in your table.
$badData = 'a:2:{i:0;s:16:"as:45:"d";
Is \n";i:1;s:19:"as:45:"d";
Is \r\n";}';
You can not fix a broken serialize string using the proposed regexes:
$data = preg_replace('!s:(\d+):"(.*?)";!e', "'s:'.strlen('$2').':\"$2\";'", $badData);
var_dump(#unserialize($data)); // Output: bool(false)
// or
$data = preg_replace_callback(
'/s:(\d+):"(.*?)";/',
function($m){
return 's:' . strlen($m[2]) . ':"' . $m[2] . '";';
},
$badData
);
var_dump(#unserialize($data)); // Output: bool(false)
You can fix broken serialize string using following regex:
$data = preg_replace_callback(
'/(?<=^|\{|;)s:(\d+):\"(.*?)\";(?=[asbdiO]\:\d|N;|\}|$)/s',
function($m){
return 's:' . strlen($m[2]) . ':"' . $m[2] . '";';
},
$badData
);
var_dump(#unserialize($data));
Output
array(2) {
[0] =>
string(17) "as:45:"d";
Is \n"
[1] =>
string(19) "as:45:"d";
Is \r\n"
}
or
array(2) {
[0] =>
string(16) "as:45:"d";
Is \n"
[1] =>
string(18) "as:45:"d";
Is \r\n"
}
This error is caused because your charset is wrong.
Set charset after open tag:
header('Content-Type: text/html; charset=utf-8');
And set charset utf8 in your database :
mysql_query("SET NAMES 'utf8'");
public function unserializeKeySkills($string) {
$output = array();
$string = trim(preg_replace('/\s\s+/', ' ',$string));
$string = preg_replace_callback('!s:(\d+):"(.*?)";!', function($m) { return 's:'.strlen($m[2]).':"'.$m[2].'";'; }, utf8_encode( trim(preg_replace('/\s\s+/', ' ',$string)) ));
try {
$output = unserialize($string);
} catch (\Exception $e) {
\Log::error("unserialize Data : " .print_r($string,true));
}
return $output;
}
You can fix broken serialize string using following function, with multibyte character handling.
function repairSerializeString($value)
{
$regex = '/s:([0-9]+):"(.*?)"/';
return preg_replace_callback(
$regex, function($match) {
return "s:".mb_strlen($match[2]).":\"".$match[2]."\"";
},
$value
);
}
Here is an Online Tool for fixing a corrupted serialized string.
I'd like to add that this mostly happens due to a search and replace done on the DB and the serialization data(specially the key length) doesn't get updated as per the replace and that causes the "corruption".
Nonetheless, The above tool uses the following logic to fix the serialization data (Copied From Here).
function error_correction_serialise($string){
// at first, check if "fixing" is really needed at all. After that, security checkup.
if ( unserialize($string) !== true && preg_match('/^[aOs]:/', $string) ) {
$string = preg_replace_callback( '/s\:(\d+)\:\"(.*?)\";/s', function($matches){return 's:'.strlen($matches[2]).':"'.$matches[2].'";'; }, $string );
}
return $string;
}
the official docs says it should return false and set E_NOTICE
but since you got error then the error reporting is set to be triggered by E_NOTICE
here is a fix to allow you detect false returned by unserialize
$old_err=error_reporting();
error_reporting($old_err & ~E_NOTICE);
$object = unserialize($serialized_data);
error_reporting($old_err);
you might want to consider use base64 encode/decode
$string=base64_encode(serialize($obj));
unserialize(base64_decode($string));
The corruption in this question is isolated to a single substring at the end of the serialized string with was probably manually replaced by someone who lazily wanted to update the image filename. This fact will be apparent in my demonstration link below using the OP's posted data -- in short, C:fakepath100.jpg does not have a length of 19, it should be 17.
Since the serialized string corruption is limited to an incorrect byte/character count number, the following will do a fine job of updating the corrupted string with the correct byte count value.
The following regex based replacement will only be effective in remedying byte counts, nothing more.
It looks like many of the earlier posts are just copy-pasting a regex pattern from someone else. There is no reason to capture the potentially corrupted byte count number if it isn't going to be used in the replacement. Also, adding the s pattern modifier is a reasonable inclusion in case a string value contains newlines/line returns.
*For those that are not aware of the treatment of multibyte characters with serializing, you must not use mb_strlen() in the custom callback because it is the byte count that is stored not the character count, see my output...
Code: (Demo with OP's data) (Demo with arbitrary sample data) (Demo with condition replacing)
$corrupted = <<<STRING
a:4:{i:0;s:3:"three";i:1;s:5:"five";i:2;s:2:"newline1
newline2";i:3;s:6:"garçon";}
STRING;
$repaired = preg_replace_callback(
'/s:\d+:"(.*?)";/s',
// ^^^- matched/consumed but not captured because not used in replacement
function ($m) {
return "s:" . strlen($m[1]) . ":\"{$m[1]}\";";
},
$corrupted
);
echo $corrupted , "\n" , $repaired;
echo "\n---\n";
var_export(unserialize($repaired));
Output:
a:4:{i:0;s:3:"three";i:1;s:5:"five";i:2;s:2:"newline1
Newline2";i:3;s:6:"garçon";}
a:4:{i:0;s:5:"three";i:1;s:4:"five";i:2;s:17:"newline1
Newline2";i:3;s:7:"garçon";}
---
array (
0 => 'three',
1 => 'five',
2 => 'newline1
Newline2',
3 => 'garçon',
)
One leg down the rabbit hole... The above works fine even if double quotes occur in a string value, but if a string value contains "; or some other monkeywrenching sbustring, you'll need to go a little further and implement "lookarounds". My new pattern
checks that the leading s is:
the start of the entire input string or
preceded by ;
and checks that the "; is:
at the end of the entire input string or
followed by } or
followed by a string or integer declaration s: or i:
I haven't test each and every possibility; in fact, I am relatively unfamiliar with all of the possibilities in a serialized string because I never elect to work with serialized data -- always json in modern applications. If there are additional possible leading or trailing characters, leave a comment and I'll extend the lookarounds.
Extended snippet: (Demo)
$corrupted_byte_counts = <<<STRING
a:12:{i:0;s:3:"three";i:1;s:5:"five";i:2;s:2:"newline1
newline2";i:3;s:6:"garçon";i:4;s:111:"double " quote \"escaped";i:5;s:1:"a,comma";i:6;s:9:"a:colon";i:7;s:0:"single 'quote";i:8;s:999:"semi;colon";s:5:"assoc";s:3:"yes";i:9;s:1:"monkey";wrenching doublequote-semicolon";s:3:"s:";s:9:"val s: val";}
STRING;
$repaired = preg_replace_callback(
'/(?<=^|;)s:\d+:"(.*?)";(?=$|}|[si]:)/s',
//^^^^^^^^--------------^^^^^^^^^^^^^-- some additional validation
function ($m) {
return 's:' . strlen($m[1]) . ":\"{$m[1]}\";";
},
$corrupted_byte_counts
);
echo "corrupted serialized array:\n$corrupted_byte_counts";
echo "\n---\n";
echo "repaired serialized array:\n$repaired";
echo "\n---\n";
print_r(unserialize($repaired));
Output:
corrupted serialized array:
a:12:{i:0;s:3:"three";i:1;s:5:"five";i:2;s:2:"newline1
newline2";i:3;s:6:"garçon";i:4;s:111:"double " quote \"escaped";i:5;s:1:"a,comma";i:6;s:9:"a:colon";i:7;s:0:"single 'quote";i:8;s:999:"semi;colon";s:5:"assoc";s:3:"yes";i:9;s:1:"monkey";wrenching doublequote-semicolon";s:3:"s:";s:9:"val s: val";}
---
repaired serialized array:
a:12:{i:0;s:5:"three";i:1;s:4:"five";i:2;s:17:"newline1
newline2";i:3;s:7:"garçon";i:4;s:24:"double " quote \"escaped";i:5;s:7:"a,comma";i:6;s:7:"a:colon";i:7;s:13:"single 'quote";i:8;s:10:"semi;colon";s:5:"assoc";s:3:"yes";i:9;s:39:"monkey";wrenching doublequote-semicolon";s:2:"s:";s:10:"val s: val";}
---
Array
(
[0] => three
[1] => five
[2] => newline1
newline2
[3] => garçon
[4] => double " quote \"escaped
[5] => a,comma
[6] => a:colon
[7] => single 'quote
[8] => semi;colon
[assoc] => yes
[9] => monkey";wrenching doublequote-semicolon
[s:] => val s: val
)
You will have to alter the collation type to utf8_unicode_ci and the problem will be fixed.
In my case I was storing serialized data in BLOB field of MySQL DB which apparently wasn't big enough to contain the whole value and truncated it. Such a string obviously could not be unserialized.
Once converted that field to MEDIUMBLOB the problem dissipated.
Also it may be needed to switch in table options ROW_FORMAT to DYNAMIC or COMPRESSED.
After having tried some things on this page without success I had a look in the page-source and remarked that all quotes in the serialized string have been replaced by html-entities.
Decoding these entities helps avoiding much headache:
$myVar = html_entity_decode($myVar);
change the column size of the particular field(LONGTEXT)
Another reason of this problem can be column type of "payload" sessions table. If you have huge data on session, a text column wouldn't be enough. You will need MEDIUMTEXT or even LONGTEXT.
You can use this for all case:
$newdata = preg_replace_callback(
'/(?<=^|\{|;)s:(\d+):\"(.*?)\";(?=[asbdiO]\:\d|N;|\}|$)/s',
function($m){
return 's:' . strlen($m[2]) . ':"' . $m[2] . '";';
},
$badData
);

parsing string into array (['class': 'navigation', 'id': 'navigation'])

Anyone can suggest an alternative method to parse a string: (['class': 'navigation', 'id': 'navigation']).
I used to use:
if(strpos($match[1], '[') === 0)
{
$render = array();
foreach(explode(',', substr($match[1], 1, -1)) as $arr)
{
$parts = explode(':', $arr);
if(count($parts) == 2)
{
$render[substr(trim($parts[0]), 1, -1)] = substr(trim($parts[1]), 1, -1);
}
else
{
$render[] = substr(trim($parts[0]), 1, -1);
}
}
$args[] = $render;
}
elseif(strpos($match[1], "'") == 0)
{
$args[] = substr($match[1], 1, -1);
}
However, it doesn't take long to understand the drawbacks of this method, e.g. ['title': 'Tom's diary'] would completely fail the code.
It would also nice to be able to identify an erroneous entries and leave them out. At the moment all I do is use: |{{([^}]+)}}| to catch all functions, which look like: {{foo}}, {{foo('test', 'best')}} or {{foo(['array': 'bar'])}}. If you have quick solution, I'd appreciate if you share it.
Your string's format seems not too far from JSON -- even if it doesn't seem to be real JSON (you are using [] instead of {}, arround what seems to be an object).
Maybe you could change your format, in order to use JSON ?
That would allow you to use the json_encode() and json_decode() functions.
And more people would be able to work with your data, as you'd be using a standard format.
This is a json string, just use json_decode to parse it into an array.

Categories