PHPUnit: expect method call with array as argument - php

I have a PHPUnit test case, in which I am puzzled by the following snippet. I want to check that the method actionUpload calls the function exposeAndSaveDataLines correctly, i.e. that the first argument is an array as I expect it to be.
public function test_actionUpload()
{
$sut = $this->getMockBuilder('MasterdataController')
->setMethods(array('exposeAndSaveDataLines', 'render'))
->disableOriginalConstructor()
->getMock();
$expectedLines = require_once ($this->dataDir . 'expectedLines.php');
$sut->expects($this->once())
->method('exposeAndSaveDataLines')
->with($this->equalTo($expectedLines),
$this->anything(),
$this->anything(),
$this->anything(),
$this->anything());
$sut->actionUpload();
}
The expected data is a printout of the current array, made with a temporary print_r (var_export($lines)) in the actual code. I return it in the file expectedLines.php, and when I manually print it, it is correct.
Now, when I run the test case with a single character deliberately misspelled in expectedLines, I get the following error (as expected).
Failed asserting that two arrays are equal.
--- Expected
+++ Actual
## ##
3 => 'Colour Group Code'
- 4 => '{2F30E832-D3DB-447E-B733-7BC5125CBCCc}'
+ 4 => '{2F30E832-D3DB-447E-B733-7BC5125CBCCC}'
)
)
)
However, when I correct the mistake, it still mentions that the two arrays are not equal. However, it now prints the entire array (at least the start of it, it is a long array), but it doesn't show any differences (no - and + in front of any line). Why does the expects method not recognize that the two arrays are the same? How am I able to test this properly?
EDIT 1
I have shortened the array, such that it prints the entire array when they are not equal. Still no + or - signs in the comparison.
This is the end of my expectation PHP file.
'RetTarget Area' => array(
0 => array(
0 => '',
1 => '',
2 => '{C19D52BC-834C-45DA-B17F-74D73A2EC0BB}
'
),
1 => array(
0 => '1',
1 => '1',
2 => '{5E25C44F-C18A-4F54-B6B1-248955A82E59}'
)
)
);
This is the end of my comparison output in the console.
'RetTarget Area' => Array (
0 => Array (
0 => ''
1 => ''
2 => '{C19D52BC-834C-45DA-B17F-74D73A2EC0BB}
'
)
1 => Array (...)
)
)
I find it suspicious that the last Array is not fully shown in the comparison.
EDIT 2
I find here that the order of the arrays is important. I am pretty sure though I have all elements in the same order, if PHP is not doing something secret under the hood. The solution mentioned there I cannot copy, since I don't have a $this->assertEquals but a ->with($this->equalTo syntax.
EDIT 3
I read here about an undocumented parameter $canonicalize that orders arrays before comparing. When I use it like this:
$sut->expects($this->once())
->method('exposeAndSaveDataLines')
->with($this->equalTo($expectedLines, $delta = 0.0, $maxDepth = 10, $canonicalize = true, $ignoreCase = false),
$this->anything(),
$this->anything(),
$this->anything(),
$this->anything());
I see that the order of the arrays is indeed changed, but I still see the same error. Also, still one array is 'collapsed', which I suspect causes this failure. Besides, I don't want to order all my subarrays, they should be in the same order in the real and expected result.
--- Expected
+++ Actual
## ##
Array (
0 => Array (
0 => Array (
0 => ''
1 => ''
2 => '{C19D52BC-834C-45DA-B17F-74D73A2EC0BB}
'
)
1 => Array (...)
)
EDIT 4
When I use identicalTo instead of equalTo, I get a more elaborate error message, saying that the one array is not identical to the other array, while printing both of them. I copy-pasted them both into a text file, and used the command diff to check for any differences, but there were none. Still, PHPUnit claims that the two arrays are not equal/identical.
EDIT 5
When I use greaterThanOrEqual or even greaterThan instead of equalTo, then the test passes. This does not happen for lessThanOrEqual. This implies that there is a difference between the two arrays.
If I manually change the expected outcome into something with a string that is alphabetically before the correct string, I can lessThan pass as well, but then of course greaterThanOrEqual fails.
EDIT 6
I am getting convinced that the line ending of the strings in my array are making this comparison to fail, which doesn't show up in all comparisons.
I now have the following assertion.
public function test_actionUpload_v10MasterdataFile()
{
....
$sut->expects($this->once())
->method('exposeAndSaveDataLines')
->will($this->returnCallback(function($lines) {
$expectedLines = include ($this->dataDir . 'ExpectedLines.php');
$arrays_similar = $this->similar_arrays($lines, $expectedLines);
PHPUnit_Framework_Assert::assertTrue($arrays_similar);
}));
$sut->actionUpload();
}
private function similar_arrays($a, $b)
{
if(is_array($a) && is_array($b))
{
if(count(array_diff(array_keys($a), array_keys($b))) > 0)
{
print_r(array_diff(array_keys($a), array_keys($b)));
return false;
}
foreach($a as $k => $v)
{
if(!$this->similar_arrays($v, $b[$k]))
{
return false;
}
}
return true;
}
else
{
if ($a !== $b)
{
print_r(PHP_EOL . 'A: '. $a. PHP_EOL . 'Type: ' . gettype($a) . PHP_EOL);
print_r(PHP_EOL . 'B: '. $b. PHP_EOL . 'Type: ' . gettype($b) . PHP_EOL);
}
return $a === $b;
}
}
With the following result.
A: {72C2F175-9F50-4C9C-AF82-9E3FB875EA82}
Type: string
B: {72C2F175-9F50-4C9C-AF82-9E3FB875EA82}
Type: string

I finally got it to work, although it is a bit of a compromise. I am now removing newlines before I compare the arrays. This cannot be done in the with method, so I have made the following construction.
public function test_actionUpload_v10MasterdataFile()
{
/*
* Create a stub to disable the original constructor.
* Exposing data and rendering are stubbed.
* All other methods behave exactly the same as in the real Controller.
*/
$sut = $this->getMockBuilder('MasterdataController')
->setMethods(array('exposeAndSaveDataLines', 'render'))
->disableOriginalConstructor()
->getMock();
$sut->expects($this->once())
->method('exposeAndSaveDataLines')
->will($this->returnCallback(function($lines) {
$expectedLines = include ($this->dataDir . 'ExpectedLines.php');
PHPUnit_Framework_Assert::assertTrue($this->similar_arrays($lines, $expectedLines));
}));
// Execute the test
$sut->actionUpload();
}
...
private function similar_arrays($a, $b)
{
/**
* Check if two arrays have equal keys and values associated with it, without
* looking at order of elements, and discarding newlines.
*/
if(is_array($a) && is_array($b))
{
if(count(array_diff(array_keys($a), array_keys($b))) > 0)
{
return false;
}
foreach($a as $k => $v)
{
if(!$this->similar_arrays($v, $b[$k]))
{
return false;
}
}
return true;
}
else
{
$a = rtrim($a);
$b = rtrim($b);
$extended_output = false;
if ($extended_output && ($a !== $b))
{
print_r(PHP_EOL . 'A: '. $a. PHP_EOL . 'Type: ' . gettype($a) . PHP_EOL);
print_r(PHP_EOL . 'B: '. $b. PHP_EOL . 'Type: ' . gettype($b) . PHP_EOL);
}
return $a === $b;
}
}

Related

Difference between "==" and "strcmp" in PHP

I was asked to create simulation array_keys function, but check equality "==" returns false.
But with "strcmp ($a, $b) == 0" return true.
class Utility
{
public function convertArrayKeys(array $array)
{
$i = 0;
$elements = [];
foreach ($array as $element=>$value) {
$elements[] = ' ' . $i++ . " => '" . $element . "'";
}
return 'array ( ' . implode(', ', $elements) . ', )';
}
public function testStrings(array $array)
{
$arrayOur = $this->convertArrayKeys($array);
$arrayPhp = var_export(array_keys($array),true);
if ($arrayOur == $arrayPhp){
echo 'They are identical :)';
} else {
echo 'They are not identical :(';
echo '<br>';
print_r(str_split($arrayOur));
echo '<br>';
print_r(str_split($arrayPhp));
}
}
}
View:
$repository = array('box'=>'blue', 'cube'=>'red', 'ball'=>'green');
$utility = new Utility();
echo "OUr array_keys: ";
echo $utility->convertArrayKeys($repository);
echo "<br />";
echo "PHP array_keys: ";
print_r (var_export(array_keys($repository)));
echo "<hr >";
echo "<br />";
echo $utility->testStrings($repository);
I would appreciate to know because
Edit: The reason that the two don't work in THIS instance is that your functions dont produce identical outputs: yours produces:
array ( 0 => 'box', 1 => 'cube', 2 => 'ball', )
where the php function produces:
array (
0 => 'box',
1 => 'cube',
2 => 'ball',
)
If you were to view that in the web browser i think the web browser renderer does whitespace trickery. However try putting uhh <pre> tags around it (or run in command line to check).
Basically == does more then compare the two values - the documentation suggests "after type juggling". You can get some weird things by comparing strings using ==. One good example is: '1e3' == '1000'. It is useful to use ==at times, but possibly not in conjunction with strings.
Strcmp also though doesn't return a true/false answer, but a -1, 0, 1 answer indicating which string is alphabetically in front of the other.
You should also look at === which can also have helpful uses, but personally I would stick with strcmp with strings.
Hi never use == in PHP. It will not do what you expect. Even if you are comparing strings to strings, PHP will implicitly cast them to floats and do a numerical comparison if they appear numerical.
try these you will know the reason
$something = 0;
echo ('password' == $something) ? 'true' : 'false';// true
$something = 0;
echo ('password' === $something) ? 'true' : 'false'; // false
echo strcmp('password123',$something); // 1
Because they are not arrays, rather they are strings. Arrays are not created like this. you are doing it wrong. Were they arrays then
if ($arrayOur == $arrayPhp)
would have evaluated to true. But they are just strings and
"strcmp ($a, $b) == 0"
Does not return true, because there are whitespaces in the first string,
return 'array ( ' . implode(', ', $elements) . ', )';
You are doing it completely wrong. You need to correct your approach.

How can I efficiently search a field for flipped word order?

I have the following array.
$arr = array('foo','bar','foo-bar','abc','def','abc-def','ghi','abc-def-ghi');
I'm given a new string to decide to add to the array or not. If the string is already in the array, don't add it. If it is not in the array in its current form, but in a flipped word form is found, don't add it.
How should I accomplish this?
Examples:
'foo' —-> N - Do NOT add, already found
'xyz' —-> Y - Add, this is new
'bar-foo' —-> N - Do NOT add, already found in the flipped form 'foo-bar'
'ghi-jkl' —-> Y - Add, this is new
What do you recommend?
If you want to exclude items whose elements ('abc','ghi', etc.) are contained in another order and not only reversed, you could do:
$arr = array('foo','bar','foo-bar','abc','def','abc-def','ghi','abc-def-ghi');
function split_and_sort($str) {
$partsA = explode('-', $str);
sort($partsA);
return $partsA;
}
$arr_parts = array_map('split_and_sort', $arr);
$tests = array('foo','xyz','bar-foo','ghi-jkl');
$tests_parts = array_map('split_and_sort', $tests);
foreach($tests_parts as $test) {
if( !in_array($test, $arr_parts)) {
echo "adding: " . join('-', $test) . "\n";
$arr[] = join('-', $test);
}
else {
echo "skipping: " . join('-', $test) . "\n";
}
}
var_export($arr);
which outputs:
skipping: foo
adding: xyz
skipping: bar-foo
adding: ghi-jkl
array (
0 => 'foo',
1 => 'bar',
2 => 'foo-bar',
3 => 'abc',
4 => 'def',
5 => 'abc-def',
6 => 'ghi',
7 => 'abc-def-ghi',
8 => 'xyz',
9 => 'ghi-jkl',
)
Heres a suggestions on one way you can try...
for each string in $arr, reverse it as push into another array called $rev_arr
then...
$new_array = array();
foreach ($arr as $arr_1) $new_array[$arr_1] = true; // just set something
foreach ($rev_arr as $arr_2) $new_array[$arr_2] = true; // do also for reverse
now you can check what you want to do based on
if ( isset($new_arr[ $YOUR_TEST_VARIABLE_HERE ]) ) { // match found
}

What is wrong with my PHP filtering system?

I have a simple filtering function for data that I'm receiving from POST, and for another variable (it will be part of the SESSION array, but in development is in an array of its own). The POST data is handled by the function exactly as expected, whereas the other variable, $sess['iid'], always fails. Why?
I can work around this, but I hate not to understand why it's happening.
The filtering function:
function filterNumber($fin) {
if( ctype_digit( $fin ) ) {
$fout = $fin;
} else {
$fout = 0;
}
return $fout;
}
I am strict about naming variables, so the POST array is transferred into $dirty[], and then $clean[] (for entry into the database) is produced by applying the appropriate filter to $dirty[]. The exact same sequence is applied to $sess['iid'].
Examples of each stage:
$dirty['iid'] = $sess['iid'];
$dirty['liverpool'] = $_POST['liverpool'];
$clean['iid'] = filterNumber($dirty['iid']);
$clean['liverpool'] = filterNumber($dirty['liverpool']);
The first step - $sess['iid'] to $dirty['iid] - works, just as the POST variables do.
But the second, $dirty['iid'] to $clean['iid'] via filterNumber(), results in a value of 0, regardless of what I put into $sess['iid'].
This also happens if I eliminate the $dirty['iid'] step.
function filterNumber($fin) {
if( ctype_digit( $fin ) ) {
$fout = $fin;
} else {
$fout = 0;
}
return $fout;
}
$tests = array(
1,
'1',
'123',
123,
1.2,
'1.2',
'abc',
true,
false,
null,
new stdClass()
);
foreach ($tests as $test) {
echo 'Testing: ' . var_export($test, true) . ' - result: ' . filterNumber($test);
}
Prints
Testing: 1 - result: 0
Testing: '1' - result: 1
Testing: '123' - result: 123
Testing: 123 - result: 0
Testing: 1.2 - result: 0
Testing: '1.2' - result: 0
Testing: 'abc' - result: 0
Testing: true - result: 0
Testing: false - result: 0
Testing: NULL - result: 0
Testing: stdClass::__set_state(array(
)) - result: 0
Resources
var_export
var-dump
gettype

How to repair a serialized string which has been corrupted by an incorrect byte count length?

I am using Hotaru CMS with the Image Upload plugin, I get this error if I try to attach an image to a post, otherwise there is no error:
unserialize() [function.unserialize]: Error at offset
The offending code (error points to line with **):
/**
* Retrieve submission step data
*
* #param $key - empty when setting
* #return bool
*/
public function loadSubmitData($h, $key = '')
{
// delete everything in this table older than 30 minutes:
$this->deleteTempData($h->db);
if (!$key) { return false; }
$cleanKey = preg_replace('/[^a-z0-9]+/','',$key);
if (strcmp($key,$cleanKey) != 0) {
return false;
} else {
$sql = "SELECT tempdata_value FROM " . TABLE_TEMPDATA . " WHERE tempdata_key = %s ORDER BY tempdata_updatedts DESC LIMIT 1";
$submitted_data = $h->db->get_var($h->db->prepare($sql, $key));
**if ($submitted_data) { return unserialize($submitted_data); } else { return false; }**
}
}
Data from the table, notice the end bit has the image info, I am not an expert in PHP so I was wondering what you guys/gals might think?
tempdata_value:
a:10:{s:16:"submit_editorial";b:0;s:15:"submit_orig_url";s:13:"www.bbc.co.uk";s:12:"submit_title";s:14:"No title found";s:14:"submit_content";s:12:"dnfsdkfjdfdf";s:15:"submit_category";i:2;s:11:"submit_tags";s:3:"bbc";s:9:"submit_id";b:0;s:16:"submit_subscribe";i:0;s:15:"submit_comments";s:4:"open";s:5:"image";s:19:"C:fakepath100.jpg";}
Edit: I think I've found the serialize bit...
/**
* Save submission step data
*
* #return bool
*/
public function saveSubmitData($h)
{
// delete everything in this table older than 30 minutes:
$this->deleteTempData($h->db);
$sid = preg_replace('/[^a-z0-9]+/i', '', session_id());
$key = md5(microtime() . $sid . rand());
$sql = "INSERT INTO " . TABLE_TEMPDATA . " (tempdata_key, tempdata_value, tempdata_updateby) VALUES (%s,%s, %d)";
$h->db->query($h->db->prepare($sql, $key, serialize($h->vars['submitted_data']), $h->currentUser->id));
return $key;
}
unserialize() [function.unserialize]: Error at offset was dues to invalid serialization data due to invalid length
Quick Fix
What you can do is is recalculating the length of the elements in serialized array
You current serialized data
$data = 'a:10:{s:16:"submit_editorial";b:0;s:15:"submit_orig_url";s:13:"www.bbc.co.uk";s:12:"submit_title";s:14:"No title found";s:14:"submit_content";s:12:"dnfsdkfjdfdf";s:15:"submit_category";i:2;s:11:"submit_tags";s:3:"bbc";s:9:"submit_id";b:0;s:16:"submit_subscribe";i:0;s:15:"submit_comments";s:4:"open";s:5:"image";s:19:"C:fakepath100.jpg";}';
Example without recalculation
var_dump(unserialize($data));
Output
Notice: unserialize() [function.unserialize]: Error at offset 337 of 338 bytes
Recalculating
$data = preg_replace('!s:(\d+):"(.*?)";!e', "'s:'.strlen('$2').':\"$2\";'", $data);
var_dump(unserialize($data));
Output
array
'submit_editorial' => boolean false
'submit_orig_url' => string 'www.bbc.co.uk' (length=13)
'submit_title' => string 'No title found' (length=14)
'submit_content' => string 'dnfsdkfjdfdf' (length=12)
'submit_category' => int 2
'submit_tags' => string 'bbc' (length=3)
'submit_id' => boolean false
'submit_subscribe' => int 0
'submit_comments' => string 'open' (length=4)
'image' => string 'C:fakepath100.jpg' (length=17)
Recommendation .. I
Instead of using this kind of quick fix ... i"ll advice you update the question with
How you are serializing your data
How you are Saving it ..
================================ EDIT 1 ===============================
The Error
The Error was generated because of use of double quote " instead single quote ' that is why C:\fakepath\100.png was converted to C:fakepath100.jpg
To fix the error
You need to change $h->vars['submitted_data'] From (Note the singe quite ' )
Replace
$h->vars['submitted_data']['image'] = "C:\fakepath\100.png" ;
With
$h->vars['submitted_data']['image'] = 'C:\fakepath\100.png' ;
Additional Filter
You can also add this simple filter before you call serialize
function satitize(&$value, $key)
{
$value = addslashes($value);
}
array_walk($h->vars['submitted_data'], "satitize");
If you have UTF Characters you can also run
$h->vars['submitted_data'] = array_map("utf8_encode",$h->vars['submitted_data']);
How to detect the problem in future serialized data
findSerializeError ( $data1 ) ;
Output
Diffrence 9 != 7
-> ORD number 57 != 55
-> Line Number = 315
-> Section Data1 = pen";s:5:"image";s:19:"C:fakepath100.jpg
-> Section Data2 = pen";s:5:"image";s:17:"C:fakepath100.jpg
^------- The Error (Element Length)
findSerializeError Function
function findSerializeError($data1) {
echo "<pre>";
$data2 = preg_replace ( '!s:(\d+):"(.*?)";!e', "'s:'.strlen('$2').':\"$2\";'",$data1 );
$max = (strlen ( $data1 ) > strlen ( $data2 )) ? strlen ( $data1 ) : strlen ( $data2 );
echo $data1 . PHP_EOL;
echo $data2 . PHP_EOL;
for($i = 0; $i < $max; $i ++) {
if (#$data1 {$i} !== #$data2 {$i}) {
echo "Diffrence ", #$data1 {$i}, " != ", #$data2 {$i}, PHP_EOL;
echo "\t-> ORD number ", ord ( #$data1 {$i} ), " != ", ord ( #$data2 {$i} ), PHP_EOL;
echo "\t-> Line Number = $i" . PHP_EOL;
$start = ($i - 20);
$start = ($start < 0) ? 0 : $start;
$length = 40;
$point = $max - $i;
if ($point < 20) {
$rlength = 1;
$rpoint = - $point;
} else {
$rpoint = $length - 20;
$rlength = 1;
}
echo "\t-> Section Data1 = ", substr_replace ( substr ( $data1, $start, $length ), "<b style=\"color:green\">{$data1 {$i}}</b>", $rpoint, $rlength ), PHP_EOL;
echo "\t-> Section Data2 = ", substr_replace ( substr ( $data2, $start, $length ), "<b style=\"color:red\">{$data2 {$i}}</b>", $rpoint, $rlength ), PHP_EOL;
}
}
}
A better way to save to Database
$toDatabse = base64_encode(serialize($data)); // Save to database
$fromDatabase = unserialize(base64_decode($data)); //Getting Save Format
I don't have enough reputation to comment, so I hope this is seen by people using the above "correct" answer:
Since php 5.5 the /e modifier in preg_replace() has been deprecated completely and the preg_match above will error out. The php documentation recommends using preg_match_callback in its place.
Please find the following solution as an alternative to the above proposed preg_match.
$fixed_data = preg_replace_callback ( '!s:(\d+):"(.*?)";!', function($match) {
return ($match[1] == strlen($match[2])) ? $match[0] : 's:' . strlen($match[2]) . ':"' . $match[2] . '";';
},$bad_data );
Quick Fix
Recalculating the length of the elements in serialized array - but don't use (preg_replace) it's deprecated - better use preg_replace_callback:
Edit: New Version now not just wrong length but it also fix line-breaks and count correct characters with aczent (thanks to mickmackusa)
// New Version
$data = preg_replace_callback('!s:\d+:"(.*?)";!s',
function($m) {
return "s:" . strlen($m[1]) . ':"'.$m[1].'";';
}, $data
);
There's another reason unserialize() failed because you improperly put serialized data into the database see Official Explanation here. Since serialize() returns binary data and php variables don't care encoding methods, so that putting it into TEXT, VARCHAR() will cause this error.
Solution: store serialized data into BLOB in your table.
$badData = 'a:2:{i:0;s:16:"as:45:"d";
Is \n";i:1;s:19:"as:45:"d";
Is \r\n";}';
You can not fix a broken serialize string using the proposed regexes:
$data = preg_replace('!s:(\d+):"(.*?)";!e', "'s:'.strlen('$2').':\"$2\";'", $badData);
var_dump(#unserialize($data)); // Output: bool(false)
// or
$data = preg_replace_callback(
'/s:(\d+):"(.*?)";/',
function($m){
return 's:' . strlen($m[2]) . ':"' . $m[2] . '";';
},
$badData
);
var_dump(#unserialize($data)); // Output: bool(false)
You can fix broken serialize string using following regex:
$data = preg_replace_callback(
'/(?<=^|\{|;)s:(\d+):\"(.*?)\";(?=[asbdiO]\:\d|N;|\}|$)/s',
function($m){
return 's:' . strlen($m[2]) . ':"' . $m[2] . '";';
},
$badData
);
var_dump(#unserialize($data));
Output
array(2) {
[0] =>
string(17) "as:45:"d";
Is \n"
[1] =>
string(19) "as:45:"d";
Is \r\n"
}
or
array(2) {
[0] =>
string(16) "as:45:"d";
Is \n"
[1] =>
string(18) "as:45:"d";
Is \r\n"
}
This error is caused because your charset is wrong.
Set charset after open tag:
header('Content-Type: text/html; charset=utf-8');
And set charset utf8 in your database :
mysql_query("SET NAMES 'utf8'");
public function unserializeKeySkills($string) {
$output = array();
$string = trim(preg_replace('/\s\s+/', ' ',$string));
$string = preg_replace_callback('!s:(\d+):"(.*?)";!', function($m) { return 's:'.strlen($m[2]).':"'.$m[2].'";'; }, utf8_encode( trim(preg_replace('/\s\s+/', ' ',$string)) ));
try {
$output = unserialize($string);
} catch (\Exception $e) {
\Log::error("unserialize Data : " .print_r($string,true));
}
return $output;
}
You can fix broken serialize string using following function, with multibyte character handling.
function repairSerializeString($value)
{
$regex = '/s:([0-9]+):"(.*?)"/';
return preg_replace_callback(
$regex, function($match) {
return "s:".mb_strlen($match[2]).":\"".$match[2]."\"";
},
$value
);
}
Here is an Online Tool for fixing a corrupted serialized string.
I'd like to add that this mostly happens due to a search and replace done on the DB and the serialization data(specially the key length) doesn't get updated as per the replace and that causes the "corruption".
Nonetheless, The above tool uses the following logic to fix the serialization data (Copied From Here).
function error_correction_serialise($string){
// at first, check if "fixing" is really needed at all. After that, security checkup.
if ( unserialize($string) !== true && preg_match('/^[aOs]:/', $string) ) {
$string = preg_replace_callback( '/s\:(\d+)\:\"(.*?)\";/s', function($matches){return 's:'.strlen($matches[2]).':"'.$matches[2].'";'; }, $string );
}
return $string;
}
the official docs says it should return false and set E_NOTICE
but since you got error then the error reporting is set to be triggered by E_NOTICE
here is a fix to allow you detect false returned by unserialize
$old_err=error_reporting();
error_reporting($old_err & ~E_NOTICE);
$object = unserialize($serialized_data);
error_reporting($old_err);
you might want to consider use base64 encode/decode
$string=base64_encode(serialize($obj));
unserialize(base64_decode($string));
The corruption in this question is isolated to a single substring at the end of the serialized string with was probably manually replaced by someone who lazily wanted to update the image filename. This fact will be apparent in my demonstration link below using the OP's posted data -- in short, C:fakepath100.jpg does not have a length of 19, it should be 17.
Since the serialized string corruption is limited to an incorrect byte/character count number, the following will do a fine job of updating the corrupted string with the correct byte count value.
The following regex based replacement will only be effective in remedying byte counts, nothing more.
It looks like many of the earlier posts are just copy-pasting a regex pattern from someone else. There is no reason to capture the potentially corrupted byte count number if it isn't going to be used in the replacement. Also, adding the s pattern modifier is a reasonable inclusion in case a string value contains newlines/line returns.
*For those that are not aware of the treatment of multibyte characters with serializing, you must not use mb_strlen() in the custom callback because it is the byte count that is stored not the character count, see my output...
Code: (Demo with OP's data) (Demo with arbitrary sample data) (Demo with condition replacing)
$corrupted = <<<STRING
a:4:{i:0;s:3:"three";i:1;s:5:"five";i:2;s:2:"newline1
newline2";i:3;s:6:"garçon";}
STRING;
$repaired = preg_replace_callback(
'/s:\d+:"(.*?)";/s',
// ^^^- matched/consumed but not captured because not used in replacement
function ($m) {
return "s:" . strlen($m[1]) . ":\"{$m[1]}\";";
},
$corrupted
);
echo $corrupted , "\n" , $repaired;
echo "\n---\n";
var_export(unserialize($repaired));
Output:
a:4:{i:0;s:3:"three";i:1;s:5:"five";i:2;s:2:"newline1
Newline2";i:3;s:6:"garçon";}
a:4:{i:0;s:5:"three";i:1;s:4:"five";i:2;s:17:"newline1
Newline2";i:3;s:7:"garçon";}
---
array (
0 => 'three',
1 => 'five',
2 => 'newline1
Newline2',
3 => 'garçon',
)
One leg down the rabbit hole... The above works fine even if double quotes occur in a string value, but if a string value contains "; or some other monkeywrenching sbustring, you'll need to go a little further and implement "lookarounds". My new pattern
checks that the leading s is:
the start of the entire input string or
preceded by ;
and checks that the "; is:
at the end of the entire input string or
followed by } or
followed by a string or integer declaration s: or i:
I haven't test each and every possibility; in fact, I am relatively unfamiliar with all of the possibilities in a serialized string because I never elect to work with serialized data -- always json in modern applications. If there are additional possible leading or trailing characters, leave a comment and I'll extend the lookarounds.
Extended snippet: (Demo)
$corrupted_byte_counts = <<<STRING
a:12:{i:0;s:3:"three";i:1;s:5:"five";i:2;s:2:"newline1
newline2";i:3;s:6:"garçon";i:4;s:111:"double " quote \"escaped";i:5;s:1:"a,comma";i:6;s:9:"a:colon";i:7;s:0:"single 'quote";i:8;s:999:"semi;colon";s:5:"assoc";s:3:"yes";i:9;s:1:"monkey";wrenching doublequote-semicolon";s:3:"s:";s:9:"val s: val";}
STRING;
$repaired = preg_replace_callback(
'/(?<=^|;)s:\d+:"(.*?)";(?=$|}|[si]:)/s',
//^^^^^^^^--------------^^^^^^^^^^^^^-- some additional validation
function ($m) {
return 's:' . strlen($m[1]) . ":\"{$m[1]}\";";
},
$corrupted_byte_counts
);
echo "corrupted serialized array:\n$corrupted_byte_counts";
echo "\n---\n";
echo "repaired serialized array:\n$repaired";
echo "\n---\n";
print_r(unserialize($repaired));
Output:
corrupted serialized array:
a:12:{i:0;s:3:"three";i:1;s:5:"five";i:2;s:2:"newline1
newline2";i:3;s:6:"garçon";i:4;s:111:"double " quote \"escaped";i:5;s:1:"a,comma";i:6;s:9:"a:colon";i:7;s:0:"single 'quote";i:8;s:999:"semi;colon";s:5:"assoc";s:3:"yes";i:9;s:1:"monkey";wrenching doublequote-semicolon";s:3:"s:";s:9:"val s: val";}
---
repaired serialized array:
a:12:{i:0;s:5:"three";i:1;s:4:"five";i:2;s:17:"newline1
newline2";i:3;s:7:"garçon";i:4;s:24:"double " quote \"escaped";i:5;s:7:"a,comma";i:6;s:7:"a:colon";i:7;s:13:"single 'quote";i:8;s:10:"semi;colon";s:5:"assoc";s:3:"yes";i:9;s:39:"monkey";wrenching doublequote-semicolon";s:2:"s:";s:10:"val s: val";}
---
Array
(
[0] => three
[1] => five
[2] => newline1
newline2
[3] => garçon
[4] => double " quote \"escaped
[5] => a,comma
[6] => a:colon
[7] => single 'quote
[8] => semi;colon
[assoc] => yes
[9] => monkey";wrenching doublequote-semicolon
[s:] => val s: val
)
You will have to alter the collation type to utf8_unicode_ci and the problem will be fixed.
In my case I was storing serialized data in BLOB field of MySQL DB which apparently wasn't big enough to contain the whole value and truncated it. Such a string obviously could not be unserialized.
Once converted that field to MEDIUMBLOB the problem dissipated.
Also it may be needed to switch in table options ROW_FORMAT to DYNAMIC or COMPRESSED.
After having tried some things on this page without success I had a look in the page-source and remarked that all quotes in the serialized string have been replaced by html-entities.
Decoding these entities helps avoiding much headache:
$myVar = html_entity_decode($myVar);
change the column size of the particular field(LONGTEXT)
Another reason of this problem can be column type of "payload" sessions table. If you have huge data on session, a text column wouldn't be enough. You will need MEDIUMTEXT or even LONGTEXT.
You can use this for all case:
$newdata = preg_replace_callback(
'/(?<=^|\{|;)s:(\d+):\"(.*?)\";(?=[asbdiO]\:\d|N;|\}|$)/s',
function($m){
return 's:' . strlen($m[2]) . ':"' . $m[2] . '";';
},
$badData
);

How to find leaf arrays in nested arrays?

I have a nested array in PHP:
array (
'0' => "+5x",
'1' => array (
'0' => "+",
'1' => "(",
'2' => "+3",
'3' => array (
'0' => "+",
'1' => "(",
'2' => array ( // I want to find this one.
'0' => "+",
'1' => "(",
'2' => "+5",
'3' => "-3",
'4' => ")"
),
'3' => "-3",
'4' => ")"
),
'4' => ")"
)
);
I need to process the innermost arrays, ones that themselves contain no arrays. In the example, it's the one with the comment: "I want to find this one." Is there a function for that?
I have thought about doing (written as an idea, not as correct PHP):
foreach ($array as $id => $value) {
if ($value is array) {
$name = $id;
foreach ($array[$id] as $id_2 => $value_2) {
if ($value_2 is array) {
$name .= "." . $id_2;
foreach ($array[$id][$id_2] as $id_3 => $value_3) {
if ($value_3 is array) {
$name .= "." . $id_3;
foreach ($array[$id][$id_2][$id_3] as $id_4 => $value_4) {
if ($value_4 is array) {
$name .= "." . $id_4;
foreach [and so it goes on];
} else {
$listOfInnerArrays[] = $name;
break;
}
}
} else {
$listOfInnerArrays[] = $name;
break;
}
}
} else {
$listOfInnerArrays[] = $name;
break;
}
}
}
}
So what it does is it makes $name the current key in the array. If the value is an array, it goes into it with foreach and adds "." and the id of the array. So we would in the example array end up with:
array (
'0' => "1.3.2",
)
Then I can process those values to access the inner arrays.
The problem is that the array that I'm trying to find the inner arrays of is dynamic and made of a user input. (It splits an input string where it finds + or -, and puts it in a separate nested array if it contains brackets. So if the user types a lot of brackets, there will be a lot of nested arrays.)
Therefore I need to make this pattern go for 20 times down, and still it will only catch 20 nested arrays no matter what.
Is there a function for that, again? Or is there a way to make it do this without my long code? Maybe make a loop make the necessary number of the foreach pattern and run it through eval()?
Definitions
simple:
Describes expressions without sub-expressions (e.g. "5", "x").
compound:
Describes expressions that have sub-expressions (e.g. "3+x", "1+2").
constness:
Whether an expression has a constant value (e.g. "5", "1+2") or not (e.g. "x", "3+x").
outer node:
In an expression tree, a node reachable by always traversing left or always traversing right. "Outer" is always relative to a given node; a node might be "outer" relative to one node, but "inner" relative to that node's parent.
inner node:
In an expression tree, a node that isn't an outer node.
For an illustration of "inner" and "outer" nodes, consider:
__1__
/ \
2 5
/ \ / \
3 4 6 7
3 and 7 are always outer nodes. 6 is outer relative to 5, but inner relative to 1.
Answer
The difficulty here lies more in the uneven expression format than the nesting. If you use expression trees, the example 5x+3=(x+(3+(5-3))) equation would parse to:
array(
'=' => array(
'+' => array( // 5x + 3
'*' => array(
5, 'x'
),
3
)
'+' => array( // (x+(3+(5-3)))
'x',
'+' => array( // (3+(5-3))
3,
'-' => array(
5, 3
) ) ) ) )
Note that nodes for binary operations are binary, and unary operations would have unary nodes. If the nodes for binary commutative operations could be combined into n-ary nodes, 5x+3=x+3+5-3 could be parsed to:
array(
'=' => array(
'+' => array( // 5x + 3
'*' => array(
5, 'x'
),
3
)
'+' => array( // x+3+5-3
'x',
3,
'-' => array(
5, 3
) ) ) )
Then, you'd write a post-order recursive function that would simplify nodes. "Post-order" means node processing happens after processing its children; there's also pre-order (process a node before its children) and in-order (process some children before a node, and the rest after). What follows is a rough outline. In it, "thing : Type" means "thing" has type "Type", and "&" indicates pass-by-reference.
simplify_expr(expression : Expression&, operation : Token) : Expression {
if (is_array(expression)) {
foreach expression as key => child {
Replace child with simplify_expr(child, key);
key will also need to be replaced if new child is a constant
and old was not.
}
return simplify_node(expression, operation);
} else {
return expression;
}
}
simplify_node(expression : Expression&, operation : Token) : Expression;
In a way, the real challenge is writing simplify_node. It could perform a number of operations on expression nodes:
If an inner grand-child doesn't match the constness of the other child but its sibling does, swap the siblings. In other words, make the odd-man-out an outer node. This step is in preparation for the next.
+ + + +
/ \ / \ / \ / \
\+ 2 ---> + 2 + y ---> + y
/ \ / \ / \ / \
1 x x 1 x 1 1 x
If a node and a child are the same commutative operation, the nodes could be rearranged. For example, there's rotation:
+ +
/ \ / \
\+ c ---> a +
/ \ / \
a b b c
This corresponds to changing "(a+b)+c" to "a+(b+c)". You'll want to rotate when "a" doesn't match the constness of "b" and "c". It allows the next transformation to be applied to the tree. For example, this step would convert "(x+3)+1" to "x+(3+1)", so the next step could then convert it to "x+4".
The overall goal is to make a tree with const children as siblings. If a commutative node has two const descendants, they can be rotated next to each other. If a node has only one const descendent, make it a child so that a node further up in the hierarchy can potentially combine the const node with another of the ancestor's const children (i.e. const nodes float up until they're siblings, at which point they combine like bubbles in soda).
If all children are constant, evaluate the node and replace it with the result.
Handling nodes with more than one compound child and n-ary nodes left as exercises for the reader.
Object-Oriented Alternative
An OO approach (using objects rather than arrays to build expression trees) would have a number of advantages. Operations would be more closely associated with nodes, for one; they'd be a property of a node object, rather than as the node key. It would also be easier to associate ancillary data with expression nodes, which would be useful for optimizations. You probably wouldn't need to get too deep into the OOP paradigm to implement this. The following simple type hierarchy could be made to work:
Expression
/ \
SimpleExpr CompoundExpr
/ \
ConstantExpr VariableExpr
Existing free functions that manipulate trees would become methods. The interfaces could look something like the following pseudocode. In it:
Child < Parent means "Child" is a subclass of "Parent".
Properties (such as isConstant) can be methods or fields; in PHP, you can implement this using overloading.
(...){...} indicate functions, with the parameters between parentheses and the body between brackets (much like function (...){...} in Javascript). This syntax is used for properties that are methods. Plain methods simply use brackets for the method body.
Now for the sample:
Expression {
isConstant:Boolean
simplify():Expression
}
SimpleExpr < Expression {
value:Varies
/* simplify() returns an expression so that an expression of one type can
be replaced with an expression of another type. An alternative is
to use the envelope/letter pattern:
http://users.rcn.com/jcoplien/Patterns/C++Idioms/EuroPLoP98.html#EnvelopeLetter
http://en.wikibooks.org/wiki/More_C%2B%2B_Idioms/Envelope_Letter
*/
simplify():Expression { return this }
}
ConstantExpr < SimpleExpr {
isConstant:Boolean = true
}
VariableExpr < SimpleExpr {
isConstant:Boolean = false
}
CompoundExpr < Expression {
operation:Token
children:Expression[]
commutesWith(op:Expression):Boolean
isCommutative:Boolean
isConstant:Boolean = (){
for each child in this.children:
if not child.isConstant, return false
return true
}
simplify():Expression {
for each child& in this.children {
child = child.simplify()
}
return this.simplify_node()
}
simplify_node(): Expression {
if this.isConstant {
evaluate this, returning new ConstExpr
} else {
if one child is simple {
if this.commutesWith(compound child)
and one grand-child doesn't match the constness of the simple child
and the other grand-child matches the constness of the simple child
{
if (compound child.isCommutative):
make odd-man-out among grand-children the outer child
rotate so that grand-children are both const or not
if grand-children are const:
set compound child to compound child.simplify_node()
}
} else {
...
}
}
return this
}
}
The PHP implementation for SimpleExpr and ConstantExpr, for example, could be:
class SimpleExpr extends Expression {
public $value;
function __construct($value) {
$this->value = $value;
}
function simplify() {
return $this;
}
}
class ConstantExpr extends SimpleExpr {
// Overloading
function __get($name) {
switch ($name) {
case 'isConstant':
return True;
}
}
}
An alternate implementation of ConstantExpr:
function Expression {
protected $_properties = array();
// Overloading
function __get($name) {
if (isset($this->_properties[$name])) {
return $this->_properties[$name];
} else {
// handle undefined property
...
}
}
...
}
class ConstantExpr extends SimpleExpr {
function __construct($value) {
parent::construct($value);
$this->_properties['isConstant'] = True;
}
}
Recursive foreach function, from comments at: http://php.net/manual/en/control-structures.foreach.php
/* Grab any values from a multidimensional array using infinite recursion. --Kris */
function RecurseArray($inarray, $result) {
foreach ($inarray as $inkey => $inval) {
if (is_array($inval)) {
$result = RecurseArray($inval, $result);
} else {
$result[] = $inval;
}
}
return $result;
}
Note that the above implementation produces a flattened array. To preserve nesting:
function RecurseArray($inarray) {
$result = array();
foreach ( $inarray as $inkey => $inval ) {
if ( is_array($inval) ) {
$result[] = RecurseArray($inval);
} else {
// process $inval, store in result array
$result[] = $inval;
}
}
return $result;
}
To modify an array in-place:
function RecurseArray(&$inarray) {
foreach ( $inarray as $inkey => &$inval ) {
if ( is_array($inval) ) {
RecurseArray($inval);
} else {
// process $inval
...
}
}
}
RecursiveIteratorIterator knows the current depth of any children. As you're interested only in children that have children, filter those with no children out and look for max-depth.
Then filter again based by depth for max-depth:
$ritit = new RecursiveIteratorIterator(new RecursiveArrayIterator($arr), RecursiveIteratorIterator::SELF_FIRST);
$cf = new ChildrenFilter($ritit);
$maxdepth = NULL;
foreach($cf as $v)
{
$maxdepth = max($maxdepth, $cf->getDepth());
}
if (NULL === $maxdepth)
{
throw new Exception('No children found.');
}
$df = new DepthFilter($cf, $maxdepth);
foreach($df as $v)
{
echo "Array with highest depth:\n", var_dump($v), "\n";
}
Demo / Source
Please, try the following code and let me know the results.
You just need to pass your array to the find_deepest function.
function find_deepest( $array )
{
$index = ''; // this variable stores the current position (1.2, 1.3.2, etc.)
$include = true; // this variable indicates if the current position should be added in the result or not
$result = array(); // this is the result of the function, containing the deepest indexes
$array_stack = array(); // this is a STACK (or LIFO) to temporarily store the sub-arrays - see http://en.wikipedia.org/wiki/LIFO_%28computing%29
reset( $array ); // here we make the array internal POINTER move to the first position
// each loop interaction moves the $array internal pointer one step forward - see http://php.net/each
// if $current value is null, we reached the end of $array; in this case, we will also continue the loop, if the stack contains more than one array
while ( ( $current = each( $array ) ) || ( count( $array_stack ) > 1 ) )
{
// we are looping $array elements... if we find an array (a sub-array), then we will "enter it...
if ( is_array( $current['value'] ) )
{
// update the index string
$index .= ( empty ( $index ) ? '' : '.' ) . $current['key'];
// we are entering a sub-array; by default, we will include it
$include = true;
// we will store our current $array in the stack, so we can move BACK to it later
array_push( $array_stack, $array );
// ATTENTION! Here we CHANGE the $array we are looping; here we "enter" the sub-array!
// with the command below, we start to LOOP the sub-array (whichever level it is)
$array = $current['value'];
}
// this condition means we reached the END of a sub-array (because in this case "each()" function has returned NULL)
// we will "move out" of it; we will return to the previous level
elseif ( empty( $current ) )
{
// if we reached this point and $include is still true, it means that the current array has NO sub-arrays inside it (otherwise, $include would be false, due to the following lines)
if ( $include )
$result[] = $index;
// ATTENTION! With the command below, we RESTORE $array to its precedent level... we entered a sub-array before, now we are goin OUT the sub-array and returning to the previous level, where the interrupted loop will continue
$array = array_pop( $array_stack );
// doing the same thing with the $index string (returning one level UP)
$index = substr( $index, 0, strrpos( $index, '.' ) );
// if we are RETURNING one level up, so we do NOT want the precedent array to be included in the results... do we?
$include = false;
}
// try removing the comment from the following two lines! you will see the array contents, because we always enter this "else" if we are not entering a sub-array or moving out of it
// else
// echo $index . ' => ' . $current['value'] . '<br>';
}
return $result;
}
$result = find_deepest( $my_array );
print_r( $result );
The most important parts of the code are:
the each command inside the while loop
the array_push function call, where we store the current array in the "array stack" in order to return back to it later
the array_pop function call, where we return one level back by restoring the current array from the "array stack"

Categories