Converting Base64 encoded tab delimited file in PHP - php

I am writing a reporting app that needs to consume logs which have been stored in the DB as base 64 encoded strings. I am able to decode them no problem, however, I am having some trouble getting them to be fed into str_getcsv() properly.
Below is the data I am working with, the code and the outputs. It seems to me that once decoded the files are not recognizable as tab-delimited. However, if I decode it with this URL and and save as a text file, I can open it properly in excel.
https://www.base64decode.org/
In PHP however, it seems to be an issue with recognizing some of the tabs and the line breaks seem to completely go away. I think it has to do with the encoding, the DB table and column are both UTF-8. They are being recognized as ASCII - which is a subset of UTF-8, but I am not sure if they need to be explicitly UTF-8 for it to work (the site that works uses UTF-8).
The code: very simple (though at this point I may be going overboard with the encoding)
// get the stored result (laravel eloquent)
$media_result = MediaResult::where("video_id", "=", $media_benchmark->id)->firstOrFail();
# decode the access_log stored as b64 string
$tab_file = base64_decode(mb_convert_encoding($media_result->access_log, "UTF-8"));
$encoding = mb_detect_encoding($tab_file); // I was using iconv() so I grabbed this - it is always ASCII
$new_file = mb_convert_encoding($tab_file,'UTF-8');
$encoding_new = mb_detect_encoding($new_file);
#if I were to echo both encoding variables, it would be ASCII - no matter what I do.
# convert the supposed tab-delimited file into an array
$full_stats = str_getcsv($new_file, 0, "\t");
Here is a sample base64 encoded log:
VVJJCXNlcnZlckFkZHJlc3MJbnVtYmVyT2ZTZXJ2ZXJBZGRyZXNzQ2hhbmdlcwltZWRpYVJlcXVlc3RzV1dBTgl0cmFuc2ZlckR1cmF0aW9uCW51bWJlck9mQnl0ZXNUcmFuc2ZlcnJlZAludW1iZXJPZk1lZGlhUmVxdWVzdHMJcGxheWJhY2tTdGFydERhdGUJcGxheWJhY2tTZXNzaW9uSUQJcGxheWJhY2tTdGFydE9mZnNldAlwbGF5YmFja1R5cGUJc3RhcnR1cFRpbWUJZHVyYXRpb25XYXRjaGVkCW51bWJlck9mRHJvcHBlZFZpZGVvRnJhbWVzCW51bWJlck9mU3RhbGxzCW51bWJlck9mU2VnbWVudHNEb3dubG9hZGVkCXNlZ21lbnRzRG93bmxvYWRlZER1cmF0aW9uCWRvd25sb2FkT3ZlcmR1ZQlvYnNlcnZlZEJpdHJhdGVTdGFuZGFyZERldmlhdGlvbglvYnNlcnZlZE1heEJpdHJhdGUJb2JzZXJ2ZWRNaW5CaXRyYXRlCXN3aXRjaEJpdHJhdGUJaW5kaWNhdGVkQml0cmF0ZQlvYnNlcnZlZEJpdHJhdGUKaHR0cDovL3Zldm9wbGF5bGlzdC1saXZlLmhscy5hZGFwdGl2ZS5sZXZlbDMubmV0L3Zldm8vY2gxLzAxL3Byb2dfaW5kZXgubTN1OAk4LjI1NC4yMy4yNTQJMAkwCTAuNjc4MjgwNzA5CTEwOTk2MTIJMwkyMDE2LTA1LTEwIDE5OjIxOjE4ICswMDAwCTdBMTI5MERDLTE2MzAtNDlGQy1BQTY0LUNDNzZDMTgxQzcyQQk0MglMSVZFCTAuMjUzMjk3OTg0NjAwMDY3MQkxNi4wODMyNjU5NjAyMTY1MgkwCTAJMwkxOAkwCS0xCTI1NTcyOTAxLjM4MzMwNzg3CTE4MjA3OTg3LjMyODUyNTkJMTAxMTU1NDguNzgzODE4MjUJNDkyMDAwCTIxMDI1OTU1LjA1Mzg4OTI0Cmh0dHA6Ly92ZXZvcGxheWxpc3QtbGl2ZS5obHMuYWRhcHRpdmUubGV2ZWwzLm5ldC92ZXZvL2NoMS8wNi9wcm9nX2luZGV4Lm0zdTgJOC4yNTMuMzIuMTI2CTgJMAkzNS43NDAxNjM2MjIJMTIzNDgxOTcyCTQzCTIwMTYtMDUtMTAgMTk6MjE6MzQgKzAwMDAJN0ExMjkwREMtMTYzMC00OUZDLUFBNjQtQ0M3NkMxODFDNzJBCTU4LjAyODk5NDM1OAlMSVZFCTAJMjQxLjkyNjk3NTk2NTQ5OTkJMAkwCTQzCTI1OAkwCS0xCTQ2ODg1OTAzLjAzNTk4OTkzCTEwODA3NDU3LjM4MjQwNjY3CS0xCTQwMDAwMDAJMzE3ODIzNjAuNjE0NTI4NjM=
Here is the same string decoded:
URI serverAddress numberOfServerAddressChanges mediaRequestsWWAN transferDuration numberOfBytesTransferred numberOfMediaRequests playbackStartDate playbackSessionID playbackStartOffset playbackType startupTime durationWatched numberOfDroppedVideoFrames numberOfStalls numberOfSegmentsDownloaded segmentsDownloadedDuration downloadOverdue observedBitrateStandardDeviation observedMaxBitrate observedMinBitrate switchBitrate indicatedBitrate observedBitrate http://vevoplaylist-live.hls.adaptive.level3.net/vevo/ch1/01/prog_index.m3u8 8.254.23.254 0 0 0.678280709 1099612 3 2016-05-10 19:21:18 +0000 7A1290DC-1630-49FC-AA64-CC76C181C72A 42 LIVE 0.2532979846000671 16.08326596021652 0 0 3 18 0 -1 25572901.38330787 18207987.3285259 10115548.78381825 492000 21025955.05388924 http://vevoplaylist-live.hls.adaptive.level3.net/vevo/ch1/06/prog_index.m3u8 8.253.32.126 8 0 35.740163622 123481972 43 2016-05-10 19:21:34 +0000 7A1290DC-1630-49FC-AA64-CC76C181C72A 58.028994358 LIVE 0 241.9269759654999 0 0 43 258 0 -1 46885903.03598993 10807457.38240667 -1 4000000 31782360.61452863
Finally, here is the resulting array:
Array ( [0] => URI serverAddress numberOfServerAddressChanges mediaRequestsWWAN transferDuration numberOfBytesTransferred numberOfMediaRequests playbackStartDate playbackSessionID playbackStartOffset playbackType startupTime durationWatched numberOfDroppedVideoFrames numberOfStalls numberOfSegmentsDownloaded segmentsDownloadedDuration downloadOverdue observedBitrateStandardDeviation observedMaxBitrate observedMinBitrate switchBitrate indicatedBitrate observedBitrate http://vevoplaylist-live.hls.adaptive.level3.net/vevo/ch1/ [1] => 1/prog_index.m3u8 8.254.23.254 [2] => 0 [3] => .67828 [4] => 7 [5] => 9 1 [6] => 99612 3 2 [7] => 16- [8] => 5-1 [9] => 19:21:18 + [10] => [11] => [12] => [13] => 7A1290DC-1630-49FC-AA64-CC76C181C72A42 LIVE [14] => .2532979846 [15] => [16] => [17] => 671 16. [18] => 8326596 [19] => 21652 [20] => 03 18 [21] => -1255729 [22] => 1.3833 [23] => 787 182 [24] => 7987.3285259 1 [25] => 115548.78381825 492 [26] => [27] => [28] => 21025955.05388924 http://vevoplaylist-live.hls.adaptive.level3.net/vevo/ch1/06/prog_index.m3u88.253.32.126 8 [29] => 35.740163622123481972 43 2 [30] => 16- [31] => 5-1 [32] => 19:21:34 + [33] => [34] => [35] => [36] => 7A1290DC-1630-49FC-AA64-CC76C181C72A58. [37] => 28994358 LIVE [38] => 241.9269759654999 [39] => 043 258 [40] => -1468859 [41] => 3. [42] => 3598993 1 [43] => 8 [44] => 7457.3824 [45] => 667 -1 4 [46] => [47] => [48] => [49] => [50] => [51] => 31782360.61452863 )

Keep in mind that str_getcsv()
parses only one line of a csv file
expects the delimiter "\t" to be the second parameter, not the third
You probably want something like:
$full_stats = [];
foreach(explode("\n", $decoded) as $line) {
$full_stats[] = str_getcsv($line, "\t");
}
var_dump($full_stats);
This will output an array containing 3 arrays (aka rows) containing 24 items (aka columns) each.
See http://sandbox.onlinephpfunctions.com/code/1ccf5115df6f8c342ff7c7e451f3ea26e081197e for working example and generated output.
Regarding the import of data that contains line breaks you should switch to fget_csv() which handles line breaks correctly:
$csv = <<< eot
"first","my data
with line breaks"
"second", "simple data"
eot;
// We need to "convert" the string to a file handle
$fp = fopen('data://text/plain,' . $csv,'r');
while ($data = fgetcsv($fp)) {
var_dump($data);
}

Related

PHP How to get rid of new line character in multiline string declaration, EOD method doesn't work

In PHP, I have 2 strings coming from an external source, where each contain 32x32 1s or 0s.
Because it's a requirement, I need it to be formatted as 32x32, because each one of those bits represent a pixel in a frame.
The problem I'm facing, is that I need to do bitwise logic on every single one of those 1s or 0s in paralell, because one affects the other.
In PHP, as far as I know, my only option is to make them into an array, where each bit is 1 index by adding commas after every bit, and then explode it into an array. (But ofc feel free to prove me wrong :) )
However, when I try to explode the string, the newline characters get included on every 32nd (last bit of the row) bit in the array that was created by the explosion.
I have tried the $image = <<<EOD ... EOD; method of getting around it, but it's no bueno.
My question would be, is there a neat way to get access to each bit of data bit by bit at the same time on both images, so that I can do some bitwise math?
My current EOD method: ($image1 and $image2 are the same format, let me only paste in image 1 declaration).
$image1 = <<<EOD
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,1,1,1,1,1,1,0,0,0,0,0,0,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
EOD;
Then the explosion:
$image_array = explode(',', $image1);
Creates this output (section taken out, so I don't need to list all 1024 bits here, but the point is, every 32nd bit has a newline character at the end).
[33] => 0
[34] => 0
[35] => 0
[36] => 0
[37] => 0
[38] => 0
[39] => 0
[40] => 0
[41] => 0
[42] => 0
[43] => 0
[44] => 0
[45] => 0
[46] => 0
[47] => 0
[48] => 0
[49] => 0
[50] => 0
[51] => 0
[52] => 0
[53] => 0
[54] => 0
[55] => 0
[56] => 0
[57] => 0
[58] => 0
[59] => 0
[60] => 0
[61] => 0
[62] => 0
[63] => 0
[64] =>
0
Also, as a bonus question: Is there an, in my use case, even better solution, where I don't need to insert commas after every single bit in the declaration? That would be amazing if there was.
Thank you in advance guys, looking forward to your answers. :)
I'd say that the input format actually is more than questionable. If it is meant to describe a 32*32 matrix, then those trailing commas do not make any sense.
But if you cannot fix that issue, then just implement a primitive parser which does what you need:
<?php
$input = <<<EOD
0,0,0,
0,1,1,
1,0,0
EOD;
$output = [];
foreach (explode("\n", $input) as $y => $line) {
$output[$y] = [];
foreach(explode(",", $line) as $x => $cell) {
if ($cell != "") {
$output[$y][$x] = $cell;
}
}
}
print_r($output);
The output obviously is:
Array
(
[0] => Array
(
[0] => 0
[1] => 0
[2] => 0
)
[1] => Array
(
[0] => 0
[1] => 1
[2] => 1
)
[2] => Array
(
[0] => 1
[1] => 0
[2] => 0
)
)
Certainly other approaches exist, but none is better than the other, except if you can name a reason...
Can you just map the array to be only numeric?
function cleanup($a) {
return preg_replace('/\D+/', '', $a);
}
$image_array = array_map('cleanup',explode(",", $image1));

php base64_decode part corrupt using xampp on windows 7

I have a problem with some third party php scripts not decoding properly using xampp on windows 7. The process is that the script sends an encrypted payload to an external server where it is decrypted and returned. We have no access or control of the external server. The relevant code is:
$response = $provider->request('package/decode/' . $endpoint, 'POST', $params);
$data = $response->toXml();
if (isset($data->object)) {
$object = (string)$data->object;
$object = base64_decode($object);
$object = unserialize($object);
}
if (isset($data->related_objects)) {
$relatedObjects = (string)$data->related_objects;
$relatedObjects = base64_decode($relatedObjects);
$relatedObjects = unserialize($relatedObjects);
}
The $object decodes with no problem. The $relatedObjects does not and looks corrupt part way through:
YToxOntzOjEyOiJQbHVnaW5FdmVudHMiO2E6MTA6e3M6MzI6ImQ0NzM2NzJjM2YyNzk2NjVhYTNmNWJlNjBjMzU2MjYyIjthOjg6e3M6MTM6InByZXNlcnZlX2tleXMiO2I6MTtzOjEzOiJ1cGRhdGVfb2JqZWN0IjtiOjA7czoxMDoidW5pcXVlX2tleSI7YToyOntpOjA7czo4OiJwbHVnaW5pZCI7aToxO3M6NToiZXZlbnQiO31zOjU6ImNsYXNzIjtzOjE0OiJtb2RQbHVnaW5FdmVudCI7czo2OiJvYmplY3QiO3M6NzU6InsicGx1Z2luaWQiOjAsImV2ZW50IjoiT25SaWNoVGV4dEJyb3dzZXJJbml0IiwicHJpb3JpdHkiOjAsInByb3BlcnR5c2V0IjowfSI7czo0OiJndWlkIjtzOjMyOiIwMDMzMTExNjBkMDkwYzUxY2FjN2E5Mzg4ZDQzMDNiNyI7czoxMDoibmF0aXZlX2tleSI7YToyOntpOjA7aTowO2k6MTtzOjIxOiJPblJpY2hUZXh0QnJvd3NlckluaXQiO31zOjk6InNpZ25hdHVyZSI7czozMjoiZDVhYjlmMTVjOWY3ODNkZTJhNDFiMzZjMTlkMTUwNGQiO31zOjMyOiI3YzczZTg3OGI5MzZiOTRmMGQzZmVhNjJhNzIzMGM0MyI7YTo4OntzOjEzOiJwcmVzZXJ2ZV9rZXlzIjtiOjE7czoxMzoidXBkYXRlX29iamVjdCI7YjowO3M6MTA6InVuaXF1ZV9rZXkiO2E6Mjp7aTowO3M6ODoicGx1Z2luaWQiO2k6MTtzOjU6ImV2ZW50Ijt9czo1OiJjbGFzcyI7czoxNDoibW9kUGx1Z2luRXZlbnQiO3M6Njoib2JqZWN0IjtzOjc5OiJ7InBsdWdpbmlkIjowLCJldmVudCI6Ik9uTWFuYWdlclBhZ2VCZWZvcmVSZW5kZXIiLCJwcmlvcml0eSI6MCwicHJvcGVydHlzZXQiOjB9IjtzOjQ6Imd1aWQiO3M6MzI6IjZkMTdjMDRhZTkyNjQ4ZTUzNTk0MjViODI5NWUzOWQ1IjtzOjEwOiJuYXRpdmVfa2V5IjthOjI6e2k6MDtpOjA7aToxO3M6MjU6Ik9uTWFuYWdlclBhZ2VCZWZvcmVSZW5kZXIiO31zOjk6InNpZ25hdHVyZSI7czozMjoiZDYyYzc1NGM3OTEzYzhlNTE3NWNkMzVhNTJmMWUzMGQiO31zOjMyOiI2ZTUxNGI1MTYwNGNmMzdmODQ2ZTY3N2U5YzVhYzA5MCI7YTo4OntzOjEzOiJwcmVzZXJ2ZV9rZXlzIjtiOjE7czoxMzoidXBkYXRlX29iamVjdCI7YjowO3M6MTA6InVuaXF1ZV9rZXkiO2E6Mjp7aTowO3M6ODoicGx1Z2luaWQiO2k6MTtzOjU6ImV2ZW50Ijt9czo1OiJjbGFzcyI7czoxNDoibW9kUGx1Z2luRXZlbnQiO3M6Njoib2JqZWN0IjtzOjcyOiJ7InBsdWdpbmlkIjowLCJldmVudCI6Ik9uRG9jRm9ybVByZXJlbmRlciIsInByaW9yaXR5IjowLCJwcm9wZXJ0eXNldCI6MH0iO3M6NDoiZ3VpZCI7czozMjoiZTY0MGZjM2Y5NDUyNzQ5MWRkZjc1NzM3ZTEwMDI2NjAiO3M6MTA6Im5hdGl2ZV9rZXkiO2E6Mjp7aTowO2k6MDtpOjE7czoxODoiT25Eb2NGb3JtUHJlcmVuZGVyIjt9czo5OiJzaWduYXR1cmUiO3M6MzI6IjljNjBhY2FiODQ0OGI4ZTEzMDI4MDE1Njg1MzE1YzZiIjt9czozMjoiMDg2ZWUyMjUxODljMmVmN2Y2ODBiZDVhZTY0YjQ5NjQiO2E6ODp7czoxMzoicHJlc2VydmVfa2V5cyI7YjoxO3M6MTM6InVwZGF0ZV9vYmplY3QiO2I6MDtzOjEwOiJ1bmlxdWVfa2V5IjthOjI6e2k6MDtzOjg6InBsdWdpbmlkIjtpOjE7czo1OiJldmVudCI7fXM6NToiY2xhc3MiO3M6MTQ6Im1vZFBsdWdpbkV2ZW50IjtzOjY6Im9iamVjdCI7czo3ODoieyJwbHVnaW5pZCI6MCwiZXZlbnQiOiJPblJpY2hUZXh0RWRpdG9yUmVnaXN0ZXIiLCJwcmlvcml0eSI6MCwicHJvcGVydHlzZXQiOjB9IjtzOjQ6Imd1aWQiO3M6MzI6ImM0NWQ1OGU5ZmVkNWQ5NDcyZTlmMDExMDU2YjNhMDg3IjtzOjEwOiJuYXRpdmVfa2V5IjthOjI6e2k6MDtpOjA7aToxO3M6MjQ6Ik9uUmljaFRleHRFZGl0b3JSZWdpc3RlciI7fXM6OToic2lnbmF0dXJlIjtzOjMyOiIxMWJlZDhiOGYyZWQxOWEyZTVkZmEwYWY2MjU1NDJmOSI7fXM6MzI6ImIyOGFkZTdjNDQ4ZmRjMmJkMzdlZTBlNGQ0ODEyMTE1IjthOjg6e3M6MTM6InByZXNlcnZlX2tleXMiO2I6MTtzOjEzOiJ1cGRhdGVfb2JqZWN0IjtiOjA7czoxMDoidW5pcXVlX2tleSI7YToyOntpOjA7czo4OiJwbHVnaW5pZCI7aToxO3M6NToiZXZlbnQiO31zOjU6ImNsYXNzIjtzOjE0OiJtb2RQbHVnaW5FdmVudCI7czo2OiJvYmplY3QiO3M6NzM6InsicGx1Z2luaWQiOjAsImV2ZW50IjoiT25UVklucHV0UmVuZGVyTGlzdCIsInByaW9yaXR5IjowLCJwcm9wZXJ0eXNldCI6MH0iO3M6NDoiZ3VpZCI7czozMjoiZjg1MWU0OTdmZDg5ZDBiY2Y4MDRmNGVmZDRmOTVlMmMiO3M6MTA6Im5hdGl2ZV9rZXkiO2E6Mjp7aTowO2k6MDtpOjE7czoxOToiT25UVklucHV0UmVuZGVyTGlzdCI7fXM6OToic2lnbmF0dXJlIjtzOjMyOiJjN2M5YTA4MWUyY2E4ZTI4ZGU2OTViMmUwYzM3MzhhMCI7fXM6MzI6IjIwOTFkOGFlMjQ3ZmY0NzQwMjI3NmU4NGUwNDY3ZDMwIjthOjg6e3M6MTM6InByZXNlcnZlX2tleXMiO2I6MTtzOjEzOiJ1cGRhdGVfb2JqZWN0IjtiOjA7czoxMDoidW5pcXVlX2tleSI7YToyOntpOjA7czo4OiJwbHVnaW5pZCI7aToxO3M6NToiZXZlbnQiO31zOjU6ImNsYXNzIjtzOjE0OiJtb2RQbHVnaW5FdmVudCI7czo2OiJvYmplY3QiO3M6NzQ6InsicGx1Z2luaWQiOjAsImV2ZW50IjoiT25UVk91dHB1dGUG2ApmKwV9XbLEl8hy9QJ6yTZ2NnvVRfTjhvx3hq3ZXLDI21gcc0/UMBXxXTkbVOlOtapxOTH1eK3xIXwKkPy38Y5zLjHFaNyK5cPoQpP5FbP1qjUNUrgrbQGKeEfRGsIAbtzHT2qTxzgY7z/epePkvclKqWa194ZevVGKQN0z0jLLZ2REY/ZSadUQVP/LcIew8P//MvY/8YeFF5mxyutlTdRJ9oMwRZ5pMI2wT/mmsx2aeeVh1WcFP7/OqEzhHT8XIUwV96DYv/aFmetP6avi1ygnWTVZpnuXWmoSorfgwnAUzU76j3iv4f0szvn8BG6IvvXxMOuTCazD6H0lteI4KKSW6VAIj205AxFOuEqUp9iHcb4O2e/vkDW8Rn0ayfVIOagnMcU1oshb41KNZezuJBdv99IpbKOKnSy3SnPf+RzdkLWuFWek7I51O9xYzxdMiwot0alVPjZcepC2k9NAH/oQ0Uj2ks0djW6lwDXfYUAJkzem3qTwUj0eEac/fIKZAd84GBy3Bgef+migngUrnqRWyTqpJ8S0Jt61XzQogNrSrB8ZuqT+vbrkaLDBtTI49MmvlMcN0bf2PAhkoaMSwISL3d4iv4FRPaNUhtHe3XyWomnZwjckCOUEHdPC7ptUBkYGlatqNw259eQi2QubaCR87JX8FPUo1U5pQiVqK74QHBc8gVTfoybY2cMtMZ3qmzBymsn6ugeeGEoW6C+0nAqqhJOp7ZNJ4Y30rnMyjrcqX1L1dGClu1HMAhZ1HR7iYMlB2EDSlkxMzrBWdeUqXon0yuKltLLr9BlFtY0in/6NKRy6CQnIdsQvp0ZhBiHVUsdPNmTX7OI/arvkvKT7Pw8UcRkraDJgCJm+GQ5lLi7E22diyxwtPCYaR0936XIGWqzcJFUGSaWApH3y8q8c8+3FOlmpdxali1mDfO4zQT05D+B7EADFxnPoXdXaUSd/enArOOJO8NFqMTwgJ2Z2ujaBLpkojPE4aJLdzihqEvPA3gd8dwZBw5vsSDDpajiz9/vCh7YgDJwotKCXw41qRWDD6hKeyfkosVEqUOMZn0nxG9RHVdrKdH2KJquaLdndgTzOOCnPcU753TcG6sjVnSokexmx2/KA2P5+wjNe8G8cO61OhfuXGFLE7LslNc+urnVeLYAsN+SnBZYeFF+XooXK1JiZtIUo3JHKi037T1qevfvy6EcHaALwGBUPmR7J9H+YckekfEaMK4TC6dCb5fwNaDKSpYk9GNj5mVzQ676PzvWffcS8o2ZdqFryUM0GNr0sSVRhlo7rYjjPGUOu1hIhrBNGEcOePBRutK3V8HstKmTDdyIB6dGtS3D+8MhZOz6DBu8DlNerqsYXxKA7qlQ5cfVrM+CpO4yUobiqQfCQaTkXMp2352ORppa+9VG0tVa74eZnyHQgugz8d4XRxBo4VA5vMOCA3yg34+q2/CayXEPE6VAlIhf1agZ3vfRBxVZEUSWR0SV1ut5Y9crHkBoEjF9m/NEO/Os7mD69tSsJelsO3gfhuzZX8FgZSpOX3mwrdjoS0h/w3s1BKkg6srOZ9zbFx/34xTZBIwYBwfV4i3GzSZiFY0NiqT/3hXkaWZqzv4IuT4J9AZOUFjZxpdkdDbCo3nXEhDcg/DdISg1WbPuFjOHed50GD2el/Y+cS3cKrbwzVWom/Jt8dwTSPfU+j+w5IpEDOC85VYl+G4nqs9pxnbNw9eB9yPCUrCg9p5EJzwdOrvXX6DmkmLZ9+1I3+9T0TUQ9vsi10CnCdIF2tjdWR+vbxJkBDfIL9Ji0dfw4TtSZm+NtqER4GvbhcIq9sA1Lnf7Sy8jbd/ifhR0804bCvcvRTCa+zHDwauGYyV9jXd4fKLI7CAai8ot29x1hTypyzDcGCU0xKZgHAHRSxHd2tiEYUmvUJzhg7prdXwgdSEexe/hLRAHVNxo3G7JPV/D86YUdex8Ya+lbnwF1+V7k2Yz7uUehLkt6r1LpbMCdFwFvXvhQkd8QWoM3qlFNtkV0KeyI99nOPOAQqRJ5tlqmKNYHgVGs29y1DGltSQP5dN4LTbOFKsHl0kSKQFEqTqguU+qP8sLe/ID3Vvn1HslwqqWIvq5PLpjJtoS39lMQdWhxAuZNSccFP/2Rdrzx9IWuRbIQuRTa48Gtf3RJBZysslo6DrfYnEdvReE1aGMXV/tEiOFYyZsrKgSzvpaVVmN7uQRwPqOsNakBPjdz4ibEN0ZYpuoYDlJos/VxhlNi/KLbFosufeEXbqDx5eSWYl6BNRgnB2+qX/AO+hNuanN4Y711Sx8AJReInSUlgCG1QPrej7ZAYp+v1j1+i3oQw45Qtm2gBmFmWlMU4moDXZHwYNK0+OUfggZ8IESROYL85bQft+IvQMqGodgdEFY0ztWB3EItOVW6wS21HpMJ8CKJdqGFRSPuq9qoewjqRFkduG/Vnn1xib1tWz3OpILecoCCLehiaFN+rWX5ZN7cw3acJRq29oILLaw/I7teu119ASNo3J2OjJ1Ct6QYJB0FSa5Jsr4EhnZaf0r6VI7MM7aOAe1XMOXP0F2ks1Ash057XMxLdkdB8EKGQPm4w0aYxRbRoY4UK8nxERZHyxb4XpxIfN1mHXMV2TalbyyxB3dMLK80csIPzRRPdpvUnFRNynw1ZaJUrypTZfWBnf2eNbT6ynWrWmrBR5+jKuOwevGidfh5tE6U4prLU8as+/ARq/E4TfVfjUesfDSbyRqd/FDxpHeI3pVTwW4gsA==
Result:
a:1:{s:12:"PluginEvents";a:10:{s:32:"d473672c3f279665aa3f5be60c356262";a:8:{s:13:"preserve_keys";b:1;s:13:"update_object";b:0;s:10:"unique_key";a:2:{i:0;s:8:"pluginid";i:1;s:5:"event";}s:5:"class";s:14:"modPluginEvent";s:6:"object";s:75:"{"pluginid":0,"event":"OnRichTextBrowserInit","priority":0,"propertyset":0}";s:4:"guid";s:32:"003311160d090c51cac7a9388d4303b7";s:10:"native_key";a:2:{i:0;i:0;i:1;s:21:"OnRichTextBrowserInit";}s:9:"signature";s:32:"d5ab9f15c9f783de2a41b36c19d1504d";}s:32:"7c73e878b936b94f0d3fea62a7230c43";a:8:{s:13:"preserve_keys";b:1;s:13:"update_object";b:0;s:10:"unique_key";a:2:{i:0;s:8:"pluginid";i:1;s:5:"event";}s:5:"class";s:14:"modPluginEvent";s:6:"object";s:79:"{"pluginid":0,"event":"OnManagerPageBeforeRender","priority":0,"propertyset":0}";s:4:"guid";s:32:"6d17c04ae92648e5359425b8295e39d5";s:10:"native_key";a:2:{i:0;i:0;i:1;s:25:"OnManagerPageBeforeRender";}s:9:"signature";s:32:"d62c754c7913c8e5175cd35a52f1e30d";}s:32:"6e514b51604cf37f846e677e9c5ac090";a:8:{s:13:"preserve_keys";b:1;s:13:"update_object";b:0;s:10:"unique_key";a:2:{i:0;s:8:"pluginid";i:1;s:5:"event";}s:5:"class";s:14:"modPluginEvent";s:6:"object";s:72:"{"pluginid":0,"event":"OnDocFormPrerender","priority":0,"propertyset":0}";s:4:"guid";s:32:"e640fc3f94527491ddf75737e1002660";s:10:"native_key";a:2:{i:0;i:0;i:1;s:18:"OnDocFormPrerender";}s:9:"signature";s:32:"9c60acab8448b8e13028015685315c6b";}s:32:"086ee225189c2ef7f680bd5ae64b4964";a:8:{s:13:"preserve_keys";b:1;s:13:"update_object";b:0;s:10:"unique_key";a:2:{i:0;s:8:"pluginid";i:1;s:5:"event";}s:5:"class";s:14:"modPluginEvent";s:6:"object";s:78:"{"pluginid":0,"event":"OnRichTextEditorRegister","priority":0,"propertyset":0}";s:4:"guid";s:32:"c45d58e9fed5d9472e9f011056b3a087";s:10:"native_key";a:2:{i:0;i:0;i:1;s:24:"OnRichTextEditorRegister";}s:9:"signature";s:32:"11bed8b8f2ed19a2e5dfa0af625542f9";}s:32:"b28ade7c448fdc2bd37ee0e4d4812115";a:8:{s:13:"preserve_keys";b:1;s:13:"update_object";b:0;s:10:"unique_key";a:2:{i:0;s:8:"pluginid";i:1;s:5:"event";}s:5:"class";s:14:"modPluginEvent";s:6:"object";s:73:"{"pluginid":0,"event":"OnTVInputRenderList","priority":0,"propertyset":0}";s:4:"guid";s:32:"f851e497fd89d0bcf804f4efd4f95e2c";s:10:"native_key";a:2:{i:0;i:0;i:1;s:19:"OnTVInputRenderList";}s:9:"signature";s:32:"c7c9a081e2ca8e28de695b2e0c3738a0";}s:32:"2091d8ae247ff47402276e84e0467d30";a:8:{s:13:"preserve_keys";b:1;s:13:"update_object";b:0;s:10:"unique_key";a:2:{i:0;s:8:"pluginid";i:1;s:5:"event";}s:5:"class";s:14:"modPluginEvent";s:6:"object";s:74:"{"pluginid":0,"event":"OnTVOutputeØ
f+}]²ÄÈrõzÉ6v6{ÕEôãüw­Ù\°ÈÛXsOÔ0ñ]9TéNµªq91õx­ñ!|
ü·ñs.1ÅhÜåÃèBù³õª5
R¸+mxGÑÂnÜÇOjÇ8ï?Þ¥ãä½ÉJ©fµ÷^½Q#Ý3Ò2ËgdDcöRiÕTÿËp°ðÿÿ2ö?ñ±ÊëeMÔIö0Ei0°Où¦³yåaÕg?¿Î¨Lá?!L÷ Ø¿öëOé«â×('Y5Y¦{Zj¢·àÂpÍNúx¯áý,Îùün¾õñ0ë ¬Ãè}%µâ8(¤éPm9N¸J§Øq¾Ùïï5¼F}ÉõH9¨'1Å5¢È[ãReìî$o÷Ò)l£,·Jsßùݵ®g¤ìu;ÜXÏL
-Ñ©U>6\z¶Ó#úÑHöÍn¥À5ßa# 7¦Þ¤ðR=§?|ß8·úh +¤VÉ:©'Ä´&Þµ_4(ÚÒ¬º¤þ½ºäh°Áµ28ôɯÇ
Ñ·ö<d¡£ÀÝÞ"¿Q=£TÑÞÝ|¢iÙÂ7$åÓÂîTF«j7
¹õä"Ùh$|ìüõ(ÕNiB%j+¾<Tߣ&ØÙÃ-1ê0rÉúºJè/´
ª©íIáô®s2·*_Rõt¥»QÌuâÉAØ#ÒLLΰVuå*^ôÊ⥴²ëôEµ"þ)º ÈvÄ/§Fa!ÕRÇO6d×ìâ?j»ä¼¤û?q+h2¾e..ÄÛgbË-<&GOwérZ¬Ü$UI¥¤}òò¯óíÅ:Y©w¥Y|î3A=9à{ÅÆsè]ÕÚQ'zp+8âNðÑj1< 'fvº6.(ñ8hÝÎ(jóÀÞ|wAÃìH0éj8³÷û¶ (´ ÃjEÃêÉù(±Q*PãIñÔGUÚÊt}&«-ÙÝ<Î8)ÏqNùÝ7êÈÕ*${±ÛòØþ~Â3^ðo;­NûRÄì»%5Ï®®u^-,7ä§_¢ÊÔ´(ÜÊMûOZ½ûòèGhðÉôrG¤|F+ÂéÐåü
h2¥=Øù\Ðë¾Îõ}ļ£f]¨ZòPÍ6½,ITaëb8ÏC®Ö!¬FÃ<n´­Õð{-*dÃw"éÑ­KpþðÈY;>ï׫ªÆÄ ;ªT9qõk3à©;¡¸ªAði92·çc¦¾õQ´µV»áægÈt ºüwÑÄ8To0àß(7ãê¶ü&²\CÄéP%"õjw½ôAÅVDQ%Ñ%uºÞXõÊÇ_füÑüë;>½µ+ z[Þá»6WðXJÞl+v:ÒðÞÍA*H:²³÷6ÅÇýøÅ6A#Áõxq³IcCb©?÷yY³¿.O}6q¥Ù
°¨ÞuÄ7 ü7HJ
VlûáÞwg¥ýKw
­¼3Uj&ü|wÒ=õ>ì9"8/9U~ê³Úq³põà}Èð¬(=§ ÏN®õ×è9¤¶}ûR7ûÔôMD=¾ÈµÐ)Âtv¶7VGëÛÄ
òô´uü8NÔãm¨Dxöáp½°
KþÒËÈÛwø<Ó½ËÑL&¾ÌpðjáÉ_c]Þ(²;¢òv÷aO*rÌ7 M1)tRÄwv¶!RkÔ'8îÝ_HG±{øKDÕ77²OWðüé{ké[uù^äÙû¹G¡.Kz¯RélÀo^øPßZ7ªQM¶Et)ì÷ÙÎ<à©y¶Z¦(ÖQ¬ÛܵimIùtÞM³ÁåÒD#QN¨.SêòÂÞü÷VùõÉpª¥¾®O.ɶ·öSuhqæMIÇ?ýv¼ñô®E²¹ÚãÁ­tI¬²Z:·ØGoEá5hcWûDáXÉ+*³¾Vc{¹p>£¬5©>7sâ&Ä7FX¦êRh³õqSbü¢Û.}án ñåäb^5'oª_ðúnjsxc½uK%%%!µ#úÞ¶#b¯Ö=~zÃP¶m afZSâj]ðÒ´øå| D9üå´·â/#Ê¡ØV4ÎÕÜB-9UºÁ-µ ð"v¡E#î«Ú¨{êDY¸oÕ}q½m[=ΤÞr-èbhS~­eùdÞÜÃv%¶ö-¬?#»^»]}#hÜB·¤$I®I²¾vZJúTÌ3¶íW0åÏÐ]¤³P,N{\ÌKvGAðB#ù¸ÃFÅÑ¡+ÉñGËø^H|ÝfsÙ6¥o,±wL,¯4rÂÍOvÔTMÊ|5e¢T¯*Seõý5´úÊu«ZjÁG£*ã°zñ¢uøy´NâËSƬûð«ñ8Mõ_G¬|4ÉüPñ¤wÞSÁn °
There are no php errors or errors at the software providers end either. I have set Apache and PHPs charsets to UTF-8. I have the same script working on an external centos 7 setup and the vendor says other customers use xampp with no problem. So I'm guessing this is a local character encoding problem with windows or php? How can I to narrow it down so I can decode the string correctly? Thanks.
My first guess was that a piece of the data is missing, and because of the cyclic inner structure of base64 if we'd cut off at a different offset we could get the data back.
So I've tried:
echo base64_decode(substr($payload, $offset));
for different offsets around 3400 (where the corruption starts), and got no luck! At any offset it was still gibberish.
Then I tried collecting byte statistics to see if it's coherent to the distribution usually found in text files to which JSON belongs.
$pivot = 3400;
$decodeJson = base64_decode(substr($payload, 0, $pivot));
$decodeGibberish = base64_decode(substr($payload, $pivot));
print_r(getByteDistribution($decodeJson));
print str_repeat('-', 100) . "\n";
print_r(getByteDistribution($decodeGibberish));
function getByteDistribution(string $input): array {
$distribution = [];
for ($i = 0; $i < strlen($input); ++$i) {
$ord = ord($input[$i]);
if (!isset($distribution[$ord])) {
$distribution[$ord] = 0;
}
$distribution[$ord]++;
}
arsort($distribution, SORT_NUMERIC);
return $distribution;
}
And this is what I got:
Array
(
[58] => 281
[34] => 236
[101] => 181
[115] => 132
[59] => 129
[105] => 101
[48] => 88
[49] => 83
[116] => 76
[110] => 76
[97] => 70
[100] => 69
[50] => 66
[114] => 61
[51] => 60
[99] => 55
[53] => 52
[52] => 50
[117] => 49
[56] => 47
...skipped for brevity...
)
---------------------------------------------------------------------
Array
(
[245] => 21
[222] => 19
[240] => 18
[6] => 18
[252] => 17
[55] => 16
[106] => 15
[201] => 15
[79] => 14
[133] => 14
[56] => 14
[119] => 14
[29] => 14
[117] => 14
[164] => 13
[118] => 13
[157] => 13
[104] => 13
[196] => 13
[241] => 13
...skipped for brevity...
)
I advise you to run it and see for yourself: while distribution in JSON part shows high peaks for often-repeated character, the distribution for gibberish is much more even.
Which is a strong indicator for encrypted data.
So the problem is not with the payload, but with deciphering it.
And it occurred to me that there is just the same common problem when uploading a zip file to FTP. If you happen to use text mode in your FTP client, then line ending characters which may happen in the encrypted data may be irreversibly lost due to conversion. This is most often happens with Windows because it uses different line endings than Unix/Linux.
So my suggestion is that you check the code for downloading the encrypted data and see if a binary mode may be enabled somewhere. For example, if you use fopen, then read mode should be denoted as 'rb' ('b' for binary-safe), and not just 'r'. If you use a different transport, it's up to you to inspect it.

preg match all in array searching for [ ]

i have found the solution mysqlf using:
foreach ($output as $value) {
if (strpos($value, "]:") > -1) {
$tal = substr($value, strpos($value, "]:") +3) . "<br>";
echo $tal;
}
}
this returns:
-210
-212
Thanks in advance.
I want to preg so i only get the line: [10] => [147]: -210
or both [10] => [147]: -210 and [21] => [148]: -212
how can i preg [147]: or is there a better way to get the specific information?
my array $output contains:
Array
(
[0] => modpoll 3.4 - FieldTalk(tm) Modbus(R) Master Simulator
[1] => Copyright (c) 2002-2013 proconX Pty Ltd
[2] => Visit http://www.modbusdriver.com for Modbus libraries and tools.
[3] =>
[4] => Protocol configuration: MODBUS/TCP
[5] => Slave configuration...: address = 1, start reference = 147, count = 1
[6] => Communication.........: 10.234.6.11, port 502, t/o 1.00 s, poll rate 1000 ms
[7] => Data type.............: 16-bit register, output (holding) register table
[8] =>
[9] => -- Polling slave...
[10] => [147]: -210
[11] => modpoll 3.4 - FieldTalk(tm) Modbus(R) Master Simulator
[12] => Copyright (c) 2002-2013 proconX Pty Ltd
[13] => Visit http://www.modbusdriver.com for Modbus libraries and tools.
[14] =>
[15] => Protocol configuration: MODBUS/TCP
[16] => Slave configuration...: address = 1, start reference = 148, count = 1
[17] => Communication.........: 10.234.6.11, port 502, t/o 1.00 s, poll rate 1000 ms
[18] => Data type.............: 16-bit register, output (holding) register table
[19] =>
[20] => -- Polling slave...
[21] => [148]: -212
)
$matches = preg_grep ('/^[147] (\w+)/i', $output);
print_r ($matches);
//only returns Array()
You need to escape the opening square bracket because [147] is seen as a character class that contains 1, 4 and 7
You can do this with:
$result=preg_grep('~^\[14(?:7|8)]:~',$rgData);
print_r($result);
You can find all that you want to know about escaping (or not) square brackets here

Can't read csv(Tab delimited) properly

I have simple csv file which is tab delimited which i have to use as it is because it is coming from somewhere and i hvae to read it and insert it into my db i have used a simple php code to read it
if(($handle = fopen("var/import/MMT29DEC.csv","r"))!==FALSE){
/*Skip the first row*/
fgetcsv($handle, 1000,chr(9));
while(($data = fgetcsv($handle,1000,chr(9)))!==FALSE){
print_r($data[0]);
}
}
When print_r the data it shows like
Array ( [0] => 01SATAPC [1] => 40ATAPC [2] => [3] => 21P [4] => SERIAL ATA POWER CABLE [5] => 0.00 [6] => 2.00 [7] => 0 [8] => Power Supplies [9] => SERIAL ATA POWER CABLE [10] =>
4 TO 15 PIN 160MM
[11] => [12] => [13] => [14] => MELBHO [15] => 0.000 [16] => [17] => Order to Order [18] => 4 [19] => 2013-01-18 )
Which is the desired result but when i go to access the particular column value using the $data['index'] e.g. $data[8] or $data[1] it weirdly giving me garbage values says for some iterations it give me right values but after 10-15 rows its starting giving me the some numbers and other column values..... i don't know whats is going on with this as far as i know it should be formatting issue i have tried open the file in excel and its coming fine....
#ravisoni are you sure that the second parameter to fgetcsv of 1000 is longer than the longest line in your file? Try setting it to 0 as the docs say [php.net/fgetcsv] and see if that makes a difference.
if(($handle = fopen("var/import/MMT29DEC.csv","r"))!==FALSE){
/*Skip the first row*/
fgetcsv($handle, 0,chr(9));
while(($data = fgetcsv($handle,0,chr(9)))!==FALSE){
print_r($data[0]);
}
}

PHP and Unicode: Weirdness between Windows and Linux

Look at IBM's Unicode for the working PHP programmer, especially listings 3 and 4.
On Ubuntu Lucid I get the same output from the code as IBM does, viz:
Здравсствуйте
Array
(
[1] => 65279
[2] => 1047
[3] => 1076
[4] => 1088
[5] => 1072
[6] => 1074
[7] => 1089
[8] => 1089
[9] => 1090
[10] => 1074
[11] => 1091
[12] => 1081
[13] => 1090
[14] => 1077
)
Здравсствуйте
However, on Windows I get a completely different response.
ðùð┤ÐÇð░ð▓ÐüÐüÐéð▓Ðâð╣ÐéðÁ
Array
(
[1] => -131072
[2] => 386138112
[3] => 872677376
[4] => 1074003968
[5] => 805568512
[6] => 839122944
[7] => 1090781184
[8] => 1090781184
[9] => 1107558400
[10] => 839122944
[11] => 1124335616
[12] => 956563456
[13] => 1107558400
[14] => 889454592
)
ðùð┤ÐÇð░ð▓ÐüÐüÐéð▓Ðâð╣ÐéðÁ
Aside from the fact that the Russian characters (which are in UTF-32) don't render in a CMD.EXE shell (because they're in UTF-32 not Windows' own UTF-16), why do the character values differ so significantly?
function utf8_to_unicode_code($utf8_string)
{
$expanded = iconv("UTF-8", "UTF-32", $utf8_string);
return unpack("L*", $expanded);
}
This does two things wrong:
It uses “UTF-32”, which will drop an unwanted BOM at the start of the string, which is why you get 65279 (0xFEFF BOM). You don't want stray BOMs hanging around the place causing trouble.
It uses machine-specific byte endianness (capital L) which iconv may well not agree with. To be honest I wouldn't have expected it to clash on a Windows box (as i386 is little-endian regardless of OS), but clearly it has, as the values you've got are all what would result from a reversed byte order.
Better to state both byte orderings explicitly, and avoid the BOM. Use UCS-4LE as the encoding, and unpack with V*. The same goes for unicode_code_to_utf8.
Also ignore listing 6. The ellipsis character—like the fi-ligature and others—is a ‘compatibility character’ which we wouldn't use in the modern Unicode-and-OpenType world. It's up to the font to provide contextual alternatives for fi or ... if it wants to, instead of requiring us to mangle the text.

Categories