PHP and Unicode: Weirdness between Windows and Linux - php

Look at IBM's Unicode for the working PHP programmer, especially listings 3 and 4.
On Ubuntu Lucid I get the same output from the code as IBM does, viz:
Здравсствуйте
Array
(
[1] => 65279
[2] => 1047
[3] => 1076
[4] => 1088
[5] => 1072
[6] => 1074
[7] => 1089
[8] => 1089
[9] => 1090
[10] => 1074
[11] => 1091
[12] => 1081
[13] => 1090
[14] => 1077
)
Здравсствуйте
However, on Windows I get a completely different response.
ðùð┤ÐÇð░ð▓ÐüÐüÐéð▓Ðâð╣ÐéðÁ
Array
(
[1] => -131072
[2] => 386138112
[3] => 872677376
[4] => 1074003968
[5] => 805568512
[6] => 839122944
[7] => 1090781184
[8] => 1090781184
[9] => 1107558400
[10] => 839122944
[11] => 1124335616
[12] => 956563456
[13] => 1107558400
[14] => 889454592
)
ðùð┤ÐÇð░ð▓ÐüÐüÐéð▓Ðâð╣ÐéðÁ
Aside from the fact that the Russian characters (which are in UTF-32) don't render in a CMD.EXE shell (because they're in UTF-32 not Windows' own UTF-16), why do the character values differ so significantly?

function utf8_to_unicode_code($utf8_string)
{
$expanded = iconv("UTF-8", "UTF-32", $utf8_string);
return unpack("L*", $expanded);
}
This does two things wrong:
It uses “UTF-32”, which will drop an unwanted BOM at the start of the string, which is why you get 65279 (0xFEFF BOM). You don't want stray BOMs hanging around the place causing trouble.
It uses machine-specific byte endianness (capital L) which iconv may well not agree with. To be honest I wouldn't have expected it to clash on a Windows box (as i386 is little-endian regardless of OS), but clearly it has, as the values you've got are all what would result from a reversed byte order.
Better to state both byte orderings explicitly, and avoid the BOM. Use UCS-4LE as the encoding, and unpack with V*. The same goes for unicode_code_to_utf8.
Also ignore listing 6. The ellipsis character—like the fi-ligature and others—is a ‘compatibility character’ which we wouldn't use in the modern Unicode-and-OpenType world. It's up to the font to provide contextual alternatives for fi or ... if it wants to, instead of requiring us to mangle the text.

Related

php base64_decode part corrupt using xampp on windows 7

I have a problem with some third party php scripts not decoding properly using xampp on windows 7. The process is that the script sends an encrypted payload to an external server where it is decrypted and returned. We have no access or control of the external server. The relevant code is:
$response = $provider->request('package/decode/' . $endpoint, 'POST', $params);
$data = $response->toXml();
if (isset($data->object)) {
$object = (string)$data->object;
$object = base64_decode($object);
$object = unserialize($object);
}
if (isset($data->related_objects)) {
$relatedObjects = (string)$data->related_objects;
$relatedObjects = base64_decode($relatedObjects);
$relatedObjects = unserialize($relatedObjects);
}
The $object decodes with no problem. The $relatedObjects does not and looks corrupt part way through:
YToxOntzOjEyOiJQbHVnaW5FdmVudHMiO2E6MTA6e3M6MzI6ImQ0NzM2NzJjM2YyNzk2NjVhYTNmNWJlNjBjMzU2MjYyIjthOjg6e3M6MTM6InByZXNlcnZlX2tleXMiO2I6MTtzOjEzOiJ1cGRhdGVfb2JqZWN0IjtiOjA7czoxMDoidW5pcXVlX2tleSI7YToyOntpOjA7czo4OiJwbHVnaW5pZCI7aToxO3M6NToiZXZlbnQiO31zOjU6ImNsYXNzIjtzOjE0OiJtb2RQbHVnaW5FdmVudCI7czo2OiJvYmplY3QiO3M6NzU6InsicGx1Z2luaWQiOjAsImV2ZW50IjoiT25SaWNoVGV4dEJyb3dzZXJJbml0IiwicHJpb3JpdHkiOjAsInByb3BlcnR5c2V0IjowfSI7czo0OiJndWlkIjtzOjMyOiIwMDMzMTExNjBkMDkwYzUxY2FjN2E5Mzg4ZDQzMDNiNyI7czoxMDoibmF0aXZlX2tleSI7YToyOntpOjA7aTowO2k6MTtzOjIxOiJPblJpY2hUZXh0QnJvd3NlckluaXQiO31zOjk6InNpZ25hdHVyZSI7czozMjoiZDVhYjlmMTVjOWY3ODNkZTJhNDFiMzZjMTlkMTUwNGQiO31zOjMyOiI3YzczZTg3OGI5MzZiOTRmMGQzZmVhNjJhNzIzMGM0MyI7YTo4OntzOjEzOiJwcmVzZXJ2ZV9rZXlzIjtiOjE7czoxMzoidXBkYXRlX29iamVjdCI7YjowO3M6MTA6InVuaXF1ZV9rZXkiO2E6Mjp7aTowO3M6ODoicGx1Z2luaWQiO2k6MTtzOjU6ImV2ZW50Ijt9czo1OiJjbGFzcyI7czoxNDoibW9kUGx1Z2luRXZlbnQiO3M6Njoib2JqZWN0IjtzOjc5OiJ7InBsdWdpbmlkIjowLCJldmVudCI6Ik9uTWFuYWdlclBhZ2VCZWZvcmVSZW5kZXIiLCJwcmlvcml0eSI6MCwicHJvcGVydHlzZXQiOjB9IjtzOjQ6Imd1aWQiO3M6MzI6IjZkMTdjMDRhZTkyNjQ4ZTUzNTk0MjViODI5NWUzOWQ1IjtzOjEwOiJuYXRpdmVfa2V5IjthOjI6e2k6MDtpOjA7aToxO3M6MjU6Ik9uTWFuYWdlclBhZ2VCZWZvcmVSZW5kZXIiO31zOjk6InNpZ25hdHVyZSI7czozMjoiZDYyYzc1NGM3OTEzYzhlNTE3NWNkMzVhNTJmMWUzMGQiO31zOjMyOiI2ZTUxNGI1MTYwNGNmMzdmODQ2ZTY3N2U5YzVhYzA5MCI7YTo4OntzOjEzOiJwcmVzZXJ2ZV9rZXlzIjtiOjE7czoxMzoidXBkYXRlX29iamVjdCI7YjowO3M6MTA6InVuaXF1ZV9rZXkiO2E6Mjp7aTowO3M6ODoicGx1Z2luaWQiO2k6MTtzOjU6ImV2ZW50Ijt9czo1OiJjbGFzcyI7czoxNDoibW9kUGx1Z2luRXZlbnQiO3M6Njoib2JqZWN0IjtzOjcyOiJ7InBsdWdpbmlkIjowLCJldmVudCI6Ik9uRG9jRm9ybVByZXJlbmRlciIsInByaW9yaXR5IjowLCJwcm9wZXJ0eXNldCI6MH0iO3M6NDoiZ3VpZCI7czozMjoiZTY0MGZjM2Y5NDUyNzQ5MWRkZjc1NzM3ZTEwMDI2NjAiO3M6MTA6Im5hdGl2ZV9rZXkiO2E6Mjp7aTowO2k6MDtpOjE7czoxODoiT25Eb2NGb3JtUHJlcmVuZGVyIjt9czo5OiJzaWduYXR1cmUiO3M6MzI6IjljNjBhY2FiODQ0OGI4ZTEzMDI4MDE1Njg1MzE1YzZiIjt9czozMjoiMDg2ZWUyMjUxODljMmVmN2Y2ODBiZDVhZTY0YjQ5NjQiO2E6ODp7czoxMzoicHJlc2VydmVfa2V5cyI7YjoxO3M6MTM6InVwZGF0ZV9vYmplY3QiO2I6MDtzOjEwOiJ1bmlxdWVfa2V5IjthOjI6e2k6MDtzOjg6InBsdWdpbmlkIjtpOjE7czo1OiJldmVudCI7fXM6NToiY2xhc3MiO3M6MTQ6Im1vZFBsdWdpbkV2ZW50IjtzOjY6Im9iamVjdCI7czo3ODoieyJwbHVnaW5pZCI6MCwiZXZlbnQiOiJPblJpY2hUZXh0RWRpdG9yUmVnaXN0ZXIiLCJwcmlvcml0eSI6MCwicHJvcGVydHlzZXQiOjB9IjtzOjQ6Imd1aWQiO3M6MzI6ImM0NWQ1OGU5ZmVkNWQ5NDcyZTlmMDExMDU2YjNhMDg3IjtzOjEwOiJuYXRpdmVfa2V5IjthOjI6e2k6MDtpOjA7aToxO3M6MjQ6Ik9uUmljaFRleHRFZGl0b3JSZWdpc3RlciI7fXM6OToic2lnbmF0dXJlIjtzOjMyOiIxMWJlZDhiOGYyZWQxOWEyZTVkZmEwYWY2MjU1NDJmOSI7fXM6MzI6ImIyOGFkZTdjNDQ4ZmRjMmJkMzdlZTBlNGQ0ODEyMTE1IjthOjg6e3M6MTM6InByZXNlcnZlX2tleXMiO2I6MTtzOjEzOiJ1cGRhdGVfb2JqZWN0IjtiOjA7czoxMDoidW5pcXVlX2tleSI7YToyOntpOjA7czo4OiJwbHVnaW5pZCI7aToxO3M6NToiZXZlbnQiO31zOjU6ImNsYXNzIjtzOjE0OiJtb2RQbHVnaW5FdmVudCI7czo2OiJvYmplY3QiO3M6NzM6InsicGx1Z2luaWQiOjAsImV2ZW50IjoiT25UVklucHV0UmVuZGVyTGlzdCIsInByaW9yaXR5IjowLCJwcm9wZXJ0eXNldCI6MH0iO3M6NDoiZ3VpZCI7czozMjoiZjg1MWU0OTdmZDg5ZDBiY2Y4MDRmNGVmZDRmOTVlMmMiO3M6MTA6Im5hdGl2ZV9rZXkiO2E6Mjp7aTowO2k6MDtpOjE7czoxOToiT25UVklucHV0UmVuZGVyTGlzdCI7fXM6OToic2lnbmF0dXJlIjtzOjMyOiJjN2M5YTA4MWUyY2E4ZTI4ZGU2OTViMmUwYzM3MzhhMCI7fXM6MzI6IjIwOTFkOGFlMjQ3ZmY0NzQwMjI3NmU4NGUwNDY3ZDMwIjthOjg6e3M6MTM6InByZXNlcnZlX2tleXMiO2I6MTtzOjEzOiJ1cGRhdGVfb2JqZWN0IjtiOjA7czoxMDoidW5pcXVlX2tleSI7YToyOntpOjA7czo4OiJwbHVnaW5pZCI7aToxO3M6NToiZXZlbnQiO31zOjU6ImNsYXNzIjtzOjE0OiJtb2RQbHVnaW5FdmVudCI7czo2OiJvYmplY3QiO3M6NzQ6InsicGx1Z2luaWQiOjAsImV2ZW50IjoiT25UVk91dHB1dGUG2ApmKwV9XbLEl8hy9QJ6yTZ2NnvVRfTjhvx3hq3ZXLDI21gcc0/UMBXxXTkbVOlOtapxOTH1eK3xIXwKkPy38Y5zLjHFaNyK5cPoQpP5FbP1qjUNUrgrbQGKeEfRGsIAbtzHT2qTxzgY7z/epePkvclKqWa194ZevVGKQN0z0jLLZ2REY/ZSadUQVP/LcIew8P//MvY/8YeFF5mxyutlTdRJ9oMwRZ5pMI2wT/mmsx2aeeVh1WcFP7/OqEzhHT8XIUwV96DYv/aFmetP6avi1ygnWTVZpnuXWmoSorfgwnAUzU76j3iv4f0szvn8BG6IvvXxMOuTCazD6H0lteI4KKSW6VAIj205AxFOuEqUp9iHcb4O2e/vkDW8Rn0ayfVIOagnMcU1oshb41KNZezuJBdv99IpbKOKnSy3SnPf+RzdkLWuFWek7I51O9xYzxdMiwot0alVPjZcepC2k9NAH/oQ0Uj2ks0djW6lwDXfYUAJkzem3qTwUj0eEac/fIKZAd84GBy3Bgef+migngUrnqRWyTqpJ8S0Jt61XzQogNrSrB8ZuqT+vbrkaLDBtTI49MmvlMcN0bf2PAhkoaMSwISL3d4iv4FRPaNUhtHe3XyWomnZwjckCOUEHdPC7ptUBkYGlatqNw259eQi2QubaCR87JX8FPUo1U5pQiVqK74QHBc8gVTfoybY2cMtMZ3qmzBymsn6ugeeGEoW6C+0nAqqhJOp7ZNJ4Y30rnMyjrcqX1L1dGClu1HMAhZ1HR7iYMlB2EDSlkxMzrBWdeUqXon0yuKltLLr9BlFtY0in/6NKRy6CQnIdsQvp0ZhBiHVUsdPNmTX7OI/arvkvKT7Pw8UcRkraDJgCJm+GQ5lLi7E22diyxwtPCYaR0936XIGWqzcJFUGSaWApH3y8q8c8+3FOlmpdxali1mDfO4zQT05D+B7EADFxnPoXdXaUSd/enArOOJO8NFqMTwgJ2Z2ujaBLpkojPE4aJLdzihqEvPA3gd8dwZBw5vsSDDpajiz9/vCh7YgDJwotKCXw41qRWDD6hKeyfkosVEqUOMZn0nxG9RHVdrKdH2KJquaLdndgTzOOCnPcU753TcG6sjVnSokexmx2/KA2P5+wjNe8G8cO61OhfuXGFLE7LslNc+urnVeLYAsN+SnBZYeFF+XooXK1JiZtIUo3JHKi037T1qevfvy6EcHaALwGBUPmR7J9H+YckekfEaMK4TC6dCb5fwNaDKSpYk9GNj5mVzQ676PzvWffcS8o2ZdqFryUM0GNr0sSVRhlo7rYjjPGUOu1hIhrBNGEcOePBRutK3V8HstKmTDdyIB6dGtS3D+8MhZOz6DBu8DlNerqsYXxKA7qlQ5cfVrM+CpO4yUobiqQfCQaTkXMp2352ORppa+9VG0tVa74eZnyHQgugz8d4XRxBo4VA5vMOCA3yg34+q2/CayXEPE6VAlIhf1agZ3vfRBxVZEUSWR0SV1ut5Y9crHkBoEjF9m/NEO/Os7mD69tSsJelsO3gfhuzZX8FgZSpOX3mwrdjoS0h/w3s1BKkg6srOZ9zbFx/34xTZBIwYBwfV4i3GzSZiFY0NiqT/3hXkaWZqzv4IuT4J9AZOUFjZxpdkdDbCo3nXEhDcg/DdISg1WbPuFjOHed50GD2el/Y+cS3cKrbwzVWom/Jt8dwTSPfU+j+w5IpEDOC85VYl+G4nqs9pxnbNw9eB9yPCUrCg9p5EJzwdOrvXX6DmkmLZ9+1I3+9T0TUQ9vsi10CnCdIF2tjdWR+vbxJkBDfIL9Ji0dfw4TtSZm+NtqER4GvbhcIq9sA1Lnf7Sy8jbd/ifhR0804bCvcvRTCa+zHDwauGYyV9jXd4fKLI7CAai8ot29x1hTypyzDcGCU0xKZgHAHRSxHd2tiEYUmvUJzhg7prdXwgdSEexe/hLRAHVNxo3G7JPV/D86YUdex8Ya+lbnwF1+V7k2Yz7uUehLkt6r1LpbMCdFwFvXvhQkd8QWoM3qlFNtkV0KeyI99nOPOAQqRJ5tlqmKNYHgVGs29y1DGltSQP5dN4LTbOFKsHl0kSKQFEqTqguU+qP8sLe/ID3Vvn1HslwqqWIvq5PLpjJtoS39lMQdWhxAuZNSccFP/2Rdrzx9IWuRbIQuRTa48Gtf3RJBZysslo6DrfYnEdvReE1aGMXV/tEiOFYyZsrKgSzvpaVVmN7uQRwPqOsNakBPjdz4ibEN0ZYpuoYDlJos/VxhlNi/KLbFosufeEXbqDx5eSWYl6BNRgnB2+qX/AO+hNuanN4Y711Sx8AJReInSUlgCG1QPrej7ZAYp+v1j1+i3oQw45Qtm2gBmFmWlMU4moDXZHwYNK0+OUfggZ8IESROYL85bQft+IvQMqGodgdEFY0ztWB3EItOVW6wS21HpMJ8CKJdqGFRSPuq9qoewjqRFkduG/Vnn1xib1tWz3OpILecoCCLehiaFN+rWX5ZN7cw3acJRq29oILLaw/I7teu119ASNo3J2OjJ1Ct6QYJB0FSa5Jsr4EhnZaf0r6VI7MM7aOAe1XMOXP0F2ks1Ash057XMxLdkdB8EKGQPm4w0aYxRbRoY4UK8nxERZHyxb4XpxIfN1mHXMV2TalbyyxB3dMLK80csIPzRRPdpvUnFRNynw1ZaJUrypTZfWBnf2eNbT6ynWrWmrBR5+jKuOwevGidfh5tE6U4prLU8as+/ARq/E4TfVfjUesfDSbyRqd/FDxpHeI3pVTwW4gsA==
Result:
a:1:{s:12:"PluginEvents";a:10:{s:32:"d473672c3f279665aa3f5be60c356262";a:8:{s:13:"preserve_keys";b:1;s:13:"update_object";b:0;s:10:"unique_key";a:2:{i:0;s:8:"pluginid";i:1;s:5:"event";}s:5:"class";s:14:"modPluginEvent";s:6:"object";s:75:"{"pluginid":0,"event":"OnRichTextBrowserInit","priority":0,"propertyset":0}";s:4:"guid";s:32:"003311160d090c51cac7a9388d4303b7";s:10:"native_key";a:2:{i:0;i:0;i:1;s:21:"OnRichTextBrowserInit";}s:9:"signature";s:32:"d5ab9f15c9f783de2a41b36c19d1504d";}s:32:"7c73e878b936b94f0d3fea62a7230c43";a:8:{s:13:"preserve_keys";b:1;s:13:"update_object";b:0;s:10:"unique_key";a:2:{i:0;s:8:"pluginid";i:1;s:5:"event";}s:5:"class";s:14:"modPluginEvent";s:6:"object";s:79:"{"pluginid":0,"event":"OnManagerPageBeforeRender","priority":0,"propertyset":0}";s:4:"guid";s:32:"6d17c04ae92648e5359425b8295e39d5";s:10:"native_key";a:2:{i:0;i:0;i:1;s:25:"OnManagerPageBeforeRender";}s:9:"signature";s:32:"d62c754c7913c8e5175cd35a52f1e30d";}s:32:"6e514b51604cf37f846e677e9c5ac090";a:8:{s:13:"preserve_keys";b:1;s:13:"update_object";b:0;s:10:"unique_key";a:2:{i:0;s:8:"pluginid";i:1;s:5:"event";}s:5:"class";s:14:"modPluginEvent";s:6:"object";s:72:"{"pluginid":0,"event":"OnDocFormPrerender","priority":0,"propertyset":0}";s:4:"guid";s:32:"e640fc3f94527491ddf75737e1002660";s:10:"native_key";a:2:{i:0;i:0;i:1;s:18:"OnDocFormPrerender";}s:9:"signature";s:32:"9c60acab8448b8e13028015685315c6b";}s:32:"086ee225189c2ef7f680bd5ae64b4964";a:8:{s:13:"preserve_keys";b:1;s:13:"update_object";b:0;s:10:"unique_key";a:2:{i:0;s:8:"pluginid";i:1;s:5:"event";}s:5:"class";s:14:"modPluginEvent";s:6:"object";s:78:"{"pluginid":0,"event":"OnRichTextEditorRegister","priority":0,"propertyset":0}";s:4:"guid";s:32:"c45d58e9fed5d9472e9f011056b3a087";s:10:"native_key";a:2:{i:0;i:0;i:1;s:24:"OnRichTextEditorRegister";}s:9:"signature";s:32:"11bed8b8f2ed19a2e5dfa0af625542f9";}s:32:"b28ade7c448fdc2bd37ee0e4d4812115";a:8:{s:13:"preserve_keys";b:1;s:13:"update_object";b:0;s:10:"unique_key";a:2:{i:0;s:8:"pluginid";i:1;s:5:"event";}s:5:"class";s:14:"modPluginEvent";s:6:"object";s:73:"{"pluginid":0,"event":"OnTVInputRenderList","priority":0,"propertyset":0}";s:4:"guid";s:32:"f851e497fd89d0bcf804f4efd4f95e2c";s:10:"native_key";a:2:{i:0;i:0;i:1;s:19:"OnTVInputRenderList";}s:9:"signature";s:32:"c7c9a081e2ca8e28de695b2e0c3738a0";}s:32:"2091d8ae247ff47402276e84e0467d30";a:8:{s:13:"preserve_keys";b:1;s:13:"update_object";b:0;s:10:"unique_key";a:2:{i:0;s:8:"pluginid";i:1;s:5:"event";}s:5:"class";s:14:"modPluginEvent";s:6:"object";s:74:"{"pluginid":0,"event":"OnTVOutputeØ
f+}]²ÄÈrõzÉ6v6{ÕEôãüw­Ù\°ÈÛXsOÔ0ñ]9TéNµªq91õx­ñ!|
ü·ñs.1ÅhÜåÃèBù³õª5
R¸+mxGÑÂnÜÇOjÇ8ï?Þ¥ãä½ÉJ©fµ÷^½Q#Ý3Ò2ËgdDcöRiÕTÿËp°ðÿÿ2ö?ñ±ÊëeMÔIö0Ei0°Où¦³yåaÕg?¿Î¨Lá?!L÷ Ø¿öëOé«â×('Y5Y¦{Zj¢·àÂpÍNúx¯áý,Îùün¾õñ0ë ¬Ãè}%µâ8(¤éPm9N¸J§Øq¾Ùïï5¼F}ÉõH9¨'1Å5¢È[ãReìî$o÷Ò)l£,·Jsßùݵ®g¤ìu;ÜXÏL
-Ñ©U>6\z¶Ó#úÑHöÍn¥À5ßa# 7¦Þ¤ðR=§?|ß8·úh +¤VÉ:©'Ä´&Þµ_4(ÚÒ¬º¤þ½ºäh°Áµ28ôɯÇ
Ñ·ö<d¡£ÀÝÞ"¿Q=£TÑÞÝ|¢iÙÂ7$åÓÂîTF«j7
¹õä"Ùh$|ìüõ(ÕNiB%j+¾<Tߣ&ØÙÃ-1ê0rÉúºJè/´
ª©íIáô®s2·*_Rõt¥»QÌuâÉAØ#ÒLLΰVuå*^ôÊ⥴²ëôEµ"þ)º ÈvÄ/§Fa!ÕRÇO6d×ìâ?j»ä¼¤û?q+h2¾e..ÄÛgbË-<&GOwérZ¬Ü$UI¥¤}òò¯óíÅ:Y©w¥Y|î3A=9à{ÅÆsè]ÕÚQ'zp+8âNðÑj1< 'fvº6.(ñ8hÝÎ(jóÀÞ|wAÃìH0éj8³÷û¶ (´ ÃjEÃêÉù(±Q*PãIñÔGUÚÊt}&«-ÙÝ<Î8)ÏqNùÝ7êÈÕ*${±ÛòØþ~Â3^ðo;­NûRÄì»%5Ï®®u^-,7ä§_¢ÊÔ´(ÜÊMûOZ½ûòèGhðÉôrG¤|F+ÂéÐåü
h2¥=Øù\Ðë¾Îõ}ļ£f]¨ZòPÍ6½,ITaëb8ÏC®Ö!¬FÃ<n´­Õð{-*dÃw"éÑ­KpþðÈY;>ï׫ªÆÄ ;ªT9qõk3à©;¡¸ªAði92·çc¦¾õQ´µV»áægÈt ºüwÑÄ8To0àß(7ãê¶ü&²\CÄéP%"õjw½ôAÅVDQ%Ñ%uºÞXõÊÇ_füÑüë;>½µ+ z[Þá»6WðXJÞl+v:ÒðÞÍA*H:²³÷6ÅÇýøÅ6A#Áõxq³IcCb©?÷yY³¿.O}6q¥Ù
°¨ÞuÄ7 ü7HJ
VlûáÞwg¥ýKw
­¼3Uj&ü|wÒ=õ>ì9"8/9U~ê³Úq³põà}Èð¬(=§ ÏN®õ×è9¤¶}ûR7ûÔôMD=¾ÈµÐ)Âtv¶7VGëÛÄ
òô´uü8NÔãm¨Dxöáp½°
KþÒËÈÛwø<Ó½ËÑL&¾ÌpðjáÉ_c]Þ(²;¢òv÷aO*rÌ7 M1)tRÄwv¶!RkÔ'8îÝ_HG±{øKDÕ77²OWðüé{ké[uù^äÙû¹G¡.Kz¯RélÀo^øPßZ7ªQM¶Et)ì÷ÙÎ<à©y¶Z¦(ÖQ¬ÛܵimIùtÞM³ÁåÒD#QN¨.SêòÂÞü÷VùõÉpª¥¾®O.ɶ·öSuhqæMIÇ?ýv¼ñô®E²¹ÚãÁ­tI¬²Z:·ØGoEá5hcWûDáXÉ+*³¾Vc{¹p>£¬5©>7sâ&Ä7FX¦êRh³õqSbü¢Û.}án ñåäb^5'oª_ðúnjsxc½uK%%%!µ#úÞ¶#b¯Ö=~zÃP¶m afZSâj]ðÒ´øå| D9üå´·â/#Ê¡ØV4ÎÕÜB-9UºÁ-µ ð"v¡E#î«Ú¨{êDY¸oÕ}q½m[=ΤÞr-èbhS~­eùdÞÜÃv%¶ö-¬?#»^»]}#hÜB·¤$I®I²¾vZJúTÌ3¶íW0åÏÐ]¤³P,N{\ÌKvGAðB#ù¸ÃFÅÑ¡+ÉñGËø^H|ÝfsÙ6¥o,±wL,¯4rÂÍOvÔTMÊ|5e¢T¯*Seõý5´úÊu«ZjÁG£*ã°zñ¢uøy´NâËSƬûð«ñ8Mõ_G¬|4ÉüPñ¤wÞSÁn °
There are no php errors or errors at the software providers end either. I have set Apache and PHPs charsets to UTF-8. I have the same script working on an external centos 7 setup and the vendor says other customers use xampp with no problem. So I'm guessing this is a local character encoding problem with windows or php? How can I to narrow it down so I can decode the string correctly? Thanks.
My first guess was that a piece of the data is missing, and because of the cyclic inner structure of base64 if we'd cut off at a different offset we could get the data back.
So I've tried:
echo base64_decode(substr($payload, $offset));
for different offsets around 3400 (where the corruption starts), and got no luck! At any offset it was still gibberish.
Then I tried collecting byte statistics to see if it's coherent to the distribution usually found in text files to which JSON belongs.
$pivot = 3400;
$decodeJson = base64_decode(substr($payload, 0, $pivot));
$decodeGibberish = base64_decode(substr($payload, $pivot));
print_r(getByteDistribution($decodeJson));
print str_repeat('-', 100) . "\n";
print_r(getByteDistribution($decodeGibberish));
function getByteDistribution(string $input): array {
$distribution = [];
for ($i = 0; $i < strlen($input); ++$i) {
$ord = ord($input[$i]);
if (!isset($distribution[$ord])) {
$distribution[$ord] = 0;
}
$distribution[$ord]++;
}
arsort($distribution, SORT_NUMERIC);
return $distribution;
}
And this is what I got:
Array
(
[58] => 281
[34] => 236
[101] => 181
[115] => 132
[59] => 129
[105] => 101
[48] => 88
[49] => 83
[116] => 76
[110] => 76
[97] => 70
[100] => 69
[50] => 66
[114] => 61
[51] => 60
[99] => 55
[53] => 52
[52] => 50
[117] => 49
[56] => 47
...skipped for brevity...
)
---------------------------------------------------------------------
Array
(
[245] => 21
[222] => 19
[240] => 18
[6] => 18
[252] => 17
[55] => 16
[106] => 15
[201] => 15
[79] => 14
[133] => 14
[56] => 14
[119] => 14
[29] => 14
[117] => 14
[164] => 13
[118] => 13
[157] => 13
[104] => 13
[196] => 13
[241] => 13
...skipped for brevity...
)
I advise you to run it and see for yourself: while distribution in JSON part shows high peaks for often-repeated character, the distribution for gibberish is much more even.
Which is a strong indicator for encrypted data.
So the problem is not with the payload, but with deciphering it.
And it occurred to me that there is just the same common problem when uploading a zip file to FTP. If you happen to use text mode in your FTP client, then line ending characters which may happen in the encrypted data may be irreversibly lost due to conversion. This is most often happens with Windows because it uses different line endings than Unix/Linux.
So my suggestion is that you check the code for downloading the encrypted data and see if a binary mode may be enabled somewhere. For example, if you use fopen, then read mode should be denoted as 'rb' ('b' for binary-safe), and not just 'r'. If you use a different transport, it's up to you to inspect it.

Converting Base64 encoded tab delimited file in PHP

I am writing a reporting app that needs to consume logs which have been stored in the DB as base 64 encoded strings. I am able to decode them no problem, however, I am having some trouble getting them to be fed into str_getcsv() properly.
Below is the data I am working with, the code and the outputs. It seems to me that once decoded the files are not recognizable as tab-delimited. However, if I decode it with this URL and and save as a text file, I can open it properly in excel.
https://www.base64decode.org/
In PHP however, it seems to be an issue with recognizing some of the tabs and the line breaks seem to completely go away. I think it has to do with the encoding, the DB table and column are both UTF-8. They are being recognized as ASCII - which is a subset of UTF-8, but I am not sure if they need to be explicitly UTF-8 for it to work (the site that works uses UTF-8).
The code: very simple (though at this point I may be going overboard with the encoding)
// get the stored result (laravel eloquent)
$media_result = MediaResult::where("video_id", "=", $media_benchmark->id)->firstOrFail();
# decode the access_log stored as b64 string
$tab_file = base64_decode(mb_convert_encoding($media_result->access_log, "UTF-8"));
$encoding = mb_detect_encoding($tab_file); // I was using iconv() so I grabbed this - it is always ASCII
$new_file = mb_convert_encoding($tab_file,'UTF-8');
$encoding_new = mb_detect_encoding($new_file);
#if I were to echo both encoding variables, it would be ASCII - no matter what I do.
# convert the supposed tab-delimited file into an array
$full_stats = str_getcsv($new_file, 0, "\t");
Here is a sample base64 encoded log:
VVJJCXNlcnZlckFkZHJlc3MJbnVtYmVyT2ZTZXJ2ZXJBZGRyZXNzQ2hhbmdlcwltZWRpYVJlcXVlc3RzV1dBTgl0cmFuc2ZlckR1cmF0aW9uCW51bWJlck9mQnl0ZXNUcmFuc2ZlcnJlZAludW1iZXJPZk1lZGlhUmVxdWVzdHMJcGxheWJhY2tTdGFydERhdGUJcGxheWJhY2tTZXNzaW9uSUQJcGxheWJhY2tTdGFydE9mZnNldAlwbGF5YmFja1R5cGUJc3RhcnR1cFRpbWUJZHVyYXRpb25XYXRjaGVkCW51bWJlck9mRHJvcHBlZFZpZGVvRnJhbWVzCW51bWJlck9mU3RhbGxzCW51bWJlck9mU2VnbWVudHNEb3dubG9hZGVkCXNlZ21lbnRzRG93bmxvYWRlZER1cmF0aW9uCWRvd25sb2FkT3ZlcmR1ZQlvYnNlcnZlZEJpdHJhdGVTdGFuZGFyZERldmlhdGlvbglvYnNlcnZlZE1heEJpdHJhdGUJb2JzZXJ2ZWRNaW5CaXRyYXRlCXN3aXRjaEJpdHJhdGUJaW5kaWNhdGVkQml0cmF0ZQlvYnNlcnZlZEJpdHJhdGUKaHR0cDovL3Zldm9wbGF5bGlzdC1saXZlLmhscy5hZGFwdGl2ZS5sZXZlbDMubmV0L3Zldm8vY2gxLzAxL3Byb2dfaW5kZXgubTN1OAk4LjI1NC4yMy4yNTQJMAkwCTAuNjc4MjgwNzA5CTEwOTk2MTIJMwkyMDE2LTA1LTEwIDE5OjIxOjE4ICswMDAwCTdBMTI5MERDLTE2MzAtNDlGQy1BQTY0LUNDNzZDMTgxQzcyQQk0MglMSVZFCTAuMjUzMjk3OTg0NjAwMDY3MQkxNi4wODMyNjU5NjAyMTY1MgkwCTAJMwkxOAkwCS0xCTI1NTcyOTAxLjM4MzMwNzg3CTE4MjA3OTg3LjMyODUyNTkJMTAxMTU1NDguNzgzODE4MjUJNDkyMDAwCTIxMDI1OTU1LjA1Mzg4OTI0Cmh0dHA6Ly92ZXZvcGxheWxpc3QtbGl2ZS5obHMuYWRhcHRpdmUubGV2ZWwzLm5ldC92ZXZvL2NoMS8wNi9wcm9nX2luZGV4Lm0zdTgJOC4yNTMuMzIuMTI2CTgJMAkzNS43NDAxNjM2MjIJMTIzNDgxOTcyCTQzCTIwMTYtMDUtMTAgMTk6MjE6MzQgKzAwMDAJN0ExMjkwREMtMTYzMC00OUZDLUFBNjQtQ0M3NkMxODFDNzJBCTU4LjAyODk5NDM1OAlMSVZFCTAJMjQxLjkyNjk3NTk2NTQ5OTkJMAkwCTQzCTI1OAkwCS0xCTQ2ODg1OTAzLjAzNTk4OTkzCTEwODA3NDU3LjM4MjQwNjY3CS0xCTQwMDAwMDAJMzE3ODIzNjAuNjE0NTI4NjM=
Here is the same string decoded:
URI serverAddress numberOfServerAddressChanges mediaRequestsWWAN transferDuration numberOfBytesTransferred numberOfMediaRequests playbackStartDate playbackSessionID playbackStartOffset playbackType startupTime durationWatched numberOfDroppedVideoFrames numberOfStalls numberOfSegmentsDownloaded segmentsDownloadedDuration downloadOverdue observedBitrateStandardDeviation observedMaxBitrate observedMinBitrate switchBitrate indicatedBitrate observedBitrate http://vevoplaylist-live.hls.adaptive.level3.net/vevo/ch1/01/prog_index.m3u8 8.254.23.254 0 0 0.678280709 1099612 3 2016-05-10 19:21:18 +0000 7A1290DC-1630-49FC-AA64-CC76C181C72A 42 LIVE 0.2532979846000671 16.08326596021652 0 0 3 18 0 -1 25572901.38330787 18207987.3285259 10115548.78381825 492000 21025955.05388924 http://vevoplaylist-live.hls.adaptive.level3.net/vevo/ch1/06/prog_index.m3u8 8.253.32.126 8 0 35.740163622 123481972 43 2016-05-10 19:21:34 +0000 7A1290DC-1630-49FC-AA64-CC76C181C72A 58.028994358 LIVE 0 241.9269759654999 0 0 43 258 0 -1 46885903.03598993 10807457.38240667 -1 4000000 31782360.61452863
Finally, here is the resulting array:
Array ( [0] => URI serverAddress numberOfServerAddressChanges mediaRequestsWWAN transferDuration numberOfBytesTransferred numberOfMediaRequests playbackStartDate playbackSessionID playbackStartOffset playbackType startupTime durationWatched numberOfDroppedVideoFrames numberOfStalls numberOfSegmentsDownloaded segmentsDownloadedDuration downloadOverdue observedBitrateStandardDeviation observedMaxBitrate observedMinBitrate switchBitrate indicatedBitrate observedBitrate http://vevoplaylist-live.hls.adaptive.level3.net/vevo/ch1/ [1] => 1/prog_index.m3u8 8.254.23.254 [2] => 0 [3] => .67828 [4] => 7 [5] => 9 1 [6] => 99612 3 2 [7] => 16- [8] => 5-1 [9] => 19:21:18 + [10] => [11] => [12] => [13] => 7A1290DC-1630-49FC-AA64-CC76C181C72A42 LIVE [14] => .2532979846 [15] => [16] => [17] => 671 16. [18] => 8326596 [19] => 21652 [20] => 03 18 [21] => -1255729 [22] => 1.3833 [23] => 787 182 [24] => 7987.3285259 1 [25] => 115548.78381825 492 [26] => [27] => [28] => 21025955.05388924 http://vevoplaylist-live.hls.adaptive.level3.net/vevo/ch1/06/prog_index.m3u88.253.32.126 8 [29] => 35.740163622123481972 43 2 [30] => 16- [31] => 5-1 [32] => 19:21:34 + [33] => [34] => [35] => [36] => 7A1290DC-1630-49FC-AA64-CC76C181C72A58. [37] => 28994358 LIVE [38] => 241.9269759654999 [39] => 043 258 [40] => -1468859 [41] => 3. [42] => 3598993 1 [43] => 8 [44] => 7457.3824 [45] => 667 -1 4 [46] => [47] => [48] => [49] => [50] => [51] => 31782360.61452863 )
Keep in mind that str_getcsv()
parses only one line of a csv file
expects the delimiter "\t" to be the second parameter, not the third
You probably want something like:
$full_stats = [];
foreach(explode("\n", $decoded) as $line) {
$full_stats[] = str_getcsv($line, "\t");
}
var_dump($full_stats);
This will output an array containing 3 arrays (aka rows) containing 24 items (aka columns) each.
See http://sandbox.onlinephpfunctions.com/code/1ccf5115df6f8c342ff7c7e451f3ea26e081197e for working example and generated output.
Regarding the import of data that contains line breaks you should switch to fget_csv() which handles line breaks correctly:
$csv = <<< eot
"first","my data
with line breaks"
"second", "simple data"
eot;
// We need to "convert" the string to a file handle
$fp = fopen('data://text/plain,' . $csv,'r');
while ($data = fgetcsv($fp)) {
var_dump($data);
}

preg match all in array searching for [ ]

i have found the solution mysqlf using:
foreach ($output as $value) {
if (strpos($value, "]:") > -1) {
$tal = substr($value, strpos($value, "]:") +3) . "<br>";
echo $tal;
}
}
this returns:
-210
-212
Thanks in advance.
I want to preg so i only get the line: [10] => [147]: -210
or both [10] => [147]: -210 and [21] => [148]: -212
how can i preg [147]: or is there a better way to get the specific information?
my array $output contains:
Array
(
[0] => modpoll 3.4 - FieldTalk(tm) Modbus(R) Master Simulator
[1] => Copyright (c) 2002-2013 proconX Pty Ltd
[2] => Visit http://www.modbusdriver.com for Modbus libraries and tools.
[3] =>
[4] => Protocol configuration: MODBUS/TCP
[5] => Slave configuration...: address = 1, start reference = 147, count = 1
[6] => Communication.........: 10.234.6.11, port 502, t/o 1.00 s, poll rate 1000 ms
[7] => Data type.............: 16-bit register, output (holding) register table
[8] =>
[9] => -- Polling slave...
[10] => [147]: -210
[11] => modpoll 3.4 - FieldTalk(tm) Modbus(R) Master Simulator
[12] => Copyright (c) 2002-2013 proconX Pty Ltd
[13] => Visit http://www.modbusdriver.com for Modbus libraries and tools.
[14] =>
[15] => Protocol configuration: MODBUS/TCP
[16] => Slave configuration...: address = 1, start reference = 148, count = 1
[17] => Communication.........: 10.234.6.11, port 502, t/o 1.00 s, poll rate 1000 ms
[18] => Data type.............: 16-bit register, output (holding) register table
[19] =>
[20] => -- Polling slave...
[21] => [148]: -212
)
$matches = preg_grep ('/^[147] (\w+)/i', $output);
print_r ($matches);
//only returns Array()
You need to escape the opening square bracket because [147] is seen as a character class that contains 1, 4 and 7
You can do this with:
$result=preg_grep('~^\[14(?:7|8)]:~',$rgData);
print_r($result);
You can find all that you want to know about escaping (or not) square brackets here

Can't read csv(Tab delimited) properly

I have simple csv file which is tab delimited which i have to use as it is because it is coming from somewhere and i hvae to read it and insert it into my db i have used a simple php code to read it
if(($handle = fopen("var/import/MMT29DEC.csv","r"))!==FALSE){
/*Skip the first row*/
fgetcsv($handle, 1000,chr(9));
while(($data = fgetcsv($handle,1000,chr(9)))!==FALSE){
print_r($data[0]);
}
}
When print_r the data it shows like
Array ( [0] => 01SATAPC [1] => 40ATAPC [2] => [3] => 21P [4] => SERIAL ATA POWER CABLE [5] => 0.00 [6] => 2.00 [7] => 0 [8] => Power Supplies [9] => SERIAL ATA POWER CABLE [10] =>
4 TO 15 PIN 160MM
[11] => [12] => [13] => [14] => MELBHO [15] => 0.000 [16] => [17] => Order to Order [18] => 4 [19] => 2013-01-18 )
Which is the desired result but when i go to access the particular column value using the $data['index'] e.g. $data[8] or $data[1] it weirdly giving me garbage values says for some iterations it give me right values but after 10-15 rows its starting giving me the some numbers and other column values..... i don't know whats is going on with this as far as i know it should be formatting issue i have tried open the file in excel and its coming fine....
#ravisoni are you sure that the second parameter to fgetcsv of 1000 is longer than the longest line in your file? Try setting it to 0 as the docs say [php.net/fgetcsv] and see if that makes a difference.
if(($handle = fopen("var/import/MMT29DEC.csv","r"))!==FALSE){
/*Skip the first row*/
fgetcsv($handle, 0,chr(9));
while(($data = fgetcsv($handle,0,chr(9)))!==FALSE){
print_r($data[0]);
}
}

How to check is timezone identifier valid from code?

I'll try to explain what's the problem here.
According to list of supported timezones from PHP manual, I can see all valid TZ identifiers in PHP.
My first question is how to get that list from code but that's not what I really need.
My final goal is to write function isValidTimezoneId() that returns TRUE if timezone is valid, otherwise it should return FALSE.
function isValidTimezoneId($timezoneId) {
# ...function body...
return ?; # TRUE or FALSE
}
So, when I pass TZ identifier using $timezoneId (string) in function I need boolean result.
Well, what I have so far...
1) Solution using # operator
First solution I've got is something like this:
function isValidTimezoneId($timezoneId) {
$savedZone = date_default_timezone_get(); # save current zone
$res = $savedZone == $timezoneId; # it's TRUE if param matches current zone
if (!$res) { # 0r...
#date_default_timezone_set($timezoneId); # try to set new timezone
$res = date_default_timezone_get() == $timezoneId; # it's true if new timezone set matches param string.
}
date_default_timezone_set($savedZone); # restore back old timezone
return $res; # set result
}
That works perfectly, but I want another solution (to avoid trying to set wrong timezone)
2) Solution using timezone_identifiers_list()
Then, I was trying to get list of valid timezone identifiers and check it against parameter using in_array() function. So I've tried to use timezone_identifiers_list(), but that was not so good because a lot of timezones was missing in array returned by this function (alias of DateTimeZone::listIdentifiers()). At first sight that was exactly what I was looking for.
function isValidTimezoneId($timezoneId) {
$zoneList = timezone_identifiers_list(); # list of (all) valid timezones
return in_array($timezoneId, $zoneList); # set result
}
This code looks nice and easy but than I've found that $zoneList array contains ~400 elements. According to my calculations it should return 550+ elements. 150+ elements are missing... So that's not good enough as solution for my problem.
3) Solution based on DateTimeZone::listAbbreviations()
This is last step on my way trying to find perfect solution. Using array returned by this method I can extract all timezone identifiers supported by PHP.
function createTZlist() {
$tza = DateTimeZone::listAbbreviations();
$tzlist = array();
foreach ($tza as $zone)
foreach ($zone as $item)
if (is_string($item['timezone_id']) && $item['timezone_id'] != '')
$tzlist[] = $item['timezone_id'];
$tzlist = array_unique($tzlist);
asort($tzlist);
return array_values($tzlist);
}
This function returns 563 elements (in Example #2 I've got just 407).
I've tried to find differences between those two arrays:
$a1 = timezone_identifiers_list();
$a2 = createTZlist();
print_r(array_values(array_diff($a2, $a1)));
Result is:
Array
(
[0] => Africa/Asmera
[1] => Africa/Timbuktu
[2] => America/Argentina/ComodRivadavia
[3] => America/Atka
[4] => America/Buenos_Aires
[5] => America/Catamarca
[6] => America/Coral_Harbour
[7] => America/Cordoba
[8] => America/Ensenada
[9] => America/Fort_Wayne
[10] => America/Indianapolis
[11] => America/Jujuy
[12] => America/Knox_IN
[13] => America/Louisville
[14] => America/Mendoza
[15] => America/Porto_Acre
[16] => America/Rosario
[17] => America/Virgin
[18] => Asia/Ashkhabad
[19] => Asia/Calcutta
[20] => Asia/Chungking
[21] => Asia/Dacca
[22] => Asia/Istanbul
[23] => Asia/Katmandu
[24] => Asia/Macao
[25] => Asia/Saigon
[26] => Asia/Tel_Aviv
[27] => Asia/Thimbu
[28] => Asia/Ujung_Pandang
[29] => Asia/Ulan_Bator
[30] => Atlantic/Faeroe
[31] => Atlantic/Jan_Mayen
[32] => Australia/ACT
[33] => Australia/Canberra
[34] => Australia/LHI
[35] => Australia/NSW
[36] => Australia/North
[37] => Australia/Queensland
[38] => Australia/South
[39] => Australia/Tasmania
[40] => Australia/Victoria
[41] => Australia/West
[42] => Australia/Yancowinna
[43] => Brazil/Acre
[44] => Brazil/DeNoronha
[45] => Brazil/East
[46] => Brazil/West
[47] => CET
[48] => CST6CDT
[49] => Canada/Atlantic
[50] => Canada/Central
[51] => Canada/East-Saskatchewan
[52] => Canada/Eastern
[53] => Canada/Mountain
[54] => Canada/Newfoundland
[55] => Canada/Pacific
[56] => Canada/Saskatchewan
[57] => Canada/Yukon
[58] => Chile/Continental
[59] => Chile/EasterIsland
[60] => Cuba
[61] => EET
[62] => EST
[63] => EST5EDT
[64] => Egypt
[65] => Eire
[66] => Etc/GMT
[67] => Etc/GMT+0
[68] => Etc/GMT+1
[69] => Etc/GMT+10
[70] => Etc/GMT+11
[71] => Etc/GMT+12
[72] => Etc/GMT+2
[73] => Etc/GMT+3
[74] => Etc/GMT+4
[75] => Etc/GMT+5
[76] => Etc/GMT+6
[77] => Etc/GMT+7
[78] => Etc/GMT+8
[79] => Etc/GMT+9
[80] => Etc/GMT-0
[81] => Etc/GMT-1
[82] => Etc/GMT-10
[83] => Etc/GMT-11
[84] => Etc/GMT-12
[85] => Etc/GMT-13
[86] => Etc/GMT-14
[87] => Etc/GMT-2
[88] => Etc/GMT-3
[89] => Etc/GMT-4
[90] => Etc/GMT-5
[91] => Etc/GMT-6
[92] => Etc/GMT-7
[93] => Etc/GMT-8
[94] => Etc/GMT-9
[95] => Etc/GMT0
[96] => Etc/Greenwich
[97] => Etc/UCT
[98] => Etc/UTC
[99] => Etc/Universal
[100] => Etc/Zulu
[101] => Europe/Belfast
[102] => Europe/Nicosia
[103] => Europe/Tiraspol
[104] => Factory
[105] => GB
[106] => GB-Eire
[107] => GMT
[108] => GMT+0
[109] => GMT-0
[110] => GMT0
[111] => Greenwich
[112] => HST
[113] => Hongkong
[114] => Iceland
[115] => Iran
[116] => Israel
[117] => Jamaica
[118] => Japan
[119] => Kwajalein
[120] => Libya
[121] => MET
[122] => MST
[123] => MST7MDT
[124] => Mexico/BajaNorte
[125] => Mexico/BajaSur
[126] => Mexico/General
[127] => NZ
[128] => NZ-CHAT
[129] => Navajo
[130] => PRC
[131] => PST8PDT
[132] => Pacific/Ponape
[133] => Pacific/Samoa
[134] => Pacific/Truk
[135] => Pacific/Yap
[136] => Poland
[137] => Portugal
[138] => ROC
[139] => ROK
[140] => Singapore
[141] => Turkey
[142] => UCT
[143] => US/Alaska
[144] => US/Aleutian
[145] => US/Arizona
[146] => US/Central
[147] => US/East-Indiana
[148] => US/Eastern
[149] => US/Hawaii
[150] => US/Indiana-Starke
[151] => US/Michigan
[152] => US/Mountain
[153] => US/Pacific
[154] => US/Pacific-New
[155] => US/Samoa
[156] => Universal
[157] => W-SU
[158] => WET
[159] => Zulu
)
This list contains all valid TZ identifiers that Example #2 failed to match.
There's four TZ identifiers more (part of $a1):
print_r(array_values(array_diff($a1, $a2)));
Output
Array
(
[0] => America/Bahia_Banderas
[1] => Antarctica/Macquarie
[2] => Pacific/Chuuk
[3] => Pacific/Pohnpei
)
So now, I have almost perfect solution...
function isValidTimezoneId($timezoneId) {
$zoneList = createTZlist(); # list of all valid timezones (last 4 are not included)
return in_array($timezoneId, $zoneList); # set result
}
That's my solution and I can use it. Of course, I use this function as part of class so I don't need to generate $zoneList on every methods call.
What I really need here?
I'm wondering, is there any easier (quicker) solution to get list of all valid timezone identifiers as array (I want to avoid extracting that list from DateTimeZone::listAbbreviations() if that's possible)? Or if you know another way how to check is timezone parameter valid, please let me know (I repeat, # operator can't be part of solution).
P.S. If you need more details and examples, let me know. I guess you don't.
I'm using PHP 5.3.5 (think that's not important).
Update
Any part of code that throws exception on invalid timezone string (hidden using # or caught using try..catch block) is not solution I'm looking for.
Another update
I've put small bounty on this question!
Now I'm looking for the easiest way how to extract list of all timezone identifiers in PHP array.
Why not use # operator?
This code works pretty well, and you don't change default timezone:
function isValidTimezoneId($timezoneId) {
#$tz=timezone_open($timezoneId);
return $tz!==FALSE;
}
If you don't want #, you can do:
function isValidTimezoneId($timezoneId) {
try{
new DateTimeZone($timezoneId);
}catch(Exception $e){
return FALSE;
}
return TRUE;
}
You solution works fine, so if it's speed you're looking for I would look more closely at what you're doing with your arrays. I've timed a few thousand trials to get reasonable average times, and these are the results:
createTZlist : 20,713 microseconds per run
createTZlist2 : 13,848 microseconds per run
Here's the faster function:
function createTZList2()
{
$out = array();
$tza = timezone_abbreviations_list();
foreach ($tza as $zone)
{
foreach ($zone as $item)
{
$out[$item['timezone_id']] = 1;
}
}
unset($out['']);
ksort($out);
return array_keys($out);
}
The if test is faster if you reduce it to just if ($item['timezone_id']), but rather than running it 489 times to catch a single case, it's quicker to unset the empty key afterwards.
Setting hash keys allows us the skip the array_unique() call which is more expensive. Sorting the keys and then extracting them is a tiny bit faster than extracting them and then sorting the extracted list.
If you drop the sorting (which is not needed unless you're comparing the list), it gets down to 12,339 microseconds.
But really, you don't need to return the keys anyway. Looking at the holistic isValidTimezoneId(), you'd be better off doing this:
function isValidTimezoneId2($tzid)
{
$valid = array();
$tza = timezone_abbreviations_list();
foreach ($tza as $zone)
{
foreach ($zone as $item)
{
$valid[$item['timezone_id']] = true;
}
}
unset($valid['']);
return !!$valid[$tzid];
}
That is, assuming you only need to test once per execution, otherwise you'd want to save $valid after the first run. This approach avoids having to do a sort or converting the keys to values, key lookups are faster than in_array() searches and there's no extra function call. Setting the array values to true instead of 1 also removes a single cast when the result is true.
This brings it reliably down to under 12ms on my test machine, almost half the time of your example. A fun experiment in micro-optimizations!
When I tried this on a Linux system running 5.3.6, your Example #2 gave me 411 zones and Example #3 gave 496. The following slight modification to Example #2 gives me 591:
$zoneList = timezone_identifiers_list(DateTimeZone::ALL_WITH_BC);
There are no zones returned by Example #3 that are not returned with that modified Example #2.
On an OS X system running 5.3.3, Example #2 gives 407, Example #3 gives 564, and the modified Example #2 gives 565. Again, there are no zones returned by Example #3 that are not returned with that modified Example #2.
On a Linux system running 5.2.6 with the timezonedb PECL extension installed, Example #2 gives me 571 zones and Example #3 gives me only 488. There are no zones returned by Example #3 that are not by Example #2 on this system. The constant DateTimeZone::ALL_WITH_BC does not seem to exist in 5.2.6; it was probably added in 5.3.0.
So it seems the simplest way to get a list of all time zones in 5.3.x is
timezone_identifiers_list(DateTimeZone::ALL_WITH_BC), and in 5.2.x is timezone_identifiers_list(). The simplest (if not fastest) way to check if a particular string is a valid time zone is still probably #timezone_open($timezoneId) !== false.
If you're on Linux most if not all information on timezones there stored at /usr/share/zoneinfo/. You can walk over them using is_file() and related functions.
You can also parse the former files with zdump for codes or fetch sources for these files and grep/cut out needed info. Again, you are not obliged to use built-in functions to accomplish the task. There isn't a rationale why would someone force you to use only the built-in date functions.
See in PHP sources on php_date.c and timezonemap.h that`s why I can say this is always in 101.111111% static info (but per php build).
If you want to get it dynamically, use timezone_abbreviations_list as DateTimeZone::listAbbreviations is a map to it.
As you can see all these values are just one time filled list for current PHP version.
So much faster solution is simple -- prepare somehow static file with retrieved ids one time per server during install of your app and use it.
For example:
function isValidTZ($zone) {
static $zones = null;
if (null === $zones) {
include $YOUR_APP_STORAGE . '/tz_list.php';
}
// isset is muuuch faster than array_key_exists and also than in_array
// so you should work with structure like [key => 1]
return isset($zones[$zone]);
}
tz_list.php should be like this:
<?php
$zones = array(
'Africa/Abidjan' => 1,
'Africa/Accra' => 1,
'Africa/Addis_Ababa' => 1,
// ...
);
I would research what changes the perfect array and use a basic caching mechanism (like store the array in a file, that you include and update when needed). You're currently optimizing building an array that is static for 99.9999% of all the requests.
Edit:
Ok, static/dynamic.
if( !function_exists(timezone_version_get) )
{
function timezone_version_get() { return '2009.6'; }
}
include 'tz_list_' . PHP_VERSION . '_' . timezone_version_get() . '.php';
Now each time the php version is updated, the file should be regenerated automatically by your code.
In case of php<5.3, how about this?
public static function is_valid_timezone($timezone)
{
$now_timezone = #date_default_timezone_get();
$result = #date_default_timezone_set($timezone);
if( $now_timezone ){
// set back to current timezone
date_default_timezone_set($now_timezone);
}
return $result;
}
Just an addendum to Cal's excellent answer. I think the following might be even faster...
function isValidTimezoneID($tzid) {
if (empty($tzid)) {
return false;
}
foreach (timezone_abbreviations_list() as $zone) {
foreach ($zone as $item) {
if ($item["timezone_id"] == $tzid) {
return true;
}
}
}
return false;
}

Categories