Text to Hex conversion in php is inaccurate

Text to Hex conversion in php is inaccurate - php

I'm trying to convert a text string to hexadecimal in php (which sounds trivial enough) but all the conversions I have tried output incorrect data.
The string I need to convert is;
RTP1 •. • A ¥;¥9ÈKJ| %¯ : E~WF 3HxI#Y¥
The correct result is;
525450310120209501022e2095204120030503040ba53b03040ba539c84b041f4a7c1120202025af032020203a20457e0357462033487849230459a52020202020
But I consistently get;
52545031012020e280a201022e20e280a2204120030503040bc2a53b03040bc2a539c3884b041f4a7c1120202025c2af032020203a20457e0357462033487849230459c2a52020202020
The online calculator at http://www.swingnote.com/tools/texttohex.php works on this perfectly - I have emailed the author to request the php source code but have had no answer.
I've tried the following functions without success;
bin2hex($data);
function strToHex($string)
{
$hex='';
for ($i=0; $i < strlen($string); $i++)
{
$hex .= dechex(ord($string[$i]));
}
return $hex;
}
for ($i = 0; $i < strlen($string); $i++) {
echo dechex(ord($string[$i]));
}
and a few others I can no longer find... I'm really at a loss with this so any help will be greatly appreciated!
Thanks!
Matthew

The input string appears to contain utf-8 encoded characters (I say this based on the output). Try converting these characters back into an ASCII/ISO-8859-1 alike format.
$indat = utf8_decode("...");
$hexdata = bin2hex($indat);

I usually just process it one char at a time.
$str = 'My Cool String!';
$hex = '';
$str_ary = str_split($str);
foreach($str_ary as $char)
{
$hex .= dechex(ord($char));
}
echo $hex;
Edit:
Looking at it again, it looks like our code is very similar (didn't notice the code :\ ). I believe Jeff Parker has the right idea in the comment, it might just be a display issue.

Related

Parse UTF-8 string char-by-char in PHP

I'm sorry if I'm asking the obvious, but I can't seem to find a working solution for a simple task. On the input I have a string, provided by a user, encoded with UTF-8 encoding. I need to sanitize it by removing all characters less than 0x20 (or space), except 0x7 (or tab.)
The following works for ANSI strings, but not for UTF-8:
$newName = "";
$ln = strlen($name);
for($i = 0; $i < $ln; $i++)
{
$ch = substr($name, $i, 1);
$och = ord($ch);
if($och >= 0x20 ||
$och == 0x9)
{
$newName .= $ch;
}
}
It totally missed UTF-8 encoded characters and treats them as bytes. I keep finding posts where people suggest using mb_ functions, but that still doesn't help me. (For instance, I tried calling mb_strlen($name, "utf-8"); instead of strlen, but it still returns the length of string in BYTEs instead of characters.)
Any idea how to do this in PHP?
PS. Sorry, my PHP is somewhat rusty.

If you use multibyte functions (mb_) then you have to use them for everything. In this example you should use mb_strlen() and mb_substr().
The reason it is not working is probably because you are using ord(). It only works with ASCII values:
ord
(PHP 4, PHP 5)
ord — Return ASCII value of character
...
Returns the ASCII value of the first character of string.
In other words, if you throw a multibyte character into ord() it will only use the first byte, and throw away the rest.

Wow, PHP is one messed up language. Here's what worked for me (but how much slower will this run for a longer chunk of text...):
function normalizeName($name, $encoding_2_use, $encoding_used)
{
//'$name' = string to normalize
// INFO: Must be encoded with '$encoding_used' encoding
//'$encoding_2_use' = encoding to use for return string (example: "utf-8")
//'$encoding_used' = encoding used to encode '$name' (can be also "utf-8")
//RETURN:
// = Name normalized, or
// = "" if error
$resName = "";
$ln = mb_strlen($name, $encoding_used);
if($ln !== false)
{
for($i = 0; $i < $ln; $i++)
{
$ch = mb_substr($name, $i, 1, $encoding_used);
$arp = unpack('N', mb_convert_encoding($ch, 'UCS-4BE', $encoding_used));
if(count($arp) >= 1)
{
$och = intval($arp[1]); //Index 1?! I don't understand why, but it works...
if($och >= 0x20 || $och == 0x9)
{
$ch2 = mb_convert_encoding('&#'.$och.';', $encoding_2_use, 'HTML-ENTITIES');
$resName .= $ch2;
}
}
}
}
return $resName;
}

How does youtube encode their urls?

Quick Question
How does youtube encode theirs urls? take below
http://www.youtube.com/watch?v=MhWyAL2hKlk
what are they doing to get the value MhWyAL2hKlk
are they using some kind of encryption then decrypting at their end
I want to something similar with a website i am working on below looks horrible.
http://localhost:8888/example/account_player/?playlist=drum+and+bass+music
i would like to encode the urls to act like youtubes dont know how they do it tho.
Any advice

Well, technically speaking, YouTube generates video IDs by using an algorithm. Honestly, I have no idea. It could be a hashsum of the entire video file + a salt using the current UNIX time, or it could be a base64 encoding of something unique to the video. But I do know that it's most likely not random, because if it were, the risk of collision would be too high.
For the sake of example, though, we'll assume that YouTube does generate random ID's. Keep in mind that when using randomly generated values to store something, it is generally a good idea to implement collision checking to ensure that a new object doesn't overwrite the existing one. In practice, though, I would recommend using a hashing algorithm, since they are one-way and very effective at preventing collisions.
So, I'm not very familiar with PHP. I had to write it in JavaScript first. Then, I ported it to PHP, which turned out to be relatively simple:
function randch($charset){
return $charset[rand() % strlen($charset)];
}
function randstr($len, $charset = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_-"){
$out = [];
for($i = 0; $i < $len; $i++){
array_push($out, randch($charset));
}
return join("", $out);
}
What this does is generate a random string len characters long via the given charset.
Here's some sample output:
randstr(5) -> 1EWHd
randstr(30) -> atcUVgfhAmM5bXz-3jgyRoaVnnY2jD
randstr(30, "asdfASDF") -> aFSdSAfsfSdAsSSddFFSSsdasDDaDa
Though it's not a good idea to use such a short charset.
randstr(30, "asdf")
sdadfaafsdsdfsaffsddaaafdddfad
adaaaaaafdfaadsadsdafdsfdfsadd
dfaffafaaddfdddadasaaafsfssssf
randstr(30)
r5BbvJ45HEN6dWtNZc5ZvHGLCg4Qyq
50vKb1rh66WWf9RLZQY2QrMucoNicl
Mklh3zjuRqDOnVYeEY3B0V3Moia9Dn
Now let's say you have told the page to use this function to generate a random id for a video that was just uploaded, now you want to store this key in a table with a link to the relevant data to display the right page. If an id is requested via $_GET (e.g. /watch?v=02R0-1PWdEf), you can tell the page to check this key against the database containing the video ids, and if it finds a match, grab the data from that key, else give a 404.
You can also encode directly to a base 64 string if you don't want it to be random. This can be done with base64_encode() and base64_decode(). For example, say you have the data for the video in one string $str="filename=apples.avi;owner=coolpixlol124", for whatever reason. base64_encode($str) will give you ZmlsZW5hbWU9YXBwbGVzLmF2aTtvd25lcj1jb29scGl4bG9sMTI0.
To decode it later use base64_decode($new_str), which will give back the original string.
Though, as I said before, it's probably a better idea to use a hashing algorithm like SHA.
I hope this helped.
EDIT: I forgot to mention, YouTube's video ids as of now are 11 characters long, so if you want to use the same kind of thing, you would want to use randstr(11) to generate an 11 digit random string, like this sample id I got: 6AMx8N5r6cg
EDIT 2 (2015.12.17): Completely re-wrote answer. Original was crap, I don't know what I was thinking when I wrote it.

Your question is similar to this other SO question which contains some optimised generator functions along with a clear description of the problem you're trying to solve:
php - help improve the efficiency of this youtube style url generator
It will provide you with code, a better understanding of performance issues, and a better understanding of the problem domain all at once.

Dunno how exactly google generates their strings, but the idea is really simple. Create a table like:
+----------+------------------------------+
| code | url |
+----------+------------------------------+
| asdlkasd | playlist=drum+and+bass+music |
+----------+------------------------------+
Now, create your url like:
http://localhost:8888/example/account_player/asdlkasd
After that, just read compare your own made code with the database url and load your image, video or whatever you intend to.
PS: This is just a fast example. It can be done in many other ways also of course.

If you don't want to use decimal numbers, you can encode them into base36:
echo base_convert(123456789, 10, 36); // => "21i3v9"
And decode back:
echo base_convert("21i3v9", 36, 10); // => "123456789"

function alphaID($in, $to_num = false, $pad_up = false, $pass_key = null)
{
$out = '';
$index = 'abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ';
$base = strlen($index);
if ($pass_key !== null) {
for ($n = 0; $n < strlen($index); $n++) {
$i[] = substr($index, $n, 1);
}
$pass_hash = hash('sha256',$pass_key);
$pass_hash = (strlen($pass_hash) < strlen($index) ? hash('sha512', $pass_key) : $pass_hash);
for ($n = 0; $n < strlen($index); $n++) {
$p[] = substr($pass_hash, $n, 1);
}
array_multisort($p, SORT_DESC, $i);
$index = implode($i);
}
if ($to_num) {
// Digital number <<-- alphabet letter code
$len = strlen($in) - 1;
for ($t = $len; $t >= 0; $t--) {
$bcp = bcpow($base, $len - $t);
$out = $out + strpos($index, substr($in, $t, 1)) * $bcp;
}
if (is_numeric($pad_up)) {
$pad_up--;
if ($pad_up > 0) {
$out -= pow($base, $pad_up);
}
}
} else {
// Digital number -->> alphabet letter code
if (is_numeric($pad_up)) {
$pad_up--;
if ($pad_up > 0) {
$in += pow($base, $pad_up);
}
}
for ($t = ($in != 0 ? floor(log($in, $base)) : 0); $t >= 0; $t--) {
$bcp = bcpow($base, $t);
$a = floor($in / $bcp) % $base;
$out = $out . substr($index, $a, 1);
$in = $in - ($a * $bcp);
}
}
return $out;
}
?>
you can encypt or decrypt using this function.
<?php
$random_id=57256;
$encode=alphaID($random_id);
$decode=alphaID($encode,true); //where boolean true reverse the string back to original
echo "Encode : {$encode} <br> Decode : {$decode}";
?>
Just visit the below for more info :
http://kvz.io/blog/2009/06/10/create-short-ids-with-php-like-youtube-or-tinyurl/

Just use an auto-increment ID value (from a database). Although I personally like the long URLs.

Convert inline specified UTF-8 mail subject

want to convert the following raw mail subject to normal UTF-8 text:
=?utf-8?Q?Schuker_hat_sich_vom_=C3=9Cbungsabend_(01.01.2012)_abgem?= =?utf-8?Q?eldet?=
The real text for that is:
Schuker hat sich vom Übungsabend (01.01.2012) abgemeldet
My first approach to convert this:
$mime = '=?utf-8?Q?Schuker_hat_sich_vom_=C3=9Cbungsabend_(01.01.2012)_abgem?= =?utf-8?Q?eldet?=';
mb_internal_encoding("UTF-8");
echo mb_decode_mimeheader($mime);
This gives me the following result:
Schuker_hat_sich_vom_Übungsabend_(01.01.2012)_abgemeldet
(Questions here: What am I doing wrong? Why do those underscores occur?)
My second approach to convert this:
$mime = '=?utf-8?Q?Schuker_hat_sich_vom_=C3=9Cbungsabend_(01.01.2012)_abgem?= =?utf-8?Q?eldet?=';
echo imap_utf8($mime);
This gives me the following (correct) result:
Schuker hat sich vom Übungsabend (01.01.2012) abgemeldet
Why does this work? On which method should I rely on?
The reason I ask is that I previously asked another mail subject decoding related question where mb_decode_mimeheader was the solution whereas here imap_utf8 would be the way to go. How can I ensure to decode everything correct for those both examples:
=?utf-8?Q?Schuker_hat_sich_vom_=C3=9Cbungsabend_(01.01.2012)_abgem?= =?utf-8?Q?eldet?
and
=?UTF-8?B?UmU6ICMyLUZpbmFsIEFjY2VwdGFuY2UgdGVzdCB3aXRoIG5ldyB0ZXh0IHdpdGggU2xvdg==?=
=?UTF-8?B?YWsgaW50ZXJwdW5jdGlvbnMgIivEvsWhxI3FpcW+w73DocOtw6khxYgi?=
Should give me the expected results:
Schuker hat sich vom Übungsabend (01.01.2012) abgemeldet
and
Re: #2-Final Acceptance test with new text with Slovak interpunctions "+ľščťžýáíé!ň"

Based on the hbit response, I've improved the imapUtf8() function to convert the subject text to UTF-8 using the charset information. The result is something like:
function imapUtf8($str){
$convStr = '';
$subLines = preg_split('/[\r\n]+/', $str);
for ($i=0; $i < count($subLines); $i++) {
$convLine = '';
$linePartArr = imap_mime_header_decode($subLines[$i]);
for ($j=0; $j < count($linePartArr); $j++) {
if ($linePartArr[$j]->charset === 'default') {
if ($linePartArr[$j]->text != " ") {
$convLine .= ($linePartArr[$j]->text);
}
} else {
$convLine .= iconv($linePartArr[$j]->charset, 'UTF-8', $linePartArr[$j]->text);
}
}
$convStr .= $convLine;
}
return $convStr;
}

This function works for both examples:
function imapUtf8($str){
$convStr = '';
$subLines = preg_split('/[\r\n]+/',$str); // split multi-line subjects
for($i=0; $i < count($subLines); $i++){ // go through lines
$convLine = '';
$linePartArr = imap_mime_header_decode(trim($subLines[$i])); // split and decode by charset
for($j=0; $j < count($linePartArr); $j++){
$convLine .= ($linePartArr[$j]->text); // append sub-parts of line together
}
$convStr .= $convLine; // append to whole subject
}
return $convStr; // return converted subject
}
Tests:
$sub1 = '=?utf-8?Q?Schuker_hat_sich_vom_=C3=9Cbungsabend_(01.01.2012)_abgem?= =?utf-8?Q?eldet?=';
$sub2 = '=?UTF-8?B?UmU6ICMyLUZpbmFsIEFjY2VwdGFuY2UgdGVzdCB3aXRoIG5ldyB0ZXh0IHdpdGggU2xvdg==?= =?UTF-8?B?YWsgaW50ZXJwdW5jdGlvbnMgIivEvsWhxI3FpcW+w73DocOtw6khxYgi?=';
echo imapUtf8($sub1);
echo imapUtf8($sub2);
Result:
Schuker hat sich vom Übungsabend (01.01.2012) abgemeldet
Re: #2-Final Acceptance test with new text with Slovak interpunctions "+ľščťžýáíé!ň"

It's also in the comments in the manual for mb_decode_mimeheader, and I actually assume it is a bug. None in the database, so I'd file it as a new one.
However, AFAIK imap_mime_header_decode will cope with both your encodings without a problem, so that will keep your code going.

About the mysterious underscore in the Subject header field:
RFC2047 4.2(2) states explicitly:
The 8-bit hexadecimal value 20 (e.g., ISO-8859-1 SPACE) may be
represented as "_" (underscore, ASCII 95.). (This character may
not pass through some internetwork mail gateways, but its use
will greatly enhance readability of "Q" encoded data with mail
readers that do not support this encoding.) Note that the "_"
always represents hexadecimal 20, even if the SPACE character
occupies a different code position in the character set in use.
The encoding rule for Subject line is documented in the very RFC2047 .

PHP: make random string URL safe and undo whatever made it safe

Given a randomly generated string, how do I convert it to make it URL safe -- and then "un convert" it?
PHP's bin2hex function (see: http://www.php.net/manual/en/function.bin2hex.php) seems to safely convert strings into URL safe characters. The hex2bin function (see: http://www.php.net/manual/en/function.hex2bin.php) is probably not ready yet. The following custom hex2bin function works sometimes:
function hex2bin($hexadecimal_data)
{
$binary_representation = '';
for ($i = 0; $i < strlen($hexadecimal_data); $i += 2)
{
$binary_representation .= chr(hexdec($hexadecimal_data{$i} . $hexadecimal_data{($i + 1)}));
}
return $binary_representation;
}
It only works right if the input to the function is a valid bin2hex string. If I send it something that was not a result of bin2hex, it dies. I can't seem to get it to throw an exception in case something is wrong.
Any suggestions what I can do? I'm not set on using hex2bin/bin2hex. All I need to to be able to convert a random string into a URL safe string, then reverse the process.

What you want to do is URL encode/decode the string:
$randomString = ...;
$urlSafe = urlencode($randomString);
$urlNotSafe = urldecode($urlSafe); // == $randomString

You can use urlencode()/urldecode().

PHP read binary file in real binary

I searched google for my problem but found no solution.
I want to read a file and convert the buffer to binary like 10001011001011001.
If I have something like this from the file
bmoov���lmvhd�����(tF�(tF�_�
K�T��������������������������������������������#���������������������������������trak���\tkh
d����(tF�(tF������� K������������������������������������������������#������������$edts��
How can I convert all characters (including also this stuff ��) to 101010101000110010 representation??
I hope someone can help me :)

Use ord() on each byte to get its decimal value and then sprintf to print it in binary form (and force each byte to include 8 bits by padding with 0 on front).
<?php
$buffer = file_get_contents(__FILE__);
$length = filesize(__FILE__);
if (!$buffer || !$length) {
die("Reading error\n");
}
$_buffer = '';
for ($i = 0; $i < $length; $i++) {
$_buffer .= sprintf("%08b", ord($buffer[$i]));
}
var_dump($_buffer);
$ php test.php
string(2096) "00111100001111110111000001101000011100000000101000100100011000100111010101100110011001100110010101110010001000000011110100100000011001100110100101101100011001010101111101100111011001010111010001011111011000110110111101101110011101000110010101101110011101000111001100101000010111110101111101000110010010010100110001000101010111110101111100101001001110110000101000100100011011000110010101101110011001110111010001101000001000000011110100100000011001100110100101101100011001010111001101101001011110100110010100101000010111110101111101000110010010010100110001000101010111110101111100101001001110110000101000001010011010010110011000100000001010000010000100100100011000100111010101100110011001100110010101110010001000000111110001111100001000000010000100100100011011000110010101101110011001110111010001101000001010010010000001111011000010100010000000100000011001000110100101100101001010000010001001010010011001010110000101100100011010010110111001100111001000000110010101110010011100100110111101110010010111000110111000100010001010010011101100001010011111010000101000001010001001000101111101100010011101010110011001100110011001010111001000100000001111010010000000100111001001110011101100001010011001100110111101110010001000000010100000100100011010010010000000111101001000000011000000111011001000000010010001101001001000000011110000100000001001000110110001100101011011100110011101110100011010000011101100100000001001000110100100101011001010110010100100100000011110110000101000100000001000000010010001011111011000100111010101100110011001100110010101110010001000000010111000111101001000000111001101110000011100100110100101101110011101000110011000101000001000100010010100110000001110000110010000100010001011000010000001100100011001010110001101100010011010010110111000101000011011110111001001100100001010000010010001100010011101010110011001100110011001010111001001011011001001000110100101011101001010010010100100101001001110110000101001111101000010100000101001110110011000010111001001011111011001000111010101101101011100000010100000100100010111110110001001110101011001100110011001100101011100100010100100111011"

On thing you could do is to read the file into a string variable, then print the string in your binary number representation with the use of sprintfDocs:
$string = file_get_contents($file);
for($l=strlen($string), $i=0; $i<$l; $i++)
{
printf('%08b', ord($string[$i]));
}
If you're just looking for a hexadecimal representation, you can use bin2hexDocs:
echo bin2hex($string);
If you're looking for a nicer form of hexdump, please see the related question:
How can I get a hex dump of a string in PHP?

Reading a file word-wise (32 bits at once) would be faster than byte-wise:
$s = file_get_contents("filename");
foreach(unpack("L*", $s) as $n)
$buf[] = sprintf("%032b", $n);

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Text to Hex conversion in php is inaccurate - php

The input string appears to contain utf-8 encoded characters (I say this based on the output). Try converting these characters back into an ASCII/ISO-8859-1 alike format. $indat = utf8_decode("..."); $hexdata = bin2hex($indat);

Related

Parse UTF-8 string char-by-char in PHP

How does youtube encode their urls?

Convert inline specified UTF-8 mail subject

PHP: make random string URL safe and undo whatever made it safe

PHP read binary file in real binary

Categories

Resources