How to write NON ASCII data in one file with php - php

I must write a binary file with php, but I think that I don't use the correct method.
I use this function:
$ptr = fopen("file.txt", 'wb');
fwrite($ptr, $Str);
fclose($ptr);
I write this string (chaotic representation of 0 and 1):
$Str="00000001001110000000000100000100000000000000001000000010000000010000001100000101000000000000010000000001000000000000010100000";
I thought that opening the file with OpenOffice I would not have seen the text of zeros and ones, but instead I was sure I saw a chaotic sequence of characters.
Why I see the zeros and ones in open office? How can I do to write the raw data with php?

If you write text in a file, even when open in binary mode, you will get text in the file. Your 0 is not stored as a zero bit, but as the ASCII representation of the 0 caracter.
Use the binary format for numbers in PHP, for instance:
$var = OxFF; // equals to 1111 1111 in binary.
More in the manual

You can write any character with the chr() function.
Alternatively, you can do something like "\x0A" (that's a newline character).

Related

How can i surround double quotes to the hex came from the file

I have file named test.log,inside of it I have many hex lines,but the problem is that when I am trying to convert to
hex2bin($hex)
I receive this warning
Warning: hex2bin(): Hexadecimal input string must have an even length
but I will do manually example
$hex = "hexhere";
hex2bin($hex)
It works fine.because I surround it with doublquotes.
Thank you in advance.
A warning is thrown if the input string is of odd length. In PHP
5.4.0 the string was silently accepted, but the last byte was truncated.
As you can see your length in the file is odd! So i think your using a php version 5.4 or higher! (check the length in your file)
hex2bin — Decodes a hexadecimally encoded binary string, doc: http://php.net/manual/en/function.hex2bin.php

In PHP, How to convert unicode number strings into numbers correctly?

I have csv file encoded in unicode and when I read it either with fgetcsv or fgets and try to use the number strings as integer numbers in PHP, only the first character of the string is casting into a number, i.e
$str='2012';
$num=$str + 0; OR $num=(int)$str;
echo $num;
results -> 2
How can I convert these unicode number strings correctly?
I was not successful using conversion functions in PHP from unicode to other charsets!
The only way I know is to use a simple text editor like notepad or notepad++ and convert the file format to an ANSI csv.
Thanks for your help.
convert it to some other encoding, like UTF-8.
$str = mb_convert_encoding( $str, "UTF-8", "UTF-16LE");
Your string is actually like this (Manually constructed UTF-16LE):
$str = "2\x000\x001\x002\x00";
So php reads the first 2 and then sees NUL which is not a number, and you get 2.
BTW, LE BOM isn't handled here (\xFF\xFE) so show your full code and I will see.

How to write "pseudo" binary into file using php?

I have a string containing something like "01001010" and I want to write it into a file using binary. In other words, what's inside that file is not the chars 0/1, but in binary format.
How can I make that?
So you mean you want to convert a string of 0s and 1s (eg $bitString = '01010101...';) into binary data (0x55...), and then write that to a file, you need to do this in two steps.
First, convert your string of zeros and ones into binary - see Converting string of 1s and 0s into binary value, then compress afterwards ,PHP
Note that strings in PHP can store binary data.
Then just write the output to a file, eg using file_put_contents().

Working with files and utf8 in PHP

Lets say I have a file called foo.txt encoded in utf8:
aoeu
qjkx
ñpyf
And I want to get an array that contains all the lines in that file (one line per index) that have the letters aoeuñpyf, and only the lines with these letters.
I wrote the following code (also encoded as utf8):
$allowed_letters=array("a","o","e","u","ñ","p","y","f");
$lines=array();
$f=fopen("foo.txt","r");
while(!feof($f)){
$line=fgets($f);
foreach(preg_split("//",$line,-1,PREG_SPLIT_NO_EMPTY) as $letter){
if(!in_array($letter,$allowed_letters)){
$line="";
}
}
if($line!=""){
$lines[]=$line;
}
}
fclose($f);
However, after that, the $lines array just has the aoeu line in it.
This seems to be because somehow, the "ñ" in $allowed_letters is not the same as the "ñ" in foo.txt.
Also if I print a "ñ" of the file, a question mark appears, but if I print it like this print "ñ";, it works.
How can I make it work?
If you are running Windows, the OS does not save files in UTF-8, but in cp1251 (or something...) by default you need to save the file in that format explicitly or run each line in utf8_encode() before performing your check. I.e.:
$line=utf8_encode(fgets($f));
If you are sure that the file is UTF-8 encoded, is your PHP file also UTF-8 encoded?
If everything is UTF-8, then this is what you need :
foreach(preg_split("//u",$line,-1,PREG_SPLIT_NO_EMPTY) as $letter){
// ...
}
(append u for unicode chars)
However, let me suggest a yet faster way to perform your check :
$allowed_letters=array("a","o","e","u","ñ","p","y","f");
$lines=array();
$f=fopen("foo.txt","r");
while(!feof($f)){
$line=fgets($f);
$line = str_split(rtrim($line));
if (count(array_intersect($line, $allowed_letters)) == count($line)) {
$lines[] = $line;
}
}
fclose($f);
(add space chars to allow space characters as well, and remove the rtrim($line))
In UTF-8, ñ is encoded as two bytes. Normally in PHP all string operations are byte-based, so when you preg_split the input it splits up the first byte and the second byte into separate array items. Neither the first byte on its own nor the second byte on its own will match both bytes together as found in $allowed_letters, so it'll never match ñ.
As Yanick posted, the solution is to add the u modifier. This makes PHP's regex engine treat both the pattern and the input line as Unicode characters instead of bytes. It's lucky that PHP has special Unicode support here; elsewhere PHP's Unicode support is extremely spotty.
A simpler and quicker way than splitting would be to compare each line against a character-group regex. Again, this must be a u regex.
if(preg_match('/^[aoeuñpyf]+$/u', $line))
$lines[]= $line;
It sounds like you've already got your answer, but it is important to recognize that unicode characters can be stored in multiple ways. Unicode normalization* is a process which can help ensure comparisons work as expected.
http://en.wikipedia.org/wiki/Unicode_equivalence

In PHP what does it mean by a function being binary-safe?

In PHP what does it mean by a function being binary-safe ?
What makes them special and where are they typically used ?
It means the function will work correctly when you pass it arbitrary binary data (i.e. strings containing non-ASCII bytes and/or null bytes).
For example, a non-binary-safe function might be based on a C function which expects null-terminated strings, so if the string contains a null character, the function would ignore anything after it.
This is relevant because PHP does not cleanly separate string and binary data.
The other users already mentioned what binary safe means in general.
In PHP, the meaning is more specific, referring only to what Michael gives as an example.
All strings in PHP have a length associated, which are the number of bytes that compose it. When a function manipulates a string, it can either:
Rely on that length meta-data.
Rely on the string being null-terminated, i.e., that after the data that is actually part of the string, a byte with value 0 will appear.
It's also true that all string PHP variables manipulated by the engine are also null-terminated. The problem with functions that rely on 2., is that, if the string itself contains a byte with value 0, the function that's manipulating it will think the string has ended at that point and will ignore everything after that.
For instance, if PHP's strlen function worked like C standard library strlen, the result here would be wrong:
$str = "abc\x00abc";
echo strlen($str); //gives 7, not 3!
More examples:
<?php
$string1 = "Hello";
$string2 = "Hello\x00World";
// This function is NOT ! binary safe
echo strcoll($string1, $string2); // gives 0, strings are equal.
// This function is binary safe
echo strcmp($string1, $string2); // gives <0, $string1 is less than $string2.
?>
\x indicates hexadecimal notation. See: PHP strings
0x00 = NULL
0x04 = EOT (End of transmission)
ASCII table to see ASCII char list

Categories