Laravel Storage file encoding - php

I'm trying to save text file as UTF-8 by using Laravel's Storage facade. Unfortunately couldn't find a way and it saves as us-ascii. How can I save as UTF-8?
Currently I'm using following code to save file;
Storage::disk('public')->put('files/test.txt", $fileData);

You should be able to append "\xEF\xBB\xBF" (the BOM which defines it as UTF-8) to your $fileData. So:
Storage::disk('public')->put('files/test.txt", "\xEF\xBB\xBF" . $fileData);
There are other ways to convert your text before writing it to the file, but this is the simplest and easiest to read and execute. As far as I know, there is also no character encoding methods within Illuminate\Filesystem\Filesystem.
For more information: https://stackoverflow.com/a/9047876/823549 and What's different between UTF-8 and UTF-8 without BOM?.

ASCII is a subset of UTF-8, so all ASCII files are already UTF-8 encoded. The bytes in the ASCII file and the bytes that would result from "encoding it to UTF-8" would be exactly the same bytes. There's no difference between them, so there's no need to do anything.
It looks like your problem is that the files are not actually ASCII. You need to determine what encoding they are using, and transcode them properly.

I recommend using mb_convert_encoding instead
$fileData = mb_convert_encoding($fileData, "UTF-8", "auto");
Storage::disk('public')->put('files/test.txt", $fileData);

Related

Unable to convert file from ANSI to UTF-8, using PHP

I have a file, which contains some cyrillic characters. When I open this file in Notepad++ I see, that it has ANSI encoding. If I manually encode it into UTF-8 using Notepad++, then everything is absolutely ok - I can use this file in my parsers and get results. But what I want is to do it programmatically, using PHP. This is what I tried after searching through SO and documentation:
file_put_contents($file, utf8_encode(file_get_contents($file)));
In this case when my algorithm parses the resulting files, it meets such letters as "è", "í" , "â". In other words, in this case I get some rubbish. I also tried this:
file_put_contents($file, iconv('WINDOWS-1252', 'UTF-8', file_get_contents($file)));
But it produces the very same rubbish. So, I really wonder how can I achive programmatically what Notepad++ does. Thanks!
Notepad++ may report your encoding as ANSI but this does not necessarily equate to Windows-1252. 1252 is an encoding for the Latin alphabet, whereas 1251 is designed to encode Cyrillic script. So use
file_put_contents($file, iconv('WINDOWS-1251', 'UTF-8', file_get_contents($file)));
to convert from 1251 to utf-8 with iconv.

How to convert CSV's to UTF-8 with PHP

I have looked all over the internet and i cannot find an answer.
I am scraping thousands of CSV's from a source out of my control. The CSV can be ANY character encoding. so i need to convert them all to UTF-8.
I have read online that if you convert utf-8 to utf-8 the data gets scrabbled, so what i am trying to do is detect the character encoding of the file and if its not utf-8 i want to convert it to utf-8 (i plan to use iconv).
I have tried everything on stack overflow (and other sites) but i cannot seem to get the current encoding of the file.
If i use
mb_detect_encoding(file_get_contents($csvPath), mb_detect_order(), TRUE);
or
mb_detect_encoding(file_get_contents($csvPath),'auto');
has anyone got any suggestions on how i can detect the encoding of the csv or have a better way that i can convert files without knowing the original encoding.
Iv figured it out after hours of trial and error. forget mb_detect_encoding its useless.
to the shell instead and use iconv (installed by default on OSX and Linux).
$output = shell_exec("file --mime-encoding GBP_AUD_Week1.csv");
$output = str_replace("$csvPath: ", '', $output);
This gives the current file encoding
shell_exec(iconv -f $output -t utf-8 GBP_AUD_Week1.csv > GBP_AUD_Week1Converted.csv);
Note:
I tried to overwrite the file instead of creating a new one, but when i did this the file was blank and the encoding was binary.

PHP script convert an ainsi file to utf 8

As part of a project in PHP, I have to deal with a CSV file to put data in a database.
However, the csv file is encoded in AINSI but I would treat data as UTF-8 for them appear correctly in my database. Do you know a way to automate this conversion?
I already read the function mb_convert_encoding, but it works with $string parameters.
if you know for sure that your current encoding is pure ASCII, then you don't have to do anything because ASCII is already a valid UTF-8
But if you still want to convert just to be sure, then you can use iconv
$string = iconv('ASCII', 'UTF-8//IGNORE', $string);
The IGNORE will discard any invalid characters just in case some were not valid ASCII

PHP: Use (or not) 'utf8_encode' in combination with setting BOM to \xEF\xBB\xBF

When using the following code:
$myString = 'some contents';
$fh=fopen('newfile.txt',"w");
fwrite($fh, "\xEF\xBB\xBF" . $myString);
Is there any point of using PHP functions to first encode the text ($myString in the example) e.g. like running utf8_encode($myString); or similar iconv() commands?
Assuming that the BOM \xEF\xBB\xBF is first inputted into the file and that UTF8 represents practically all characters in the world I don't see any potential failure scenarion of creating a file this way. In other words I don't see any case where any major text editor wouldn't be able to interpret the newly created file corectly, displaying all characters as intended. This even if $myString would be a PHP $_POST variable from a HTML form. Am I right?
If your source file is UTF-8 encoded, then the string $myString is also UTF-8 encoded, you don't need to convert it. Otherwise, you need to use iconv() to convert the encoding first before write it to the file.
And note utf8_encode() is used to encode an ISO-8859-1 string to UTF-8.
Note that utf8_encode will only convert ISO-8859-1 encoded strings.
In general, given that PHP only supports a 256 char character set, you will need to utf-8 encode any string containing non-ASCII characters before writing it to UTF-8.
The BOM is optional (most text file readers now will scan the file for its encoding).
From Wikipedia
The Unicode Standard permits the BOM in UTF-8,[2] but does not require
or recommend for or against its use

Problem writing UTF-8 encoded file in PHP

I have a large file that contains world countries/regions that I'm seperating into smaller files based on individual countries/regions. The original file contains entries like:
EE.04 Järvamaa
EE.05 Jõgevamaa
EE.07 Läänemaa
However when I extract that and write it to a new file, the text becomes:
EE.04 Järvamaa
EE.05 Jõgevamaa
EE.07 Läänemaa
To save my files I'm using the following code:
mb_detect_encoding($text, "UTF-8") == "UTF-8" ? : $text = utf8_encode($text);
$fp = fopen(MY_LOCATION,'wb');
fwrite($fp,$text);
fclose($fp);
I tried saving the files with and without utf8_encode() and neither seems to work. How would I go about saving the original encoding (which is UTF8)?
Thank you!
First off, don't depend on mb_detect_encoding. It's not great at figuring out what the encoding is unless there's a bunch of encoding specific entities (meaning entities that are invalid in other encodings).
Try just getting rid of the mb_detect_encoding line all together.
Oh, and utf8_encode turns a Latin-1 string into a UTF-8 string (not from an arbitrary charset to UTF-8, which is what you really want)... You want iconv, but you need to know the source encoding (and since you can't really trust mb_detect_encoding, you'll need to figure it out some other way).
Or you can try using iconv with a empty input encoding $str = iconv('', 'UTF-8', $str); (which may or may not work)...
It doesn't work like that. Even if you utf8_encode($theString) you will not CREATE a UTF8 file.
The correct answer has something to do with the UTF-8 byte-order mark.
This to understand the issue:
- http://en.wikipedia.org/wiki/Byte_order_mark
- http://unicode.org/faq/utf_bom.html
The solution is the following:
As the UTF-8 byte-order mark is '\xef\xbb\xbf' we should add it to the document's header.
<?php
function writeStringToFile($file, $string){
$f=fopen($file, "wb");
$file="\xEF\xBB\xBF".$string; // utf8 bom
fputs($f, $string);
fclose($f);
}
?>
The $file could be anything text or xml...
The $string is your UTF8 encoded string.
Try it now and it will write a UTF8 encoded file with your UTF8 content (string).
writeStringToFile('test.xml', 'éèàç');
Maybe you want to call htmlentities($text) before writing it into file and html_entity_decode($fetchedData) before output. It'll work with Scandinavian letters.
It appears that your source file is not, in fact, in UTF-8. You might want to try using the same approach you've been using, but with a different encoding, such as UTF-16 perhaps.
You can do it as follows:
<?php
$s = "This is a string éèàç and it is in utf-8";
$f = fopen('myFile',"w");
fwrite($f, utf8_encode($s));
fclose($f);
?>

Categories