Reading and parsing english with japnese characters from a cav file php - php

i have a csv file which have lots of lines like this:
I Want It All (Tribute to Queen);Dancer (おもしろ♪ Ver.)
Hijo De La Luna (Tribute to Mecano);Perfect (おもしろ♪ Ver.)
You've Got A Friend In Me (おもしろ♪英語 Ver.) [映画『トイ·ストーリー』より]
The CSV file has two columns. First one contains only english strings but 2nd one contains mix of english and japnese characters. My code to read this csv file:
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
<?php
header('Content-Encoding: UTF-8');
$string = file_get_contents('myfile.csv');
echo $string;
?>
// My output
��I Want It All (Tribute to Queen);Dancer (J0�0W0�0j& Ver.)
Hijo De La Luna (Tribute to Mecano);Perfect (J0�0W0�0j& Ver.)
��You've Got A Friend In Me (J0�0W0�0j&񂞊 Ver.) [ f;u0�0�0��0�0�0�0�00�0�0]
If I try:
echo "Losing My Religion (Tribute to R.E.M.);I Love It (オモシロ♪ヴォイス ver.)"
it displays text with japnese characters correctly. I tried all the solutions, i found on this site, but unable to parse the csv file correctly.
I need help to parse this file correctly. Thanks in advance for your help!

I got the issue solved by this:
$f = file_get_contents('myfile.csv'); // Get the whole file as string
$f = mb_convert_encoding($f, 'UTF8', 'UTF-16LE'); // Convert the file to UTF8
$f = preg_split("/\R/", $f); // Split it by line breaks
$f = array_map('str_getcsv', $f);

Related

Trying to read a csv file with thailand's character in it using php but after reading it the characters are changed to some unidentified characters

I have a csv file that have data like this:
Sub District District
A Hi อาฮี Tha Li District ท่าลี่
A Phon อาโพน Buachet District บัวเชด
when I tried to read it using php code by following this SO question:
<?php
//set internal encoding to utf8
mb_internal_encoding('utf8');
$fileContent = file_get_contents('thai_unicode.csv');
//convert content from unicode to utf
$fileContentUtf = mb_convert_encoding($fileContent, 'utf8', 'unicode');
echo "parse utf8 string:\n";
var_dump(str_getcsv($fileContentUtf, ';'));
But it didn't work at all. Someone please let me know what I am doing wrong here.
Thanks in advance.
There are 2 issues with your code:
Your code applies str_getcsv to whole file contents (instead of individual line)
Your code example is using delimiter ";" but there is no such symbol in your input file.
Your data is in either fixed field length format (which is actually not a csv file) or in tab delimited csv file format.
If it is tab delimited file format then you can use 2 ways to read your file:
$lines = file('thai_unicode.csv');
foreach($lines as $line){
$data = str_getcsv($line,"\t");
echo "sub_district: ". $data[0].", district: ".$data[1]."\n";
}
or
$f = fopen('thai_unicode.csv',"r");
while($data = fgetcsv($f,0,"\t")){
echo "sub_district: ". $data[0].", district: ".$data[1]."\n";
}
fclose($f);
And in case you have fixed length fields data format you need to split each line yourself because csv related php function are not suitable for this purpose.
So you will end up with something like this:
$f = fopen('thai_unicode.csv',"r");
while($line = fgets($f)){
$sub_district = mb_substr($line,0,20);
$district = mb_substr($line,20);
echo "sub_district: $sub_district, district: $district\n";
}
fclose($f);

Extract information from some lines of a CSV file with PHP

I am trying to extract some information from a csv file. What I want is to extract information only from the lines where this information is from the email and ID.
Previously I had no problems extracting information from csv files via php, however this csv file that I am using has the information in a very particular way.
Here I show part of the contents of the csv file:
As you can see, the information entered is not in a conventional format.
Using this small code:
$file = fopen("mails.csv","rb");
while(feof($file) == false)
{
echo fgets($file). "<br />";
}
fclose($file);
I get this on screen:
the file is more extensive, has more data. What I have shown is only a small part of the file, used as an example.
What I want to do is extract information from the lines where the radio type input is because there is the email, which is the information that I really want to extract.
I have tried with the conventional PHP functions to extract information from csv files but they do not work for me.
they throw me multiple errors and I can not get the information only from the lines that have the emails and save those emails in an array.
Alguna idea que puedan darme para obtener solo las lineas que tienen los inputs radio y extraer de allí el correo electronico de cada linea y guardarlo en un array?
Any help they can give me I would appreciate it.
Assuming that your format never changes (including for the invalid lines that you wish to ignore), the below would work.
Note: I have not tested this code so use this as a pointer and make adjustments as required.
$file = fopen("mails.csv","rb");
while(feof($file) == false){
$contents = fgets($file);
if (substr($contents,0,1) != "#"){
$val = explode(",", $contents);
echo $val[0]. "<br />";
}
}
fclose($file);

How to properly read .nfo file with PHP

I'm trying to open and show a nfo file with a php script.
Everything is working but the result isn't like in the NFO file. I got special chars like that :
�������������������������
When I open the source code of the result, I can see the NFO file like he is in real!
Did I need to use some special tricks for HTML or something like that ?
You can convert the character encoding of your NFO text (to output to eg. utf8):
$nfoContent = file_get_contents('foo.nfo');
$nfoContent = mb_convert_encoding($nfoContent, 'UTF-8', 'ASCII');
Thanks for your reply!
I found it, in PHP you just need to do :
header('Content-Type: text/plain; charset=ansi');
<?php
header('Content-Type: text/html; charset=UTF-8');
$file = 'CORE.NFO';
$nfo = file_get_contents($file);
echo '<pre>';
echo iconv('CP850', 'UTF-8', $nfo).PHP_EOL;
echo '</pre>';
http://pastebin.com/uqxg4yYC
CP850 west europe
CP866 russia
https://de.wikipedia.org/wiki/NFO

PHP: Issues reading and writing data(with special characters) to file

I am having issues reading-writing data(with special-characters) to file.
I am doing something like this:
//Writing data..
<?php
header('Content-Type: text/html; charset=utf-8');
$file = 'filename.db';
$data = 'Some string with special characters';
//Writing to the file..
#file_put_contents($file, json_encode($data));
?>
This works fine.
When I open the db file in Notepad ++, the data is proper.
Special characters are also stored properly:
//Reading data..
<?php
header('Content-Type: text/html; charset=utf-8');
$file = 'filename.db';
//Reading from the file..
$data = file_get_contents($file);
$data = json_decode(utf8_encode(stripslashes($data)));
echo $data;
?>
This displays the special characters as "????" or sometimes like "u00cf" or some other characters.
What is going wrong, and where?
Any help would be appreciated,
Thanks.
If you're not storing arrays or other complex data structures you do not need JSON.
When reading from the file, why in god's name do you mistreat the data by stripping slashes and running it through utf8_encode? That's what's destroying the JSON format and thereby your data.
Just write the raw string into a file and read it back as is, done!
$string = 'ユーティーエッフエイト';
file_put_contents('file.txt', $string);
$string = file_get_contents('file.txt');
Nothing more you need to do.
In data reading script, try:
$data = json_decode(mb_convert_encoding(stripslashes($data),
"UTF-8"));

PHPWord: Creating an Arabic right to left word document

I'm trying to use PHPWord to create a word document that will include dynamic data pulled out from a MySQL database. The database has MySQL charset: UTF-8 Unicode (utf8)
MySQL connection collation: utf8_unicode_ci and so does the table fields.
Data is stored and previewed fine in HTML, however when creating the document with the arabic variables, the output in Word looks like أحÙد Ùبار٠اÙÙرÙ.
$PHPWord = new PHPWord();
$document = $PHPWord->loadTemplate('templates/.../wtvr.docx');
$document->setValue('name', $name);
$document->setValue('overall_percent_100', $overall_percent_100);
$document->save('Individual Report - ' . $name . '.docx');
Is there anyway to fix that?
Well, yes. But you must unfortunately modify the library. The author of the library uses utf8_encode/utf8_decode obviously without understanding what they do at all.
On line 150, of Shared/String.php:
Replace
public static function IsUTF8($value = '') {
return utf8_encode(utf8_decode($value)) === $value;
}
With
public static function IsUTF8($value = '') {
return mb_check_encoding($value, "UTF-8");
}
Then, if you do
$ grep -rn "utf8_encode" .
On the project root, you will find all lines where utf8_encode is used. You will see lines like
$linkSrc = utf8_encode($linkSrc); //$linkSrc = $linkSrc;
$givenText = utf8_encode($text); //$givenText = $text;
You can simply remove the utf8_encode as shown in the comments.
Why is utf8_encode/utf8_decode wrong? First of all, because that's not what they do. They do from_iso88591_to_utf8 and from_utf8_to_iso88591. Secondly, ISO-8859-1 is almost never used, and usually when someone claims they use it, they are actually using Windows-1252. ISO-8859-1 is a very tiny character set, not even capable of encoding €, let alone arabic letters.
You can do fast reviews of a library by doing:
$ grep -rn "utf8_\(en\|de\)code" .
If you get matches, you should move on and look for some other library. These functions simply do the wrong thing every time, and even if someone needed some edge case to use these functions, it's far better to be explicit about it when you really need ISO-8859-1, because you normally never do.
Please find the following points to write all types of utf-8 right to left data insertion in phpword template.
In setValue function (line #95) in Template.php please comment the following portion of code
//if(!is_array($replace)) {
// $replace = utf8_encode($replace);
//}
If you have problem with right to left which in some language the text mix up with left to right text add the following code in the same setValue function.
$replace = "<w:rPr><w:rtl/></w:rPr>".$replace;
//==== here is a working example of how the word data can be write inside the word template
//--- load phpword libraries ----
$this->load->library("phpword/PHPWord");
$PHPWord = new PHPWord();
$document = $PHPWord->loadTemplate('./forms/data.docx');
$document->setValue('NAME', 'شراف الدين');
$document->setValue('SURNAME', 'مشرف');
$document->setValue('FNAME', 'ظهرالدين');
$document->setValue('MYVALUE', '15 / سنبله / 1363');
$document->setValue('PROVINCE', 'سمنگان');
$document->setValue('DNAME', 'عبدالله');
$document->setValue('DMOBILE', '0775060701');
$document->setValue('BOX','<w:sym w:font="Wingdings" w:char="F06F"/>');
$document->setValue('NO','<w:sym w:font="Wingdings" w:char="F06F"/>');
//$document->setValue('BOX2','<w:sectPr w:rsidR="00000000"><w:pgSz w:w="12240" w:h="15840"/><w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="720" w:footer="720" w:gutter="0"/><w:cols w:space="720"/><w:docGrid w:linePitch="360"/>');
$document->setValue('YES','<w:sym w:font="Wingdings" w:char="F0FE"/>');
$document->setValue('CLASS1','<w:sym w:font="Wingdings" w:char="F06F"/>');
$document->setValue('CLASS2','<w:sym w:font="Wingdings" w:char="F0FE"/>');
$document->setValue('DNAME','يما شاه رخي');
$document->setValue('TEL','0799852369');
$document->setValue('ENTITY','مشاور حقوقي و نهادي');
$document->setValue('ENTITY','مشاور حقوقي و نهادي');
$document->setValue('REMARKS','در مسابقات سال 2012 میلادی در میدان Judo بر علاوه به تعداد 39 نفر در تاریخ 4/میزان/ سال 1391 قرار ذیل اند.');
$file = "./forms/data2.docx";
$document->save($file);
header("Cache-Control: public");
header("Content-Description: File Transfer");
header("Content-Disposition: attachment; filename=data2.docx");
header("Content-Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document");
header("Content-Transfer-Encoding: binary");
ob_clean();
flush();
readfile($file);
//need how design can change the looking.
colr #E4EDF9
Find
$objWriter->startElement('w:t');
$objWriter->writeAttribute('xml:space', 'preserve'); // needed because of drawing spaces before and after text
$objWriter->writeRaw($strText);
$objWriter->endElement();
In Writer/Word2007/Base.php
replace with
$objWriter->startElement('w:textDirection');
$objWriter->writeAttribute('w:val', 'rlTb');
$objWriter->startElement('w:t');
$objWriter->writeAttribute('xml:space', 'preserve'); // needed because of drawing spaces before and after text
$objWriter->writeRaw($strText);
$objWriter->endElement();
$objWriter->endElement();
Also, make sure you don't use any styles to make it work, or else you will have to repeat this step in every function you use.
I had to fix it in two place different than Nasers's way:
1- in Section.php addText function:
I did this:
//$givenText = utf8_encode($text);
$givenText = $text;
2- in cell.php addText function
I did this:
// $text = utf8_encode($text);
now your word file will display unicode characters in right way.
And then i had a problem in texts directions.
i found the solution by using this code
$section->addText($val['notetitle'],array('textDirection'=>PHPWord_Style_Cell::TEXT_DIR_TBRL));
u can see the two constants in the cell.php file
const TEXT_DIR_TBRL = 'tbRl';
const TEXT_DIR_BTLR = 'btLr';
note that u can not apply other array combined styles like Paragraph before than 'textDirection' , because whose styles make 'textDirection' disabled.
Open PHPWord\Template.php
Change in setValue function (line no 89.) as below.
Change $replace = utf8_encode($replace);
to
$replace = $replace;

Categories