cURL font encoding-error - php

I want to get contents via cURL from this page.
Here is my code:
$url = $_GET["url"];
$url = str_replace(" ", "%20", $url);
$curlSession = curl_init();
curl_setopt($curlSession, CURLOPT_URL, $url);
curl_setopt($curlSession, CURLOPT_BINARYTRANSFER, true);
curl_setopt($curlSession, CURLOPT_RETURNTRANSFER, true);
$jsonData = curl_exec($curlSession);
curl_close($curlSession);
if (strpos($url, "toomva.com") >= 0) {
$jsonData = str_replace("toomva.com", "http://av.bsquochoai.ga ⇔ ", $jsonData);
}
if (strpos($url, "Toomva -") >= 0){
$jsonData = str_replace("toomva.com", "http://av.bsquochoai.ga ⇔ ", $jsonData);
}
echo($jsonData);
Here you can find a live demo.
My problem is that the returned text is not as I expect. It has a lot of �����:
��1� � �0�0�:�0�0�:�2�4�,�4�0�0� �-�-�>� �0�0�:�0�0�:�3�3�,�1�4�0� �
�M��i� �k�h�i� �a�n�h� �t�r���n�g� �t�h��y� �k�h�u���n� �m��t� �e�m�,�
�t�h�� �g�i�a�n� �n���y� �n�h�� �c�h��t� �t�a�n� �b�i��n� � �
Can you please help me with this?

Here are the first few bytes of the file you're trying to access:
$ curl -s 'http://toomva.com/Data/subtitle/Duncan%20James%20ft.%20Keedie%20-%20I%20Believe%20My%20Heart.Vie_Syned.srt' | xxd | head
0000000: fffe 3100 0d00 0a00 3000 3000 3a00 3000 ..1.....0.0.:.0.
0000010: 3000 3a00 3200 3400 2c00 3400 3000 3000 0.:.2.4.,.4.0.0.
0000020: 2000 2d00 2d00 3e00 2000 3000 3000 3a00 .-.-.>. .0.0.:.
0000030: 3000 3000 3a00 3300 3300 2c00 3100 3400 0.0.:.3.3.,.1.4.
0000040: 3000 0d00 0a00 4d00 d71e 6900 2000 6b00 0.....M...i. .k.
0000050: 6800 6900 2000 6100 6e00 6800 2000 7400 h.i. .a.n.h. .t.
0000060: 7200 f400 6e00 6700 2000 7400 6800 a51e r...n.g. .t.h...
0000070: 7900 2000 6b00 6800 7500 f400 6e00 2000 y. .k.h.u...n. .
0000080: 6d00 b71e 7400 2000 6500 6d00 2c00 2000 m...t. .e.m.,. .
0000090: 7400 6800 bf1e 2000 6700 6900 6100 6e00 t.h... .g.i.a.n.
It starts with 0xff 0xfe, which is the byte order mark for UTF-16 Little Endian. This information should really be provided in the file's HTTP headers, but apparently not in this case.
You can use PHP's mb_convert_encoding() function to change the file's content into whatever character set you're using for your website. For example, this will convert it into utf-8:
$src = file_get_contents('http://toomva.com/Data/subtitle/Duncan%20James%20ft.%20Keedie%20-%20I%20Believe%20My%20Heart.Vie_Syned.srt');
$utf8src = mb_convert_encoding($src,'UTF-8','UTF-16LE');
header('Content-Type: text/plain; charset=utf-8');
die($utf8src);
However, the file doesn't contain JSON data. Here are the first few lines:
1
00:00:24,400 --> 00:00:33,140
Mỗi khi anh trông thấy khuôn mặt em, thế gian này như chợt tan biến
2
00:00:33,140 --> 00:00:42,700
Tất cả đều phơi bày trong một ánh nhìn thoáng qua

use utf8_encode when you echo your jsonDate :
echo(utf8_encode($jsonData));

Related

Save data to CSV and encode to utf-8 [duplicate]

This question already has answers here:
UTF-8 all the way through
(13 answers)
Closed 3 years ago.
I have a database encoded as utf8mb4. I connect with this database and I set utf8 charset:
$dbHandler = new PDO("mysql:host=$dbHost;dbname=$dbName;charset=utf8mb4", $dbUsername, $dbPassword);
All data is property encoded in DB. I want to fetch data and save it as CSV:
$fp = fopen('data.csv', 'w+');
foreach ($result as $row) {
...
fputcsv($fp, $csvData, ';');
}
But then all the encoding is broken:
groÃ<9f>e,
Zubehör. etc.
I've tried to add BOM (didn't help) and convert array_map("utf8_encode", $csvData); (some characters are displaying correct: große, Zubehör, but some not: Kabelverl?ng, F?r). Any idea?
EDIT:
Hexdump output beginning of file:
00000000: efbb bf70 726f 6475 6374 3b61 7274 6963 ...product;artic
00000010: 6c65 3b73 6b75 3b64 6174 653b 6e61 6d65 le;sku;date;name
00000020: 0a30 3030 3239 3039 3530 3030 3b3b 3b3b .00028151000;;;;
00000030: 2242 7265 616b 6f75 742d 626f 7820 4b70 "Breakout-box Kp
00000040: 6c2e 223b 223c 7374 726f 6e67 3e42 7265 l.";"<strong>Bre
00000050: 616b 6f75 742d 626f 7820 4b70 6c2e 3c2f akout-box Kpl.</
Hexdump output of file with 1 record where we can see the issue (F..r instead of Für). By the way - original string was modified by ucwords and strtolower:
00000000: 3030 3032 3930 3936 3030 333b 3b3b 3b22 00028151000;;;;"
00000010: 4e65 747a 7465 696c 2032 3230 762f 3132 Netzteil 220v/12
00000020: 7620 46e3 9c72 2041 766c 223b 223c 7374 v F..r Avl";"<st
00000030: 726f 6e67 3e4e 6574 7a74 6569 6c20 3232 rong>Netzteil 22
00000040: 3076 2f31 3276 2046 e39c 7220 4176 6c3c 0v/12v F..r Avl<
00000050: 2f73 7472 6f6e 673e 3c62 723e 3c62 723e /strong><br><br>
00000060: 4f45 4d20 4e75 6d6d 6572 3a20 3030 3032 OEM Nummer: 0002
00000070: 3930 3936 3030 3322 3b31 3038 2e34 363b 9096003";108.46;
00000080: 3030 3032 3930 3936 3030 332d 6e65 747a 00028151000-netz
00000090: 7465 696c 2d32 3230 762d 3132 762d 6675 teil-220v-12v-fu
000000a0: 722d 6176 6c3b 4875 7371 7661 726e 613b r-avl;Husqvarna;
000000b0: 4452 4f50 444f 574e 3b59 3b4e 3b68 7474 DROPDOWN;Y;N;htt
000000c0: 7073 3a2f 2f73 7061 7265 7061 7274 7366 ps://sparepartsf
000000d0: 696e 6465 722e 6b74 6d2e 636f 6d2f 5350 inder.fha.com/SP
000000e0: 462f 496d 6167 6573 2f6d 6170 732f 3130 F/Images/maps/10
000000f0: 3030 3032 3932 302e 6769 663b 313b 4154 0002920.gif;1;AT
00000100: 3b57 6964 6765 743b 224b 544d 204f 7269 ;Ponret;"KTM Ori
00000110: 6769 6e61 6c20 4572 7361 747a 7465 696c ginal Ersatzteil
00000120: 6522 3b22 4875 7371 7661 726e 6120 4e65 e";"Husqvarna Ne
00000130: 747a 7465 696c 2032 3230 762f 3132 7620 tzteil 220v/12v
00000140: 46e3 9c72 2041 766c 202d 204f 454d 204e F..r Avl - OEM N
00000150: 756d 6d65 723a 2030 3030 3239 3039 3630 ummer: 000290960
00000160: 3033 223b 3b22 4b61 7566 656e 2053 6965 03";;"Kaufen Sie
00000170: 2048 7573 7176 6172 6e61 204e 6574 7a74 Husqvarna Netzt
00000180: 6569 6c20 3232 3076 2f31 3276 2046 e39c eil 220v/12v F..
00000190: 7220 4176 6c20 6d69 7420 4f45 4d2d 4e75 r Avl mit OEM-Nu
000001a0: 6d6d 6572 2030 3030 3239 3039 3630 3033 mmer 00028151000
000001b0: 2062 6569 2065 696e 656d 2048 7573 7176 bei einem Husqv
000001c0: 6172 6e61 2d56 6572 7472 6167 7368 c3a4 arna-Vertragsh..
000001d0: 6e64 6c65 722e 2057 6972 2068 6162 656e ndler. Wir haben
000001e0: 2065 696e 6520 6772 6fc3 9f65 2041 7573 eine gro..e Aus
000001f0: 7761 686c 2061 6e20 4875 7371 7661 726e wahl an Husqvarn
00000200: 612d 4572 7361 747a 7465 696c 656e 2c20 a-Ersatzteilen,
00000210: 4163 6365 7373 6f72 6965 732c 2043 6c6f Accessories, Clo
00000220: 7468 696e 672c 204d 5820 4265 6b6c 6569 thing, MX Beklei
00000230: 6475 6e67 2075 6e64 205a 7562 6568 c3b6 dung und Zubeh..
00000240: 722e 220a r.".
file data.csv output:
data.csv: Non-ISO extended-ASCII text, with very long lines
The problem was that I was using strtolower and ucfirst. I changed it to
$name = mb_convert_case($name, MB_CASE_LOWER, "UTF-8");
$name = mb_convert_case($name, MB_CASE_TITLE, "UTF-8");
and it works.

Issue with file_get_contents encoding

I'm getting file_get_contents(uri) and getting back a Json that I'm unable to encode.
I tried several encodings and str_replace but I don't quite understand what the issue is.
This is the start of my json with file_get_contents:
string(67702) "��{"localidades"
I know it's finding unknown characters and that's what the ? are for, but I don't understand how to solve it.
I've tried this but to no avail
if(substr($s, 0, 2) == chr(0xFF).chr(0xFE)){
return substr($s,3);
}
else{
return $s;
}
}
This is xxd | head from terminal
00000000: fffe 7b00 2200 6c00 6f00 6300 6100 6c00 ..{.".l.o.c.a.l.
00000010: 6900 6400 6100 6400 6500 7300 2200 3a00 i.d.a.d.e.s.".:.
00000020: 2000 5b00 7b00 2200 6900 6400 4c00 6f00 .[.{.".i.d.L.o.
00000030: 6300 6100 6c00 6900 6400 6100 6400 2200 c.a.l.i.d.a.d.".
00000040: 3a00 2000 3300 2c00 2200 6c00 6f00 6300 :. .3.,.".l.o.c.
00000050: 6100 6c00 6900 6400 6100 6400 2200 3a00 a.l.i.d.a.d.".:.
00000060: 2000 2200 4200 7500 6500 6e00 6f00 7300 .".B.u.e.n.o.s.
00000070: 2000 4100 6900 7200 6500 7300 2200 2c00 .A.i.r.e.s.".,.
00000080: 2200 6900 6400 5000 7200 6f00 7600 6900 ".i.d.P.r.o.v.i.
00000090: 6e00 6300 6900 6100 2200 3a00 2000 2200 n.c.i.a.".:. .".
What you have there is UTF-16LE in which each codepoint is encoded as at least two bytes, even "basic ASCII". The first two bytes of the document are the Byte Order Mark [BOM] that declares in what byte-order [endian] those codepoints are encoded
$input = "\xff\xfe{\x00}\x00"; // UTF-16-LE with BOM
function convert_utf16($input, $charset=NULL) {
// if your data has no BOM you must explicitly define the charset.
if( is_null($charset) ) {
$bom = substr($input, 0, 2);
switch($bom) {
case "\xff\xfe":
$charset = "UTF-16LE";
break;
case "\xfe\xff":
$charset = "UTF-16BE";
break;
default:
throw new \Exception("No encoding specified, and no BOM detected");
break;
}
$input = substr($input, 2);
}
return mb_convert_encoding($input, "UTF-8", $charset);
}
$output = convert_utf16($input);
var_dump(
$output,
bin2hex($output),
json_decode($output, true)
);
Output:
string(2) "{}"
string(4) "7b7d"
array(0) {
}
It's also worth noting that using anything other than UTF-8 to encode JSON makes it invalid JSON, and you should tell whoever is giving you this data to fix their app.
What you are getting is UTF-16 LE. The fffe at the beginning is called a BOM. You can use iconv:
$data = iconv( 'UTF-16', 'UTF-8', $data);
And now you have a UTF-8 with BOM. Which i think will work with json_decode, because PHP seems to handle it. Still, if you want to remove the BOM, which you should (see #Sammitch comment), you can use this one as well:
$data = preg_replace("/^pack('H*','EFBBBF')/", '', $data);
I recreated a part of your file and i get this:
$data = file_get_contents('/var/www/html/utf16le.json');
$data = preg_replace("/^pack('H*','EFBBBF')/", '', iconv( 'UTF-16', 'UTF-8', $data));
print_r(json_decode($data));
Output:
stdClass Object
(
[localidades] => Array
(
[0] => stdClass Object
(
[idLocalidad] => 3
[localidad] => Buenos Aires
)
)
)
And from xxd:
The file you try to process is encoded in UTF-16, which isn’t natively supported by PHP. So, in order to process it, you’ll have to remove BOM header first (first two bytes) and then convert encoding to UTF-8 using iconv or mbstring.

Merging Different PDF formats with PHP?

I am trying to merge few PDF files with Setasign FPDI. This packages is working fine for some PDF format but failing for others.
There are three different formats of PDF i could find.
Format 1:
%PDF-1.4
%´µ¶·
%
1 0 obj
<<
/Type /Catalog
/PageMode /UseNone
/ViewerPreferences 2 0 R
/Pages 3 0 R
/PageLayout /OneColumn
>>
Format 2:
--uuid:3c4caf6a-2a7e-4ca5-9e0a-63346610deae
Content-Type: application/octet-stream
Content-Transfer-Encoding: binary
Content-ID: <1>
%PDF-1.4
%âãÏÓ
1 0 obj
<</ColorSpace/DeviceGray/Subtype/Image
Format 3:
2550 4446 2d31 2e34 0a25 aaab acad 0a34
2030 206f 626a 0a3c 3c0a 2f43 7265 6174
6f72 2028 4170 6163 6865 2046 4f50 2056
6572 7369 6f6e 2031 2e30 290a 2f50 726f
6475 6365 7220 2841 7061 6368 6520 464f
5020 5665 7273 696f 6e20 312e 3029 0a2f
4372 6561 7469 6f6e 4461 7465 2028 443a
3230 3136 3131 3130 3135 3437 3532 5a29
0a3e 3e0a 656e 646f 626a 0a35 2030 206f
FPDI works great with Format 1 but it is failing for format 2.
When i tried to merge two files from Format 2 from Another PDF Merging Website, i got combined pdf in Format 3.
My question is how can merge 2 Format 2 files in to any format in PHP.
And if anyone can explain these formats, that would be great too.
"Format 2" is a corrupted file, because it includes invalid header data which will corrupt the byte offset positions in the PDF (FPDI will not repair such files but requires valid PDFs).
"Format 3" is only a bunch of hex values not a PDF file.
Thanks to Setasign's Answer, I have cleaned the invalid format to a valid one.
I am using simple content splitting.
public function parseRawResponse($raw, $from)
{
$positionMap = [
'PDF' => [ 'init' => "%PDF-1.4\n", 'end' => "\n%%EOF"]
];
$initPos = strpos($raw,$positionMap[$from]['init']);
$endPos = strrpos($raw, $positionMap[$from]['end']) + strlen($positionMap[$from]['end']);
$content = substr($raw, $initPos, ($endPos - $initPos));
return $content;
}
Where $raw is format 2 and $content is actual content for PDF.

Adjust cell data in CSV file, PHP

I am writing header in CSV by using this
fputcsv($fp, $columns);
It writes the header in CSV as follow
AED AFN ALL AMD ANG AOA ARS AUD AWG AZN BAM BBD BDT BGN BHD BIF BMD BND BOB BRL BSD BTC BTN BWP BYR BZD CAD CDF CHF CLF CLP CNY COP CRC CUC CUP CVE CZK DJF DKK DOP DZD EEK EGP ERN ETB EUR FJD FKP GBP GEL GGP GHS GIP GMD GNF GTQ GYD HKD HNL HRK HTG HUF IDR ILS IMP INR IQD IRR ISK JEP JMD JOD JPY KES KGS KHR KMF KPW KRW KWD KYD KZT LAK LBP LKR LRD LSL LTL LVL LYD MAD MDL MGA MKD MMK MNT MOP MRO MUR MVR MWK MXN MYR MZN NAD NGN NIO NOK NPR NZD OMR PAB PEN PGK PHP PKR PLN PYG QAR RON RSD RUB RWF SAR SBD SCR SDG SEK SGD SHP SLL SOS SRD STD SVC SYP SZL THB TJS TMT TND TOP TRY TTD TWD TZS UAH UGX USD UYU UZS VEF VND VUV WST XAF XAG XAU XCD XDR XOF XPF YER ZAR ZMK ZMW ZWL
above data is in $columns array.
Using above data there is 168 columns.
Now i am using for loop to fetch data
$num_records=count($columns);
for($z=1;$z<$num_records;$z++){
$rowData=fetchData($columns[$z] ,$columns[$z],1);
fputcsv($fp, $rowData);
}
Here is fetchData function
function fetchData($from,$to,$amount){
$access_key = 'MYKEY';
$endpoint = 'live';
// initialize CURL:
if($from=="GNF"){
$url='http://apilayer.net/api/'.$endpoint.'?access_key='.$access_key.'&from='.$from.'&to='.$to.'&amount='.$amount.'&source='.$from."";
$ch = curl_init($url);
//echo $url;exit;
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
// get the (still encoded) JSON data:
$json = curl_exec($ch);
curl_close($ch);
// Decode JSON response:
$conversionResult = json_decode($json, true);
$tmpArray=$conversionResult['quotes'];
if($from=="")
$from="USD";
array_unshift($tmpArray, $from);
}
return $tmpArray;
}
For GNF its returning 166 data. means less then header.
Now my problem is because for GNF array has 166 conversion, And we have 168 columns.
So its not showing as per header in CSV.
Please tell me how i can fix this so it will show data as per header.
Here is the generated CSV. You can see its not showing data correctly.
https://www.dropbox.com/s/oo0a6ni4xkebat9/currency.csv?dl=0&s=sl
You're missing 2 columns because your FOR loop for($z=1; $z < $num_records; $z++) ignores the element [0] and the last element [167]. This should fix:
$num_records=count($columns);
for($z=0; $z <= $num_records; $z++){
$rowData=fetchData($columns[$z] ,$columns[$z],1);
fputcsv($fp, $rowData);
}

Text line into an separated array with PHP [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
I have txt file with 30 values in line separated with space.
12.09.11 0:01 16.2 16.2 16.2 72 11.2 3.1 SE 0.19 3.6 SE 15.9 15.9 15.7 761.8 0.00 0.0 0.001 0.000 20.9 46 8.9 19.8 8.65 1.1902 13 1 56.5 1
12.09.11 0:02 16.2 16.2 16.2 72 11.2 3.1 SE 0.19 4.0 SE 15.9 15.9 15.7 761.8 0.00 0.0 0.001 0.000 20.9 46 8.9 19.8 8.65 1.1903 23 1 100.0 1
12.09.11 0:03 16.2 16.2 16.2 72 11.1 3.6 SE 0.21 4.9 SE 15.4 15.9 15.2 761.8 0.00 0.0 0.002 0.000 20.9 46 8.8 19.8 8.65 1.1905 23 1 100.0 1
I'm not so good with PHP-array so i stuck with this:
<?PHP
$file_handle = fopen("data.txt", "rb");
while (!feof($file_handle) ) {
$textline = fgets($file_handle);
print $textline[0] . $textline[1]. $textline[2] . "<BR>";
}
fclose($file_handle);
?>
It gets me the output of 3 first character on line, in this case it looks like:
12.
12.
12.
But i need tottaly separated data values in arrays, so the output should look like this:
12.09.11 0:01 16.2
12.09.11 0:02 16.2
12.09.11 0:03 16.2
P.S. Also i need to avoid first 3 lines from reading, it shoudl allways start reading from 4th line.
Any tips or advice how to script it properly ? Thanx!
The below code reads a text file line by line.
<?php
$file = "/tmp/file1.txt";
$f = fopen($file, "r");
while ( $line = fgets($f, 1000) ) {
print $line;
}
?>
So, the same way, you can use:
<?php
$file = "data.txt";
$f = fopen($file, "r");
$myArray = array();
while ( $line = fgets($f) ) {
$myArray[] = explode(" ", $line);
}
print_r($myArray);
?>

Categories