Hungarian/Bulgarian characters from CSV file end up garbled in PHP - php

I'm trying to import a CSV file which looks something like this:
"source "," destination "
férfi-/ruházat-Öltöny," férfi-/ruházat-blézer_zakó",
Note that this is just a sample of the CSV, not the whole CSV.
The way I'm reading the file is pretty straight forward:
$line = fgets($this->fileHandle) ;
$line = mb_convert_encoding($line , 'UTF-8', mb_detect_encoding($line));
Where $this->fileHandle is just a resource pointing to the file opened using fopen. So nothing too special there.
I want to do some string manipulation on the strings inside the CSV. I can import it just fine.
When I read from the file, either using fgets, fread or whatever other function I can think if I end up with garbled text.
Something along the lines of this:
So far I've tried mb_internal_encoding("UTF-8"), to ISO-8859-2 and a few other encodings. Nothing worked.
I've also tried mb_convert_encoding($line , 'UTF-8', mb_detect_encoding($line)) where $line is the line read from the csv.
Again, nothing. Still garbled text.
Next I assumed it may be something from my OS. I' using MAC with a docker instance on Ubuntu.
Using High Sierra v10.13.4 on mac
A locale command in the terminal gives me:
LANG="C.UTF-8"
LC_COLLATE="C"
LC_CTYPE="C"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL=
As far as the docker instance:
Distributor ID: Ubuntu
Description: Ubuntu 14.04.5 LTS
Release: 14.04
Codename: trusty
# locale
LANG=C.UTF-8
LANGUAGE=
LC_CTYPE="C.UTF-8"
LC_NUMERIC="C.UTF-8"
LC_TIME="C.UTF-8"
LC_COLLATE="C.UTF-8"
LC_MONETARY="C.UTF-8"
LC_MESSAGES="C.UTF-8"
LC_PAPER="C.UTF-8"
LC_NAME="C.UTF-8"
LC_ADDRESS="C.UTF-8"
LC_TELEPHONE="C.UTF-8"
LC_MEASUREMENT="C.UTF-8"
LC_IDENTIFICATION="C.UTF-8"
LC_ALL=
So everything seems to be fine in that regard.
I've also tried an online PHP interpreter and that works fine. So clearly the issue is on my side.
To be honest I have no idea where the issue lies.
Any pointing in the right direction is greatly appreciated.

To answer my own question:
I had to ini_set("default_charset", "UTF-8");. The default was an empty string.
I have no idea how it worked without it so far, I assume it has some sort of fallback encoding.
Either way, I hope this helps anybody else who gets stuck on this.

Related

Nothing has worked to code PHP in Visual Studio Code via Windows 10

I am taking a Udemy class for WordPress. I can create a function just fine. However, when I started using the variables and array it will not work. I get a message saying there is no "php.validate.executablePath" and "php.executablePath" via settings.json.
So I downloaded the zip file of PHP 7.3 (7.3.17) VC15 x64 Non Thread Safe and extracted it to C:\php7.3. I then added the path to the settings.json: "php.validate.executablePath": "C:\php7.3\php.exe", "php.executablePath": "C:\php7.3\php.exe"
I then restarted my VSC and nothing happened. I even restarted my computer. The notification never popped up again but I still cannot run PHP. In replace of the \, I used one \, then one /, then two / but it did not change anything. I looked this issue up - found others who had the same issue - and have not found anything that has worked for me so far.
I also tried downloading the xampp but there was an issue at port 443. That's when I downloaded the PHP file directly and uninstalled xampp.
EDIT: Do I have to have something like xampp or wamp to execute the PHP? If so, I would just need to figure out how to fix that error.
Here is a screenshot of the code I used: http://prntscr.com/sahb1f
Here is a screenshot of the settings.json: http://prntscr.com/sahbk0
I am on Windows 10 x64. A tool extension for VSC costs money, which I do not want to spend.
Any help?
Thanks!
You are missing the echo in front of the $names[0] array element access.
change your line to this <p>Hi, my name is <?php echo $names[0]; ?></p> and it should work however your error to do with executablePath might be because of something else.

Emoji from android to web

I have a android app, which messages workds with emoji. Saved message with emojis is diaplayed ok on android after fetching from mysql via json.
Now I want to display same message with emojis on web script.
Found JS lib https://github.com/iamcal/js-emoji but cant make it work.
Anyone has a ready to use implementation of it?
Sample db record look like this:
Unii \uD83D\uDE02\uD83D\uDE03\uD83D\uDE2E\uD83D\uDE25\uD83D\uDE23\uD83D\uDE0F
These are android emojis. Hot make the work on web?
First of all coping files will not make it work ;) you need also do some configuration:
first of all download that repo
run npm install in main directory
run bower install in main directory
now we need to run some grunt task but before that make sure that you have copied this - https://github.com/iamcal/emoji-data/tree/6daffc10d8e8fd06b80ec24c9bdcb65218f71563 to emoji-data folder in downloaded-repo-location/build/emoji-data
also copy that content of that whole emoji-data (https://github.com/iamcal/emoji-data/tree/6daffc10d8e8fd06b80ec24c9bdcb65218f71563) to C:\js-emoji\build\emoji-data
now in demo.htm (which is placed in mainfolder/demo/demo.htm change jquery linkage to an also make sure that this line is placed above ""
run "grunt" from console.
check if in downloaded-repo-root/lib/emoji.js in line 520 you have listed emojis ;)
run demo.htm in browser
Basically check browser console if it has any errors. Most common erros is that there will be en empty emoji.prototype.data on line 519 in emoji.js file - so you need to be sure the grunt task finishes correctly without errors.
Figured it out. The basic configuration from https://github.com/iamcal/js-emoji is enough to make js script to work. The problem was the string encoding. Android uses "Unicode escape sequences" to store specials characters in strings. It works great on mobile, but php has issues with it. Therefore we need to convert Unicode escape sequences with php working version. The converted version of previous db rec
Unii \ud83d\ude02\ud83d\ude03\ud83d\ude2e\ud83d\ude25\ud83d\ude23\ud83d\ude0f
Php convert functions can be found # How to decode Unicode escape sequences like "\u00ed" to proper UTF-8 encoded characters?

CSV read issue between Mac and Win - missing \n char

I have a basic csv file reading, but it can't read the csv saved on a Mac platform.
I understood that the issue is from Different operating system families have different line-ending conventions, but I cannot fix it.
I've found a suggestion - opening the file in binary mode, but didn't work.
The code is pretty basic:
file opening:
$this->fileHandler = fopen($this->filename, 'rb');
read line:
$columns = fgetcsv(
$this->fileHandler,
$this->length,
$this->delimiter,
$this->enclosure
);
I've opened both files with Notepad++ and it seems that the Mac file lacks the \n characters at the end of rows, but the \r is there.
Set the auto_detect_line_endings option to true before using fgetcsv():
ini_set("auto_detect_line_endings", true);
// rest of your code
I don't know if this applies to Mac, but I know that when moving a text file from Windows to some Linux flavors, I can run the dos2unix filename command on the file and it'll fix up the formatting for me. Maybe Mac has a similar functionality?
EDIT: Maybe this can help: http://schmeits.wordpress.com/2010/08/26/dos2unix-alternative-those-darn-m-characters/

Ubuntu encoding of new files

I'm searching there for a long time, but without any helpful result.
I'm developing a PHP project using eclipse on a Ubuntu 11.04 VM. Every thing works fine. I've never need to look for the file encoding. But after deploying the project to my server, all contents were shown with the wrong encoding. After a manual conversion to UTF8 with Notepad++ my problems were solved.
Now I want to change it in my Ubuntu VM, too. And there's the problem. I've checked the preferences in Eclipse but every property ist set to UTF8: General content types, workspace, project settings, everything ...
If I look for the encoding on the terminal, it says "test_new.dat: text/plain; charset=us-ascii". All files are saved to ascii format. If I try to create a new file with the terminal ("touch") it's also the same.
Then I've tried to convert the files with iconv:
iconv -f US-ASCII -t UTF8 -o test.dat test_new.dat
But the encoding doesn't change. Especially PHP files seems to be resistant. I have some *.ini files in my project for which a conversion works?!
Any idea what to do?
Here are my locale settings of Ubuntu:
LANG=de_DE.UTF-8
LANGUAGE=de_DE:en
LC_CTYPE="de_DE.UTF-8"
LC_NUMERIC="de_DE.UTF-8"
LC_TIME="de_DE.UTF-8"
LC_COLLATE="de_DE.UTF-8"
LC_MONETARY="de_DE.UTF-8"
LC_MESSAGES="de_DE.UTF-8"
LC_PAPER="de_DE.UTF-8"
LC_NAME="de_DE.UTF-8"
LC_ADDRESS="de_DE.UTF-8"
LC_TELEPHONE="de_DE.UTF-8"
LC_MEASUREMENT="de_DE.UTF-8"
LC_IDENTIFICATION="de_DE.UTF-8"
LC_ALL=
I was also wondering about character encoding and found something that might be usefull here.
When I create a new empty .txt-file on my ubuntu 12.04 and ask for its character encoding with: "file -bi filename.txt" it shows me: charset=binary. After opening it and writing something inside like "haha" I saved it using "save as" and explicitly chose UTF-8 as character encoding. Now very strangely it did not show me charset=UTF-8 after asking again, but returned charset=us-ascii. This seemed already strange. But it got even stranger, when I did the whole thing again but this time included some german specific charakters (ä in this case) in the file and saved again (this time without saving as, I just pressed save). Now it said charset=UTF-8.
It therefore seems that at least gedit is checking the file and downgrading from UTF-8 to us-ascii if there is no need for UTF-8 since the file can be encoded using us-ascii.
Hope this helped a bit even though it is not php related.
Greetings
UTF-8 is compatible with ASCII. An ASCII text file is therefore also valid UTF-8, and a conversion from ASCII to UTF-8 is a no-op.

Any suggestions for an issue with a long PHP variable causing the script to bomb out in Mac OSX?

Ok, this is a strange one. I've recently moved over to Mac OS running Lion and set up the PHP version that comes with OSX. Everything's running as I expect except for one thing and I can't understand why!?
As part of our CMS, menu data is cached into a php file as:
$menuData = unserialize( $menuString );
where $menuString is a long string of serialized data. I've used the same thing successfully on a PC running WAMP and on numerous linux boxes without problems, but since I've moved to the Mac OS, every time I include the file, it prints a long string of question marks (even if the above line is commented out in the file!!). Initially, the $menuString was around 280k, but I've also tried this with a menu string of less than 6k without success.
Is there a PHP setting somewhere that might exhibit this type of behaviour? I'm baffled and have tried numerous things!??
Please help!
UPDATE: I've gone though the PHP.ini line by line on my Mac and the one I was using in WAMP and see no differences so don't expect it's anything directly set in there. Everything else in the setup is working exactly as I expect and all other site features and functions are working!? Is there something obvious in terms of native set up that I'm missing?
If it happens with that line commented out then it probably isn't that line... Try dos2unix to fix up you line endings... Aside from that grab hex edit and poke around the area for strange nonprinting chars...
Well, long and short of it is I used a workaround in the end. At the point of creating the big long serialized menuString, I saved that to a txt file with no php in it and then did a
$menuData = unserialize( file_get_contents( [Textfile] ) );
Seems to have solved the problem!! I'm still confused as to why it occurs, but at least it's working!

Categories