For the last years I used Notepad++ on Win XP SP2.
As I just have seen, the setting in Notepad++ is to encode new files in "ANSI" in "Windows Format". Basically all files on my harddisk should be ANSI files then, but I'm not sure.
Most .html-files have a charset-tag as "text/html; charset=iso-8859-1", but some have none.
Other files, especially text-files (for example keyword-lists) I stored with Firefox XPCOM-system, I don't know how they are currently encoded.
On Server-side I have Apache with PHP and MySql.
For Upload I used Filezilla.
Now the problem is: I want to use Japanes signs (or arabic, etc.). This only works partly.
I can get my selfmade Firefox-Application to constantly write or read UTF-8. But I can't check everytime which of the old files is which encoding.
Having just read Joel Spolsky's old article about UTF-8 strengthens my view that I simply have to get my whole system changed as much as possible to UTF-8.
As long as I have it running that way locally on my Hard-Disk I could just re-upload everything to the server.
So: How do I get all my files locally transfered to UTF-8?
And: Is it possible at all to have Win XP SP2 using constantly UTF-8 everywhere? Or do I have to check it with every program, or even worse with every file, that the right encoding is to be used.
How about files I get for example in E-Mails or via an USB-stick, or that I download in zip-files? (Or a thousand possibilities more.)
Update:
1.-4. went OK so far. I tried first with BOM, but without seems to be better.
So to 5.) Something I have to change there too. I changed as in 3.) the charset in the html-template-file, and the text coming from the template is displayed correctly. But the text coming from MySql/Php shows the UnknownChar-sign at some places currently, i.e. where there should be Umlaute äöü.
I have changed all collations for text fields in the MySql-Database via phpmyadmin to "utf8_unicode_ci", but that didn't do the trick.
Is it a php-issue, or do I only have to convert somehow the data in the MySql-Database once?
The beauty of UTF-8 is that it's a superset to ASCII, so if your html and php files only contain Latin alphabets (i.e. English and programing/HTML syntax), you don't need to convert the file at all. You can leave most of your file unchanged.
Should you find few exceptions that you want to convert it manually, you may open them up in Notepad++, and do 'Encoding' - 'Convert to UTF-8 (No BOM)'.
Yes, you do need to change/add <meta> charset tag to all the HTML files to make sure the browser render your files in UTF-8.
In Notepad++ you could set the new file to always open with 'UTF-8 (No BOM), Unix'. Also, check the tick on "Apply to ANSI files" so old file can be correctly saved to the new encoding. I suggest the format is because even though you are working on a Windows machine, the web servers usually runs Linux/BSD so the format is the native form (keeping files in native form is important especially when you are using a version control system).
Migrate a live site with database is a different issue. Data in MySQL comes with their own encoding, and from your question I cannot tell if you need to do it and how to do it. Need more specifics on that (if you need to).
Related
I am developing a php project which is in HTML5. Following is the meta used for all pages in my website.
<meta charset="utf-8">
I am coding in windows machine using NetBeans. I was not really aware of encoding of the files. Since the code was working fine, i was not giving importance for this.
However, based on some of the questions in stackoverflow, I could understand more about encoding. I noticed that many php/js/css files of my project are saved in UTF-8 encoding whereas some php/js/css files are saved in ANSI encoding. (to understand this, i opened the file in notepad, clicked on save as and checked the default encoding shown).
It seems the files in which I pasted some of the unicode characters were autosaved in UTF-8 and all other files were saved in ANSI encoding (I guess it might be Windows-1252). All this happened even though I set project preference as UTF-8 in netbeans.
Is it required to save those files (files which does not use unicode) also to UTF-8 as my html meta says UTF-8? (Note that there are no issues when I tested my website, but my testing was from a windows machine)
I am also curious to know, how the browser render the web page correctly though some of the php files are saved in ANSI but served with meta UTF-8.
(to understand this, i opened the file in notepad, clicked on save as and checked the default encoding shown).
This isn't an accurate way of checking the encoding of a file.
Files which contain only ASCII characters -- like most CSS and Javascript source files! -- are valid in most text encodings. Notepad will call them "ANSI" because that's its default, but they're also perfectly valid as UTF-8. No conversion is necessary.
I am running a PHP CLI script on my local machine, which calls an API and receives back UTF encoded string. Now, this string is basically a simple word in Portuguese, and I can see it properly when printed on the screen(terminal). However, on a different machine, some characters are not shown properly. The question is, which php.ini option would I need to set in order to see string properly on the other machine? I haven't touched my PHP.ini in regards to encoding and everything works fine, so I'm not sure what do I need to enable for UTF to work on a php cli app?
Edit: this should not be a terminal issue. To be more specific, API returns back a string(UTF encoded) which is supposed to be found inside an array. However, PHP is issuing a warning on another machine saying that can't find that string inside an array. It is the key doesn't exist error that I'm getting, and the key or the string that is shown doesn't look right, so that's the case.
It is basically
$stringReturnedFromApi = $apiCall();
$this->myArray[$stringReturnedFromApi];
it works right on my machine, on another is complaining that the key doesn't exist, and when I looked up $stringReturnedFromAp, it doesn't look UTF encoded
The other terminals need to be set to UTF-8 encoding. For instance, PuTTY needs to be told what encoding, or he will use (on windows) ANSI if i remember well.
If the other host is not on Windows, just verify the locales to be sure it is UTF-8. This is not on your side that the error is, this is just an encoding which is client-side.
If the result is printed out to a webpage, be sure the charset encoding is set to UTF8 also;
I am not sure that this is actually a PHP problem.
Without seeing your script I can't say anything 100%, but I am guessing that the PHP script is actually correctly outputting the UTF character. The reason it looks odd is because the terminal doesn't understand UTF encoding and is unable to display the character.
You would need to be a little more specific on the details of the terminal application you are using to determine where the actual problem lies. I doubt it is something you can fix in the php.ini
Here is the situation:
I'm using UTF-8 to input Japanese characters into a MySQL database, using a php form. However while done from my PC it works perfectly and the script records the characters correctly into the DB, but from other PCs the script inputs raw symbols. I've declared completely all the things regarding the UTF-8 header, meta tag, etc. I'm sure this is not a php/sql issue, (because it works perfectly from one pc) but something from windows configuration I cannot understand.
Anyone knows something regarding this issue?
My pages when viewed in browser are working fine. I have a language file (in .php extension), but without any HTML tags. I m using Dreamweaver CS5.5 to edit these files.
I have a variable $lang["label_name"], which holds the value. in dreamweaver code view, all the devnagari unicode character appears as boxes. But in another machine, with same version of dreamweaver, when same file is opened, the unicode characters are showing correctly.
Is there any settings that I missed in my machine? How to make unicode character appear correctly in Dreamweaver (not talking about browser)?
Thanks
Is there any settings that I missed in my machine?
Probably. Just compare the setup between the two dreamweaver installations and add the differences to your question. This might then be easier to say.
How to make unicode character appear correctly in Dreamweaver (not talking about browser)?
By configuring it properly. Also ensure that you are using the same font on both systems and the files have the same encoding and Dreamweaver is aware of the correct encoding.
When I upload/download some HTML/CSS/… file to FTP server, sometimes something puts every line of code in one line making it completeply unreadable. That something happens every now and then and I’m still looking for an explanation for this behaviour. What could cause this?
It has to do with text file line endings and FTP transfer modes.
Text files in Windows use a combination of carriage return and line feed at the end.
Text files in Mac OS9 and down use carriage return only.
Text files in UNIX and its clones (including OSX) use line feed only.
It sounds like you're pulling a UNIX-style text file to a Windows system in binary mode and then trying to view it in an editor that doesn't understand these differences.
FTP clients have an ASCII (or ASC) transfer mode to do these conversions for you. It's not usually turned on by default, though, as it messes up binary files.
Two solutions are to use an editor that understands the differences or use an FTP client that allows you to tell it that specific file extensions should be transferred in ASC mode.
P.S. This is probably better asked on superuser or serverfault.
This problem arises when we are working with Notepad++, Notepad editors, main importantly hosting servers. I had an same issue and fixed it by opening file again through Dreamweaver and set the content properly and uploaded to server Again. Works fine.