Setting encoding, newline, linebreaks, end-of-line (EOL) in PHP - php

For example, when I create a new file:
$message = "Hello!";
$fh = fopen(index.html, 'w');
fwrite($fh, $message);
fclose($fh);
How can I set it's encoding(utf-8 or shift-jis or euc-jp) and linebreaks(LF or CR+LF or CR) in PHP?

The encoding of a string literal should match the encoding of the source file, to convert between encodings you could use iconv.
$utf8=iconv("ISO-8859-1", "UTF-8", $message);
Line breaks are entirely up to you. You could use the PHP_EOL constant, or if you think you might need to vary the type of line break, store the desired sequence in a variable and configure it at runtime.

To add carriage returns and linefeeds use the special characters \r and \n. So:
$message = "Hello!\r\n";

Related

fopen 't' compatibility mode in php

It is said, that fopen can use t mode to convert \n to \r\n. So, questions:
1) How should i use t mode when i need to read and write (r+)? Should it be r+t or rt+ or tr+? Same question for b, should i write r+b or how?
2) I've tried all variants on debian linux to convert file, that contains only \n to \r\n using magic mode t (wanna understand how it works). But it does not work. What am I doing wrong? When t mode works?
Here is my code:
// Write string with \n symbols
$h = fopen('test.file', 'wt');
fwrite($h, "test \ntest \ntest \n"); // I've checked, after file is being created
fclose($h); // \n symbols are not substituted to \r\n
// Open file, that contains rows only with \n symbols
$h = fopen('test.file', 'rt');
$data = fread($h, filesize('test.file'));
fclose($h);
// I want to see what's inside
$data = str_replace("\n", '[n]', $data);
$data = str_replace("\r", '[r]', $data);
// finally i have only \n symbols, \r symbols are not added
var_dump($data);
From: http://php.net/fopen
Windows offers a text-mode translation flag ('t') which will transparently translate \n to \r\n when working with the file. In contrast, you can also use 'b' to force binary mode, which will not translate your data. To use these flags, specify either 'b' or 't' as the last character of the mode parameter.
So no Linux. Also, according to the spec r+t or r+b would be correct (but only on Windows).

Converting ^M Character to Whitespace without Line Break using PHP

I'm trying to convert ^M character to a white space character, but having hard time doing this.
In PHP, I used "wb" option so, it wouldn't write DOS character into the file. fopen("file.csv", "wb")
It was successful, but still has line breaks instead of ^M
$fp = fopen("file.csv", "wb");
$description =nl2br( $product->getShortDescription());
$line .= $description . $other_variables . "\n";
fputs($fp, $line);
but I still see line break within the description, is there any way to remove ^M and replace it with possibly a whitespace.
Also used dos2unix, when it was in regular file "w" mode. It removes all ^M characters, but the file still has line breaks where there was ^M. I really need it to be all on one line for my CSV file to work.
Thank you.
I think you're asking how to remove all the newline/line feed/carriage return characters from the description. If so:
$description =str_replace(array("\r", "\n"), '', nl2br($product->getShortDescription());

Whitespace in a database field is not removed by trim()

I have some whitespace at the begining of a paragraph in a text field in MySQL.
Using trim($var_text_field) in PHP or TRIM(text_field) in MySQL statements does absolutely nothing. What could this whitespace be and how do I remove it by code?
If I go into the database and backspace it out, it saves properly. It's just not being removed via the trim() functions.
function UberTrim($s) {
$s = preg_replace('/\xA0/u', ' ', $s); // strips UTF-8 NBSP: "\xC2\xA0"
$s = trim($s);
return $s;
}
The UTF-8 character encoding for a no-break space, Unicode (U+00A0), is the 2-byte sequence C2 A0. I tried to make use of the second parameter to trim() but that didn't do the trick. Example use:
assert("abc" === UberTrim(" \r\n \xc2\xa0 abc \t \xc2\xa0 "));
A MySQL replacement for TRIM(text_field) that also removes UTF no-break spaces, thanks to #RudolfRein's comment:
TRIM(REPLACE(text_field, '\xc2\xa0', ' '))
UTF-8 checklist:
(more checks here)
Make sure your PHP source code editor is in UTF-8 mode without BOM. Or set in the preferences.
Make sure your MySQL client is set for UTF-8 character encoding (more here and here), e.g.
$pdo = new PDO('mysql:host=...;dbname=...;charset=utf8',$userid,$password);
$pdo->exec("SET CHARACTER SET utf8");
Make sure your HTTP server is set for UTF-8, e.g. for Apache:
AddDefaultCharset UTF-8
Make sure the browser expects UTF-8.
header('Content-Type: text/html; charset=utf-8');
or
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
If the problem is with UTF-8 NBSP, another simple option is:
REPLACE(the_field, UNHEX('C2A0'), ' ')
The best solution is a combination of a few things mentioned to you already.
First run ORD() on the string in question. In my case I had to run a reverse first because my problem character was at the end of the string.
ORD(REVERSE([col name]))
Once you discover the problematic char, run a
REPLACE([col_name], char([char_value_returned]), char(32))
Finally, call a proper
TRIM([col_name])
This will completely eradicate the problem character from all aspects of the string, and trim off the leading (in my case trailing) character.
Try using the MySQL ORD() function on the text_field to check the character code of the left-most character. It can be a non-whitespace characters that appears like whitespace.
you have to detect these "whitespace" characters first. if it's some HTML entity, like no trimming function would help, of course.
I'd suggest to print it out like this
echo urlenclde($row['field']);
and see what it says
Well as its A0 (or 160 decimal) non-breaking space character, you can convert it to ordinal space first:
<pre><?php
$str = urldecode("%A0")."bla";
var_dump(trim($str));
$str = str_replace(chr(160)," ",$str);
$str = trim($str);
var_dump($str);
and, ta-dam! -
string(4) " bla"
string(3) "bla"
Try to check what character each "whitespace" is by writing the charactercode out - It might be a non-visible charactertype that isn't removed by trim.
Trim only removes a few characters such as whitespace, tab, newline, CR and NUL but there exist other non-visible characters that might cause this problem.
try
str_ireplace(array("\r", "\n", "\t"), $var_text_field

Is replacing a line break UTF-8 safe?

If I have a UTF-8 string and want to replace line breaks with the HTML <br> , is this safe?
$var = str_replace("\r\n", "<br>", $var);
I know str_replace isn't UTF-8 safe but maybe I can get away with this. I ask because there isn't an mb_strreplace function.
UTF-8 is designed so that multi-byte sequences never contain an anything that looks like an ASCII-character. That is, any time you encounter a byte with a value in the range 0-127, you can safely assume it to be an ASCII character.
And that means that as long as you only try to replace ASCII characters with ASCII characters, str_replace should be safe.
str_replace() is safe for any ascii-safe character.
Btw, you could also look at the nl2br()
1st: Use the code-sample markup for code in your questions.
2nd: Yes, it is save.
3rd: It may not be what you want to archieve. This could be better:
$var = str_replace(array("\r\n", "\n", "\r"), "<br/>", $var);
Don't forget that different operating systems handle line breaks different. The code above should replace all line breaks, no matter where they come from.

PHP equivalent of VB.net character codes

So I am calling an API written in VB.NET from PHP and passing it some text. I want to insert into that text two linebreaks.
I understand that in VB.NET, the character codes for a linebreak are Chr(10) and Chr(13). How can I represent those in PHP?
TIA.
The chr function exists in PHP too.
But, generally, we use "\n" (newline ; chr=10) and "\r" (carriage-return ; chr=13) (note the double-quotes - do not use simple quotes here, is you want those characters)
For more informations, and a list of the escape sequences for special characters, you can take a look at the manual page about strings.
CR or Carriage Return, Chr(10), is represented by \r in a string
LF or Line Feed, Chr(13), is represented by \n in a string
e.g.
echo "This is\r\na broken line";
this might look more familiar, using the PHP chr() function, but you'd rarely see it done like this:
echo "This is".chr(10).chr(13)."a broken line";
There is also a constant called PHP_EOL which contains the most appropriate line break sequence for the system PHP is running on.
$break = "\n";

Categories