PHP urlencode() and whitespace - php

I have this string to be encoded (with line break)
Sender ID
Sender ID
Sender ID
When using this urlencode generator, I get the desired output which is
Sender%20ID%0ASender%20ID%0ASender%20ID
However when i using php urlencode() i get this output
Sender+ID%0D%0ASender+ID%0D%0ASender+ID
When using the php rawurlencode() i get this output
Sender%20ID%0D%0ASender%20ID%0D%0ASender%20ID
How to achieve the output same as the generator? I need it to be same since Blackberry phone will properly show line break only if the urlencode for line break is %0A (i am working on a sms system).
Right now the only solution i can think is to search for the %0D%0A and replace with %0A

You have a Windows line ending which is being translated directly by PHP and ignored by your generator tool. The easy way to get rid of it is to simply:
str_replace( "\r\n", "\n", $input );
%0D refers to the 13th ASCII character: \r. Since this is immediately followed by %0A (the \n) it is clear that you have the MS line ending (\r\n) instead of the *nix line ending (\n) and that the urlencode generator is using the *nix approach.

Related

PHP - preg_match() - matching substitution character black diamond with question mark

I have a problem with substitution character - diamond question mark � in text I'm reading with SplFileObject. This character is already present in my text file, so nothing can't be done to convert it to some other encoding. I decided to search for it with preg_match(), but the problem is that PHP can't find any occurence of it. PHP probably sees it as different character as �. I don't want to just remove this character from text, so that's the reason I want to search for it with preg_match(). Is there any way to match this character in PHP?
I tried with regex line: /.�./i, but without success.
Try this code.Hexadecimal of � character is FFFD
$line = "�";
if (preg_match("/\x{FFFD}/u", $line, $match))
print "Match found!";
PHP with SplFileObject seems to read the file a little bit different and instead of U+FFFD detects U+0093 and U+0094. If you are having the same problem as I had, then I suggest you to use hexdump to get information on how unrecognized character is encoded in it. Afterwards I suggest you to use this snippet as recommended by #stribizhev in comments, to get hex code recognized by PHP. Once you figure out what is correct hex code of unrecognized character (use conversion tool as suggested by #stribizhev in comments, to get correct value), you can use preg_...() function. Here's the solution to my problem:
preg_replace("/(?|\x93|\x94)/i", "'", $text);

php and newlines: what I need to know?

I have some questions about \r\n:
newlines are browser dependent? (not how they are displayed in a browser, but how <textarea> sends them to php via http request)
newlines are system dependent? (where php runs)
will php apply some implicit conversion?
will mysql apply some implicit conversion?
Thanks in advance!
newlines are browser dependent?
No. Use <br> to get a newline in a browser
newlines are system dependent? (where php runs)
yes : \n on OSX, \n on Unix/Linux, \r\n on Windows
will php apply some implicit conversion?
no
will mysql apply some implicit conversion?
no
Generally, for browser \r and \n are whitespace chars, like ' ' (whitespace) of \t (tab). Inside some tags (script, pre etc.) they are treated as line break symbols. In this case browser will understand any of common line break sequences (\r, \r\n, \n).
When data comes from textarea, line breaks will always be represented as \r\n.
Line breaks in php files doesn't depend on system where they're running. It depends on settings of file editor used for creating php files. When you copy a php file to another system, line breaks format will not change.
For example, look at this code:
print_r("
" === "\r\n");
Its result will depend on settings of the editor used for creating this file. It doesn't depend on current system.
But if you're trying to read some other files contained by your system (text files, for example) these files will most probably use system's common line breaks format.
No, PHP and MySQL don't apply implicit conversions.
The system independent way is using PHP_EOL constant.
New lines is not browser dependent, outer a tag with CSS white-space:pre you must to execute nl2br() php function to convert newlines to BR tags.
You may be interested in nl2br, this takes new line characters like you described and replaces them with a HTML line break (<br />).
A big gotcha for me was that in single quoted strings 'like\nthis' escape sequences (like \n) will not be interpreted. You have to use double quotes "like\nthis" to get an actual newline.
<br> is browser independent, \n should be too.
Don't know about \r
MySQL won't convert it

Get iconv to convert my string

I have the following string:
ᴰᴶ Bagi
Is it possible to let iconv make it into DJ Bagi?
First I tried with:
$text = iconv('utf-8', 'us-ascii//TRANSLIT', $text);
Which resulted in the following notice:
Notice: iconv() [function.iconv]: Detected an illegal character in input string
On the PHP site I saw someone using:
//IGNORE//TRANSLIT
While this prevents the notice I only get:
Bagi
I initially thought that this is an encoding problem on your end, but if I copy + paste those characters locally from the soundcloud source page:
ᴰᴶ Bagi
and try to iconv them, I get the same result as you do. That means that the data is UTF-8, but iconv does not recognize ᴰ as a "child" of D. Unable to convert the character, it complains (a bit misleadingly IMO) about an illegal character.
Edit: This seems indeed true. Superscript D is not in the Unicode Superscripts and Subscripts range, but it's a phonetic character. That's probably why they can't be mapped back to their "parent" letter. Here is more info on ᴰ
As far as I can see, your only choice is to replace the characters manually.
The most primitive example of a replace is
str_replace("ᴰ", "D", $string);
(note that your source file needs to be stored as UTF-8 for this to work)
For an elegant solution, you could build an array out of the source and replacement characters, and pass that to the str_replace call.
Or call DJ Bagi and tell him to get the damn letters straight. You will notice that Soundcloud's URL builder encountered exactly the same problem.
soundcloud.com/bagi

How to ignore PHP break lines with POEdit parser?

I am currently translating my PHP application using gettext with POEdit. Since I respect the print margin in my source code, I was used to writing strings like that:
print $this->translate("A long string of text
that needs to follow the print margin and since
php outputs whitespaces for every break line I do
my sites renders correctly.");
However, in POEdit, as expected, the linebreaks are not escaped to whitespaces.
A long string of text\n
that needs to follow the print margin and since\n
php outputs whitespaces for every break line I do\n
my websites render correctly.\n
I know one approach would be to close the strings when changing lines in the source code like that:
print $this->translate("A long string of text " .
"that needs to follow the print margin and since " .
"php outputs whitespaces for every break line I do " .
"my sites renders correctly. ");
But it is not an approach that is extensible for me when texts need to change and print margin
still respected, unless netbeans (the IDE I use) can do that for me automatically just like eclipse
in java.
So in conclusion, is there a way to tell the POEdit parser to escape linebreaks as whitespaces in the preferences?
I know that the strings are still translatable even though linebreaks are not escaped, I'm asking this so my traductor (sometimes even the customer/user) will avoid confusion into thinking he needs to duplicate the linebreaks while he translates in POEdit.
You have to make sure that your using the right line breaks in your script and your app
LF: Line Feed, U+000A
FF: Form Feed, U+000C
CR: Carriage Return, U+000D
CR+LF: CR (U+000D) followed by LF (U+000A)
NEL: Next Line, U+0085
LS: Line Separator, U+2028
PS: Paragraph Separator, U+2029
Within Windows systems (ms-dos) there line feed is CR+LF, And within "Unix-like" systems its LF adn 8Bit commodore's its a CR
You have to make sure that the source location contains the same type of feeds to your edit location.
Your server handles its line feeds different to the host that the editor is running on, just double check this and develope some means of auto replacing the Unicode chars depending on your OS
As you say that your "translating my PHP application using gettext with POEdit", i would create a script to go threw all your files via shell/doss/php and auto convert the character codes to the type of system your running on.
so if your working on Windows then you would search for all chars that are U+000A and replace with U+000DU+000A

PHP equivalent of VB.net character codes

So I am calling an API written in VB.NET from PHP and passing it some text. I want to insert into that text two linebreaks.
I understand that in VB.NET, the character codes for a linebreak are Chr(10) and Chr(13). How can I represent those in PHP?
TIA.
The chr function exists in PHP too.
But, generally, we use "\n" (newline ; chr=10) and "\r" (carriage-return ; chr=13) (note the double-quotes - do not use simple quotes here, is you want those characters)
For more informations, and a list of the escape sequences for special characters, you can take a look at the manual page about strings.
CR or Carriage Return, Chr(10), is represented by \r in a string
LF or Line Feed, Chr(13), is represented by \n in a string
e.g.
echo "This is\r\na broken line";
this might look more familiar, using the PHP chr() function, but you'd rarely see it done like this:
echo "This is".chr(10).chr(13)."a broken line";
There is also a constant called PHP_EOL which contains the most appropriate line break sequence for the system PHP is running on.
$break = "\n";

Categories