Defining simple string in PHP wont work.. - php

I am a PHP programmer since 12 years now, but i run out of my possibilities now.
I never had such issue, and i dont know, what's going wrong there.
It is really simply. I want to declare the number 84367 as a variable.
I minimized my script to 1 line, in a new php file, but.. what is going wrong?!
<?php
$x = "84367"‬;
?>
results in
Parse error: syntax error, unexpected '‬' (T_STRING) in C:\xampp\htdocs\me7dtc\test.php on line 2
Why ?

Simple. Your code contains a unicode character.
Copy and paste this exactly as shown:
<?php
// $x = "84367"‬;
^ unicode hidden character between the last quote and the semi-colon‬.
$x = "84367";
?>
The commented line is the one that contains the unicode character.
To be more specific, it's the (hidden) ‬ character between the last quote and the semi-colon.
A.k.a.: "POP DIRECTIONAL FORMATTING".
Consult the following links on this:
http://www.fileformat.info/info/unicode/char/202c/index.htm
http://www.codetable.net/decimal/8236
This would likely not have shown it when encoded/editing under an UTF-8 environment, but will in ANSI.
In an ANSI encoded environment, it would have shown ‬ immediately following the last quote.
More precisely:
<?php
$x = "84367"‬;
?>
You more than likely were under an UTF-8 coding environment where it is needed for you, but were unable to see it. You could temporarily convert your file to ANSI then switch back to UTF-8 in order to pick up on (hidden) characters such as these.

Related

Having en-dash at the end of the string doesn't allow json_encode

I am trying to extract n characters from a string using
substr($originalText,0,250);
The nth character is an en-dash. So I get the last character as †when I view it in notepad. In my editor, Brackets, I can't even open the log file it since it only supports UTF-8 encoding.
I also cannot run json_encode on this string.
However, when I use substr($originalText,0,251), it works just fine. I can open the log file and it shows an en-dash instead of â€. json_encode also works fine.
I can use mb_convert_encoding($mystring, "UTF-8", "Windows-1252") to circumvent the problem, but could anyone tell me why having these characters at the end specifically causes an error?
Moreover, on doing this, my log file shows †in brackets, which is confusing too.
My question is why is having the en-dash at the end of the string, different from having it anywhere else (followed by other characters).
Hopefully my question is clear, if not I can try to explain further.
Thanks.
Pid's answer gives an explanation for why this is happening, this answer just looks at what you can do about it...
Use mb_substr()
The multibyte string module was designed for exactly this situation, and provides a number of string functions that handle multibyte characters correctly. I suggest having a look through there as there are likely other ones that you will need in other places of your application.
You may need to install or enable this module if you get a function not found error. Instructions for this are platform dependent and out-of-scope for this question.
The function you want for the case in your question is called mb_substr() and is called the same as you would use substr(), but has other optional arguments.
UTF-8 uses so-called surrogates which extend the codepage beyond ASCII to accomodate many more characters.
A single UTF-8 character may be coded into one, two, three or four bytes, depending on the character.
You cut the string right in the middle of a multi-byte character:
[<-character->]
[byte-0|byte-1]
^
You cut the string right here in the middle!
[<-----character---->]
[byte-0|byte-1|byte-2]
^ ^
Or anywhere here if it's 3 bytes long.
So the decoder has the first byte(s) but can't read the entire character because the string ends prematurely.
This causes all the effects you are witnessing.
The solution to this problem is here in Dezza's answer.

PHP echos odd characters ("“helloâ€") on OS X installation

This is my first time with MAC on php, earlier I have been using ubuntu machines for PHP.
I have successfully installed MAMP. and now I have a file(index.php) in htdocs
<?php
echo “hello”;
and its output on safari is :
“helloâ€
I have tried with several different texts, all of them generate absurd output on browser.
Where is the problem in mac, safari or MAMP ?
update :
Without curly quotes the output is ‘hello’
Don't use special quotes like “ and ” but use " or ' instead. Be sure to use a simple text editor when you write code and not something like Word, for example, which will replace the simple quote characters to more fancy ones.
PHP doesn't understand the fancy quotes and won't substitute them with " or ', which have a special meaning in the language.
So, why didn't it break? PHP is incredibly forgiving, which has advantages and disadvantages. Consider the following code, which uses a constant:
define('HELLO', 'Hello world!');
echo HELLO;
This works and will output "Hello world!"
Now, if we pass what looks like a constant to PHP but don't define it, PHP will just output the (inexistent) constant's name instead:
echo HELLOWORLD;
this will output "HELLOWORLD".
The same happens with the text bit “hello” – it tries to look for a constant with that name, doesn't find any, and so just outputs “hello”. This only outputs an E_NOTICE error, which may be disabled by default. It is recommended to output all errors during development to avoid errors such as this:
error_reporting(E_ALL);
echo “hello”;
This will output:
Notice: Use of undefined constant “hello†- assumed '“helloâ€'
And indeed, if we tried to add a space between those special quotes, it would fail as constants can't have a space in their name, and so the text can't be interpreted at all:
echo “hello world”;
Parse error: syntax error, unexpected 'worldâ€' (T_STRING), expecting ',' or ';'
As to why the special quotes like “ aren't being displayed properly, this is because of an encoding problem. “ has an ASCII value above 127 and so can be interpreted in different ways depending on the encoding. Your file is saved in a certain encoding but your server and browser may assume it is in another one, yielding false characters.
you must set your docmuent as utf-8 encode.
method-1: add following html code to your page
<meta charset="utf-8" />
method-2: using php code like below
<?
header("Content-Type: text/html;charset=utf-8");
echo “hello”;
?>
the header must be in the first line of your code.
it will output “hello” ; if you want to output hello please change your double-quote to 'hello'or "hello". because your current double-quote are not english.
You can quote with quote ', double-quote ".
Your double quotes are supposed to be " not “ or ”.
Your Problem With encoded You Must Make It UTF-8 (utf8_encode)
string utf8_encode ( string $data )
Note If Not Work That Mean Problem With Your Pc or Browsers Then Go To Langs In Control panal..
Hope Hear Your Answer...

Why does pasting a line break the code if writing out the same line by hand works fine?

Here are two versions of a line in a php file:
First version:
if ($projet['sourceDonnees'] === (string)$CONSTANTS['sourceDonnees_saisie']) {
Second version:
if ($projet['sourceDonnees'] === (string)$CONSTANTS['sourceDonnees_saisie']) {
Although they look identical, the first version results in a PHP Parse error: syntax error, unexpected T_STRING, whereas the second version works fine. The difference between the two is that the first version was pasted in and modified whereas the second version was written out by hand entirely. What's going on here?
Notes: The line was copied from a text file encoded in UTF-8 and pasted into another UTF-8 text file. All operations done within gedit, both files written by me in gedit.
You've copied UTF-8 quote marks, which are not parse-able by PHP. Remove the quote marks and replace them with ASCII equivalents (i.e. by typing them).
For more information on ASCII vs UTF-8 quote marks see http://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html
When I copied & pasted your first line into my text editor and turned on the "show invisible characters" option, it looked like this:
if ($projet['sourceDonnees']•=== (string)$CONSTANTS['sourceDonnees_saisie']) {
Notice the • between the ] and the ===.
Your second line of code showed perfectly clean.
Many times you will pick up stray invisible characters when you copy & paste text from websites. However, I do not know what keyboard combination will reproduce this from scratch.
Further experimentation reveals this invisible character as "non-ASCII"... the BBEdit text editor simply calls them "gremlins", and even has a "zap gremlins" function.
unexpected T_STRING is usually an issue with your quotes or tick marks. The 's in the code that you copied are UTF-8 and probably a version of the quote that PHP cannot parse, like the backticks we use for inline code here on SO. Try changing them to a regular single quote and it'll likely solve your issue.
If that's not the case, make sure you didn't miss the semicolon at the end of your function. This can cause the same error.

PHP adds extra whitespace on require

Consider the following code:
<div id="sidebar">
<?php
require_once('/components/search.php');
require_once('/components/categories.php');
?>
</div>
search.php and category.php are essentially the same structure - a div container with some specific contents. Nothing special here, pure HTML:
<div class="component">
<!-- blah -->
</div>
However, when inserted with require_once (or require / include etc), PHP adds whitespace above each element, pushing it down, identifiable as an empty text node in Chrome's Inspect Element tool (the whitespace disappears when this node is deleted)
Deleting all unnecessary whitespace from the sidebar script (making it a single line of code) doesn't fix it. And if I just replace the require_once lines with the contents of the components, the whitespace doesn't appear. So not sure why PHP is adding it on require. Any ideas?
Update
This one's still proving to be a weird one. I agree now that require_once does not seem to be the root cause as such. I decided to ignore the problem for a while and hope it would go away after I'd worked on it further. Alas, it remains, so I did bit more investigating. Checking the page source in the browser confirms that the code in question is indeed returned as a single long unbroken line http://pastebin.com/dtp7QNbs - there's no whitespace or carriage return between any of the tags, yet space appears in the browser - identifiable in the Inspect Element tool as empty lines between each <div class="component">
Does this help shed any more light on the issue?
I had the same problem and verified Kai's solution to change the format to ANSI but also found that "Encode in UTF-8 without BOM" also works.
This comes up as the default format for new Notepad++ PHP files so one less conversion step.
It seems that use of the byte order mark file header in UTF-8 is not generally recommended. I verified that my installation of VS2010 was adding BOM when saving PHP files.
The following stackoverflow article explains nicely where the extra whitespace got inserted.
What's different between utf-8 and utf-8 without BOM?
Problem solved! This took forever to figure out. The short answer is that my php files were UTF-8 encoded. Changing this in Notepad++ to ANSI fixed it.
I only found the real cause of the problem by doing a character-by-character comparison of the output HTML - one output from where 'require_once' was used and one where the code was manually pasted in place.
In a visual comparison of the output, both appeared identical - same length, no extra/different characters. But when pushed through preg_split('//', $string), and looped through character by character, 3 extra "invisible" characters were revealed at the start of each require_once insert point. I idenitified these as the ASCII characters ï, » and ¿ (a double-dotted i, a right chevron and an upside-down question mark).
Changed the encoding to ANSI (I discovered this as the cause when I recreated one of the scripts in Notepad word-for-word and it did not suffer the same issue), and the extra lines have gone.
The extra characters are the BOM (Byte Order Mark). So converting to UTF-8 without BOM was the real trick here. More info here: http://en.wikipedia.org/wiki/Byte_order_mark
some time it happened because of white space after ?> on the class file, also or before <?php
It's easy to save PHP file as UTF-8 without mark symbols in Programmer's Notepad editor using:
File -> Encoding -> UTF-8 No Mark
File -> Save
After that require command will include PHP script without adding whitespaces.
First, put <?php on the same line as the DIV:
<div id="sidebar"><?php
This gets rid of the whitespace before search.php.
Then make sure that search.php has no newline at the end, that's causing the whitespace between search.php and categories.php. Some text editors add a trailing newline by default, you may need to override this.
I just tried this, the output of php main.php is:
<div id="sidebar"><div class="component">
<!-- search.php -->
</div><div class="component">
<!-- categories.php -->
</div></div>

PHP concatenate Characters with HTML encoding of the Unicode characters

PROBLEM
I am trying to make a string where the values are normal characters as well as HTML encoding.
How can I create a string that is part Character and part Encoding?
FOR EXAMPLE
I want to make an array of the cards A,2,3,4,5,6,7,8,9,10,J,Q,K of Hearts or &#9829 in HTML encoding.
I have tried the following in various forms to no avail...
$hearts = array("A&#9829","2&#9829", etc);
I have also tried to use the HTML encoding of the letters themselves but it returns an error as unexpected code.
RESOLVED
The code as is above will work. Error was due to incorrect " symbols in original php. BUT see selected answer and comments for information on UTF-8 usage in php.
Just include the UTF-8 characters, for example ❤.

Categories