How does PHP naturally anticipate apostrophes in variables?

How does PHP naturally anticipate apostrophes in variables? - php

I just noticed that I could use an a variable as an argument, like this: $variable = "This's a string."; function('$variable'), and not like this: function('This's a string');. I can see why I can't do the latter, but I don't understand what's happening behind the scenes that meakes the first example work.

Have you heard about formal languages? The parser keeps track of the context, and so, it knows what the expected characters are and what not.
In the moment you close the already opened string, you're going back to the context before the opening of the string (that is, in the context of a function call in this case).
The relevant php-internal pieces of codes are:
the scanner turns the sequence between ' and ' into an indivisible TOKEN.
the parser puts the individual indivisible tokens into a semantic context.
These are the relevant chucks of C code that make it work. They are part of the inner workings of PHP (particularily, the Zend Engine).
PHP does not anticipate anything, it really reads everything char by char and it issues a parsing error as soon as it finds an unexpected TOKEN in a semantic context where it's not allowed to be.
In your case, it reads the token 'This' and the scanner matches a new string. Then it goes on reading s and when it finds a space, it turns the s into a constant. As the constant and the previously found token 'This' together don't form any known reduction (the possible reductions are described in the parser-link I've given you above), the parser issues an error like
Unexpected T_STRING
As you can deduce from this message, it is really referring to what it has found (or what it hopes it has found), so there's really no anticipation of anything.
Your question itself is wrong in the sense that there's no apostroph in the variable (in the variable's identifier). You may have an apostroph in the variable's value. Do not confuse them. A value can stand alone, without a variable:
<?php
'That\'s fine';
42;
(this is a valid PHP code which just loads those values into memory)

function('$variable') shouldn't be working correctly
Characters within the " " escape single quotes
Characters within '' do not escape single quotes (they cant escape themselves!).
Using the "" also lets you use variables as part of a string, so:
$pet = 'cat'
$myStory = "the $pet walked down the street"
function($pet) is the way the function should be passed a string

use it like this
function('This\'s a string');

Related

HOW TO PRINT "THAT'S IT!" IN PHP? WITH DOUBLE AND SINGLE QUOTATIONS? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I want to print "that's it!" Exactly as same with double and single quotations in php? Can anyone have solution?

You just have to escape the single quote with a back slash
echo '"that\'s it!"';

I'll post something different because I like them and it think it's often missed or unknown by less experienced programmers.
HereDoc/NowDoc
//HereDoc works like " for variables
echo <<<TXT
"that's it!"
TXT;
The TXT tag can be anything that follows the same rules as variable names (\w+, and must start with Alpha), and the ending one must be on it's own line with "ABSOULTLY" nothing else besides ; (which can be omitted in arrays), nothing else not even whitespaces.
//NowDoc works like ' for variables
echo <<<'TXT'
"that's it!"
TXT;
It's often overlooked, as it's not documented in any really visible place, it's the third example on the string type page on PHP.net. Then after a bunch of scrolling you'll find the Nowdoc part, the only real difference is how they treat variables, as I mentioned.
http://php.net/language.types.string#language.types.string.syntax.heredoc
A third way to delimit strings is the heredoc syntax: <<<. After this operator, an identifier is provided, then a newline. The string itself follows, and then the same identifier again to close the quotation.
The closing identifier must begin in the first column of the line. Also, the identifier must follow the same naming rules as any other label in PHP: it must contain only alphanumeric characters and underscores, and must start with a non-digit character or underscore.
Warning
It is very important to note that the line with the closing identifier must contain no other characters, except a semicolon (;). That means especially that the identifier may not be indented, and there may not be any spaces or tabs before or after the semicolon. It's also important to realize that the first character before the closing identifier must be a newline as defined by the local operating system. This is \n on UNIX systems, including macOS. The closing delimiter must also be followed by a newline.
If this rule is broken and the closing identifier is not "clean", it will not be considered a closing identifier, and PHP will continue looking for one. If a proper closing identifier is not found before the end of the current file, a parse error will result at the last line.
This is also common in other languages besides PHP (with minor variations), and it's the cleanest way (for larger amounts of text, think HTML, JavaScript and a mix of those and PHP).
$selector = 'a.foobar';
echo <<<HTML
<script type="text/javascript">
;( function( $, window, document, undefined ) {
"use strict";
$(document).ready(function(){
$('tr.foobar').css('display', 'none');//singe quotes are fine
$("div.foobar").css("display", "none");//double quotes are too
$({$selector}).css('display', "none");//we can even mix them up if we want.
//we can use PHP variable like {$selector}, even these
//comments become comments in the JS.
//if that wasn't enough, most IDE's treat them like HTML
//so they are not greyed out, but nicely colored!
});
} ) ( jQuery, window, document );
</script>
HTML;
And this would work just fine, even the {$selector} would be replaced by PHP, the {} are optional except for method calls (unless they changed that). I put them in by habit because it colors them better in my IDE. Which is excatly how PHP treats variables in "normal" double quoted strings. (variable interpolation) except here we can use both types of quotes any way we want to...
If you do ever put one in an array it will only work this way (without the ;):
$a = [
<<<TXT
sometext
TXT
, "something else",
1,
2,
'etc..'
];
Other languages that use them (linked to the PHP section)
https://en.wikipedia.org/wiki/Here_document#PHP
In computing, a here document (here-document, here-text, heredoc, hereis, here-string or here-script) is a file literal or input stream literal: it is a section of a source code file that is treated as if it were a separate file. The term is also used for a form of multiline string literals that use similar syntax, preserving line breaks and other whitespace (including indentation) in the text.
The important thing is it does not use the quotes to define the string, so you are free to use them however you want, with no escaping.
One last thing I happened to notice from the PHP documentation that i never really read before.
the first character before the closing identifier must be a newline as defined by the local operating system. This is \n on UNIX systems, including macOS
Maybe someone else knows, but I am not sure how important this bit really is. I program on a Windows Desktop \r\n, and then use the same exact files on a Linux server \n and Have Never 1 time had an issue with what that says. I do use editors though like Eclipse PDT, so it may default to the \n even on windows. But I have never had an issue on either one....
Enjoy!!

Probably one of the ways that works without escaping is as follows:
echo('"'."that'".'s it!"');

How to stop PHP from assuming string contains variable

I've been trying to find solution somewhere for this possibly simple fix but, I haven't been able to surprisingly.
How is it possible to stop PHP from assuming a variable is a part of a string. E.g.
The line of code is $string = "slfnnwnfkw49828323$dgjkt^7ktlskegjejke";
how do you stop PHP from thinking '$dgjkt' is a variable within the string when it's really a part of the full string as characters. Thanks

Use this string like $sting = 'slfnnwnfkw49828323$dgjkt^7ktlskegjejke'

You have to use ' instead of " otherwise php tries to find any variables inside your string

Read the manual.
The most important feature of double-quoted strings is the fact that
variable names will be expanded. See string parsing for details:
When a string is specified in double quotes or with heredoc, variables are parsed within it.
There are two types of syntax: a simple one and a complex one. The
simple syntax is the most common and convenient. It provides a way to
embed a variable, an array value, or an object property in a string
with a minimum of effort.
The complex syntax can be recognised by the curly braces surrounding
the expression.

Hypothetical concatenation predicament

So I am working on a simple micro language/alternative syntax for PHP.
Its syntax takes a lot from JavaScript and CoffeeScript including a few of my own concepts. I have hand written the parser (no parser generator used) in PHP to convert the code into PHP then execute it. It is more of a proof of concept/learning tool rather than anything else but I'd be lying if I said I didn't want to see it used on an actual project one day.
Anyway here is a little problem I have come across that I thought I would impose on you great intellects:
As you know in PHP the period ( . ) is used for string concatenation. However in JavaScript it is used for method chaining.
Now one thing that annoys me in PHP is having to do use that bloody arrow (->) for my method chains, so I went the JavaScript way and implemented the period (.) for use with objects.
(I think you can see the problem already)
Because I'm currently only writing a 'dumb' parser that merely does a huge search and replace, there is no way to distinguish whether a period (.) is being used for concatenation or for method chaining.
"So if you are trying to be like JavaScript, just use the addition (+) operator Franky!", I hear you scream. Well I would but because the addition (+) operator is used for math in PHP I would merely be putting myself in the same situation.
Unless I can make my parser smart enough (with a crap load of work) to know that when the addition (+) operator is working with integers then don't convert it into a period (.) for concatenation I am pretty much screwed.
But here is the cool thing. Because this is pretty much a new language. I don't have to use the period or addition operator for concatenation.
So my question is: If I was to decide to introduce a new method of string concatenation, what character would make the most sense?

Does it have to be one character? .. could work!
Any myriad of combinations, like ~~ or >: even!

If you don't want to use + or ., then I would recommend ^ because that's used in some other languages for string concatenation and I don't believe that it's used for anything in PHP.
Edit: It's been pointed out that it's used for XOR. One option would be to use ^ anyway since bitwise XOR is not commonly used and then to map something else like ^^ to XOR. Another option would be to use .. for concatenation. The problem is that the single characters are mostly taken.
Another option would be to use +, but map it to a function which concatenates when one argument is a string and adds otherwise. In order to not break things which rely on strings which are numbers being treated as their values, we should probably treat numeric strings as numbers for these purposes. Here's the function that I would use.
function smart_add($arg1,$arg2) {
if ($arg1.is_numeric() && $arg2.is_numeric()) {
return $arg1 + $arg2;
} else {
return $arg1 . $arg2;
}
}
Then a + b + c + d just gets turned into smart_add(smart_add(smart_add(a,b),c),d)
This may not be perfect in all cases, but it should work pretty well most of the time and has clear rules for use.

So my question is: If I was to decide to introduce a new method of
string concatenation, what character would make the most sense?
As you're well aware of, you'll need to chose a character that is not being used as one of PHP's operators. Since string concatenation is a common technique, I would try to avoid using characters that you need to press SHIFT to type, as those characters will be a hindrance.
Instead of trying to assign one character for string concatenation (as most are already in use), perhaps you should define your own syntax for string concatenation (or any other operation you need to overwrite with a different operator), as a shorthand operator (sort of). Something like:
[$string, $string]
Should be easy to pick up by a parser and form the resulting concatenated string.
Edit: I should also note that whether you're using literal strings or variables, there's no way (as far as I know) to confuse this syntax with any other PHP functionality, since the comma in the middle is invalid for array manipulations. So, all of the following would still be recognized as string concatenation and not something else in PHP.
["stack", "overflow"]
["stack", $overflow]
[$stack, $overflow]
Edit: Since this conflicts to JSON notation, the following alternative variations exist:
Changing the delimiter
Omitting the delimiter
Example:
[$stack $overflow $string $concatenation] // Use nothing (but really need space)

Regular expression to match ">", "<", "&" chars that appear inside XML nodes

I'm trying to write a regular expression using the PCRE library in PHP.
I need a regex to match only &, > and < chars that exist within string part of any XML node and not the tag declaration themselves.
Input XML:
<pnode>
<cnode>This string contains > and < and & chars.</cnode>
</pnode>
The idea is to to a search and replace these chars and convert them to XML entities equivalents.
If I was to convert the entire XML to entities the XML would look like this:
Entire XML converted to entities
<pnode>
<cnode>This string contains > and < and & chars.</cnode>
</pnode>
I need it to look like this:
Correct XML
<pnode>
<cnode>This string contains > and &lt and & chars.</cnode>
</pnode>
I have tried to write a regular expression to match these chars using look-ahaead but I don't know enough to get this to work. My attempt (currently only attempting to match > symbols):
/>(?=[^<]*<)/g
Just to make it clear the XML I'm trying to fix comes from a 3rd party and they seem unable to fix it their end hence my attempt to fix it.

In the end I've opted to use the Tidy library in PHP. The code I used is shown below:
// Specify configuration
$config = array(
'input-xml' => true,
'show-warnings' => false,
'numeric-entities' => true,
'output-xml' => true);
$tidy = new tidy();
$tidy->parseFile('feed.xml', $config, 'latin1');
$tidy->cleanRepair()
This works perfectly correcting all the encoding errors and converting invalid characters to XML entities.

Classic example of garbage in, garbage out. The real solution is to fix the broken XML exporter, but obviously that's out of the scope of your problem. Sounds like you might have to manually parse the XML, run htmlentites() on the contents, then put the XML tags back.

I'm reasonably certain it's simply not possible. You need something that keeps track of nesting, and there's no way to get a regular expression to track nesting. Your choices are to fix the text first (when you probably can use an RE) or use something that's at least vaguely like an XML parser, specifically to the extent of keeping track of how the tags are nested.
There's a reason XML demands that these characters be escaped though -- without that, you can only guess about whether something is really a tag or not. For example, given something like:
<tag>Text containing < and > characters</tag>
you and I can probably guess that the result should be: ...containing < and >... but I'm pretty sure the XML specification allows the extra whitespace, so officially "< and >" should be treated as a tag. You could, I suppose, assume that anything that looks like an un-matched tag really isn't intended to be a tag, but that's going to take some work too.

Would it be possible to intercept the text before it tries to become part of your XML? A few ounces of prevention might be worth pounds of cure.

This should do it for ampersands:
/(\s+)(&)(\s+)/gim
This means you're only looking for those characters when they have whitespace characters on both sides.
Just make sure the replacement expression is "$1$2amp;$3";
The others would go like this, with their replacement expressions on the right
/(\s+)(>)(\s+)/gim "$1>$2"
/(\s+)(<)(\s+)/gim "$1<$2"

As stated by others, regular expressions don't do well with hierarchical data. Besides, if the data is improperly formatted, you can't guarantee that you'll get it right. Consider:
<xml>
<tag>Something<br/>Something Else</tag>
</xml>
Is that <br/> supposed to read <br/>? There's no way to know because it's validly formatted XML.
If you have arbitrary data that you wish to include in your XML tree, consider using a <![CDATA[ ... ]]> block instead. It's treated the same as a text node, and the only thing you don't have to escape is the character sequence ]]>.

What you have there is not, of course, XML. In XML, the characters '<' and '&' may not occur (unescaped) inside text: only inside a comment, CDATA section, or processing instruction. Actually, '>' can occur in text, except as part of the string ']]>'. In well-formed XML, literal '<' and '&' characters signal the start of markup: '<' signals the start of a start tag, end tag, or empty element tag, and '&' signals the start of an entity reference. In both these cases, the next character may NOT be whitespace. So using an RE like Robusto's suggestion would find all such occurrences. You might also need to catch corner cases like '<<', '<\', or '&<'. In this case you don't need to try to parse your input, an RE will work fine.
If the source contains strings like '<something ' where 'something' matches the production for a Name:
Name ::= NameStartChar (NameChar)*
Then you have more of a problem. You are going to have to (try to) parse your input as if it were real XML, and detect the error cases of malformed Names, non-matching start & end tags, malformed attributes, and undefined entity references (to name a few). Unfortunately the error condition isn't guaranteed to happen at the location of the error.
Your best bet may be to use an RE to catch 90% of the error and fix the rest manually. You need to look for a '<' or '&' followed by anything other than a NameStartChar

How to use $_SERVER['REQUEST_URI']

Is there any difference between typing:
<?php echo $_SERVER[REQUEST_URI] ?>
or
<?php echo $_SERVER['REQUEST_URI'] ?>
or
<?php echo $_SERVER["REQUEST_URI"] ?>
?
They all work... I use the first one.
Maybe one is faster than the other?

Without quotes PHP interprets the REQUEST_URI as a constant but corrects your typo error if there is no such constant and interprets it as string.
When error_reporting includes E_NOTICE, you would probably get an error such as:
Notice: Use of undefined constant REQUEST_URI - assumed 'REQUEST_URI' in <file path> on line <line number>
But if there is a constant with this name, PHP will use the constant’s value instead. (See also Array do's and don'ts)
So always use quotes when you mean a string. Otherwise it can have unwanted side effects.
And for the difference of single and double quoted strings, see the PHP manual about strings.

The first one is wrong - you're actually looking for a constant REQUEST_URI that doesn't exist. This will generate a notice-level warning.
There's no difference between the other two.

There is a difference between single and double quotes in PHP string handling. A string enclosed in double quotes will be evaluated for embedded variables and escape characters (e.g. \n); a string enclosed in single quotes won't (or not as much).
So, for example,
$hello = "world";
echo "Hello $hello!\n";
echo 'Hello $hello!\n';
echo 'Done';
will output
Hello world!Hello $hello!\nDone
In situations where you have no escape characters or embedded variables, it is slightly more efficient to use single quotes as it requires less processing of the string by the runtime. However, many people (me included) prefer to use double quotes for all strings to save confusion.

As a caveat to Gumbo's answer the third representation - double quotes - actually makes PHP look for variables inside that string. Thus that method might be a little slower (although in a string of 11 characters it'll be negligible - it's better practice not to make PHP do that however).

When PHP comes across plain strings being used as array keys it checks if there is a constant with that name and if there isn't it defaults it back to an array key. Therefore, not using quote marks causes a slight performance hit and there is a possibility that the result will not be what you expect.

$_SERVER[REQUEST_URI]
is syntatically incorrect and AFAIK will not run on a default installation of PHP5. The array index is a string so it needs to be passed on strings. I know PHP4 converted undefined constants to strings inside the square brackets but it's still not good practice.
EDIT: Well unless you define a constant called REQUEST_URI, which you haven't in your example script.
$_SERVER['REQUEST_URI']
is the standard method and what you should be using.
$_SERVER["REQUEST_URI"]
also works and while not wrong is slightly more work for the PHP interpreter so unless you need to parse it for variables should not be used. (and if you need to do so, you need to rethink that part of your program.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.