Why do PHP Array Examples Leave a Trailing Comma? - php

I have seen examples like the following:
$data = array(
'username' => $user->getUsername(),
'userpass' => $user->getPassword(),
'email' => $user->getEmail(),
);
However, in practice I have always not left the trailing comma. Am I doing something wrong, or is this just 'another' way of doing it? If I was using a framework would not having the trailing comma affect code generation negatively? I have seen the use of trailing commas in array declarations in other languages (Java, C++) as well, so I assume the reasons for leaving trailing commas are not specific to PHP, but this has piqued my interest.

Why do PHP Array Examples Leave a Trailing Comma?
Because they can. :) The PHP Manual entry for array states:
Having a trailing comma after the last defined array entry, while unusual, is a valid syntax.
Seriously, this is entirely for convenience so you can easily add another element to the array without having to first add the trailing comma to the last entry.
Speaking of other languages: Be careful with this in JavaScript. Some older browsers will throw an error, though newer ones generally allow it.

This is a good practice when defining array on multiple lines. It's also encouraged by ZendFramework's coding standards:
When using this latter declaration, we
encourage using a trailing comma for
the last item in the array; this
minimizes the impact of adding new
items on successive lines, and helps
to ensure no parse errors occur due to
a missing comma.

I noticed when working with version control (git) that if we add 1 thing to an array and we don't have the trailing comma, it will look like we modified 2 lines because the comma had to be added to the previous line. I find this looks bad and can be misleading when looking at the file changes, and for this reason I think a trailing comma is a good thing.

Because it keeps entries uniform.
If you've had to swap the order, or add or delete entries, you know being able to leave a trailing comma is very convenient.
If the last element cannot have a comma, then you end up having to maintain the last comma by modifying entries. It's a pointless exercise and a waste of time and finger strokes because the intent of swapping or modifying entries is already accomplished.
By allowing a trailing comma on the last element, it frees the programmer from having to tend to this annoying and fruitless detail.

The reason is commit changes.
If you have to add the trailing comma when adding a new element. You're changing 1 line and adding 1 line. (-++)
When adding a new element when a comma is already in the line above. There is only 1 added line, and no changed ones. (+)

I can't speak for other people, but I usually leave a trailing comma in my code. I do so because if/when I later add to the array, I do not have to worry about missing out a comma due to forgetting to add a comma to what was previously the last line.

I'm always doing trailing comma because it helps to avoid syntax errors while adding new array elements... it's just a good practice.

I feel that even though it is allowed it is bad practice, its like leaving out the last semi colon of your functions and loops.

If you look at an example of roundcube file config (config.inc.php), they have example with and without trailing comma.
This array defines what plugins should be enabled or disabled:
...
// List of active plugins (in plugins/ directory)
$config['plugins'] = array(
'managesieve',
'password',
'archive',
'zipdownload',
);
...
Normally, this would be line by line and if somebody wants to add something on the array, they can do this:
...
// List of active plugins (in plugins/ directory)
$config['plugins'] = array(
'managesieve', //code by personA
'password', //code by personA
'archive', //code by personA
'zipdownload', //code by personA
'newplugin', //new code by personB
);
...
So, when they commit this code, they see only one changes for that particular line and this is more readable when inspecting who is making the code changes for that particular line.
In another line of code you can see this without trailing comma:
...
$config['default_folders'] = array('INBOX', 'Drafts', 'Sent', 'INBOX.spam', 'Trash');
...
Normally it would be a single line of code where nobody expects this code to be changed frequently.
In another word:
1) Put trailing comma if the array is used as an option or configuration file that might need to be changed dynamically in the future. Besides, if you make changes to that array programmatically using trailing comma you only make changes to one line code, whereas without it, you have to deal with 2 line of codes and this can cause more complexity to parse the array
2) You don't have to put trailing comma if the array is a constant array and you don't expect it to change in the future but as mentioned by the Accepted Answer, you can put trailing comma but it has no purpose

This surprised me recently, but it makes sense. I have long tried to adhere to an earlier convention that accomplishes the same thing, which is to put the separating comma in front of each entry rather than at the end.
$data = array(
'username' => $user->getUsername()
, 'userpass' => $user->getPassword()
, 'email' => $user->getEmail()
);
The commas also all line up that way, which looks nice, but it can make the indenting a little awkward. Maybe for that reason, it doesn't seem to have caught on much over the years, and I've had others ask me why I do it. I guess PHP's solution is a good compromise, and in any case, it's apparently the accepted solution now.

I've always added commas at the start of the new entry.
Compilers see it as the single-character token of look-ahead
that says "there is another one coming". I don't know if modern
compilers use LR(1) (left-recursive, single token look-ahead) but
I think that's where the syntax error originates when an a comma
has nothing after it.
It is rare that I've ever had another developer agree with me, but
it looks like JohnBrooking does!

Related

Recursive Regex in PHP with variable names

I try to make bbcode-ish engine for me website. But the thing is, it is not clear which codes are available, because the codes are made by the users. And on top of that, the whole thing has to be recursive.
For example:
Hello my name is [name user-id="1"]
I [bold]really[/bold] like cheeseburgers
These are the easy ones and i achieved making it work.
Now the problem is, what happens, when two of those codes are behind each other:
I [bold]really[/bold] like [bold]cheeseburgers[/bold]
Or inside each other
I [bold]really like [italic]cheeseburgers[/italic][/bold]
These codes can also have attributes
I [bold strengh="600"]really like [text font-size="24px"]cheeseburgers[/text][bold]
The following one worked quite well, but lacks in the recursive part (?R)
(?P<code>\[(?P<code_open>\w+)\s?(?P<attributes>[a-zA-Z-0-1-_=" .]*?)](?:(?P<content>.*?)\[\/(?P<code_close>\w+)\])?)
I just dont know where to put the (?R) recursive tag.
Also the system has to know that in this string here
I [bold]really like [italic]cheeseburgers[/italic][/bold] and [bold]football[/bold]
are 2 "code-objects":
1. [bold]really like [italic]cheeseburgers[/italic][/bold]
and
2. [bold]football[/bold]
... and the content of the first one is
really like [italic]cheeseburgers[/italic]
which again has a code in it
[italic]cheeseburgers[/italic]
which content is
cheeseburgers
I searched the web for two days now and i cant figure it out.
I thought of something like this:
Look for something like [**** attr="foo"] where the attributes are optional and store it in a capturing group
Look up wether there is a closing tag somewhere (can be optional too)
If a closing tag exists, everything between the two tags should be stored as a "content"-capturing group - which then has to go through the same procedure again.
I hope there are some regex specialist which are willing to help me. :(
Thank you!
EDIT
As this might be difficult to understand, here is an input and an expected output:
Input:
[heading icon="rocket"]I'm a cool heading[/heading][textrow][text]<p>Hi!</p>[/text][/textrow]
I'd like to have an array like
array[0][name] = heading
array[0][attributes][icon] = rocket
array[0][content] = I'm a cool heading
array[1][name] = textrow
array[1][content] = [text]<p>Hi!</p>[/text]
array[1][0][name] = text
array[1][0][content] = <p>Hi!</p>
Having written multiple BBCode parsing systems, I can suggest NOT using regexes only. Instead, you should actually parse the text.
How you do this is up to you, but as a general idea you would want to use something like strpos to locate the first [ in your string, then check what comes after it to see if it looks like a BBCode tag and process it if so. Then, search for [ again starting from where you ended up.
This has certain advantages, such as being able to examine each code and skip it if it's invalid, as well as enforcing proper tag closing order ([bold][italic]Nesting![/bold][/italic] should be considered invalid) and being able to provide meaningful error messages to the user if something is wrong (invalid parameter, perhaps) because the parser knows exactly what is going on, whereas a regex would output something unexpected and potentially harmful.
It might be more work (or less, depending on your skill with regex), but it's worth it.

Matching a complicated route with a regular expression

I'm currently working on a request router for a large PHP based website that I'm working on, but I'm getting stuck trying to use a custom form of expression for my routes.
While I know there are pre-made alternatives and routers that could make my life easier, and would have the same features (in fact, I've been looking at their source code to try and solve this), I'm still a programming student and learning how to create my own can only be a good thing!
Examples:
Here's an example of one of my route expressions:
<protocol (https?)>://<wildcard>.example.com/<controller>/{<lang=en (en|de|pl)>/}<name ([a-zA-Z0-9_-]{8})>
This could match either of these equally well:
http://www.example.com/test/en/hello_123
https://subdomain.example.com/another_test/hello_45
Returning me a nice, handy array like this (for the latter):
array(
'protocol' => 'http',
'wildcard' => 'subdomain',
'controller' => 'another_test',
'lang' => 'en',
'name' => "hello_45"
)
I can also include an array in the first place, with default values that would be overridden by the values found by the router. So, for example, I could leave out the <controller> variable and just write test instead, and then use the array, adding "controller"=>"test".
Here's the rules:
If there's no match, there's no match. Variables have to exist, and if they don't, the route is skipped. Goodbye. Optional sections don't have to exist, luckily.
Anything between <> is a variable. Escaped \<\> are ignored, even when between. The area matching in the URL should be saved to the result array, with the variable name as the key.
Curly braces {} mark a section as optional, and can never be inside a variable <>. Anything between them can be ignored in the target - however, if there is a default value specified for any variables in between, that variable must be added to the result array, using the name as the key, and with the default value as the value. Escaped braces are ignored.
A variable doesn't have to have a default value, but if you add one, it needs to be after an =, like <name=default>.
Regex rules can be added, separated by a space after the name or default value, and encased in brackets (). Escaped brackets are ignored, of course.
Lastly, you can just put Regex rules, in brackets, anywhere, if you don't mind matching anything and not getting a result. So, I could just replace <controller> with ([\/]+), but then I'd have to use the array to set a value for it instead.
What I've Tried:
I've been reading the source code of every Router I can find.
So far, I've done a couple of nasty little regular expressions, but I realised I was confused completely about how to conglomerate them and extend them.
This matches the brackets, ignoring escaped ones: {([^{\\]*(?:\\.[^}\\]*)*)}
This matches a variable, with or without the default value: <([^<\\]*(?:\\.[^>\\]*)*)(?:=?([^<>\\]*))>
This is a kind of unholy hell, the like of which made me write this post: <([^<\\]*(?:\\.[^>\\]*)*)(?:=?([^<>\\]*))(?: ?)(\([^{}<>\(\)\\]+\))?>
(It does, however, match the variables and the Regex sections.)
Can anybody give me any hints, or even example source code from libraries that offer similar functionality? And if this is really near impossible to code myself, is there a library good enough to use?
If you are trying to match the domain, this regex101 demo should match those portions with the individual sections named.
On the other hand, if you are trying to match the route expression, this other regex101 demo is able to parse the tokens you specified so far.
I may have missed some specifications, but you can always leave feedback and explain where it falls short (or even update the regex on that site itself and save a newer version).

MySQL: Remove part of string up to the third to last occurrence of forward slash

I've seen JavaScript parses for this, but not any suitable MySQL ... I have a column in my database that contains a string like this:
http://localhost/mysite/wp-content/uploads/2012/10/huge_eye-150x150.jpg
I need to be able to remove every part of that string except 2012/10/huge_eye-150x150.jpg
So I need to remove http://localhost/mysite/wp-content/uploads/ ... but keep in mind that not ALL of the rows will contain exactly http://localhost/mysite/wp-content/uploads/ ... some may contain a slightly different string because of a legacy system ... that's why I thought it might be most appropriate to find the third to last occurrence of /
Perhaps you have a better solution? Thank you!
Something like:
SUBSTRING_INDEX(url,'/',-3)

What is the proper New Line Character in Outlook Contact Export?

I have a CSV parser, that takes Outlook 2010 Contact Export .CSV file, and produces an array of values.
I break each row on the new line symbol, and each column on the comma. It works fine, until someone puts a new line inside a field (typically Address). This new line, which I assume is "\n" or "\r\n", explodes the row where it shouldn't, and the whole file becomes messed up from there on.
In my case, it happens when Business Street is written in two lines:
123 Apple Dr. Unit A
My code:
$file = file_get_contents("outlook.csv");
$rows = explode("\r\n",$file);
foreach($rows as $row)
{
$columns = explode(",",$row);
// Further manipulation here.
}
I have tried both "\n" and "\r\n", same result.
I figured I could calculate the number of columns in the first row (keys), and then find a way to not allow a new line until this many columns have been parsed, but it feels shady.
Is there another character for the new line that I can try, that would not be inside the data fields themselves?
The most common way of handling newlines in CSV files is to "quote" fields which contain significant characters such as newlines or commas. It may be worth looking into whether your CSV generator does this.
I recommend using PHP's fgetcsv() function, which is intended for this purpose. As you've discovered, splitting strings on commas works only in the most trivial cases.
In cases, where that doesn't work, a more sophisticated, reportedly RFC4180-compliant parser is available here.
I also recommend fgetcsv()
fgetcsv will also take care of commas inside strings ( between quotes ).
Interesting parsing tutorial
+1 to the previous answer ;)
PS: fgetcsv is a bit slower then opening the file and explode the contents etc. But imo it's worth it.

PHP: How are comments skipped?

Well if I comment something it's skipped in all languages, but how are they skipped and what is readed?
Example:
// This is commented out
Now does PHP reads the whole comment to go to next lines or just reads the //?
The script is parsed and split into tokens.
You can actually try this out yourself on any valid PHP source code using token_get_all(), it uses PHP's native tokenizer.
The example from the manual shows how a comment is dealt with:
<?php
$tokens = token_get_all('<?php echo; ?>'); /* => array(
array(T_OPEN_TAG, '<?php'),
array(T_ECHO, 'echo'),
';',
array(T_CLOSE_TAG, '?>') ); */
/* Note in the following example that the string is parsed as T_INLINE_HTML
rather than the otherwise expected T_COMMENT (T_ML_COMMENT in PHP <5).
This is because no open/close tags were used in the "code" provided.
This would be equivalent to putting a comment outside of <?php ?>
tags in a normal file. */
$tokens = token_get_all('/* comment */');
// => array(array(T_INLINE_HTML, '/* comment */'));
?>
There is a tokenization phase while compiling. During this phase, it see the // and then just ignores everything to the end of the line. Compilers CAN get complicated, but for the most part are pretty straight forward.
http://compilers.iecc.com/crenshaw/
Your question doesn't make sense. Having read the '//', it then has to keep reading to the newline to find it. There's no choice about this. There is no other way to find the newline.
Conceptually, compiling has several phases that are logically prior to parsing:
Scanning.
Screening.
Tokenization.
(1) basically means reading the file character by character from left to right.
(2) means throwing things away of no interest, e.g. collapsing multiple newline/whitespace sequences to a single space.
(3) means combining what's left into tokens, e.g. identifiers, keywords, literals, punctuation.
Comments are screened out during (2). In modern compilers this is all done at once by a deterministic automaton.

Categories