Can you explain Perl's hash system to a PHP guy? - php

How do Perl hashes work?
Are they like arrays in PHP or some completely different beast?
From what I understand all it is is an associative array right? This is what I thought until I began
to talk to a Perl programmer who told me I was completely wrong, but couldn't explain it in a way
that didn't make my eyes cross.
Anyway, the way that I thought it worked was like this
PHP's:
$argv['dog_name'] = 'missy';
$argv[0] = 'tree';
same as Perl's:
my %argv{'dog_name'} = 'missy';
my $argv[0] = 'tree';
Right? But you cannot print(%argv{'dog_name'}), you have to (revert?) to print($argv{'dog_name'}) which is confusing?
Is it trying to print as a variable now, like you would in PHP, echo $argv['dog_name']; ? Does this mean (again) that a hash is
just a PHP associative array with a % to declare but a $ to access?
I don't know, I'm hoping some PHP/Perl Guru can explain how hashes work, and how similar they are
to PHP's arrays.

To write
$argv['dog_name'] = 'missy';
$argv[0] = 'tree';
in Perl, you would write it as follows:
$argv{dog_name} = 'missy';
$argv{0} = 'tree';
if you had strict on, which you should, then you will need to predeclare the variable:
my %argv;
$argv{dog_name} = 'missy';
$argv{0} = 'tree';
If the above is a bit repetitive for you, you could write it:
my %argv = (
dog_name => 'missy',
0 => 'tree',
);
You can find more detail on the perldata manpage.
In short, the reasons why the sigils change from % to $ is that %hash refers to a plural hash (a list of key value pairs), and $hash{foo} refers to a single element of a hash. This is the same with arrays, where # refers to the full array, and $ refers to a single element. (for both arrays and hashes a leading # sigil with a subscript means a slice of the data, where multiple keys are passed and a list of values are returned)

To elaborate slightly on Ambrose's answer, the reason for your confusion is the difference between the philosophy of using sigils in Perl and PHP.
In PHP, the sigil is attached to the identifyer. E.g. a hash identifyer will ALWAYS have a hash sigil around it.
In Perl, a sigil is attached to the way you are accessing the data structure (are you accessing 1 value, a list of values, or a whole hash of values) - for details see other excellent answers such as Eric's.

%argv{'dog_name'} is a syntax error. You need $argv{'dog_name'} instead.
But you are correct that a perl hash is just an associative array (why perl chose to use a different terminology, I don't know).
For a complete understanding of hashes, I recommend reading any of the vast number of perl tutorials or books that cover the topic. Programming Perl is an excellent choice, or here's a random online tutorial I found as well.

I would, as Flimzy, also recommend Programming Perl. As a recent PHP to Perl convert myself, it has taught me a great deal about the language.
The % symbol is used to create a full 'associative array', as we would think of it. For example, I could create an associative array by doing the following:
%hash = ('key1' => 'value1', 'key2' => 'value2');
I could then print it out like so:
print %hash;
The output would be something like:
'key2value2key1value1'
This is, I believe, known as 'list context', since the % indicates that we are talking about a range of values.
On the other hand, if I wanted to access a single value, we would have to use the $ sigil. This, as 'Programming Perl' tells us, can be thought of as an 'S' for 'Scalar'. We have to use the $ sign whenever we are talking about a singular value.
So, to access an individual item in the array, I would have to use the following syntax:
print $hash{'key1'};
The same is true of arrays. A full array can be created like so:
#array = ('abc', '123');
and then printed like so:
print #array;
But, to access a single element of the array I would type instead:
print $array[0];
There are lots of basic principles here. You should read about 'list context' and 'scalar context' in some detail. Before long you will also want to look at references, which are the things you use to create multimensional structures in Perl. I really would recommend 'Programming Perl'! It was a difficult read in chapters, but it certainly does cover everything you need to know (and more).

The sigil changing really isn't as complicated as you make it sound. You already do this in English without thinking about it.
If you have a set of cars, then you would talk about "these cars" (or "those cars"). That's like an array.
my #cars = ('Vauxhall', 'Ford', 'Rolls Royce');
If you're talking about just one car from that set, you switch to using "this car". That's like a single element from an array.
say $car[1]; # prints 'Ford';
Similar rules also apply to hashes.

I would say your confusion is partly caused by one simple fact. Perl has different sigils for different things. PHP has one sigil for everything.
So whether you're putting something into an array/hash, or getting something out, or declaring a simple scalar variable, in PHP you always use the dollar sign.
With perl you need to be more specific, that's all.

The "sigil", i.e. the character before the variable name, denotes the amount of data being accessed, as follows:
If you say $hash{key}, you are using scalar context, i.e. one value.
For plural or list context, the sigil changes to #, therefore #hash{('key1', 'key2')} returns a list of two values associated with the two keys respectivelly (might be written as #hash{qw(key1 key2)}, too).
%hash is used to acces the hash as a whole.
The same applies to arrays: $arr[0] = 1, but #arr[1 .. 10] = (10) x 10.

I hope that you are not expecting to get a full tutorial regarding perl hashes here. You don't need a Perl guru to explain you hashes, just a simple google search.
http://www.perl.com/pub/2006/11/02/all-about-hashes.html
PS: please increase your accept ratio - 62% is pretty low

Related

Converting pseudocode into usable (using regular expression?)

As part of the system I am writing, users can create their own custom Rules, to be run when certain events happen.
There are a set number of Objects they can use to create these rules, all of which have a set number of properties and methods:
So as an example of a rule, we could say:
“if this unit award is ‘Distinction’ then set all the criteria on this unit to award ‘Achieved’”
IF UNIT.award equals “Distinction”
THEN UNIT.criteria.set_award(‘A’)
“else if this unit award is ‘Merit’ then set the award of any criteria on this unit whose name starts with either ‘P’ or ‘M’ to ‘Achieved’”
IF UNIT.award equals “Merit”
THEN UNIT.criteria.filter(‘starts’, ‘name’, ‘P’, ‘M’).set_award(‘A’)
“else if this unit award is ‘Pass then set the award of any criteria on this unit whose name starts with ‘P’ to ‘Achieved’”
IF UNIT.award equals “Merit”
THEN UNIT.criteria.filter(‘starts’, ‘name’, ‘P’).set_award(‘A’)
The problem I am having, is I am just not sure how to take that string of object, properties & methods, e.g. “UNIT.criteria.filter(‘starts’, ‘name’, ‘P’).set_award(‘A’)” and convert it into something usable.
The end result I’d like to convert the string to would be something along the lines of:
So I can then convert that into the actual proper objects and return the relevant values or run the relevant methods.
Since there is only a set number of things I need to support (for now at least) and I don’t need anything complex like calculation support or variables, it seems overkill to create a Lexer system, so I was thinking of just using a regular expression to split all the sections.
So using the examples above, I could do a simple split on the “.” character, but if that character is used in a method parameter, e.g. “CRITERION.filter(‘is’, ‘name’, ‘P.1’)” then that screws it up completely.
I could use a less common character to split them, for example a double colon or something “::” but if for whatever reason someone puts that into a parameter it will still cause the same problem. I’ve tried creating a regular expression that splits on the character, only if it’s not between quotes, but I haven’t been able to get it to work.
So basically my question is: would a regular expression be the best way to do this? (If so, could anyone help me with getting it to ignore the specified character if it’s in a method). Or is there another way I could do this that would be easier/better?
Thanks.
I'd think an ORM language like eloquent could do this for you.
But if I had to do this then first I'd split the IF THEN ELSE parts.
Leaving:
UNIT.award equals “Distinction”
UNIT.criteria.filter(‘starts’, ‘name’, ‘P’, ‘M’).set_award(‘A’)
I'm guessing the "equals" could also be "not equals" or "greater" so...
I'd split the first bit around that.
/(?'ident'[a-z.]*?) (?'expression'equals|greater) (?'compare'[0-9a-z\“\”]+)/gi
But an explode around 'equals' will do the same.
Then I'd explode the second part around the dots.
Giving:
UNIT
criteria
filter(a,b,c,d)
set_ward(e)
Pop off the first 2 to get object and property and then a list of possible filters and actions.
But frankly I'd would develop a language that would not mix properties with actions and filters.
Something like:
IF object.prop EQUALS const|var
THEN UPDATE object.prop
WITH const|var [WHERE object.prop filter const|var [AND|OR const|var]]
Eloquent does it straight in php:
DB::table('users')
->where('id', 1)
->update(['votes' => 1]);
So maybe I'd do something like:
THEN object.prop->filter(a,b,c,d)->set('award','A')
This makes it easy to split actions around -> and properties around .
Anyway...
I do my Regex on https://regex101.com/
Hope this helps.

PHP regex parsing - splitting tokens in my own language. Is there a better way?

I am creating my own language.
The goal is to "compile" it to PHP or Javascript, and, ultimately, to interpret and run it on the same language, to make it look like a "middle-level" language.
Right now, I'm focusing on the aspect of interpreting it in PHP and run it.
At the moment, I'm using regex to split the string and extract the multiple tokens.
This is the regex I have:
/\:((?:cons#(?:\d+(?:\.\d+)?|(?:"(?:(?:\\\\)+"|[^"]|(?:\r\n|\r|\n))*")))|(?:[a-z]+(?:#[a-z]+)?|\^?[\~\&](?:[a-z]+|\d+|\-1)))/g
This is quite hard to read and maintain, even though it works.
Is there a better way of doing this?
Here is an example of the code for my language:
:define:&0:factorial
:param:~0:static
:case
:lower#equal:cons#1
:case:end
:scope
:return:cons#1
:scope:end
:scope
:define:~0:static
:define:~1:static
:require:static
:call:static#sub:^~0:~1 :store:~0
:call:&-1:~0 :store:~1
:call:static#sum:^~0:~1 :store:~0
:return:~0
:scope:end
:define:end
This defines a recursive function to calculate the factorial (not so well written, that isn't important).
The goal is to get what is after the :, including the #. :static#sub is a whole token, saving it without the :.
Everything is the same, except for the token :cons, which can take a value after. The value is a numerical value (integer or float, called static or dynamic in the language, respectively) or a string, which must start and end with ", supporting escaping like \". Multi-line strings aren't supported.
Variables are the ones with ~0, using ^ before will get the value to the above :scope.
Functions are similar, being used &0 instead and &-1 points to the current function (no need for ^&-1 here).
Said this, Is there a better way to get the tokens?
Here you can see it in action: http://regex101.com/r/nF7oF9/2
[Update] To issue the pattern being complicated and maintainability, you can split it using PCRE_EXTENDED, and comments:
preg_match('/
# read constant (?)
\:((?:cons#(?:\d+(?:\.\d+)?|
# read a string (?)
(?:"(?:(?:\\\\)+"|[^"]|(?:\r\n|\r|\n))*")))|
# read an identifier (?)
(?:[a-z]+(?:#[a-z]+)?|
# read whatever
\^?[\~\&](?:[a-z]+|\d+|\-1)))
/gx
', $input)
Beware that all space are ignored, except under certain conditions (\n is normally "safe").
Now, if you want to pimp you lexer and parser, then read that:
What does (f)lex [GNU equivalent of LEX] is simply let you pass a list of regexp, and eventually a "group". You can also try ANTLR and PHP Target Runtime to get the work done.
As for you request, I've made a lexer in the past, following the principle of FLEX. The idea is to cycle through the regexp like FLEX does:
$regexp = [reg1 => STRING, reg2 => ID, reg3 => WS];
$input = ...;
$tokens = [];
while ($input) {
$best = null;
$k = null;
for ($regexp as $re => $kind) {
if (preg_match($re, $input, $match)) {
$best = $match[0];
$k = $kind;
break;
}
}
if (null === $best) {
throw new Exception("could not analyze input, invalid token");
}
$tokens[] = ['kind' => $kind, 'value' => $best];
$input = substr($input, strlen($best)); // move.
}
Since FLEX and Yacc/Bison integrates, the usual pattern is to read until next token (that is, they don't do a loop that read all input before parsing).
The $regexp array can be anything, I expected it to be a "regexp" => "kind" key/value, but you can also an array like that:
$regexp = [['reg' => '...', 'kind' => STRING], ...]
You can also enable/disable regexp using groups (like FLEX groups works): for example, consider the following code:
class Foobar {
const FOOBAR = "arg";
function x() {...}
}
There is no need to activate the string regexp until you need to read an expression (here, the expression is what come after the "="). And there is no need to activate the class identifier when you are actually in a class.
FLEX's group permits to read comments, using a first regexp, activating some group that would ignore other regexp, until some matches is done (like "*/").
Note that this approach is a naïve approach: a lexer like FLEX will actually generate an automaton, which use different state to represent your need (the regexp is itself an automaton).
This use an algorithm of packed indexes or something alike (I used the naïve "for each" because I did not understand the algorithm enough) which is memory and speed efficient.
As I said, it was something I made in the past - something like 6/7 years ago.
It was on Windows.
It was not particularly quick (well it is O(N²) because of the two loops).
I think also that PHP was compiling the regexp each times. Now that I do Java, I use the Pattern implementation which compile the regexp once, and let you reuse it. I don't know PHP does the same by first looking into a regexp cache if there was already a compiled regexp.
I was using preg_match with an offset, to avoid doing the substr($input, ...) at the end.
You should try to use the ANTLR3 PHP Code Generation Target, since the ANTLR grammar editor is pretty easy to use, and you will have a really more readable/maintainable code :)

Array merging: Keeping literal variables

Let me just jump directly into my problem.
The Build-up
I have some simple language files which just return an associative array containing the language strings (This is Laravel if that's of any help). Don't worry about the variable, that's just for demonstrational purposes.
lang/en/common.php:
<?php return [
"yes"=>"Yes",
"no"=>"No",
"hello"=>"Hello {$name}!",
"newstring"=>"This string does not exist in the other language file",
"random_number"=>"Random number: ".rand(1,10)
];
lang/da/common.php:
<?php return [
"yes"=>"Ja",
"no"=>"Nej",
"hello"=>"Hej {$name}!"
];
Now, as you can see, the index newstring doesn't exist in the danish language file. Instead of having to remember to add all the indexes all the language files instead of just one, I wrote a script, which basically does this:
$base_lang = require('lang/en/common.php');
$language_to_merge = require('lang/da/common.php');
$merged_lang = array_replace_recursive($base_lang, $language_to_merge);
file_put_contents('lang/da/common.php', var_export($merged_lang, true));
The problem
So far, so good. Now, lets say that $name = "John Doe";. By the very nature of PHP, after running this script the lang/da/common.php will now be
<?php return [
"yes"=>"Ja",
"no"=>"Nej",
"hello"=>"Hej John Doe!",
"newstring"=>"This string does not exist in the other language file",
"random_number"=>"Random number: 4"
];
As you might have guessed, the unwanted result is in the hello and random_number-indexes. Preferably it should still be "hello"=>"Hej {$name}!" and "random_number"=>"Random number: ".rand(1,10), but obviously that ain't happening due to PHP parsing the array values, which basically tells me that this is the wrong strategy.
Wanted result:
<?php return [
"yes"=>"Ja",
"no"=>"Nej",
"hello"=>"Hej {$name}!",
"newstring"=>"This string does not exist in the other language file",
"random_number"=>"Random number: ".rand(1,10)
];
The "how the h.. do I do that?"
Any idea how to get around this? I could file_get_contents() and do some regex, but i worry that there's too many error sources involved in that.
Thanks in advance!
EDIT
Some people have recommended using single quotes. While that actually answers my question, I became aware that I wasn't precise enough; when the Language class treats the files, I want the values to be parsed (the normal behaviour) - but only when I run my merge-script I want the actual literal variable references to remain intact.
EDIT 2 - Temporary workaround
Until I find a proper solution for this, I'm just running through the arrays of the base language, checking for key existences in the language I'm trying to fill with the missing keys - and appending a comment to the bottom of those files.
Edit: See OP's comment. This is true, but not the answer to his question :)
You should use single quotes instead of double quotes:
<?php return [
"yes"=>'Yes',
"no"=>'No',
"hello"=>'Hello {$name}!'
"newstring"=>'This string does not exist in the other language file'
];
PHP only parses variables inside double-quotes.

Echoing variable construct instead of content

I'm merging two different versions of a translations array. The more recent version has a lot of changes, over several thousand lines of code.
To do this, I loaded and evaluated the new file (which uses the same structure and key names), then loaded and evaluated the older version, overwriting the new untranslated values in the array with the values we already have translated.
So far so good!
However, I want to be able to echo out the constructor for this new merged array so I can cut and paste it into the new translation file, and have job done (apart from completing the rest of the translations..).
The code looks like this (lots of different keys, not just index):
$lang["index"]["chart1_label1"] = "Subscribed";
$lang["index"]["chart1_label2"] = "Unsubscribed";
And the old..
$lang["index"]["chart1_label1"] = "Subscrito";
$lang["index"]["chart1_label2"] = "Não subscrito";
After loading the two files, I end up with a merged $lang array, which I then want to echo out in the same form, so it can be used by the project.
However, when I do something like this..
foreach ($lang as $key => $value) {
if (is_array($value)) {
foreach ($value as $key2 => $value2) {
echo "$lang['".$key."']"; // ... etc etc
}
}
}
..obviously I get "ArrayIndex" etc etc instead of "$lang". How to echo out $lang without it being evaluated..? Once this is working, can add in the rest of the brackets etc (I realise they are missing), but just want to make this part work first.
If there's a better way to do this, all ears too!
Thanks.
edit:
What I was really looking for was this:
"Note: Unlike the three other syntaxes, variables and escape sequences for special characters will not be expanded when they occur in single quoted strings."
So no need even to escape. May come in handy one day for someone... about merging the arrays, there's a great recursive array merging function on the PHP manual page for array_merge(), which also came in handy.
Thanks to whoever downvoted! Love you too.
Either escape the $: "\$lang" or use single quotes: '$lang'
Note that you really don't need to post a whole page of backstory, if your entire question just boils down to "how do I output a dollar sign in PHP"
there is var_export() function, but I do not understand the purpose of this terrible mess.
It's all too... manual
why not to use a gettext - an industry standard for multi-language site?
or at least some other format, more reliable than plain PHP one?
Programming code is not intended to be written automatically.

Problem with PHP Array

Why is it not possible to do something equivalent to this in PHP:
(Array(0))[0];
This is just for sake of argument, but it seems strange it does not allow access of anonymous objects. I would have to do something like the following:
$array = Array(0);
$array[0];
Any ideas why this is the behavior of PHP?
I read something somewhat detailed about this once and I regret not bookmarking it because it was quite insightful. However, it's something along the lines of
"Because the array does not exist in memory until the current statement (line) executes in full (a semicolon is reached)"
So, basically, you're only defining the array - it's not actually created and readable/accessible until the next line.
I hope this somewhat accurately sums up what I only vaguely remember reading many months ago.
This language feature hasn’t been inplemented yet but will come in PHP 6.
I guess the short answer is: nobody has coded it yet. I've used (and loved) that syntax in both Python and Javascript, but still we wait for PHP.
The main reason is because unlike some languages like Python and JavaScript, Array() (or in fact array()) is not an object, but an language construct which creates an inbuilt data type.
Inbuilt datatypes themselves aren't objects either, and the array() construct doesn't return a reference to the "object" but the actual value itself when can then be assigned to a variable.

Categories