Associative Array : PHP/C vs Flex/Flash - php

In PHP an Associative Array keeps its order.
// this will keep its order in PHP
a['kiwis']
a['bananas']
a['potatoes']
a['peaches']
However in Flex it doesn't with a perfectly valid explanation. I really can't remember how C treats this problem but I am more leaned to believe it works like php as the Array has it's space pre-reserved in memory and we can just walk the memory. Am I right?
The real question here is why. Why does C/PHP interpretation of this varies from Flash/Flex and what is the main reason Adobe has made Flash work this way.
Thank you.

There isn't a C implementation, you roll your own as needed, or choose from a pre-existing one. As such, a given C implementation may be ordered or unordered.
As to why, the reason is that the advantages are different. Ordered allows you (obviously enough) to depend on that ordering. However, it's wasteful when you don't need that ordering.
Different people will consider the advantage of ordering more or less important than the advantage of not ordering.
The greatest flexibility comes from not ordering, as if you also have some sort of ordered structure (list, linked list, vector would all do) then you can easily create an ordered hashmap out of that (not the optimal solution, but it is easy, so you can't complain you didn't have one given to you). This makes it the obvious choice in something intended, from the early in its design, to be general purpose.
On the other hand, the disadvantage of ordering is generally only in terms of performance, so it's the obvious choice for something intended to give relatively wide-ranging support with a small number of types for a new developer to learn.
The march of history sometimes makes these decisions optimal and sometimes sub-optimal, in ways that no developer can really plan for.

For PHP arrays: These beasts are unique constructs and are somehow complicated, an overview is given in a slashdot response from Kendall Hopkins (scroll down to his answer):
Ken: The PHP array is a chained hash table (lookup of O(c) and O(n) on key collisions)
that allows for int and string keys. It uses 2 different hashing algorithms
to fit the two types into the same hash key space. Also each value stored in
the hash is linked to the value stored before it and the value stored after
(linked list). It also has a temporary pointer which is used to hold the
current item so the hash can be iterated.
In C/C++, there is, as has been said, no "associative array" in the core lanuage. It has a map (ordered) in the STL, as will be in the new standard library (hash_map, unordered_map) and there was a gnu_hash_map (unordered) on some implementations (which was very good imho).
Furthermore, the "order" of elements in an "ordered" C/C++ map is usually not the "insertion order" (as in PHP), it's the "key sort order" or "string hash value sort order".
To answer your question: your view of equivalence of PHP and C/C++ associative arrays does not hold, in PHP, they made a design decision in order to provide maximum comfort under a single interface (and failed or succeeded, whatever). In C/C++, there are many different implementations (with advantages and tradeoffs) available.
Regards
rbo

Related

What is the shortest way to represent all php types in string?

I'm writing a universal system that will hopefully one day apply to medicine, etc. (i.e. it's "scientific").
I figure the best way to go about this is to represent all data in php with string (true would be "true", false would be "false", so on and so forth). The reason for this is that there is exactly one string representation of any value in php (e.g. php code itself).
I am posting this question in an attempt to accelerate the design process of this program.
Some values are easily translated to string: numbers, booleans, etc.
Some are not: objects, arrays, resources.
I figured the format for transmitting objects and arrays is basically json, but I'm not sure if this is a tight fit. It's better than what I currently have (nothing), but, at least at some point, I would like to refine this to a point.
Any ideas?
I'm writing a universal system
This is an ambitious goal indeed; so ambitious as to be foolish to attempt.
Now, probably you don't really mean "can do absolutely anything for anyone", but it's relevant to your question that you don't place any limits on what you're trying to represent. That's making your search for a serialization format unnecessarily difficult.
For instance, you mention resources, which PHP uses for things like database connections, open file handles, etc. They are transient pointers to something that exists briefly and then is gone, and serializing them is not only unsupported by PHP, it's close to meaningless.
Instead of trying to cover "everything", you need to think about what types of data you actually need to handle. Maybe you'll mostly be working with classes defined within the system, so you can define whatever format for those you want. Maybe you want to work with arbitrary bags of key-value pairs, in the form of PHP arrays. You might want to leave the way open for future expansion, but that's about flexibility in the format, not having a specific answer right now.
From there, you can look for what properties you want, and shop around:
JSON is a hugely popular "lowest-common denominator" format. Its main downside is it has no representation of specific custom types, everything has to be composed of lists and key-value pairs (I like to say "JSON has no class").
XML is less fashionable than it used to be, but very powerful for defining custom languages and types. Its quite verbose, but compresses well - a lot of modern file formats are actually zip archives containing compressed XML files.
PHP serialization format is really only intended for short-term, in-application purposes, like cache data. Its fairly concise, and closely tied to PHP's type system, but has security problems if users have influence over the data, as noted on the unserialize manual page.
There are even more concise formats that don't limit themselves to human-readable representations, if that was a relevant factor for you.
Obviously, the list is endless...
I've programmed a solution to this problem. It's a simple class that converts string to int | float | bool | null | string. The idea is that
any value that is not a relativistic value (e.g. an array, something that simply holds other values) is represented by a single string. The implications are broad, I'll do my best to simplify.
Imagine you're making a website, which is basically (and in fact) made of webpages. If a webpage consists of inputs (typically GET and POST form data), and those inputs are string (GET and POST elements are string), all that stands between us and raw php is interpretation of said string.
Or think of it this way: if you model the total potential of php in string, it may not be ultimately how you do things but it works, right here right now. What THAT means is that we can implement it immediately.
The rest of it is left blank, as that is what I mean by "relativistic".
Now, ok, just to cap it all off, if you think about what this implies in form, in the actual php code itself, everything is, at a point at which there is exactly one string per one "non-relativistic" value.
So basically what you have is a bunch of php. The idea is is designed to be semantically AND syntactically as simple and functional as possible (or, at least, completely factorialized). So basically we have one way to represent any potential data in php.
Anyways, you can find it here: https://github.com/cinder-brent/Cinder
Cheers:)
-- edit --
Lo' and behold, I moved the project. It is now at https://github.com/cinder-brent/Leaf

processing in php vs c++

I need to design a function which uses hashtable. It basically inserts data into hashtable and search for items. Typically the function will take 15sec to 10min for execution. Should I implement this function in c++ and use a system call in PHP or should I implement it in php using associative arrays. Which would be more efficient. What are the advantage and disadvantage of each.
The key will be a string. The value will be one structure which contains two other structures.
The first structure basically contains an array of integers and the second will contain an array of integer pair values
Apparently, PHP arrays are implemented as a linked hash table. See How is the PHP array implemented on the C level?.
In any case, for 300 items there would probably be little speed difference in the type of container you used. I would stay in PHP if possible for simplicity.
PHP is well known for its fast associative array implementation, but according to my experiences, C++ is still faster. A few months ago I needed to implement fast prefix matching, there were thousands of prefixes in hash table and millions of strings to be matched. I made both, PHP and C++ implementations, and as I remember C++ was more than 10 times faster and consumed much less memory. But of course, it heavily depends also on your algorithm, not only on hash table implementation.

What is the best strategy to compare two Paragarphs in PHP & MySQL?

I have already Developed a Typing Software to capture Text Typed by candidates in my institutes using PHP & MySQL. In the continuation process, I am stuck with a strategic issue as to how should I compare the Similarity of Texts typed by the Candidates with the Standard Paragraph which I had given them to Type(in the form of Hard Copy, though the same copy is also stored in the MySQL database). My dilemma is that, whether I would use the Levensthein Distance Algorithm in PHP or in MySQL directly itself so that the performance issue is optimized. Actually. I am afraid if Programming in PHP would come out erroneous while evaluating the Texts. It is worthwhile to mention here that the Texts would be compared to get the rank on the basis of Words Typed Per Minute.
The simplest solution would be to utilize PHP's built-in levenshteindocs function to compare the two blocks of text. If you wanted to back the processing off to the MySQL database, you could implement the solution listed in Levenshtein: MySQL + PHPStackOverflow
Another PHP option might be the similar_textdocs function.
The unfortunate drawback for the PHP levenshtein function is that it cannot handle strings longer than 255 characters. As per the php manual docs:
This function returns the Levenshtein-Distance between the two
argument strings or -1, if one of the argument strings is longer than
the limit of 255 characters.
So, if your paragraphs are longer than that you may be forced to implement a MySQL solution, though. I suppose you could break the paragraphs up into 255-character blocks for comparison (though I can't say definitively that this won't "break" the levenshtein algorithm).
I'm not an expert in linguistics parsing and processing, so I can't speak to whether these are the best solutions (as you mention in your question). They are, however, very straightforward and simple to implement.

Memcached scaling: key "grouping"

As it is best practice to group related keys that are frequently retrieved together (using multiGet) on a single server for optimum performance, I have a couple questions regarding the implicit mechanics employed by the client functions built for doing this.
I have seen two different approaches for serving what I assume is the same purpose using libmemcache (php-memcached specifically). The first and most obvious approach is to use getByKey/setByKey to map keys to servers and the second is to use the option OPT_PREFIX_KEY (there is a simple example posted in the php documentation under memcached::_construct), which according to the documentation is "used to create a 'domain' for your item keys". The caveat of the second approach is that it can only be set on a per-instance basis, which may or may not be a good thing.
So unless I am completely mistaken, and these two approaches don't actually serve the same purpose; is that any clear benefit for going with approach over the other?
And while I'm on this topic my other question would be: What are the implications, if any, to mapping keys to servers in a consistently hashed scenario? I'm assuming that if a node were to fail, the freeform key would simply be remapped to a new server without any issue..
Thanks!
If these keys are really almost always retrieved together you probably want to cache them together in a single key/value pair, for example by sorting and concatenating keys and storing values serialized as a dictionary in JSON or similar format.
Returning to your question:
OPT_PREFIX_KEY has almost nothing to do with grouping values by key, it just prefixes all keys used by this particular client, so "1" becomes "foo1" and is distributed by consistent hashing using this new value, without any grouping by "foo".
getByKey/setByKey does the closest thing to what you want, since it can pass different keys to libketama (used to choose server) and memcached server. If you specify same first key and different second keys - they will end up on same memcached server, but won't overwrite each other.
Premature optimization is the root of all evil

Why would I ever use a DoublyLinkedList in PHP?

I've recently come across some of the PHP-SPL data structures and the I've been looking over the first one, the doubly linked list. I've a rough idea what a linked list is and now I can see what a doubly linked list is but my question is: What in the world would I do with this?
I seems like it would be just as easy to use an array. Can some Computer Science type enlighten me?
Unlike a singly-linked list, a doubly linked list can walk the list in either direction, and do object insertion and deletion in the middle of the list in O(1) (provided you already has access to spot in the list where it's going to happen, unlike a singly linked list. That said, doubly linked lists are inferior in other ways and are defiantly not something you'll come across that often in practice.
Choosing an appropriate data structure is not necessarily about what is easy for you, but what uses less memory and is faster for the machine. In the case of a doubly linked list, it would be useful whenever you need to iterate in either direction, insert anywhere in constant speed, but don't need random access.
Now given that in PHP you are usually working with small datasets, you don't have to worry very much about that sort of thing. And if you are working with large datasets, you may be better off writing the code in C. So it's unlikely that you'll ever benefit enough from such structures in PHP to ever need to use them.
But there could be that "in between" area where using one of the Spl data structures does lower the memory usage sufficiently enough to be worthy of use. (I did a simple test, and 1M integers in an array takes 200MB. The double linked list takes 150MB. Time to iterate over them was very comparable.)
IMHO, the chances of coming across something like this in the wild are unlikely, unless you're working for a company like Google or Facebook, where they're dealing with insane amounts of data and have a need to optimize list traversal to allow for node removal and addition. As a rule of thumb, if your application is that slow, you're most likely doing something wrong elsewhere (I know that's not your question, but I thought I'd just throw that in ;)).
For small to medium sized sites with small to medium sized data requirements, I'd say that an array would suffice (not to mention more readable and understandable by the average web developer ;)).

Categories