There is a group of simple formulas for calculating some values.
I need to implement this for the web interface (I make this in PHP).
To store formulas I am using simple format like this: "X1+X2+X3". When I need to make calculations, I call function preg_replace in the loop for replacing X1, X2 .. by real data (entered by user or saved earlier - it is not important)
Then I use function eval('$calculation_result ='. $trans_formula .';') where $trans_formula stores text of the formula with substituted data.
This mechanism looks like a primitive and I have a feeling that I'm trying to re-invent the wheel. Perhaps there are some ready algorithms, techniques, methods to accomplish this? Not necessary PHP code. I’ll appreciate even simple algorithm description.
The first thought that hit me: eval is bad!
How I would approach this problem:
1. I would store the formalue in postfix (polish notation)
2. Then I'd write a simple program to evaluate the expression. Its fairly easy to write a postfix evaluator.
This approach will also allow you to check things like value data types and range contraints, if need be. Also eliminates the huge risk of eval.
Cheers!
EDIT in response to your comment to the question:
If your users will be entering their own expressions, you will want to convert them to postfix too. Check out infix to postfix conversion.
Take a look at the evalMath class on PHPClasses.
If the formulas are predeterminated, as I suppose reading your question, it is non useful (better, it is dangerous) use the eval to evaluate them.
Create simple function and call them passing the appropriate parameters (after input checking).
For example, your example will be:
<?php
function sumOfThree($x1, $x2, $x3) {
return $x1+$x2+$x3;
}
// and you can call it as usual:
$calculation_result = sumOfThree($first, $second, $third);
?>
You will get a lot of plus in
speed: eval is very slow to execute (even for easy functions);
debugging: you can debug you function (and get correct error messages);
security: Eval is easily exploitable.
Related
I'm writing a universal system that will hopefully one day apply to medicine, etc. (i.e. it's "scientific").
I figure the best way to go about this is to represent all data in php with string (true would be "true", false would be "false", so on and so forth). The reason for this is that there is exactly one string representation of any value in php (e.g. php code itself).
I am posting this question in an attempt to accelerate the design process of this program.
Some values are easily translated to string: numbers, booleans, etc.
Some are not: objects, arrays, resources.
I figured the format for transmitting objects and arrays is basically json, but I'm not sure if this is a tight fit. It's better than what I currently have (nothing), but, at least at some point, I would like to refine this to a point.
Any ideas?
I'm writing a universal system
This is an ambitious goal indeed; so ambitious as to be foolish to attempt.
Now, probably you don't really mean "can do absolutely anything for anyone", but it's relevant to your question that you don't place any limits on what you're trying to represent. That's making your search for a serialization format unnecessarily difficult.
For instance, you mention resources, which PHP uses for things like database connections, open file handles, etc. They are transient pointers to something that exists briefly and then is gone, and serializing them is not only unsupported by PHP, it's close to meaningless.
Instead of trying to cover "everything", you need to think about what types of data you actually need to handle. Maybe you'll mostly be working with classes defined within the system, so you can define whatever format for those you want. Maybe you want to work with arbitrary bags of key-value pairs, in the form of PHP arrays. You might want to leave the way open for future expansion, but that's about flexibility in the format, not having a specific answer right now.
From there, you can look for what properties you want, and shop around:
JSON is a hugely popular "lowest-common denominator" format. Its main downside is it has no representation of specific custom types, everything has to be composed of lists and key-value pairs (I like to say "JSON has no class").
XML is less fashionable than it used to be, but very powerful for defining custom languages and types. Its quite verbose, but compresses well - a lot of modern file formats are actually zip archives containing compressed XML files.
PHP serialization format is really only intended for short-term, in-application purposes, like cache data. Its fairly concise, and closely tied to PHP's type system, but has security problems if users have influence over the data, as noted on the unserialize manual page.
There are even more concise formats that don't limit themselves to human-readable representations, if that was a relevant factor for you.
Obviously, the list is endless...
I've programmed a solution to this problem. It's a simple class that converts string to int | float | bool | null | string. The idea is that
any value that is not a relativistic value (e.g. an array, something that simply holds other values) is represented by a single string. The implications are broad, I'll do my best to simplify.
Imagine you're making a website, which is basically (and in fact) made of webpages. If a webpage consists of inputs (typically GET and POST form data), and those inputs are string (GET and POST elements are string), all that stands between us and raw php is interpretation of said string.
Or think of it this way: if you model the total potential of php in string, it may not be ultimately how you do things but it works, right here right now. What THAT means is that we can implement it immediately.
The rest of it is left blank, as that is what I mean by "relativistic".
Now, ok, just to cap it all off, if you think about what this implies in form, in the actual php code itself, everything is, at a point at which there is exactly one string per one "non-relativistic" value.
So basically what you have is a bunch of php. The idea is is designed to be semantically AND syntactically as simple and functional as possible (or, at least, completely factorialized). So basically we have one way to represent any potential data in php.
Anyways, you can find it here: https://github.com/cinder-brent/Cinder
Cheers:)
-- edit --
Lo' and behold, I moved the project. It is now at https://github.com/cinder-brent/Leaf
I have to following problem. I find it difficult to explain, as I am a hobby coder. Please forgive me if commit any major fauxpas:
I am working on a baseball database that deals with baseball specfic and non specific metrics/stats. Most of the date output is pretty simple when the metrics are cumulative or when I only want to display the dataset of one day that has been entered manually or imported from a csv. (All percentages are calculated and fed into the db as numbers)
For example, if I put in the stats of
{ Hits:1,
Walks:2,
AB:2,
Bavg:0.500 }
for day one, and
{ Hits:2,
Walks:2,
AB:6,
Bavg:0.333 }
and then try to get the totals, Hits, Walks and ABs are simple: SUM. But Bavg has to be a formula (Hits/AB). Other (non baseball specific) metrics, like vertical jump or 60 yard times are pretty straight forward too: MAX or MIN.
The user should be able to add his own metrics. So he has to be able to input a formula for the calculation. This calculation is stored in the database table as a column next to the metric name, id, type (cumulative, max, min, calculated).
The php script that produces the html table is setup to where it is dynamic to what ever metrics and however many metrics the query sends (the metrics can be part of several categories).
In the end result, I want to replace all values of metrics of the calculated types with their formula.
My approach is to get the formula from the mysql table as a string. Then, in php, convert the string that could be Strikes/Pitches*100 into $strikes/$pitches*100 - assuming that is something I could put into an php sql query. However, before it is put into the $strikes/$pitches*100 format, I need to have those variables available to define them. That I'm sure I can do, but I'll cross that bridge when I get there.
Could you point me in the right direction of either how to accomplish that or tell where or what to search for? I'm sure this has been done before somewhere...
I highly appreciate any help!
Clemens
The correct solution has already been given by Vilx-. So I will give you a not-so-correct, dirty solution.
As the correct solution states, eval is evil. But, it is also easy and powerful (as evil often is -- but I'll spare you my "hhh join the Dark Side, Luke hhh" spiel).
And since what you need to do is a very small and simple subset of SQL, you actually can use eval() - or even better its SQL equivalent, plugging user supplied code into a SQL query - as long as you do it safely; and with small requirements, this is possible.
(In the general case it absolutely is not. So keep it in mind - this solution is easy, quick, but does not scale. If the program grows beyond a certain complexity, you'll have to adopt Vilx-'s solution anyway).
You can verify the user-supplied string to ensure that, while it might not be syntactically or logically correct, at least it won't execute arbitrary code.
This is okay:
SELECT SUM(pitch)+AVG(runs)-5*(MIN(balls)) /* or whatever */
and this, while wrong, is harmless too:
SELECT SUM(pitch +
but this absolutely is not (mandatory XKCD reference):
SELECT "Robert'); DROP TABLE Students;--
and this is even worse, since the above would not work on a standard MySQL (that doesn't allow multiple statements by default), while this would:
SELECT SLEEP(3600)
So how do we tell the harmless from the harmful? We start by defining placeholder variables that you can use in your formula. Let us say they will always be in the form {name}. So, get those - which we know to be safe - out of the formula:
$verify = preg_replace('#{[a-z]+}#', '', $formula);
Then, arithmetic operators are also removed; they are safe too.
$verify = preg_replace('#[+*/-]#', '', $verify);
Then numbers and things that look like numbers:
$verify = preg_replace('#[0-9.]+#', '', $verify);
Finally a certain number of functions you trust. The arguments of these functions may have been variables or combinations of variables, and therefore they've been deleted and the function has now no arguments - say, SUM() - or you had nested functions or nested parentheses, like SUM (SUM( ()())).
You keep replacing () (with any spaces inside) with a single space until the replacement no longer finds anything:
for ($old = ''; $old !== $verify; $verify = preg_replace('#\\s*\\(\\s*\\)\\s*#', ' ', $verify)) {
$old = $verify;
}
Now you remove from the result the occurrences of any function you trust, as an entire word:
for ($old = ''; $old !== $verify; $verify = preg_replace('#\\b(SUM|AVG|MIN|MAX)\\b#', ' ', $verify)) {
$old = $verify;
}
The last two steps have to be merged because you might have both nested parentheses and functions, interfering with one another:
for ($old = ''; $old !== $verify; $verify = preg_replace('#\\s*(\\b(SUM|AVG|MIN|MAX)\\b|\\(\\s*\\))\\s*#', ' ', $verify)) {
$old = $verify;
}
And at this point, if you are left with nothing, it means the original string was harmless (at worst it could have triggered a division by 0, or a SQL exception if it was syntactically wrong). If instead you're left with something, the formula is rejected and never saved in the database.
When you have a valid formula, you can replace variables using preg_replace_callback() so that they become numbers (or names of columns). You're left with what is either valid, innocuous SQL code, or incorrect SQL code. You can plug this directly into the query, after wrapping it in try/catch to intercept any PDOException or division by zero.
I'll assume that the requirement is indeed to allow the user to enter arbitrary formulas. As noted in the comments, this is indeed no small task, so if you can settle for something less, I'd advise doing so. But, assuming that nothing less will do, let's see what can be done.
The simplest idea is, of course, to use PHP's eval() function. You can execute arbitrary PHP code from a string. All you need to do is to set up all necessary variables beforehand and grab the return value.
It does have drawbacks however. The biggest one is security. You're essentially executing arbitrary user-supplied code on your server. And it can do ANYTHING that your own code can. Access files, use your database connections, change variables, whatever. Unless you completely trust your users, this is a security disaster.
Also syntax or runtime errors can throw off the rest of your script. And eval() is pretty slow too, since it has to parse the code every time. Maybe not a big deal in your particular case, but worth keeping an eye on.
All in all, in every language that has an eval() function, it is almost universally considered evil and to be avoided at all costs.
So what's the alternative? Well, a dedicated formula parser/executor would be nice. I've written one a few times, but it's far from trivial. The job is easier if the formula is written in Polish notation or Reverse Polish Notation, but those are a pain to write unless you've practiced. For normal formulas, take a look at the Shunting Yard Algorithm. It's straightforward enough and can be easily adapted to functions and whatnot. But it's still fairly tedious.
So, unless you want to do it as a fun challenge, look for a library that has already done it. There seem to be a bunch of them out there. Search for something along the lines of "arithmetic expression parser library php".
I have a group of objects I need to serialize, but the class names are long, for example:
"\Namespace1\Subnamespace\dataobjectA"
"\Namespace1\Subnamespace\dataobjectB"
"\Namespace1\Subnamespace\dataobjectC"
"\Namespace1\Subnamespace\dataobjectD"
using the serialize function on the objects, I get:
"O:41:\"Namespace1\Subnamespace\dataobjectC\":1:{s:4:"data";s:9:"some data";}"
the serialization string contains the full class name which is sometimes bigger than the data :)
I'm already familiar with __sleep() ans __wakeup() functions, not useful here.
I understand that some king of lookup table required
My question is:
Is there a simple PHP way to minimize the class name in serialization
Any suggestion are welcome
If you are worried about the length of your data you may compress it with some good compression function.
This is a working example:
class Tester {
public $name;
public $age;
}
$a = new Tester();
$a->name = "Harald the old Capttttttttttttain is going to live very long.";
$a->age = 999999999;
$ser = serialize($a);
var_dump($ser);
$comp = gzcompress($ser,9);
var_dump($comp);
Result:
string(133) "O:6:"Tester":2:{s:4:"name";s:75:"Harald the old
Capttttttttttttain is going to live very
long.";s:3:"age";i:999999999;}"
string(108) "x��2�R
I-.I-R�2��.�2�R�K�MU�.�27�R�H,J�IQ(�HU��Ή%H
13Od+��g�+��+�d����U���끌4�RJL�ie ֵ�Һ(�"
Of course the latter is not human readable anymore and will be useless in database searches but it is way shorter.
There are different compression mechanisms for PHP bzcompress (http://php.net/manual/en/function.bzcompress.php) may be better than gzcompress.
I have a good answer, a bad answer, and then an answer that addresses your question.
Good Answer
If I can take this somewhere else entirely: you probably don't really want to do this at all. You mention that the class names are sometimes longer than the actual data. If that is the case, then overall you have almost no data in your serialization. Unless you have some ridiculously long namespaces/class names (in which case you might want to reconsider your application structure), I imagine your serialized strings will very easily fit into, for instance, a MySQL text field. The point is, if you only have a little bit of data, I really doubt it is worth the effort to muck with a standard format to trim off what amounts to less than a kilobyte of data. Any reasonable database and server will be be able to handle these things without trouble, even if you have millions and millions of such records. So unless this is some kind of low-memory embedded device, I would be curious to hear why you think you need to do this (of course that is rhetorical: I seriously doubt you are running PHP on an embedded device).
If you do try to do something like this, you're going to add code that you are going to have to maintain that everyone will look at after you and say "what in the world is going on here?". It does depend on your need, but I'm suspicious that you are more likely to introduce problems via the code that will make this feature happen, than you will by simply letting your serialized data be long.
Bad Answer
I really don't think you want to make any changes to the serialized data. To answer one of your questions directly: no, there is no way to shorten the namespace and still use the unserialize() method of PHP, except by ditching namespaces altogether in your application. I really doubt you want to do that.
Your other option is to manually adjust the serialized string yourself. You could then store this "modified serialized format" (let's call it modSerialized). Then, when you need to unserialize, you have to reverse your modSerialized function and can run unserialize() normally. The trouble with this is that the output of PHP's serialize method represents a standard and well-established encoding. Modifying it is going to be inherently error-prone and, by definition, will go against standard best-practices. You can do this without errors if you are very careful and write lots of code, which again, I don't think you want to do. For instance you could imagine finding and replacing \Namespace1\Subnamespace\dataobjectA with gibberish, because you want to make sure you don't accidentally replace it with something that is actually found in your string. You then have to remember what gibberish you put in, and what it represents, so you can reverse it later. If you manage to do that successfully, then good new! You just re-invented the wheel and have an ad-hoc compression algorithm built!
So really, you don't want to do that either. Or if you do, just take the answer #Blackbam gave and compress your data with a normal compression algorithm. That would be less weird.
Another option
Finally, if you don't like any of the above suggestions, then there is one more: ditch the PHP serialize() all together. Obviously it is not a good fit for your needs. As a result, it is better to come up with a solution that is a good fit for your needs then to try to modify a well-established standard to fit your problem. Going down that route will give you a chimera that doesn't work for anyone.
What would this look like? It depends on your problem. For instance, you could establish some abbreviations for the class names in your system. Then when it comes time to serialize you could make an array that contains the abbreviated class name, and a string representation of the objects data that needs to be persisted so that it can be rebuilt. Then find some encoding for that: it could just be JSON, or even php serialize, or some other format. Then, manually build your own unserialize() method to reconstruct the object from your own serialization representation. Basically, make your own serialize() and unserialize().
I have got a couple of functions that manipulate data in an array, for example unset_data() you pass it the value and an unlimited amount of string args like:
unset_data( $my_array, "firstname", "password" );
and it can handle multi-dimentional arrays etc, quite simple.
But should this function use a reference to the array and change it directly?
Or should i return the new array with the values unset.
I can never decide whether a function should use reference or not,
Is there like, specific cases or examples when and when to not use them??
Thanks
I'd ask myself what the expected use case of the function is. Does the typical use case involve keeping the original data intact and deriving new data from it, or is the explicit use case of this function to modify data in place?
Say md5 would modify data in place, that would be pretty inconvenient, since I usually want to keep the original data intact. So I'd always have to do this:
$hash = $data;
md5($hash);
instead of:
$hash = md5($data);
That's pretty ugly code, forced on you by the API of the function.
For unset though, I don't think the typical use case is for deriving new data:
$arr = unset($arr['foo']);
That seems pretty clunky as well as possibly a performance hit.
Generally speaking, it's better to return by value instead of taking a reference because:
It's the most common usage pattern (there's one less thing to keep in mind about this particular function)
You can create call chains freely, e.g. you can write array_filter(unset_data(...))
Generally speaking, code without side effects (I 'm calling the mutation of an argument in a manner visible to the caller a side effect) is easier to reason about
Most of the time, these advantages come at the cost of using up additional memory. Unless you have good reason (or better yet, proof) to believe that the additional memory consumption is going to be an issue, my advice is to just return the mutated value.
I feel that there is not a general you should/shouldn't answer to this question - it depends entirely on the usage case.
My personal feeling is leaning towards passing by reference, to keep it's behaviour more in line with the native unset(), but if you are likely to end up regularly having to make copies of the array before you call the function, then go with a return value. Another advantage of the by reference approach is that you can return some other information as well as achieving modification of the array - for example, you could return an integer describing how many values were removed from the array based on the arguments.
I don't think there is a solid argument for "best practice" with either option here, so the short answer would be:
Do whatever you are most comfortable with and whatever allows you to write the most concise, readable and self-documenting code.
I would like to implement Singular Value Decomposition (SVD) in PHP. I know that there are several external libraries which could do this for me. But I have two questions concerning PHP, though:
1) Do you think it's possible and/or reasonable to code the SVD in PHP?
2) If (1) is yes: Can you help me to code it in PHP?
I've already coded some parts of SVD by myself. Here's the code which I made comments to the course of action in. Some parts of this code aren't completely correct.
It would be great if you could help me. Thank you very much in advance!
SVD-python
Is a very clear, parsimonious implementation of the SVD.
It's practically psuedocode and should be fairly easy to understand
and compare/draw on for your php implementation, even if you don't know much python.
SVD-python
That said, as others have mentioned I wouldn't expect to be able to do very heavy-duty LSA with php implementation what sounds like a pretty limited web-host.
Cheers
Edit:
The module above doesn't do anything all by itself, but there is an example included in the
opening comments. Assuming you downloaded the python module, and it was accessible (e.g. in the same folder), you
could implement a trivial example as follow,
#!/usr/bin/python
import svd
import math
a = [[22.,10., 2., 3., 7.],
[14., 7.,10., 0., 8.],
[-1.,13.,-1.,-11., 3.],
[-3.,-2.,13., -2., 4.],
[ 9., 8., 1., -2., 4.],
[ 9., 1.,-7., 5.,-1.],
[ 2.,-6., 6., 5., 1.],
[ 4., 5., 0., -2., 2.]]
u,w,vt = svd.svd(a)
print w
Here 'w' contains your list of singular values.
Of course this only gets you part of the way to latent semantic analysis and its relatives.
You usually want to reduce the number of singular values, then employ some appropriate distance
metric to measure the similarity between your documents, or words, or documents and words, etc.
The cosine of the angle between your resultant vectors is pretty popular.
Latent Semantic Mapping (pdf)
is by far the clearest, most concise and informative paper I've read on the remaining steps you
need to work out following the SVD.
Edit2: also note that if you're working with very large term-document matrices (I'm assuming this
is what you are doing) it is almost certainly going to be far more efficient to perform the decomposition
in an offline mode, and then perform only the comparisons in a live fashion in response to requests.
while svd-python is great for learning, the svdlibc is more what you would want for such heavy
computation.
finally as mentioned in the bellegarda paper above, remember that you don't have to recompute the
svd every single time you get a new document or request. depending on what you are trying to do you could
probably get away with performing the svd once every week or so, in an offline mode, a local machine,
and then uploading the results (size/bandwidth concerns notwithstanding).
anyway good luck!
Be careful when you say "I don't care what the time limits are". SVD is an O(N^3) operation (or O(MN^2) if it's a rectangular m*n matrix) which means that you could very easily be in a situation where your problem can take a very long time. If the 100*100 case takes one minute, the 1000*1000 case would 10^3 minutes, or nearly 17 hours (and probably worse, realistically, as you're likely to be out of cache). With something like PHP, the prefactor -- the number multiplying the N^3 in order to calculate the required FLOP count, could be very, very large.
Having said that, of course it's possible to code it in PHP -- the language has the required data structures and operations.
I know this is an old Q, but here's my 2-bits:
1) A true SVD is much slower than the calculus-inspired approximations used, eg, in the Netflix prize. See: http://www.sifter.org/~simon/journal/20061211.html
There's an implementation (in C) here:
http://www.timelydevelopment.com/demos/NetflixPrize.aspx
2) C would be faster but PHP can certainly do it.
PHP Architect author Cal Evans: "PHP is a web scripting language... [but] I’ve used PHP as a scripting language for writing the DOS equivalent of BATCH files or the Linux equivalent of shell scripts. I’ve found that most of what I need to do can be accomplished from within PHP. There is even a project to allow you to build desktop applications via PHP, the PHP-GTK project."
Regarding question 1: It definitely is possible. Whether it's reasonable depends on your scenario: How big are your matrices? How often do you intend to run the code? Is it run in a web site or from the command line?
If you do care about speed, I would suggest writing a simple extension that wraps calls to the GNU Scientific Library.
Yes it's posible, but implementing SVD in php ins't the optimal approach. As you can see here PHP is slower than C and also slower than C++, so maybe it was better if you could do it in one of this languages and call them as a function to get your results. You can find an implementation of the algorithm here, so you can guide yourself trough it.
About the function calling can use:
The exec() Function
The system function is quite useful and powerful, but one of the biggest problems with it is that all resulting text from the program goes directly to the output stream. There will be situations where you might like to format the resulting text and display it in some different way, or not display it at all.
The system() Function
The system function in PHP takes a string argument with the command to execute as well as any arguments you wish passed to that command. This function executes the specified command, and dumps any resulting text to the output stream (either the HTTP output in a web server situation, or the console if you are running PHP as a command line tool). The return of this function is the last line of output from the program, if it emits text output.
The passthru() Function
One fascinating function that PHP provides similar to those we have seen so far is the passthru function. This function, like the others, executes the program you tell it to. However, it then proceeds to immediately send the raw output from this program to the output stream with which PHP is currently working (i.e. either HTTP in a web server scenario, or the shell in a command line version of PHP).
Yes. this is perfectly possible to be implemented in PHP.
I don't know what the reasonable time frame for execution and how large it can compute.
I would probably have to implement the algorithm to get a rought idea.
Yes I can help you code it. But why do you need help? Doesn't the code you wrote work?
Just as an aside question. What version of PHP do you use?