Related
I should firstly apologize for my probably rookie question, but I've just got no clue how to achieve that relatively complex task being a complete newbie regarding regex. What I need is to specify a validation pattern for a string input and perform separate checks on the separate segments of that pattern. So let's begin with the task itself. I'm working with php7.0 on laravel 5.4 (which should genuinely not make any difference) and I need to somehow produce a matching pattern for a string input, which pattern is the following:
header1: expression1; header2: expression2; header3: expression3 //etc...
What I'd need here is to check if each header is present and if it's present in a special validation list of available headers. So I'd need to extract each header.
Furthermore the expressions are built as follows
expression1 = (a1 + a2)*(a3-a1)
expression2 = b1*(b2 - b3)/b4
//etc...
The point is that each expression contains some numeric parameters which should form a valid arithmetic calculation. Those parameters should also be contained in a special list of available parameter placeholders, so I'd need to check them too. So, is there a simple efficient way (using regex and string analysis in pure php) to specify that strict structure or should I do everything step by step with exploding and try-catching?
An optimal solution would be a shorthand logic (or regex expression?) of a kind like:
$value->match("^n(header: expression)")
->delimitedBy(';')
->where(in_array($header, $allowed_headers))
->where(strtr($expression, array_fill_keys($available_param_placeholders, 0))->isValidArithmeticExpression())
I hope you can follow my logic. The code above would read as: Match N repetitions of the pattern "header: expression", delimited by ';', where 'header' (given that $header is its value) is in an array and where 'expression' (given that $expression is its value) forms a valid arithmetic expression when all available parameter placeholders have been replaced by 0. That's it all. Each deviation of that strict pattern should return false.
As an alternative I'm currently thinking of something like firstly exploding the string by the main delimiter (the semicolon) and then analysing each part separately. So I'll then have to check if there is a colon present, then if everything to the left of the colon matches a valid header name and if everythin to the right of the column forms a valid arithmetic expression when all param names from the list are replaced by a random value (like 0, just to check if the code executes, which I also don't know how to do). Anyway, that way seems like an overkill and I'm sure there should be a smoother way to specify the needed pattern.
I hope I've explained everything good enough and sorry if I'm being to messy explaining my problem. Thanks in advance for each piece of advice/help! Greatly appreciated!
Using eval() must always be Plan Z. With my understanding of your input string, this method may sufficiently validate the headers and expressions (if not, I think it should sufficiently sanitize the string for arithmetic parsing). I don't code in Laravel, so if this can be converted to Laravel syntax I'll leave that job for you.
Code: (Demo)
$test = "header1: (a1 + a2)*(a3-a1); header2: b1*(b2 - b3)/b4; header3: c1 * (((c2); header4: ((a1 * (a2 - b1))/(a3-a1))+b2";
$allowed_headers=['header1','header3','header4'];
$pairs=explode('; ',$test);
foreach($pairs as $pair){
list($header,$expression)=explode(': ',$pair,2);
if(!in_array($header,$allowed_headers)){
echo "$header is not permitted.";
}elseif(!preg_match('~^((?:[-+*/ ]+|[a-z]\d+|\((?1)\))*)$~',$expression)){ // based on https://stackoverflow.com/a/562729/2943403
echo "Invalid expression # $header: $expression";
}else{
echo "$header passed.";
}
echo "\n---\n";
}
Output:
header1 passed.
---
header2 is not permitted.
---
Invalid expression # header3: c1 * (((c2)
---
header4 passed.
---
I will admit the above pattern will match (+ )( +) so it is not the breast best pattern. So perhaps your question may be a candidate for using eval(). Although you may want to consider/research some of the github creations / plugins / parsers that can parse/tokenize an arithmetic expressions first.
Perhaps:
calculate math expression from a string using eval
How to evaluate formula passed as string in PHP?
Parse math operations with PHP
How to mathematically evaluate a string like "2-1" to produce "1"?
Any $pair that gets past the if and the elseif can move onto the evaluation process in the else.
I'll give you a headstart/hint about some general handling, but I'll shy away from giving any direct instruction to avoid the wrath of a certain population of critics.
}else{
// replace all variables with 0
//$expression=preg_replace('/[a-z]\d+/','0',$expression);
// or replace each unique variable with a whole number
$expression=preg_match_all('/[a-z]\d+/',$expression,$out)?strtr($expression,array_flip($out[0])):$expression; // variables become incremented whole numbers
// ... from here use $expression with eval() in a style/intent of your choosing.
// ... set a battery of try and catch statements to handle unsavory outcomes.
// https://www.sitepoint.com/a-crash-course-of-changes-to-exception-handling-in-php-7/
}
$test = "header1: (a1 + a2)*(a3-a1); header2: b1*(b2 - b3)/b4; header3: expression3";
$pairs = explode(';', $test);
$headers = [];
$expressions = [];
foreach ($pairs as $p) {
$he = explode(':', $p);
$headers[] = trim($he[0]);
$expressions[] = trim($he[1]);
}
foreach ($headers as $h) {
if (!in_array($h, $allowed_headers)) {
return false;
}
}
foreach ($expressions as $e) {
preg_match_all('/[a-z0-9]+/', $e, $matches);
foreach ($matches as $m) {
if (param_fails($m)) {
echo "Expression $e contains forbidden param $m.";
}
}
}
Regex appeared to be not as complicated as I thought when posting that question, so I've managed to achieve the pattern in its complete form by myself with the initial headstart owed to #mickmackusa. What I have finally come up with is that here, explained to you by regex101 itself: https://regex101.com/r/UHMrqL/1
The logic whic it's based on is described in the initial question. The only thing missing is the verification of the values of the headers and the names of the params, but that's easy to match afterwards with preg_match_all and verify with pure php checks. Thanks again for the attention and the help! :)
I am creating my own language.
The goal is to "compile" it to PHP or Javascript, and, ultimately, to interpret and run it on the same language, to make it look like a "middle-level" language.
Right now, I'm focusing on the aspect of interpreting it in PHP and run it.
At the moment, I'm using regex to split the string and extract the multiple tokens.
This is the regex I have:
/\:((?:cons#(?:\d+(?:\.\d+)?|(?:"(?:(?:\\\\)+"|[^"]|(?:\r\n|\r|\n))*")))|(?:[a-z]+(?:#[a-z]+)?|\^?[\~\&](?:[a-z]+|\d+|\-1)))/g
This is quite hard to read and maintain, even though it works.
Is there a better way of doing this?
Here is an example of the code for my language:
:define:&0:factorial
:param:~0:static
:case
:lower#equal:cons#1
:case:end
:scope
:return:cons#1
:scope:end
:scope
:define:~0:static
:define:~1:static
:require:static
:call:static#sub:^~0:~1 :store:~0
:call:&-1:~0 :store:~1
:call:static#sum:^~0:~1 :store:~0
:return:~0
:scope:end
:define:end
This defines a recursive function to calculate the factorial (not so well written, that isn't important).
The goal is to get what is after the :, including the #. :static#sub is a whole token, saving it without the :.
Everything is the same, except for the token :cons, which can take a value after. The value is a numerical value (integer or float, called static or dynamic in the language, respectively) or a string, which must start and end with ", supporting escaping like \". Multi-line strings aren't supported.
Variables are the ones with ~0, using ^ before will get the value to the above :scope.
Functions are similar, being used &0 instead and &-1 points to the current function (no need for ^&-1 here).
Said this, Is there a better way to get the tokens?
Here you can see it in action: http://regex101.com/r/nF7oF9/2
[Update] To issue the pattern being complicated and maintainability, you can split it using PCRE_EXTENDED, and comments:
preg_match('/
# read constant (?)
\:((?:cons#(?:\d+(?:\.\d+)?|
# read a string (?)
(?:"(?:(?:\\\\)+"|[^"]|(?:\r\n|\r|\n))*")))|
# read an identifier (?)
(?:[a-z]+(?:#[a-z]+)?|
# read whatever
\^?[\~\&](?:[a-z]+|\d+|\-1)))
/gx
', $input)
Beware that all space are ignored, except under certain conditions (\n is normally "safe").
Now, if you want to pimp you lexer and parser, then read that:
What does (f)lex [GNU equivalent of LEX] is simply let you pass a list of regexp, and eventually a "group". You can also try ANTLR and PHP Target Runtime to get the work done.
As for you request, I've made a lexer in the past, following the principle of FLEX. The idea is to cycle through the regexp like FLEX does:
$regexp = [reg1 => STRING, reg2 => ID, reg3 => WS];
$input = ...;
$tokens = [];
while ($input) {
$best = null;
$k = null;
for ($regexp as $re => $kind) {
if (preg_match($re, $input, $match)) {
$best = $match[0];
$k = $kind;
break;
}
}
if (null === $best) {
throw new Exception("could not analyze input, invalid token");
}
$tokens[] = ['kind' => $kind, 'value' => $best];
$input = substr($input, strlen($best)); // move.
}
Since FLEX and Yacc/Bison integrates, the usual pattern is to read until next token (that is, they don't do a loop that read all input before parsing).
The $regexp array can be anything, I expected it to be a "regexp" => "kind" key/value, but you can also an array like that:
$regexp = [['reg' => '...', 'kind' => STRING], ...]
You can also enable/disable regexp using groups (like FLEX groups works): for example, consider the following code:
class Foobar {
const FOOBAR = "arg";
function x() {...}
}
There is no need to activate the string regexp until you need to read an expression (here, the expression is what come after the "="). And there is no need to activate the class identifier when you are actually in a class.
FLEX's group permits to read comments, using a first regexp, activating some group that would ignore other regexp, until some matches is done (like "*/").
Note that this approach is a naïve approach: a lexer like FLEX will actually generate an automaton, which use different state to represent your need (the regexp is itself an automaton).
This use an algorithm of packed indexes or something alike (I used the naïve "for each" because I did not understand the algorithm enough) which is memory and speed efficient.
As I said, it was something I made in the past - something like 6/7 years ago.
It was on Windows.
It was not particularly quick (well it is O(N²) because of the two loops).
I think also that PHP was compiling the regexp each times. Now that I do Java, I use the Pattern implementation which compile the regexp once, and let you reuse it. I don't know PHP does the same by first looking into a regexp cache if there was already a compiled regexp.
I was using preg_match with an offset, to avoid doing the substr($input, ...) at the end.
You should try to use the ANTLR3 PHP Code Generation Target, since the ANTLR grammar editor is pretty easy to use, and you will have a really more readable/maintainable code :)
Here is the $source example
/**
* These functions can be replaced via plugins. If plugins do not redefine these
* functions, then these will be used instead.
*/
if ( !function_exists('wp_set_current_user') ) :
/**
* Changes the current user by ID or name.
*
*/
function wp_set_current_user($id, $name = '') {
Attention: some don't have the function_exists line.
For my special purpose, I'm trying to parse the docblock with regular expression.
Here is the regex
$t = preg_match_all("#(/\*\*.*?\*/\nfunction\s.*?\(.*?\))\s{#mis",$source,$m);
I expect to get:
/**
* Changes the current user by ID or name.
*
*/
function wp_set_current_user($id, $name = '') {
but instead, it returns me the whole code example.
Any help would be appreciated.
I find out some people ask me my purpose, I don't think this is important here though.
I'm using geany and I find out existing wordpress code hint isn't complete.
And the docblock parsers I found that don't parse function name and function arguments.
So I try to parse them on my own.
the code hint format of geany is
wp_set_current_user|Changes the current user by ID or name.|($id, $name = '')|
However, my point of this question is how to make regex take second "/**" as starting point?
I'm sorry for my poor English that confused you all.
You can parse comment out by regexp like this (check out Regex look around tutorial):
/\*\*/(?:(?:.(?!\*\*/))*)\*\*/
Then any number of white spaces can occur:
[\s]*
What keywords can function have in php? static, virtual, final, public, private, protected correct me if I'm forgetting something.
(?:(?:static|virtual|final|public|private|protected)\s+)*
Okay, now function header and braces:
function\s+(?P<name>\w\d_+)\s*\(...\)
The ... parts get's complicated because it can contain default value which can be complicated php string ($remove_characters = '\'"\n\r '), so parsing value (string, string, number, constant):
"[^"\\\\]*(?:\\\\.[^"\\\\]*)*"
\'[^\'\\\\]*(?:\\\\.[^\'\\\\]*)*'
[\d.]+
\w+
Resulting to one large value regexp:
("[^"\\\\]*(?:\\\\.[^"\\\\]*)*"|\'[^\'\\\\]*(?:\\\\.[^\'\\\\]*)*'|[\d.]+|\w+)
And every function argument has a format $var or $var = data (of course any number of spaces + I'm omitting array $input = array()) so this is simplified var name matching:
\\$[\w_][\w\d_]*
Type matching:
([\w_]+\s+)?
So function arguments can be:
\s*([\w_]+\s+)?(\\$[\w_][\w\d_]*|\\$[\w_][\w\d_]*\s*=\s*<value>)
And complete regexp for function would look like:
function\s+(?P<name>\w\d_+)\s*\(\s*|<argument>((,<argument>)*)\)
I won't be testing those regexp for you, it's your job to do so at this point, my goal was to show you what you need if you want to do this really correctly (but feel free to edit my answer if you find a mistake).You may also use really simplified version (like just one regexp for function arguments eating everything).
If you want the easy dirty trick, use a lookahead assertion
(?<=if\ (\ !function_exists('wp_set_current_user')\ )\ :)
Appending this to your search should do the trick. (You might have to escape the single quotes.)
Just a simple question. I have a contact form stored in a function because it's just easier to call it on the pages I want it to have.
Now to extend usability, I want to search for {contactform} using str_replace.
Example:
function contactform(){
// bunch of inputs
}
$wysiwyg = str_replace('{contactform}', contactform(), $wysiwyg);
So basically, if {contactform} is found. Replace it with the output of contactform.
Now I know that I can run the function before the replace and store its output in a variable, and then replace it with that same variable. But I'm interested to know if there is a better method than the one I have in mind.
Thanks
To answer your question, you could use PCRE and preg_replace_callback and then either modify your contactform() function or create a wrapper that accepts the matches.
I think your idea of running the function once and storing it in a variable makes more sense though.
Your method is fine, I would set it as a $var if you are planning to use the contents of contactform() more than once.
It might pay to use http://php.net/strpos to check if {contact_form} exists before running the str_replace function.
You could try both ways, and if your server support it, benchmark:
<?php echo 'Memory Usage: '. (!function_exists('memory_get_usage') ? '0' : round(memory_get_usage()/1024/1024, 2)) .'MB'; ?>
you may want to have a look at php's call_user_func() more information here http://php.net/call_user_func
$wysiwyg = 'Some string and {contactform}';
$find = '{contactform}';
strpos($wysiwyg, $find) ? call_user_func($find) : '';
Yes, there is: Write one yourself. (Unless there already is one, which is always hard to be sure in PHP; see my next point.)
Ah, there it is: preg_replace_callback(). Of course, it's one of the three regex libraries and as such, does not do simple string manipulation.
Anyway, my point is: Do not follow PHP's [non-]design guidelines. Write your own multibyte-safe string substitution function with a callback, and do not use call_user_func().
How do I convert the value of a PHP variable to string?
I was looking for something better than concatenating with an empty string:
$myText = $myVar . '';
Like the ToString() method in Java or .NET.
You can use the casting operators:
$myText = (string)$myVar;
There are more details for string casting and conversion in the Strings section of the PHP manual, including special handling for booleans and nulls.
This is done with typecasting:
$strvar = (string) $var; // Casts to string
echo $var; // Will cast to string implicitly
var_dump($var); // Will show the true type of the variable
In a class you can define what is output by using the magical method __toString. An example is below:
class Bottles {
public function __toString()
{
return 'Ninety nine green bottles';
}
}
$ex = new Bottles;
var_dump($ex, (string) $ex);
// Returns: instance of Bottles and "Ninety nine green bottles"
Some more type casting examples:
$i = 1;
// int 1
var_dump((int) $i);
// bool true
var_dump((bool) $i);
// string "1"
var_dump((string) 1);
Use print_r:
$myText = print_r($myVar,true);
You can also use it like:
$myText = print_r($myVar,true)."foo bar";
This will set $myText to a string, like:
array (
0 => '11',
)foo bar
Use var_export to get a little bit more info (with types of variable,...):
$myText = var_export($myVar,true);
You can either use typecasting:
$var = (string)$varname;
or StringValue:
$var = strval($varname);
or SetType:
$success = settype($varname, 'string');
// $varname itself becomes a string
They all work for the same thing in terms of Type-Juggling.
How do I convert the value of a PHP
variable to string?
A value can be converted to a string using the (string) cast or the strval() function. (Edit: As Thomas also stated).
It also should be automatically casted for you when you use it as a string.
You are looking for strval:
string strval ( mixed $var )
Get the string value of a variable.
See the documentation on string for
more information on converting to
string.
This function performs no formatting
on the returned value. If you are
looking for a way to format a numeric
value as a string, please see
sprintf() or number_format().
For primitives just use (string)$var or print this variable straight away. PHP is dynamically typed language and variable will be casted to string on the fly.
If you want to convert objects to strings you will need to define __toString() method that returns string. This method is forbidden to throw exceptions.
Putting it in double quotes should work:
$myText = "$myVar";
I think it is worth mentioning that you can catch any output (like print_r, var_dump) in a variable by using output buffering:
<?php
ob_start();
var_dump($someVar);
$result = ob_get_clean();
?>
Thanks to:
How can I capture the result of var_dump to a string?
Another option is to use the built in settype function:
<?php
$foo = "5bar"; // string
$bar = true; // boolean
settype($foo, "integer"); // $foo is now 5 (integer)
settype($bar, "string"); // $bar is now "1" (string)
?>
This actually performs a conversion on the variable unlike typecasting and allows you to have a general way of converting to multiple types.
In addition to the answer given by Thomas G. Mayfield:
If you follow the link to the string casting manual, there is a special case which is quite important to understand:
(string) cast is preferable especially if your variable $a is an object, because PHP will follow the casting protocol according to its object model by calling __toString() magic method (if such is defined in the class of which $a is instantiated from).
PHP does something similar to
function castToString($instance)
{
if (is_object($instance) && method_exists($instance, '__toString')) {
return call_user_func_array(array($instance, '__toString'));
}
}
The (string) casting operation is a recommended technique for PHP5+ programming making code more Object-Oriented. IMO this is a nice example of design similarity (difference) to other OOP languages like Java/C#/etc., i.e. in its own special PHP way (whenever it's for the good or for the worth).
As others have mentioned, objects need a __toString method to be cast to a string. An object that doesn't define that method can still produce a string representation using the spl_object_hash function.
This function returns a unique identifier for the object. This id can be used as a hash key for storing objects, or for identifying an object, as long as the object is not destroyed. Once the object is destroyed, its hash may be reused for other objects.
I have a base Object class with a __toString method that defaults to calling md5(spl_object_hash($this)) to make the output clearly unique, since the output from spl_object_hash can look very similar between objects.
This is particularly helpful for debugging code where a variable initializes as an Object and later in the code it is suspected to have changed to a different Object. Simply echoing the variables to the log can reveal the change from the object hash (or not).
I think this question is a bit misleading since,
toString() in Java isn't just a way to cast something to a String. That is what casting via (string) does, and it works as well in PHP.
// Java
String myText = (string) myVar;
// PHP
$myText = (string) $myVar;
Note that this can be problematic as Java is type-safe (see here for more details).
But as I said, this is casting and therefore not the equivalent of Java's toString().
toString in Java doesn't just cast an object to a String. It instead will give you the String representation. And that's what __toString() in PHP does.
// Java
class SomeClass{
public String toString(){
return "some string representation";
}
}
// PHP
class SomeClass{
public function __toString()
{
return "some string representation";
}
}
And from the other side:
// Java
new SomeClass().toString(); // "Some string representation"
// PHP
strval(new SomeClass); // "Some string representation"
What do I mean by "giving the String representation"?
Imagine a class for a library with millions of books.
Casting that class to a String would (by default) convert the data, here all books, into a string so the String would be very long and most of the time not very useful.
To String instead will give you the String representation, i.e., only the library's name. This is shorter and therefore gives you less, but more important information.
These are both valid approaches but with very different goals, neither is a perfect solution for every case, and you have to choose wisely which fits your needs better.
Sure, there are even more options:
$no = 421337 // A number in PHP
$str = "$no"; // In PHP, the stuff inside "" is calculated and variables are replaced
$str = print_r($no, true); // Same as String.format();
$str = settype($no, 'string'); // Sets $no to the String Type
$str = strval($no); // Get the string value of $no
$str = $no . ''; // As you said concatenate an empty string works too
All of these methods will return a String, some of them using __toString internally and some others will fail on Objects. Take a look at the PHP documentation for more details.
Some, if not all, of the methods in the previous answers fail when the intended string variable has a leading zero, for example, 077543.
An attempt to convert such a variable fails to get the intended string, because the variable is converted to base 8 (octal).
All these will make $str have a value of 32611:
$no = 077543
$str = (string)$no;
$str = "$no";
$str = print_r($no,true);
$str = strval($no);
$str = settype($no, "integer");
The documentation says that you can also do:
$str = "$foo";
It's the same as cast, but I think it looks prettier.
Source:
Russian
English
Double quotes should work too... it should create a string, then it should APPEND/INSERT the casted STRING value of $myVar in between 2 empty strings.
You can always create a method named .ToString($in) that returns
$in . '';
If you're converting anything other than simple types like integers or booleans, you'd need to write your own function/method for the type that you're trying to convert, otherwise PHP will just print the type (such as array, GoogleSniffer, or Bidet).
PHP is dynamically typed, so like Chris Fournier said, "If you use it like a string it becomes a string". If you're looking for more control over the format of the string then printf is your answer.
You can also use the var_export PHP function.
$parent_category_name = "new clothes & shoes";
// To make it to string option one
$parent_category = strval($parent_category_name);
// Or make it a string by concatenating it with 'new clothes & shoes'
// It is useful for database queries
$parent_category = "'" . strval($parent_category_name) . "'";
For objects, you may not be able to use the cast operator. Instead, I use the json_encode() method.
For example, the following will output contents to the error log:
error_log(json_encode($args));
Try this little strange, but working, approach to convert the textual part of stdClass to string type:
$my_std_obj_result = $SomeResponse->return->data; // Specific to object/implementation
$my_string_result = implode ((array)$my_std_obj_result); // Do conversion
__toString method or (string) cast
$string=(string)$variable; //force make string
you can treat an object as a string
class Foo
{
public function __toString()
{
return "foo";
}
}
echo new Foo(); //foo
also, have another trick, ı assume ı have int variable ı want to make string it
$string=''.$intvariable;
This can be difficult in PHP because of the way data types are handled internally. Assuming that you don't mean complex types such as objects or resources, generic casting to strings may still result in incorrect conversion. In some cases pack/unpack may even be required, and then you still have the possibility of problems with string encoding. I know this might sound like a stretch but these are the type of cases where standard type juggling such as $myText = $my_var .''; and $myText = (string)$my_var; (and similar) may not work. Otherwise I would suggest a generic cast, or using serialize() or json_encode(), but again it depends on what you plan on doing with the string.
The primary difference is that Java and .NET have better facilities with handling binary data and primitive types, and converting to/from specific types and then to string from there, even if a specific case is abstracted away from the user. It's a different story with PHP where even handling hex can leave you scratching your head until you get the hang of it.
I can't think of a better way to answer this which is comparable to Java/.NET where _toString() and such methods are usually implemented in a way that's specific to the object or data type. In that way the magic methods __toString() and __serialize()/__unserialize() may be the best comparison.
Also keep in mind that PHP doesn't have the same concepts of primitive data types. In essence every data type in PHP can be considered an object, and their internal handlers try to make them somewhat universal, even if it means loosing accuracy such as when converting a float to int. You can't deal with types as you can in Java unless your working with their zvals within a native extension.
While PHP userspace doesn't define int, char, bool, or float as an objects, everything is stored in a zval structure which is as close to an object that you can find in C, with generic functions for handling the data within the zval. Every possible way to access data within PHP goes down to the zval structure and the way the zend vm allows you to handles them without converting them to native types and structures. With Java types you have finer grained access to their data and more ways to to manipulate them, but also greater complexity, hence the strong type vs weak type argument.
These links my be helpful:
https://www.php.net/manual/en/language.types.type-juggling.php
https://www.php.net/manual/en/language.oop5.magic.php
I use variableToString. It handles every PHP type and is flexible (you can extend it if you want).