Find all instances of $variables in single-quoted strings?

Find all instances of $variables in single-quoted strings? - php

Someone has made a change to the repo replacing all or most instances of " with ' to assign strings. This has had the unintended effect of breaking many strings that are parsing variables. Examples:
$query = 'ALTER TABLE ' . $items . ' ADD `user_$name`';
$query .= '($length)';
etc.
Obviously this is breaking SQL queries, but it may not be limited to just strings assigned to $query.
Is there a regex or some function of PhpStorm that I can use to find all instances of this and fix them, either by reverting back to " or using ' with concatenation?

So you want to find single-quoted string literals containing a $. The obvious thing to try would be this:
/'[^'$]*[$][^']*'/
That should find any single-quoted string containing a $. Unfortunately that doesn't quite work, since there may be backslash-escaped characters. So this is slightly better:
/'(?:[^'$]|\\.)*[$](?:[^']|\\.)*'/
But another problem is that there might be more than you might have something like this:
'foo' . $bar . 'baz'
and that regex would match ' . $bar . '. To get round this, we could write a regex that matches a substring from the start of a line until the first single-quoted string literal that contains $:
/^(?:[^']|'(?:[^\\']|\\.)*')*'(?:[^'$]|\\.)*[$](?:[^']|\\.)*'/
Caveats:
I haven't tested these, so they may not be quite right.
My PHP is a little rusty, so I'm not sure exactly what string literals are supposed to look like.
I don't know how you're running the regex; if you're using a regex constructor from a string literal you need an extra layer of character escaping (e.g. in javascript, /\\/ is equivalent to new RegExp('\\\\')).
You probably have to set a flag, possibly called m (again, I don't know what regex engine you're using), so that the initial ^ will match at the start of every line, not just the start of the string; likewise, unset the flag which may be called s, since you don't want . to match newlines.
There may be some obscure case I haven't thought of. It's theoretically possible there might be a code comment containing a case you don't want to replace.

Related

PHP extend regex to accept round brackets

I have an existing regex which checks if the string is not wrapped in either quotes or square brackets then
I'm wrapping that string in quotes
My existing regex is as follow -
if (!preg_match('/^["\[].*["\]]$/', $filter)) {
$filter = '%22' . $filter . '%22';
}
Now I want to extend this regex to check already wrapped in either quotes or square brackets or parentheses
For parentheses, my string value i.e my $filter value would be something like (123 456)
Can anyone help to extended this regex?

I think the regular expression is good for complex string math but like this simple string match why do you use regex it takes almost O(n) time where n is you string size some time it takes O(2^m), where m is the length of regex. But if you check with a simple if just check 1st and the last characters of string it takes O(1). Here is the regex solution.
/^(?(?=")(["].*["])|(?=\[)([\[].*[\]]))|(?=\()(([\(].*[\)]))$/

The regular expressions are a powerful tool but they are not a Swiss army knife. There are problems that simply cannot be resolved using regex and there are problems that can be resolved using regex but a simpler approach produces code that is easier to read and understand. This is such a problem.
Let's reformulate the problem. If the first character of the string is " and also its last character is " then the string is wrapped in quotes and it does not need other processing. The same when the first character is [ and the last one is ]. Or ( and ).
The first character of a string stored in the variable $filter is $filter[0]. The last one is $filter[-1]. Extract them into a new string and search for it into a list of combinations of quotes and parentheses:
if (! in_array($filter[0].$filter[-1], ['""', '[]', '()'])) {
// the string is not enclosed in quotes, square brackets or parentheses
// do something with it (enclose it, etc)
}
If you are using PHP 5 (any version) or PHP 7.0 then you are out of luck (and out of PHP updates, btw) and you cannot use $filter[-1] (because this functionality has been introduced in PHP 7.1).
The PHP function substr() comes to the rescue.
substr($filter, -1) does the same thing as $filter[-1] (returns the last character of $filter and works in all PHP versions.
There are two corner cases to consider:
When $filter is '"' (a string of exactly one character that is a double quote), the code above will report it as enclosed in quotes when, in fact, it is not.
When $filter is '' (the empty string) the code produces two warnings (but does not report it as being enclosed in quotes.
Both cases can be easily solved by adding a check of the string's length to avoid running the other test if the string is too short:
if (strlen($filter) < 2 || ! in_array($filter[0].$filter[-1], ['""', '[]', '()'])) {
// the string is not enclosed in quotes, square brackets or parentheses
// do something with it (enclose it, etc)
}

Regex that will match each specific tag that contains ../

I'm trying to find a regex that will match each specific tag that contains ../.
I had it matching when each element was on its own line. But then there was an instance where my HTML rendered on one line causing the regex to match the whole line:
<body><img src="../../../img.png"><img src="../../img.png"><img src="../../img.png"><img src="..//../img.png"><img src="..../../img.png">
Here was the regex that I was using
<.*[\.]{2}[\/].*>

You need to make sure to match only one tag per match.
Using a negative character class like below will accomplish that.
<[^>]*\.\./[^>]*>
< = start of tag
[^>]* = any number of characters that aren't >, since > would end the tag
\.\./ = "../" with escapes for the . characters
[^>]* = same as above
> = end of tag
It appears you might be doing this to prevent path parenting. You should know that for a URL attribute in an HTML tag, the following tags are considered "equivalent":
<img src="../foo.jpg">
<img src="%2e%2e%2ffoo.jpg">
<img src="../foo.jpg">
That's because the src attribute goes through HTML entity un-escaping, and then URL un-escaping (in that order) before being used. As a result, there are 5,832 different ways to write '../' into an HTML tag's path attribute (18 ways to write each character times 3 characters).
Making a regex to match any of these encodings of ../ is more difficult, but still possible.
(\.|.|(%|%)(2|2)([Ee]|E|e)){2}(/|/|(%|%)(2|2)([Ff]|F|f))
For reference:
. = . HTML escape sequence
/ = / HTML escape sequence
%2E or %2e = . URL escape sequence
%2F or %2f = / URL escape sequence
% = % HTML escape sequence
2 = 2 HTML escape sequence
E = E HTML escape sequence
e = e HTML escape sequence
F = F HTML escape sequence
f = f HTML escape sequence
You can see why people usually say it's better to use a real HTML parser, instead of regex!
Anyway, assuming yo need this, and a full HTML parser isn't feasable, here's the version of <[^>]*[="'/]\.\./[^>]*> that also catches HTML and URL escaping:
<[^>]*[="'/](\.|.|(%|%)(2|2)([Ee]|E|e)){2}(/|/|(%|%)(2|2)([Ff]|F|f))[^>]*>

Causing the regex to match the whole line seems you are regex is greedy, try this way as #Avinash Raj commented.
SEE DEMO

To get the regexp you want I will try to follow a step by step approach:
First, we need some regex that matches the beginning and end of the tag. But we must be carefull, as the tag end character > is allowed in single and double quote strings. We construct first the regexp that matches these single/double quoted strings: ([^"'>]|"[^"]*"|'[^']*')* (a sequence of: non-quote (single and double) and non end tag character, or a single quoted string, or a double quoted string)
Now, modify it to match a single quoted string or a double quoted string that includes a ../: ([^"'>]|"[^"]*\.\.\/[^"]*"|'[^']*\.\.\/[^']*')* (we can simplify it, eliminating the last * operator, as we will match the whole string with only one matching ../ inside, and we can eliminate the first option, as we will have the ../ seq inside quoted strings). We get to: ("[^"]*\.\.\/[^"]*"|'[^']*\.\.\/[^']*')
To get a string matching a sequence including at least one of the second strings, we concatenate the first regex at the beginning and at the end, and the other in the middle. We get to: ([^"'>]|"[^"]*"|'[^']*')*("[^"]*\.\.\/[^"]*"|'[^']*\.\.\/[^']*')([^"'>]|"[^"]*"|'[^']*')*
Now, we only need to surround this regexp with the needed sequences first <[iI][mM][gG][ \t\n], and after >, getting to:
<[iI][mM][gG][ \t\n]([^"'>]|"[^"]*"|'[^']*')*("[^"]*\.\.\/[^"]*"|'[^']*\.\.\/[^']*')([^"'>]|"[^"]*"|'[^']*')*>
This is the regexp we need. See demo If we extract the content of the second group ($2, \2, etc.) we'll get to the parameter value that matches (with the quotes included) the ../ string.
Don't try to simplify this further as > characters are allowed inside single and double quoted strings, and " are allowed in single quoted strings, and ' are in double quoted strings. As someone explained in another answer to this question, you cannot be greedy (using .* inside, as you'll eat as much input as possible before matching) This regexp will need to match multiline tags, as these could be part of your input file. If you have a well formed HTML file, then you'll have no problem with this regexp.
And some final quoting: an HTML tag is defined by a grammar that is regular (it is only a regular subset of the full HTML syntax), so it is perfectly parseable with a regex (the same is not true for the complete HTML language) A regex is by far more efficient and less resource consuming than a full HTML parser. The caveats are that you have to write it (and to write it well) and that HTML parsers are easily found with some googling that avoid you the work of doing it, but you have to write it only once. Regexp parsing is a one pass process that grows in complexity (for this example, at least) linearly with input text length. You'll be advised against this by people that simply don't know how to write the right regexp or don't know how to determine is some grammar is regular.
Note:
This regexp will match commented tags. In case you don't want to match commented <img> tags, you'll have to extend your regexp a little or do a two pass to eliminate comments first, and then parse tags (the regexp that only recognizes uncommented tags is far more complicated than this) Also, look below for more difficulties you can have on your task to eliminate parent directory references.
Note 2:
As I have read in your comments to some answers, the problem you want to solve (eliminating .. references in HTML/XML sources) is not regular. The reason is that you can have . and .. references embedded in the path strings. Normally, one must proceed eliminating the /. or ./ components of the path, getting a path without . (actual directory) references. Once you have this, you have to eliminate a/.. references, where a is distinct of ... This deals to eliminating occurrences of a/.., a/b/../.., etc. But the language that matches a^i b^i is not regular (as demonstrated by the pumping lemma ---see google) and you'll need a context independent grammar.
Note 3:
If you limit the number of a/b/c/../../.. levels to some maximum bound, you're still able to find a regexp to match this kind of strings, but you can have one example that breaks your regexp and makes it invalid. Remember, you first have to eliminate the single dot . path component (as you can have something like a/b/./././c/./d/.././e/f/.././../... You will first eliminate the single dot components, leading to: a/b/c/d/../e/f/../../../... Then you proceed by pairs of <non ..>/.., getting a/b/c/[d/..]/e/f/../../../.. to a/b/c/e/[f/..]/../../.. -> a/b/c/[e/..]/../.. -> a/b/[c/..]/.. -> a/[b/..] -> a (you ought to check that all the first components of a pair do exist before being eliminated to be precise) and if you get to an empty path, you will have to change it to . to be usable.
I have code to do this process, but it's embedded in some bigger program. If you are interested, you can access this code. (look at the rel_path() routine here)
You cannot eliminate a .. element at the beginning of a path (better, that has not a <non ..> counterpart), as it refers to outside of the tree, making the reference dependant on the external structure of the tree.

Regex to match the first semi colon in a php code

I have a regular expression that is used to re-define a constant in php file using preg_match and input file is screened using htmlspecialchars
eg for
define('MEMBERSHIP', 'GOLD');
the following regex works
/define.*[&quote\']' . $constant . '[&quote;\'].*;/i
however it matches the last semi colon, works in most scenarios but fails in a case like the following
eg:
define("MEMBERSHIP", 'GOLD'); // membership subscription; empty means not in use.
notice the last semicolon, resulting in replaced code as
define("MEMBERSHIP", 'SILVER'); empty means not in use.
which breaks the code. tried the regex below but it didn't work for those with double quote
/define.*[&quote;\']' . $constant . '[&quote;\'][^;]*;/i
any idea how to fix this?

if you add a ? after the *, it will become greedy and take the smallest possible amount of characters. So try
/define.*?[&quote\']' . $constant . '[&quote;\'].*?;/i
to see if it does what you want.
Generally speaking, you should avoid using .s if you don't actually mean any character.

You can match corresponding quotes by using a backreference:
'/define[^"\']*(["\'])' . $constant . '\1[^;]*;/i'
Otherwise, the negated character class you have at the end is definitely the way to go.

Can't get Regex working in PHP, works in RegEXP program

Here is the input I am searching:
\u003cspan class=\"prs\">email_address#me.com\u003c\/span>
Trying to just return email_address#me.com.
My regex class=\\"prs\\">(.*?)\\ returns "class=\"prs\">email_address#me.com\" in RegExp which is OK, I can work with that result.
But I can't get it to work in PHP.
$regex = "/class=\\\"prs\\\">(.*?)\\/";
Gives me an error "No ending delimiter"
Can someone please help?

Your original code:
$regex = "/class=\\\"prs\\\">(.*?)\\/";
The reason you get No ending delimiter is that although you are escaping the backslash prior to the closing forward slash, what you have done is escaped it in the context of the PHP string, not in the context of the regex engine.
So the PHP string escaping mechanism does its thing, and by the time the regex engine gets it, it will look like this:
/class=\"prs\">(.*?)\/
This means that the regular expression engine will see the backslash at the end of the expression as escaping the forward slash that you are intending to use to close the expression.
The usual PHP solution to this kind of thing is to switch to using single-quoted string instead of a double-quoted one, but this still won't work, as \\ is an escaped backslash in both single and double quoted strings.
What you need to do is double up the number of backslash characters at the end of your string, so your code needs to look like this:
$regex = "/class=\\\"prs\\\">(.*?)\\\\/";
The way to prove what it's doing is to print the contents of the $regex variable, so you can see what the string will look like to the regex engine. These kinds of errors are actually very hard to spot, but looking at the actual content of the string will help you spot them.
Hope that helps.

If you change to single quotes it should fix it
$regex = '/class=\\\"prs\\\">(.*?)\\/';

How do I match a square bracket literal using RegEx?

What's the regex to match a square bracket? I'm using \\] in a pattern in eregi_replace, but it doesn't seem to be able to find a ]...

\] is correct, but note that PHP itself ALSO has \ as an escape character, so you might have to use \\[ (or a different kind of string literal).

Works flawlessly:
<?php
$hay = "ab]cd";
echo eregi_replace("\]", "e", $hay);
?>
Output:
abecd

There are two ways of doing this:
/ [\]] /x;
/ \] /x;
While you may consider the latter as the better option, and indeed I would consider using it in simpler regexps. I would consider the former, the better option for larger regexps. Consider the following:
/ (\w*) ( [\d\]] ) /x;
/ (\w*) ( \d | \] ) /x;
In this example, the former is my preferred solution. It does a better job of combining the separate entities, which may each match at the given location. It may also have some speed benefits, depending on implementation.
Note: This is in Perl syntax, partly to ensure proper highlighting.
In PHP you may need to double up on the back-slashes.
"[\\]]" and "\\]"

You don't need to escape it: if isolated, a ] is treated as a regular character.
Tested with eregi_replace and preg_replace.
[ is another beast, you have to escape it. Looks like single and double quotes, single or double escape are all treated the same by PHP, for both regex families.
Perhaps your problem is elsewhere in your expression, you should give it in full.

In .Net you escape special characters by adding up a backslash; "\" meaning it would become; "["...
Though since you normally do this in string literals you would either have to do something like this;
#"\["
or something like this;
"\\["

You problem may come from the fact you are using eregi_replace with the first parameter enclosed in simple quotes:
'\['
In double quotes, though, it could works well depending on the context, since it changes the way the parameter is passed to the function (simple quotes just pass the string without any interpretation, hence the need to double to "\" character).
Here, if "\[" is interpreted as an escape character, you still need to double "\".
Note: based on your comment, you may try the regex
<\s*(?:br|p)\s*\/?\s*\>\s*\[
in order to detect a [ right after a <br>or a <p>

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Find all instances of $variables in single-quoted strings? - php

Related

PHP extend regex to accept round brackets

Regex that will match each specific tag that contains ../

Regex to match the first semi colon in a php code

Can't get Regex working in PHP, works in RegEXP program

How do I match a square bracket literal using RegEx?

Categories

Resources