python regex to find php functions in files - php

I’m looking for a regular expression to match all functions blocks (from start to end) in php files. For example:
function test_function($var) {
if ($var == 'somethin') {
print 'hi';
}
etc.
}
I need the start offset and end offset of the block. What regex can I use?

It is very very complicated and can't be done with one regular expression.
You may think that you can easily match `the beginning of a function like this:
\bfunction\b\s+\S+[^\(](\s+)?\(.*?\)\s+\{
But you can't because what is if there is this in a code?
$string = "function myfunction() {}";
So you should search on everything what isn't quoted. So for excluding quoted strings you can use this regular expression:
(?:(?=(?:(?:\\.|"(?:\\.|[^"\\])*"|[^\\'"])*'(?:\\.|"(?:\\.|[^"'\\])*"|[^\\'])*')*(?:\\.|"(?:\\.|[^"\\])*"|[^\\'])*$)(?=(?:(?:\\.|'(?:\\.|[^'\\])*'|[^\\'"])*"(?:\\.|'(?:\\.|[^'"\\])*'|[^\\"])*")*(?:\\.|'(?:\\.|[^'\\])*'|[^\\"])*$)(?:\\.|[^\\'"]))+
The next thing you should do is counting all { and } because you need to know when the function stops and I can't think about any regular expression which can do this. So you need to do this with looping through.

Look at this: https://github.com/ramen/phply

Related

How to parse a string of function parameters with REGEX in PHP

I am trying to handle parameters like Java or PHP natively handle them, using Regex to parse variable numbers (and types) of arguments. For example, a function might be:
util.echo(5, "Hello, world!");
In this instance, I would want to separate 5 as the first argument and "Hello, world!" as the second (without quotes). What I currently do is explode by commas, but that runs into issues if the string parameters include a comma. I don't have much experience with Regex, but I think it has some way of ignoring commas that are within quotes.
The Regex from this question (",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)") seems like it could work, but I'm confused on how to implement it with PHP.
To test a regular expression onto a string, you can use the preg_match() function in PHP.
See the manual
// $matches is going to output the matches
preg_match("/,(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)/", $to_be_checked, $matches);
if($matches){
var_dump($matches);
// there was a match!
} else {
// the regular expression did not find any pattern matches
}
if you don't need to access the exact matches, just if there was at least one pattern match, you can simply do this:
if(preg_match("/,(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)/", $to_be_checked)){
// there was a match!
} else {
// the regular expression did not find any pattern matches
}
Thank you to Zac and The fourth bird! Example solution that works, for future reference:
$parameters = 'util.echo(5, "Hello, world!");';
preg_match_all('/(?:[^\s(]+\(|\G(?!^),\s*)\K(?:"[^"]*"|[^,()]+)(?=[^()]*\);)/', $parameters, $matches);
if($matches){
var_dump($matches[0]);
} else {
echo('No matches.');
}

Regex fomatting and design for a query

I'm having a some trouble formatting my regular expression for my PHP code using preg_match().
I have a simple string usually looking like this:
"q?=%23asdf".
I want my regular expression to only pass true if the string begins with "q?=%23" and there is a character at the end of the 3. So far one of the problems I have had is that the ? is being pulled up by the regex so doing something like
^q?=23 doesn't work. I am also having problems with contiguous searching in Regex expressions (because I can't figure out how to search after the 3).
So for clarification: "q?=%23asd" should PASS and "q?=%23" should FAIL
I'm no good with Regex so sorry if this seems like a beginner question and thanks in advance.
Just use a lookahead to check whether the character following 3 is an alphabet or not,
^q\?=%23(?=[a-zA-Z])
Add . instead of [A-Za-z] only if you want to check for any character following 3,
^q\?=%23(?=.)
Code would be,
$theregex = '~^q\?=%23(?=[a-z])~i';
if (preg_match($theregex, $yourstring)) {
// Yes! It matches!
}
else { // nah, no luck...
}
So the requirement is: Start with q?=%23, followed by at least one [a-z], the pattern could look like:
$pattern = '/^q\?=%23[a-z]+/i';
Used i (PCRE_CASELESS) modifier. Also see example at regex101.
$string = "q?=%23asdf";
var_dump(figureOut($string));
function figureOut($string){
if(strpos($string, 'q?=%23') == 0){
if(strlen($string) > 6){
return true;
}else{ return false;}
}
}

Regex to detect everything not between {} and then search from whats matched

I am very new to both stackoverflow and Regexes so please forgive mistakes.
I have been searching thoroughly for a Regex to match all text that is not between curly brackets {} and from that text find certain words. For example from the following string:
$content = 'Hello world, { this } is a string with { curly brackets } and this is for testing'
I would like the search for word this to return only the second occurrence of this because its in the area which is not inside curly brackets.
Even if I can get a Regex to match the substrings outside the curly brackets, things get simplified for me. I found this Regex /(}([^}]*){)/ but it cannot select the parts Hello world, and and this is for testing because these are not inside }{ and it only selects is a string with part.
Also I would like to ask if it is possible to combine two Regex for a single purpose like mine. For example the first Regex finds strings outside {} and second finds specific words that are searched for.
I want to use this Regex in php and for now I am using a function which is more like a hack. The purpose is to find specific words that are not in {} ,replace them reliably and write to text files.
Thanks in advance for your help.
(*SKIP)(*F)
You're in luck, as php's PCRE regex engine has a syntax that is wonderful for this kind of task. This tidy regex work like a charm (see demo):
{[^{}]*}(*SKIP)(*F)|\bthis\b
Okay, but how does it work?
Glad you asked. The left side of the alternation | matches complete {braces}then deliberately fails, after which the engine skips to the next position in the string. The right side matches the this words you want, and we know they're the right ones because they weren't matched by the expression on the left...
How to use it in PHP
Just the usual, something like:
$regex = "~{[^{}]*}(*SKIP)(*F)|\bthis\b~";
$count = preg_match_all($regex,$string,$matches);
You'll want to have a look at $matches[0]
Further reading about this and similar exclusion techniques
This situation is very similar to this question about "regex-matching a pattern unless...", which, if you're interested and enjoyed (*SKIP) power, you might like to read to fully understand the technique and how it can be extended.
With strings not very long, I'd use simple string manipulation functions to make these searchable
$content = 'Hello world, { this } is a string with { curly brackets } and this is for testing';
function searchify($stack,$charStart='{',$charEnd='}') {
$searchArea = '';
$first = explode($charStart,$stack);
foreach ($first as $string) {
list($void,$ok) = (strpos($string,$charEnd) ? explode($charEnd,$string) : array('',$string));
$searchArea.= $ok;
}
return $searchArea;
}
this returns a cleared string, then strtr...
$replacing = array
('with'=>'this',
"\n"=>'<br>',
' '=>"<br>",);
$raw = searchify($content);
$replaced = strtr($raw,$replacing);
var_dump($replaced);
...to replace values in it.

Converting Perl Regex to PHP

I have the following Regex in PERL which I need to convert to PHP
if ($referrer_url =~ /\.$domain/) { }
I currently have the following in PHP to match it, but I'm not getting the same results:
if (preg_match("/\.$domain/", $referrer_url)) { }
Can anyone tell me if what I have is the same or if I'm mistaken? Thanks!
Im just guessing that your $domain probably contains .'s like mysite.com if that is the case you need to use preg_quote on the variable:
if (preg_match("/\.".preg_quote($domain, "/")."/", $referrer_url)) { }
If $domain is a regular string you may prefer to use strpos to Find the position of the first occurrence of a substring in a string. This would achieve the same result as using preg_quote with the benefit of being easier to read.
if (strpos($referrer_url, ".$domain") !== false) {
}

Google Style Regular Expression Search

It's been several years since I have used regular expressions, and I was hoping I could get some help on something I'm working on. You know how google's search is quite powerful and will take stuff inside quotes as a literal phrase and things with a minus sign in front of them as not included.
Example: "this is literal" -donotfindme site:examplesite.com
This example would search for the phrase "this is literal" in sites that don't include the word donotfindme on the webiste examplesite.com.
Obviously I'm not looking for something as complex as Google I just wanted to reference where my project is heading.
Anyway, I first wanted to start with the basics which is the literal phrases inside quotes. With the help of another question on this site I was able to do the following:
(this is php)
$search = 'hello "this" is regular expressions';
$pattern = '/".*"/';
$regex = preg_match($pattern, $search, $matches);
print_r($matches);
But this outputs "this" instead of the desired this, and doesn't work at all for multiple phrases in quotes. Could someone lead me in the right direction?
I don't necessarily need code even a real nice place with tutorials would probably do the job.
Thanks!
Well, for this example at least, if you want to match only the text inside the quotes you'll need to use a capturing group. Write it like this:
$pattern = '/"(.*)"/';
and then $matches will be an array of length 2 that contains the text between the quotes in element 1. (It'll still contain the full text matched in element 0) In general, you can have more than one set of these parentheses; they're numbered from the left starting at 1, and there will be a corresponding element in $matches for the text that each group matched. Example:
$pattern = '/"([a-z]+) ([a-z]+) (.*)"/';
will select all quoted strings which have two lowercase words separated by a single space, followed by anything. Then $matches[1] will be the first word, $matches[2] the second word, and $matches[3] the "anything".
For finding multiple phrases, you'll need to pick out one at a time with preg_match(). There's an optional "offset" parameter you can pass, which indicates where in the string it should start searching, and to find multiple matches you should give the position right after the previous match as the offset. See the documentation for details.
You could also try searching Google for "regular expression tutorial" or something like that, there are plenty of good ones out there.
Sorry, but my php is a bit rusty, but this code will probably do what you request:
$search = 'hello "this" is regular expressions';
$pattern = '/"(.*)"/';
$regex = preg_match($pattern, $search, $matches);
print_r($matches[1]);
$matches1 will contain the 1st captured subexpression; $matches or $matches[0] contains the full matched patterns.
See preg_match in the PHP documentation for specifics about subexpressions.
I'm not quite sure what you mean by "multiple phrases in quotes", but if you're trying to match balanced quotes, it's a bit more involved and tricky to understand. I'd pick up a reference manual. I highly recommend Mastering Regular Expressions, by Jeffrey E. F. Friedl. It is, by far, the best aid to understanding and using regular expressions. It's also an excellent reference.
Here is the complete answer for all the sort of search terms (literal, minus, quotes,..) WITH replacements . (For google visitors at the least).
But maybe it should not be done with only regular expressions though.
Not only will it be hard for yourself or other developers to work and add functionality on what would be a huge and super complex regular expression otherwise
it might even be that it is faster with this approach.
It might still need a lot of improvement but at least here is a working complete solution in a class. There is a bit more in here than asked in the question, but it illustrates some reasons behind some choices.
class mySearchToSql extends mysqli {
protected function filter($what) {
if (isset(what) {
//echo '<pre>Search string: '.var_export($what,1).'</pre>';//debug
//Split into different desires
preg_match_all('/([^"\-\s]+)|(?:"([^"]+)")|-(\S+)/i',$what,$split);
//echo '<pre>'.var_export($split,1).'</pre>';//debug
//Surround with SQL
array_walk($split[1],'self::sur',array('`Field` LIKE "%','%"'));
array_walk($split[2],'self::sur',array('`Desc` REGEXP "[[:<:]]','[[:>:]]"'));
array_walk($split[3],'self::sur',array('`Desc` NOT LIKE "%','%"'));
//echo '<pre>'.var_export($split,1).'</pre>';//debug
//Add AND or OR
$this ->where($split[3])
->where(array_merge($split[1],$split[2]), true);
}
}
protected function sur(&$v,$k,$sur) {
if (!empty($v))
$v=$sur[0].$this->real_escape_string($v).$sur[1];
}
function where($s,$OR=false) {
if (empty($s)) return $this;
if (is_array($s)) {
$s=(array_filter($s));
if (empty($s)) return $this;
if($OR==true)
$this->W[]='('.implode(' OR ',$s).')';
else
$this->W[]='('.implode(' AND ',$s).')';
} else
$this->W[]=$s;
return $this;
}
function showSQL() {
echo $this->W? 'WHERE '. implode(L.' AND ',$this->W).L:'';
}
Thanks for all stackoverflow answers to get here!
You're in luck because I asked a similar question regarding string literals recently. You can find it here: Regex for managing escaped characters for items like string literals
I ended up using the following for searching for them and it worked perfectly:
(?<!\\)(?:\\\\)*(\"|')((?:\\.|(?!\1)[^\\])*)\1
This regex differs from the others as it properly handles escaped quotation marks inside the string.

Categories