RegExp - Finding words not starting with a certain char - php

I'm trying to extract words (read: functions) from a string with RegExp and pass them to a PHP function.
The following works pretty well already:
$func = preg_replace("/(\b.+\b)/Ue", 'extract_functions(\'\\1\')', $oneliner);
While it extracts existing functions from the string it also extracts variables with the same name, but without the starting $ char.
So if the string contains an existing function named get_function it also extracts a variable named $get_function but without the starting $, so I can't be sure whether I have a function or variable extracted.
My idea was to exclude words starting with $ but that doesn't seem to work:
$func = preg_replace("/[(\b[^\$].+\b)/Ue", 'extract_functions(\'\\1\')', $oneliner);
I'm out of ideas...

You can use a negative lookbehind to make sure that there's no $ preceding your function/variable:
$func = preg_replace("/(?<!\$)(\b.+\b)/Ue", 'extract_functions(\'\\1\')', $line);
By the way [(\b[^\$] is a bit wrongly formed. You have a character class containing (, \b, ^ and $, which doesn't work. It will actually match any of those characters instead of not matching a $ character.
It would have been a little closer with /[^$](\b.+\b)/ but this one might not work at the beginning of strings.

Thanks to Jeff, here's the solution that works for me:
$filecontent = file_get_contents($file); // Parsing the file's contents into a string
$re = '/(?<!\$)(\b\S+?\b)(?=\()/'; // The pattern
preg_match_all($re, $filecontent, $out, PREG_PATTERN_ORDER);
print_r($out[0]);
I'm using a negative lookbehind as suggested by Jeff as well as a positive lookahead checking for a ( after each word, but without making the ( part of the match.
I went for the ( part as that defines a PHP function as far as I'm concerned.
I'm open for improvements! :-) Thanks to Jeff!

Related

How to get a number from a html source page?

I'm trying to retrieve the followed by count on my instagram page. I can't seem to get the Regex right and would very much appreciate some help.
Here's what I'm looking for:
y":{"count":
That's the beginning of the string, and I want the 4 numbers after that.
$string = preg_replace("{y"\"count":([0-9]+)\}","",$code);
Someone suggested this ^ but I can't get the formatting right...
You haven't posted your strings so it is a guess to what the regex should be... so I'll answer on why your codes fail.
preg_replace('"followed_by":{"count":\d')
This is very far from the correct preg_replace usage. You need to give it the replacement string and the string to search on. See http://php.net/manual/en/function.preg-replace.php
Your second usage:
$string = preg_replace(/^y":{"count[0-9]/","",$code);
Is closer but preg_replace is global so this is searching your whole file (or it would if not for the anchor) and will replace the found value with nothing. What your really want (I think) is to use preg_match.
$string = preg_match('/y":\{"count(\d{4})/"', $code, $match);
$counted = $match[1];
This presumes your regex was kind of correct already.
Per your update:
Demo: https://regex101.com/r/aR2iU2/1
$code = 'y":{"count:1234';
$string = preg_match('/y":\{"count:(\d{4})/', $code, $match);
$counted = $match[1];
echo $counted;
PHP Demo: https://eval.in/489436
I removed the ^ which requires the regex starts at the start of your string, escaped the { and made the\d be 4 characters long. The () is a capture group and stores whatever is found inside of it, in this case the 4 numbers.
Also if this isn't just for learning you should be prepared for this to stop working at some point as the service provider may change the format. The API is a safer route to go.
This regexp should capture value you're looking for in the first group:
\{"count":([0-9]+)\}
Use it with preg_match_all function to easily capture what you want into array (you're using preg_replace which isn't for retrieving data but for... well replacing it).
Your regexp isn't working because you didn't escaped curly brackets. And also you didn't put count quantifier (plus sign in my example) so it would only capture first digit anyway.

Regex to extract substring

really struggling with this...hopefully someone can put me on the right path to a solution.
My input string is structured like this:
66-2141-A-AC107-7
I'm interested in extracting the string 'AC107' using a single regular expression. I know how to do this with other PHP string functions, but I have to do this with a regular expression.
What I need is to extract all data between the third and fourth hyphens. The structure of each section is not fixed (i.e, 66 may be 8798709 and 2141 may be 38). The presence of the number of hyphens is guaranteed (i.e., there will always be a total of four (4) hyphens).
Any help/guidance is greatly appreciated!
This will do what you need:
(?:[^-]*-){3}([^-]+)
Debuggex Demo
Explanation:
(?:[^-]*-) Look for zero or more non-hyphen characters followed by a hyphen
{3} Look for three of the blocks just described
([^-]+) Capture all the consecutive non-hyphen characters from that point forward (will automatically cut off before the next hyphen)
You can use it in PHP like this:
$str = '66-2141-A-AC107-7';
preg_match('/^(?:[^-]*-){3}([^-]+)/', $str, $matches);
echo $matches[1]; // prints AC107
This should look for anything followed by a hyphen 3 times and then in group 2 (the second set of parenthesis) it will have your value, followed by another hyphen and anything else.
/^(.*-){3}(.*)-(.*)/
You can access it by using $2. In php, it would be like this:
$string = '66-2141-A-AC107-7';
preg_match('/^(.*-){3}(.*)-(.*)/', $string, $matches);
$special_id = $matches[2];
print $special_id;

Finding used functions from php source file

Basically I want to find a function from a php string source content. I'm trying to parse a php file and read its content into string. I want to find something like:
function_name(paras) or function_name() or function_name(params, params)
for example if source contains:
echo 'Greetings'.greet("I'm Johan");
$age = date_of_birth(date());
echo 'I am ' .$age . 'years old';
it would then find greet, date_of_birth, date because these are the functions used.
If you want to get the parameters including nested brackets, like your date_of_birth(date()), its maybe not impossible with regex but very difficult.
If you say its enough to find the name of the function then you can try this:
\w+(?=\()
See it here on Regexr
That will match at least one word character that is followed by an opening bracket.
\w contains letters, digits and the underscore
(?=\() is a positive look ahead that checks if a ( is following
you need a regular expression:
maybe something like this
preg_match("[a-zA-Z_][a-zA-Z_0-9]*(.*)",$stringToLookIn);
preg_match_all('/([a-zA-Z_]\w+)\s*\(/', $source, $match);
var_dump($match[1]);

php PCRE regex to get only the file name that terminates in .txt

so I am trying to form a PCRE regex in php, specifically for use with preg_replace, that will match any number of characters that make up a text(.txt) file name, from this I will derive the directory of the file.
my initial approach was to define the terminating .txt string, then attempt to specify a character match on every character except for the / or \, so I ended up with something like:
'/[^\\\\/]*\.txt$/'
but this didn't seem to work at all, I assume it might be interpreting the negation as the demorgan's form aka:
(A+B)' <=> A'B'
but after attempting this test:
'/[^\\\\]\|[^/]*\.txt$/'
I came to the same result, which made me think that I shouldn't escape the or operator(|), but this also failed to match. Anyone know what I'm doing wrong?
The foloowing regular expression should work for getting the filename of .txt files:
$regex = "#.*[\\\\/](.*?\.txt)$#";
How it works:
.* is greedy and thus forces match to be as far to the right as possible.
[\\\\/] ensures that we have a \ or / in front of the filename.
(.*?\.txt) uses non-greedy matching to ensure that the filename is as small as possible, followed by .txt, capturing it into group 1.
$ forces match to be at end of string.
Try this pattern '/\b(?P<files>[\w-.]+\.txt)\b/mi'
$PATTERN = '/\b(?P<files>[\w-.]+\.txt)\b/mi';
$subject = 'foo.bar.txt plop foo.bar.txtbaz foo.txt';
preg_match_all($PATTERN, $subject, $matches);
var_dump($matches["files"]);

How to strip this part of my string?

$string = "Hot_Chicks_call_me_at_123456789";
How can I strip away so that I only have the numberst after the last letter in the string above?
Example, I need a way to check a string and remove everything in front of (the last UNDERSCORE FOLLOWED by the NUMBERS)
Any smart solutions for this?
Thanks
BTW, it's PHP!
Without using a regular expression
$string = "Hot_Chicks_call_me_at_123456789";
echo end( explode( '_', $string ) );
If it always ends in a number you can just match /(\d+)$/ with regex, is the formatting consistent? Is there anything between the numbers like dashes or spaces?
You can use preg_match for the regex part.
<?php
$subject = "abcdef_sdlfjk_kjdf_39843489328";
preg_match('/(\d+)$/', $subject, $matches);
if ( count( $matches ) > 1 ) {
echo $matches[1];
}
I only recommend this solution if speed isn't an issue, and if the formatting is completely consistent.
PHP's PCRE Regular Expression engine was built for this kind of task
$string = "Hot_Chicks_call_me_at_123456789";
$new_string = preg_replace('{^.*_(\d+)$}x','$1',$string);
//same thing, but with whitespace ignoring and comments turned on for explanations
$new_string = preg_replace('{
^.* #match any character at start of string
_ #up to the last underscore
(\d+) #followed by all digits repeating at least once
$ #up to the end of the string
}x','$1',$string);
echo $new_string . "\n";
To be a bit churlish, your stated specification would suggest the following algorithm:
def trailing_number(s):
results = list()
for char in reversed(s):
if char.isalpha(): break
if char.isdigit(): results.append(char)
return ''.join(reversed(results))
It returns only the digits from the end of the string up to the first letter it encounters.
Of course this example is in Python, since I don't know PHP nearly as well. However it should be easily translated as the concept is easy enough ... reverse the string (or iterate from the end towards the beginning) and accumulate digits until you find a letter and break (or fall out of the loop at the beginning of the string).
In C it would be more efficient to use something a bit like for(x=strlen(s);x>s;x--) to walk backwards through the string, saving a pointer to the most recently encountered digit until we break or drop out of the loop at the beginning of the string. Then return the pointer into the middle of the string where our most recent (leftmost) digit was found.

Categories