Ending regex with a specific character on first appearance

Ending regex with a specific character on first appearance - php

I have several source codes that I'm applying preg_match_all on.
this is what I tried:
$lazy = file_get_contents("Some_Source_code.txt");
if(!preg_match("#method_(.*)\(int var0, int var1, int var2\)#", $lazy, $function_name))
die("nothing here");
preg_match_all("#method_".$function_name[1]."\(.*\){1}#", $lazy, $matches);
print_r($matches);
but the output comes like this:
Array
(
[0] => Array
(
[0] => method_2393(int var0, int var1, int var2)
[1] => method_2393(0, 0, 0)).equals(this.field_1351.getText().toString()))
)
)
ok, what I want is $matches[0][1]. But
How can I stop it once it detects the closing parentheses ' ) ' just like the first one.
I can process the line after I extract it, but how can I do it with regex?
I searched the answers of similar problems but they were too specific.

Modify the regex as
#method_".$function_name[1]."\([^)]*\){1}#
Where you got wrong
#method_".$function_name[1]."\(.*\){1}#
here you used \(.*\) where .* would match anything including the )
Changes made
\([^)]*\) here [^)]* it matches anything other than ) so that it ends with the first occurence of the )
You can also use a lazy matching using .*? instead of .* which is gready and consumes as much as characters as it can

Related

preg_match first string after string in parentheses

I've looked around for solutions close to this but have not been successful in finding a solution. I'm looking to clean up some legacy code via php_codesniffer but the fixer doesn't fix comments or arrays that go past 80 cols just lets you know about them. I have a solution that works for the comments but I am getting stuck on the regex for the arrays.
A sample line I would like to fix is:
$line = "drupal_add_js(array('my_common' => array('my_code_validate' => variable_get('my_code_validate', FALSE), 'inner_index2 => 'inner_value2'), 'another_item' => 'another_value'), 'setting');";
$solution = preg_match('/array.*(\(.*?\))/', $line);
echo $solution;
I'd like
$solution = "'my_common' => array('my_code_validate' => variable_get('my_code_validate', FALSE), 'inner_index2 => 'inner_value2'), 'another_item' => 'another_value'";
but I am getting 1 instead. Notice that there is another array in there which is fairly common. I only want to capture the first array's values, and then I can split them up on separate lines from there. Ultimately I'd like to share my solutions to the php codesniffer project so bonus points for showing how to code a new fixer for squizlabs.

You may use
if (preg_match('~array(\(((?:[^()]++|(?1))*)\))~', $s, $matches)) {
echo $matches[2];
}
See this demo.
Details
array - a literal substring
(\(((?:[^()]++|(?1))*)\)) - Group 1:
\(
((?:[^()]++|(?1))*) - Group 2 (the required value):
(?:[^()]++|(?1))* - zero or more repetitions of 1+ chars other than ( and ) or the whole Group 1 pattern recursed
\) - a ) char

Try this solution:
.*?array\(('.*?)\), [^\)]+'\);.*
Replace with:
$1
Demo: https://regex101.com/r/oV4nvT/4/

RegEx for hashtag separated string

I have bunch of strings like this:
a#aax1aay222b#bbx4bby555bbz6c#mmm1d#ara1e#abc
And what I need to do is to split them up based on the hashtag position to something like this:
Array
(
[0] => A
[1] => AAX1AAY222
[2] => B
[3] => BBX4BBY555BBZ6
[4] => C
[5] => MMM1
[6] => D
[7] => ARA1
[8] => E
[9] => ABC
)
So, as you see the character right behind the hashtag is captured plus everything after the hashtag just right before the next char+hashtag.
I've the following RegEx which works fine only when I have a numeric value in the end of each part.
Here is the RegEx set up:
preg_split('/([A-Z])+#/', $text, 0, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
And it works fine with something like this:
C#mmm1D#ara1
But, if I change it to this (removing the numbers):
C#mmmD#ara
Then it will be the result, which is not good:
Array
(
[0] => C
[1] => D
)
I've looked at this question and this one also, which are similar but none of them worked for me.
So, my question is why does it work only if it has followed by a number? and how I can solve it?
Here you can see some of them sample strings which I have:
a#123b#abcc#def456 // A:123, B:ABC, C:DEF456
a#abc1def2efg3b#abcdefc#8 // A:ABC1DEF2EFG3, B:ABCDEF, C:8
a#abcdef123b#5c#xyz789 // A:ABCDEF123, B:5, C:XYZ789
P.S. Strings are case-insensitive.
P.P.S. If you ever thinking what the hell are these strings, they are user submitted answers to a questionnaire, and I can't do anything on them like refactoring as they are already stored and just need to be proceed.
Why Not Using explode?
If you look at my examples you will see that I need to capture the character right before the # as well. If you think it's possible with explode() please post the output as well, thanks!
Update
Should we focus on why /([A-Z])+#/ works only if numbers included? thanks.

Instead of using preg_split(), decide what you want to match instead:
A set of "words" if followed by either <any-char># or <end-of-string>.
A character if immediately followed by #.
$str = 'a#aax1aay222b#bbx4bby555bbz6c#mmm1d#ara1e#abc';
preg_match_all('/\w+(?=.#|$)|\w(?=#)/', $str, $matches);
Demo
This expression uses two look-ahead assertions. The results are in $matches[0].
Update
Another way of looking at it would be this:
preg_match_all('/(\w)#(\w+)(?=\w#|$)/', $str, $matches);
print_r(array_combine($matches[1], $matches[2]));
Each entry starts with a single character, followed by a hash, followed by X characters until either the end of the string is encountered or the start of a next entry.
The output is this:
Array
(
[a] => aax1aay222
[b] => bbx4bby555bbz6
[c] => mmm1
[d] => ara1
[e] => abc
)

If you still want to use preg_split you can remove the + and it might work as expected:
'/([A-Z])#/i'
Since then you only match the hashtag and ONE alpha character before, and not all them.
Example: http://codepad.viper-7.com/z1kFDb
Edit: Added a case-insensitive flag i in the pattern.

Use explode() rather than Regexp
$tmpArray = explode("#","a#aax1aay222b#bbx4bby555bbz6c#mmm1d#ara1e#abc");
$myArray = array();
for($i = 0; $i < count($tmpArray) - 1; $i++) {
if (substr($tmpArray[$i],0,-1)) $myArray[] = substr($tmpArray[$i],0,-1);
if (substr($tmpArray[$i],-1)) $myArray[] = substr($tmpArray[$i],-1);
}
if (count($tmpArray) && $tmpArray[count($tmpArray) - 1]) $myArray[] = $tmpArray[count($tmpArray) - 1];
edit: I updated my answer to reflect better reading the questions

You can use explode() function that will split the string except the hash signs, like stated in the answers given before.
$myArray = explode("#",$string);
For the string 'a#aax1aay222b#bbx4bby555bbz6c#mmm1d#ara1e#abc' this returns something like
$myarray = array('a', 'aax1aay22b', 'bbx4bby555bbz6c' ....);
All you need now is to take the last character of each string in array as another item.
$copy = array();
foreach($myArray as $item){
$beginning = substr($item,0,strlen($item)-1); // this takes all characters except the last one
$ending = substr($item,-1); // this takes the last one
$copy[] = $beginning;
$copy[] = $ending;
} // end foreach
This is an example, not tested.
EDIT
Instead of substr($item,0,strlen($item)-1); you might use substr($item,0,-1);.

Regex - How to match one pattern at a time

I've this function that parses some content to retrieve homemade link tag and convert it to normal link tag.
Possible input:
<p>blabalblahhh <moolinkx pageid="121">text to click</moolinkx> blablabah</p>
Output :
<p>blabalblahhh text to click blablabah</p>
Here is my code:
$regex = '/\<moolinkx pageid="(.{1,})"\>(.{1,})\<\/moolinkx\>/';
preg_match_all( $regex, $string, $matches );
It works perfectly well if there is only one in the string. But as soon as there is a second one, it doesn't work.
Input:
<p>blabalblahhh <moolinkx pageid="121">text to click</moolinkx> blablabah.</p>
<p>Another <moolinkx pageid="128">text to clickclick</moolinkx> again blablablah.</p>
That's what I got when I print_r($matches):
Array
(
[0] => Array
(
[0] => <moolinkx pageid="121">text to click</moolinkx> blablabah.</p><p>Another <moolinkx pageid="128">text to clickclick</moolinkx>
)
[1] => Array
(
[0] => 121">text to click</moolinkx> blablabah.</p><p>Another <moolinkx pageid="128
)
[2] => Array
(
[0] => text to clickclick
)
)
I'm not at ease with regex, so it must be something very trivial... but I can't pinpoint what it is :(
Thank you very much in advance!
NB: This is my first post here, though I've been using this terrific Q&A for ages!

Use a negative Regex:
$regex = '/<moolinkx pageid="([^"]+)">([^<]+)<\/moolinkx>/';
Explained demo here: http://regex101.com/r/sI3wK5

You are using a greedy selector, which is recognising everything between the first openning tag and the last closing tag as the content between the tags. Change your regex to:
$regex = '/\<moolinkx pageid="(.+?)"\>(.+?)\<\/moolinkx\>/';
preg_match_all( $regex, $string, $matches );
Notice the .{1,} has changed to .+?. The + means one or more instances, and the ? tells the regex to select the fewest characters it can to fulfil the expression.

Removing parentheses from a string

I'd like to remove all parentheses from a set of strings running through a loop. The best way that I've seen this done is with the use of preg_replace(). However, I am having a hard time understanding the pattern parameter.
The following is the loop
$coords= explode (')(', $this->input->post('hide'));
foreach ($coords as $row)
{
$row = trim(preg_replace('/\*\([^)]*\)/', '', $row));
$row = explode(',',$row);
$lat = $row[0];
$lng = $row[1];
}
And this is the value of 'hide'.
(1.4956873362063747, 103.875732421875)(1.4862491569669245, 103.85856628417969)(1.4773257504016037, 103.87968063354492)
That pattern is wrong as far as i know. i got it from another thread, i tried to read about patterns but couldn't get it. I am rather short on time so I posted this here while also searching for other ways in other parts of the net. Can someone please supply me with the correct pattern for what I am trying to do? Or is there an easier way of doing this?
EDIT: Ah, just got how preg_replace() works. Apparently I misunderstood how it worked, thanks for the info.

I see you actually want to extract all the coordinates
If so, better use preg_match_all:
$ php -r '
preg_match_all("~\(([\d\.]+), ?([\d\.]+)\)~", "(654,654)(654.321, 654.12)", $matches, PREG_SET_ORDER);
print_r($matches);
'
Array
(
[0] => Array
(
[0] => (654,654)
[1] => 654
[2] => 654
)
[1] => Array
(
[0] => (654.321, 654.12)
[1] => 654.321
[2] => 654.12
)
)

I don't understand entirely why you would need preg_replace. explode() removes the delimiters, so all you have to do is remove the opening and closing parantheses on the first and last string respectively. You can use substr() for that.
Get first and last elements of array:
$first = reset($array);
$last = end($array);
Hope that helps.

"And this is the value of $coords."
If $coords is a string, your foreach makes no sense. If that string is your input, then:
$coords= explode (')(', $this->input->post('hide'));
This line removes the inner parentheses from your string, so your $coords array will be:
(1.4956873362063747, 103.875732421875
1.4862491569669245, 103.85856628417969
1.4773257504016037, 103.87968063354492)

The pattern parameter accepts a regular expression. The function returns a new string where all parts of the original that match the regex are replaced by the second argument, i.e. replacement
How about just using preg_replace on the original string?
preg_replace('#[()]#',"",$this->input->post('hide'))
To dissect your current regex, you are matching:
an asterisk character,
followed by an opening parenthesis,
followed by zero or more instances of
any character but a closing parenthesis
followed by a closing parenthesis
Of course, this will never match, since exploding the string removed the closing and opening parentheses from the chunks.

PHP RegExp revert assertion to get the opposite match

this is my first question so please be nice :). I'm trying to build a regexp to get an array of IPs that are both valid (OK, at least with the proper IPv4 format) and NOT a private IP according to RFC 1918. So far, I've figured out a way to get exactly the opposite, I mean succcssfuly matching private IPs, so all what I need is a way to revert the assertion. This is the code so far:
// This is an example string
$ips = '10.0.1.23, 192.168.1.2, 172.24.2.189, 200.52.85.20, 200.44.85.20';
preg_match_all('/(?:10\.\d{1,3}|172\.(?:1[6-9]|2\d|3[01])|192\.168)\.\d{1,3}\.\d{1,3}/', $ips, $matches);
print_r($matches);
// Prints:
Array
(
[0] => Array
(
[0] => 10.0.1.23
[1] => 192.168.1.2
[2] => 172.24.2.189
)
)
And what I want as result is:
Array
(
[0] => Array
(
[0] => 200.52.85.20
[1] => 200.44.85.20
)
)
I've tried changing the first part of the expression (the lookahead) to be negative (?!) but this messes up the results and don't even switch the result.
If you need any more informartion please feel free to ask, many thanks in advance.

There is a PHP function for that: filter_var(). Check these constants: FILTER_FLAG_IPV4, FILTER_FLAG_NO_PRIV_RANGE.
However if you still want to solve this with regular expressions, I suggest you split your problem in two parts: first you extract all the IP addresses, then you filter out the private ones.

If all you want to do is to exclude a relatively small range of ip's, you could do this
(if I didn't make any typo's):
/(?!\b(?:10\.\d{1,3}|172\.(?:1[6-9]|2\d|3[01])|192\.168)\.\d{1,3}\.\d{1,3}\b)\b(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b)/
Example in Perl:
use strict;
use warnings;
my #found = '10.0.1.23, 192.168.1.2, 172.24.2.189, 200.52.85.20, 200.44.85.20' =~
/
(?!
\b
(?:
10\.\d{1,3}
|
172\.
(?:
1[6-9]
| 2\d
| 3[01]
)
|
192\.168
)
\.\d{1,3}
\.\d{1,3}
\b
)
\b(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b)
/xg;
for (#found) {
print "$_\n";
}
Output:
200.52.85.20
200.44.85.20

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Ending regex with a specific character on first appearance - php

Related

preg_match first string after string in parentheses

RegEx for hashtag separated string

Regex - How to match one pattern at a time

Removing parentheses from a string

PHP RegExp revert assertion to get the opposite match

Categories

Resources