PHP RegExp revert assertion to get the opposite match - php

this is my first question so please be nice :). I'm trying to build a regexp to get an array of IPs that are both valid (OK, at least with the proper IPv4 format) and NOT a private IP according to RFC 1918. So far, I've figured out a way to get exactly the opposite, I mean succcssfuly matching private IPs, so all what I need is a way to revert the assertion. This is the code so far:
// This is an example string
$ips = '10.0.1.23, 192.168.1.2, 172.24.2.189, 200.52.85.20, 200.44.85.20';
preg_match_all('/(?:10\.\d{1,3}|172\.(?:1[6-9]|2\d|3[01])|192\.168)\.\d{1,3}\.\d{1,3}/', $ips, $matches);
print_r($matches);
// Prints:
Array
(
[0] => Array
(
[0] => 10.0.1.23
[1] => 192.168.1.2
[2] => 172.24.2.189
)
)
And what I want as result is:
Array
(
[0] => Array
(
[0] => 200.52.85.20
[1] => 200.44.85.20
)
)
I've tried changing the first part of the expression (the lookahead) to be negative (?!) but this messes up the results and don't even switch the result.
If you need any more informartion please feel free to ask, many thanks in advance.

There is a PHP function for that: filter_var(). Check these constants: FILTER_FLAG_IPV4, FILTER_FLAG_NO_PRIV_RANGE.
However if you still want to solve this with regular expressions, I suggest you split your problem in two parts: first you extract all the IP addresses, then you filter out the private ones.

If all you want to do is to exclude a relatively small range of ip's, you could do this
(if I didn't make any typo's):
/(?!\b(?:10\.\d{1,3}|172\.(?:1[6-9]|2\d|3[01])|192\.168)\.\d{1,3}\.\d{1,3}\b)\b(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b)/
Example in Perl:
use strict;
use warnings;
my #found = '10.0.1.23, 192.168.1.2, 172.24.2.189, 200.52.85.20, 200.44.85.20' =~
/
(?!
\b
(?:
10\.\d{1,3}
|
172\.
(?:
1[6-9]
| 2\d
| 3[01]
)
|
192\.168
)
\.\d{1,3}
\.\d{1,3}
\b
)
\b(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b)
/xg;
for (#found) {
print "$_\n";
}
Output:
200.52.85.20
200.44.85.20

Related

How to numerically sort an array like this: ['11--2017 name.png','1--2016 name.png','2--1999 name.png']

Am I correct that character precedence would order these like this:
1--2016 name.png, 11--2017 name.png, 2--1999 name.png
Numerically, however, they would be like this:
1--2016 name.png, 2--1999 name.png, 11--2017 name.png
That is, if I'm looking at the first numbers alone. How do you numerically sort an array with strings like this? Namely, integers appended with "--".
It's important to note that these "strings" are actually pathnames which cannot be renamed. See glob for more information.
Edit, after modified question:
After your edit, obviously all answers in this thread are wrong. Also, you don't have to only copy-and-paste a piece of code, but to read entire answer. Sure enough, in my original answer, I say:
if you have a value like “12--3”, it will be sorted like “123”
So, you could see right away that your real case is not coherent with provided sample.
This second solution will sort an array by number at start of given basename path followed by two dashes. It will be applicable on following cases:
String Will be sorted by
------------------------------ -----------------
/Absolute/Path/12-- 12
/Absolute/Path/12--2001.png 12
/12--2001.png 12
12--2001.png 12
a12--2001.png a12--2001.png
-12--2001.png -12--2001.png
Having this array:
[
'/path/to/image/1--2016 name.png',
'/path/to/image/11--2017.png',
'/path/to/image/2--1999.png'
]
You can replace regular expression patter of above original solution with this pattern:
~^(.*/)?(\d+)--[^/]*$~
And above array will be sorted in this way:
Array
(
[0] => /path/to/image/1--2016 name.png
[1] => /path/to/image/2--1999.png
[2] => /path/to/image/11--2017.png
)
eval.in demo
Pattern explanation:
~
^ # Start of string
(.*/)? # Group 1 (optional): zero-ore-more characters followed by a slash
(\d+) # Group 2: one-or-more digits
-- # two dashes
[^/]* # zero-or-more characters, except slash
$ # End of string
~
In the future, take a look at How to create a Minimal, Complete, and Verifiable example
Original answer (for original question):
There are surely many ways to obtain your result. Using usort and preg_replace:
$array = ['11--','23--','1--'];
usort
(
$array,
function( $a, $b )
{
return preg_replace( '~[^\d]~', '', $a ) - preg_replace( '~[^\d]~', '', $b );
}
);
$array now is:
Array
(
[0] => 1--
[1] => 11--
[2] => 23--
)
Above solution will sort your array deleting1 all not digits characters.
So, if you have a value like 12--3, it will be sorted like 123. Consequently, it doesn't work on not-integer or negative numbers.
1 Actually, the original array values are not changed.
If you wanted a quick fix to getting this done, you could:
$strings = array('5--', '2--', '11--');
$newStrings = array();
foreach ($strings as $string) {
$stringNew = str_replace('--', '', $string);
array_push($newStrings, $stringNew);
}
sort($newStrings);
$doneArray = array();
foreach ($newStrings as $newString) {
array_push($doneArray, $newString.'--');
}
// $doneArray is the new array full of the sorted strings.
I didn't really bother with the variable names, but that's a nice way to do it.
natsort
See here.
I'm not sure how glob sorts things as they come in, but I thought that sort would have ordered them correctly, but natsort will do the trick.

Ending regex with a specific character on first appearance

I have several source codes that I'm applying preg_match_all on.
this is what I tried:
$lazy = file_get_contents("Some_Source_code.txt");
if(!preg_match("#method_(.*)\(int var0, int var1, int var2\)#", $lazy, $function_name))
die("nothing here");
preg_match_all("#method_".$function_name[1]."\(.*\){1}#", $lazy, $matches);
print_r($matches);
but the output comes like this:
Array
(
[0] => Array
(
[0] => method_2393(int var0, int var1, int var2)
[1] => method_2393(0, 0, 0)).equals(this.field_1351.getText().toString()))
)
)
ok, what I want is $matches[0][1]. But
How can I stop it once it detects the closing parentheses ' ) ' just like the first one.
I can process the line after I extract it, but how can I do it with regex?
I searched the answers of similar problems but they were too specific.
Modify the regex as
#method_".$function_name[1]."\([^)]*\){1}#
Where you got wrong
#method_".$function_name[1]."\(.*\){1}#
here you used \(.*\) where .* would match anything including the )
Changes made
\([^)]*\) here [^)]* it matches anything other than ) so that it ends with the first occurence of the )
You can also use a lazy matching using .*? instead of .* which is gready and consumes as much as characters as it can

preg match to get text after # symbol and before next space using php

I need help to find out the strings from a text which starts with # and till the next immediate space by preg_match in php
Ex : I want to get #string from this line as separate.
In this example, I need to extract "#string" alone from this line.
Could any body help me to find out the solutions for this.
Thanks in advance!
PHP and Python are not the same in regard to searches. If you've already used a function like strip_tags on your capture, then something like this might work better than the Python example provided in one of the other answers since we can also use look-around assertions.
<?php
$string = <<<EOT
I want to get #string from this line as separate.
In this example, I need to extract "#string" alone from this line.
#maybe the username is at the front.
Or it could be at the end #whynot, right!
dog#cat.com would be an e-mail address and should not match.
EOT;
echo $string."<br>";
preg_match_all('~(?<=[\s])#[^\s.,!?]+~',$string,$matches);
print_r($matches);
?>
Output results
Array
(
[0] => Array
(
[0] => #string
[1] => #maybe
[2] => #whynot
)
)
Update
If you're pulling straight from the HTML stream itself, looking at the Twitter HTML it's formatted like this however:
<s>#</s><b>UserName</b>
So to match a username from the html stream you would match with the following:
<?php
$string = <<<EOT
<s>#</s><b>Nancy</b> what are you on about?
I want to get <s>#</s><b>string</b> from this line as separate. In this example, I need to extract "#string" alone from this line.
<s>#</s><b>maybe</b> the username is at the front.
Or it could be at the end <s>#</s><b>WhyNot</b>, right!
dog#cat.com would be an e-mail address and should not match.
EOT;
$matchpattern = '~(<s>(#)</s><b\>([^<]+)</b>)~';
preg_match_all($matchpattern,$string,$matches);
$users = array();
foreach ($matches[0] as $username){
$cleanUsername = strip_tags($username);
$users[]=$cleanUsername;
}
print_r($users);
Output
Array
(
[0] => #Nancy
[1] => #string
[2] => #maybe
[3] => #WhyNot
)
Just do simply:
preg_match('/#\S+/', $string, $matches);
The result is in $matches[0]

RegEx for hashtag separated string

I have bunch of strings like this:
a#aax1aay222b#bbx4bby555bbz6c#mmm1d#ara1e#abc
And what I need to do is to split them up based on the hashtag position to something like this:
Array
(
[0] => A
[1] => AAX1AAY222
[2] => B
[3] => BBX4BBY555BBZ6
[4] => C
[5] => MMM1
[6] => D
[7] => ARA1
[8] => E
[9] => ABC
)
So, as you see the character right behind the hashtag is captured plus everything after the hashtag just right before the next char+hashtag.
I've the following RegEx which works fine only when I have a numeric value in the end of each part.
Here is the RegEx set up:
preg_split('/([A-Z])+#/', $text, 0, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
And it works fine with something like this:
C#mmm1D#ara1
But, if I change it to this (removing the numbers):
C#mmmD#ara
Then it will be the result, which is not good:
Array
(
[0] => C
[1] => D
)
I've looked at this question and this one also, which are similar but none of them worked for me.
So, my question is why does it work only if it has followed by a number? and how I can solve it?
Here you can see some of them sample strings which I have:
a#123b#abcc#def456 // A:123, B:ABC, C:DEF456
a#abc1def2efg3b#abcdefc#8 // A:ABC1DEF2EFG3, B:ABCDEF, C:8
a#abcdef123b#5c#xyz789 // A:ABCDEF123, B:5, C:XYZ789
P.S. Strings are case-insensitive.
P.P.S. If you ever thinking what the hell are these strings, they are user submitted answers to a questionnaire, and I can't do anything on them like refactoring as they are already stored and just need to be proceed.
Why Not Using explode?
If you look at my examples you will see that I need to capture the character right before the # as well. If you think it's possible with explode() please post the output as well, thanks!
Update
Should we focus on why /([A-Z])+#/ works only if numbers included? thanks.
Instead of using preg_split(), decide what you want to match instead:
A set of "words" if followed by either <any-char># or <end-of-string>.
A character if immediately followed by #.
$str = 'a#aax1aay222b#bbx4bby555bbz6c#mmm1d#ara1e#abc';
preg_match_all('/\w+(?=.#|$)|\w(?=#)/', $str, $matches);
Demo
This expression uses two look-ahead assertions. The results are in $matches[0].
Update
Another way of looking at it would be this:
preg_match_all('/(\w)#(\w+)(?=\w#|$)/', $str, $matches);
print_r(array_combine($matches[1], $matches[2]));
Each entry starts with a single character, followed by a hash, followed by X characters until either the end of the string is encountered or the start of a next entry.
The output is this:
Array
(
[a] => aax1aay222
[b] => bbx4bby555bbz6
[c] => mmm1
[d] => ara1
[e] => abc
)
If you still want to use preg_split you can remove the + and it might work as expected:
'/([A-Z])#/i'
Since then you only match the hashtag and ONE alpha character before, and not all them.
Example: http://codepad.viper-7.com/z1kFDb
Edit: Added a case-insensitive flag i in the pattern.
Use explode() rather than Regexp
$tmpArray = explode("#","a#aax1aay222b#bbx4bby555bbz6c#mmm1d#ara1e#abc");
$myArray = array();
for($i = 0; $i < count($tmpArray) - 1; $i++) {
if (substr($tmpArray[$i],0,-1)) $myArray[] = substr($tmpArray[$i],0,-1);
if (substr($tmpArray[$i],-1)) $myArray[] = substr($tmpArray[$i],-1);
}
if (count($tmpArray) && $tmpArray[count($tmpArray) - 1]) $myArray[] = $tmpArray[count($tmpArray) - 1];
edit: I updated my answer to reflect better reading the questions
You can use explode() function that will split the string except the hash signs, like stated in the answers given before.
$myArray = explode("#",$string);
For the string 'a#aax1aay222b#bbx4bby555bbz6c#mmm1d#ara1e#abc' this returns something like
$myarray = array('a', 'aax1aay22b', 'bbx4bby555bbz6c' ....);
All you need now is to take the last character of each string in array as another item.
$copy = array();
foreach($myArray as $item){
$beginning = substr($item,0,strlen($item)-1); // this takes all characters except the last one
$ending = substr($item,-1); // this takes the last one
$copy[] = $beginning;
$copy[] = $ending;
} // end foreach
This is an example, not tested.
EDIT
Instead of substr($item,0,strlen($item)-1); you might use substr($item,0,-1);.

Is there a way to match recursively/nested with regex? (PHP, preg_match_all)

How can I match both (http://[^"]+)'s?:
(I know it's an illegal URL, but same idea)
I want the regex to give me these two matches:
1 http://yoursite.com/goto/http://aredirectURL.com/extraqueries
2 http://aredirectURL.com/extraqueries
Without running multiple preg_match_all's
Really stumped, thanks for any light you can shed.
This regular expression will get you the output you want: ((?:http://[^"]+)(http://[^"]+)). Note the usage of the non-capturing group (?:regex). To read more about non-capturing groups, see Regular Expression Advanced Syntax Reference.
<?php
preg_match_all(
'((?:http://[^"]+)(http://[^"]+))',
'',
$out);
echo "<pre>";
print_r($out);
echo "</pre>";
?>
The above code outputs the following:
Array
(
[0] => Array
(
[0] => http://yoursite.com/goto/http://aredirectURL.com/extraqueries
)
[1] => Array
(
[0] => http://aredirectURL.com/extraqueries
)
)
you can split the string with this function:
http://de.php.net/preg_split
each part can contain e.g. one of the urls in the array given in the result.
if there is more content maybe call the preg_split using a callback operation while your full text is "worked" on.
$str = '';
preg_match("/\"(http:\/\/.*?)(http:\/\/.*?)\"/i", $str, $match);
echo "{$match[0]}{$match[1]}\n";
echo "{$match[1]}\n";

Categories