Split by whitespace only if not surrounded by [,<,{ or ],>,} - php

I have a string like this one:
traceroute <ip-address|dns-name> [ttl <ttl>] [wait <milli-seconds>] [no-dns] [source <ip-address>] [tos <type-of-service>] {router <router-instance>] | all}
I'd like to create an array like this:
$params = array(
<ip-address|dns-name>
[ttl <ttl>]
[wait <milli-seconds]
[no-dns]
[source <ip-address>]
[tos <tos>]
{router <router-instance>] | all}
);
Should I use preg_split('/someregex/', $mystring) ?
Or is there any better solution?

Use negative lookarounds. This one uses a negative lookahead for a <. This means it will not split if it finds a < ahead of the whitespace.
$regex='/\s(?!<)/';
$mystring='traceroute <192.168.1.1> [ttl <120>] [wait <1500>] [no-dns] [source <192.168.1.11>] [tos <service>] {router <instance>] | all}';
$array=array();
$array = preg_split($regex, $mystring);
var_dump($array);
And my output is
array
0 => string 'traceroute <192.168.1.1>' (length=24)
1 => string '[ttl <120>]' (length=11)
2 => string '[wait <1500>]' (length=13)
3 => string '[no-dns]' (length=8)
4 => string '[source <192.168.1.11>]' (length=23)
5 => string '[tos <service>]' (length=15)
6 => string '{router <instance>]' (length=19)
7 => string '|' (length=1)
8 => string 'all}' (length=4)

You could use preg_match_all such as:
preg_match_all("/\\[[^]]*]|<[^>]*>|{[^}]*}/", $str, $matches);
And get your result from the $matches array.

Yes, preg_split makes sense and is probably the most efficient way to do this.
Try:
preg_split('/[\{\[<](.*?)[>\]\}]/', $mystring);
Or if you want to match rather than split, you may want to try:
$matches=array();
preg_match('/[\{\[<](.*?)[>\]\}]/',$mystring,$matches);
print_r($matches);
Updated
I missed that you're trying to get the tokens, not the content of the tokens. I think you are going to need to use preg_match. Try something like this one for a good start:
$matches = array();
preg_match_all('/(\{.*?[\}])|(\[.*?\])|(<.*?>)/', $mystring,$matches);
var_dump($matches);
I get:
Array
(
[0] => Array
(
[0] => <ip-address|dns-name>
[1] => [ttl <ttl>]
[2] => [wait <milli-seconds>]
[3] => [no-dns]
[4] => [source <ip-address>]
[5] => [tos <type-of-service>]
[6] => {router <router-instance>] | all}
)

Related

PHP Regex Match getting unexpected output

I'm trying to create a simple PHP script that retrieves info from a string and puts it into an array. Ive looked around on some sites on multi capture regex for one pattern but can't seem to get the output im looking for
Currently this is my script.
$input = "username: jack number: 20";
//$input = file_get_contents("test.txt");
preg_match_all("/username: ([^\s]+)|number: ([^\s]+)/", $input, $data);
var_dump($data);
Which produces this output:
0 =>
array (size=2)
0 => string 'username: jack' (length=14)
1 => string 'number: 20' (length=10)
1 =>
array (size=2)
0 => string 'jack' (length=4)
1 => string '' (length=0)
2 =>
array (size=2)
0 => string '' (length=0)
1 => string '20' (length=2)
Im looking to get the data into the form of:
0 =>
array (size=x)
0 => string 'jack'
1 =>
array (size=x)
0 => string '20'
Or two different arrays where the keys correspond to the same user/number combo
You can use match-reset \K:
preg_match_all('/\b(?:username|number):\h*\K\S+/', $input, $data);
print_r($data[0]);
Array
(
[0] => jack
[1] => 20
)
RegEx Breakup:
\b => a word boundary
(?:username|number) => matches username or number. (?:..) is non-capturing group
:\h* => matches a colon followed optional horizontal spaces
\K => match reset, causes regex engine to forget matched data
\S+ => match 1 or more non-space chars
Or else you can use a capturing group to get your matched data like this:
preg_match_all('/\b(?:username|number):\h*(\S+)/', $input, $data);
print_r($data[1]);
Array
(
[0] => jack
[1] => 20
)
(?<=username:|number:)\s*(\S+)
You can use lookbehind here.See demo.
https://regex101.com/r/mG8kZ9/10

PHP parsing string to array with regular expressions

I have a string like this:
$msg,array('goo','gle'),000,"face",'book',['twi'=>'ter','link'=>'edin']
I want to use preg_match_all to convert this to an array that could look like this:
array(
0 => $msg,
1 => array('goo','gle'),
2 => 000,
3 => "face",
4 => 'book',
5 => ['twi'=>'ter','link'=>'edin']
);
Note that all the values are string .
I am not very good at regular expressions, so I have just been unable to create a Pattern for this. Multiple preg calls will also do.
I suggest using preg_split with the following regex:
$re = "/([a-z]*(?:\\[[^]]*\\]|\\([^()]*\\)),?)|(?<=,)/";
$str = "\$msg,array('goo','gle'),000,\"face\",'book',['twi'=>'ter','link'=>'edin']";
print_r(preg_split($re, $str, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY));
Output of the sample program:
Array
(
[0] => $msg,
[1] => array('goo','gle'),
[2] => 000,
[3] => "face",
[4] => 'book',
[5] => ['twi'=>'ter','link'=>'edin']
)
I know you asked for a regular expression solution, however I'm on an eval() kick today:
eval('$array = array('.$string.');');
print_r($array);
Also note that 000 is NOT a string and will be converted to 0.

Get content from html file

I have a list of html files. Each file repeatedly has the strings onClick="rpd(SOME_NUMBER)" . I know how to get the content from the html files, what I would want to do is get a list of the "SOME_NUMBER" . I saw that I might need to do a preg_match, but I'm horrible at regular expressions. I tried
$file_content = file_get_contents($url);
$pattern= 'onClick="rpd(#);"';
preg_match($pattern, $file_content);
As you could imagine... it didn't work. What would be the best way to get this done? Thanks!
This should get it done:
$file_content ='234=fdf donClick="rpd(5);"as23 f2 onClick="rpd(7);" dff fonClick="rpd(8);"';
$pattern= '/onClick="rpd\((\d+)\);"/';
preg_match_all($pattern, $file_content,$matches);
var_dump( $matches);
The output is like this:
array (size=2)
0 =>
array (size=3)
0 => string 'onClick="rpd(5);"' (length=17)
1 => string 'onClick="rpd(7);"' (length=17)
2 => string 'onClick="rpd(8);"' (length=17)
1 =>
array (size=3)
0 => string '5' (length=1)
1 => string '7' (length=1)
2 => string '8' (length=1)
Maybe something like this?
preg_match('/onClick="rpd\((\d+)\);"/', $file_content,$matches);
print $matches[1];
I don't know PHP, but the regular expression to match that would be:
'onClick="rpd\(([0-9]+)\)"'
Note that we need to escape those paranthesis with \ because of their special meaning, also we surrounded our match with one regular paranthesis for seperating digits.
If preg_match also supports lookahead/lookbehind expressions:
'(?<=onClick="rpd\()[0-9]+(?=\)")'
will also work.
$file_content='blah blah onClick="rpd(56)"; blah blah\nblah blah onClick="rpd(43)"; blah blah\nblah blah onClick="rpd(11)"; blah blah\n';
$pattern= '/onClick="rpd\((\d+)\)";/';
preg_match_all($pattern, $file_content, $matches);
print_r($matches);
That outputs:
Array
(
[0] => Array
(
[0] => onClick="rpd(56)";
[1] => onClick="rpd(43)";
[2] => onClick="rpd(11)";
)
[1] => Array
(
[0] => 56
[1] => 43
[2] => 11
)
)
You can play around with my example here: http://ideone.com/TzShPG
A clean way to do this is to use DOMDocument and XPath:
$doc = new DOMDocument();
#$doc->loadHTMLFile($url);
$xpath = new DOMXPath($doc);
$ress= $xpath->query("//*[contains(#onclick,'rpd(')]/attribute::onclick");
foreach ($ress as $res) {
echo substr($res->value,4,-1) . "\n";
}

the fastest way to replace (and store in array) links in the text with their order numbers

There is a $str string that may contain html text including <a >link</a> tags.
I want to store links in array and set the proper changes in the $str.
For example, with this string:
$str="some text <a href='/review/'>review</a> here <a class='abc' href='/about/'>link2</a> hahaha";
we get:
linkArray[0]="<a href='/review/'>review</a>";
positionArray[0] = 10;//position of the first link in the string
linkArray[1]="<a class='abc' href='/about/'>link2</a>";
positionArray[1]=45;//position of the second link in the string
$changedStr="some text [[0]] here [[1]] hahaha";
Is there any faster way (the performance) to do that, than running through the whole string using for?
this can be done by preg_match_all with PREG_OFFSET_CAPTURE FLAG.
e.g.
$str="some text <a href='/review/'>review</a> here <a class='abc' href='/about/'>link2</a> hahaha";
preg_match_all("|<[^>]+>(.*)</[^>]+>|U",$str,$out,PREG_OFFSET_CAPTURE);
var_dump($out);
Here the output array is $out. PREG_OFFSET_CAPTURE captures the offset in the string where the pattern starts.
The above code will output:
array (size=2)0 =>
array (size=2)
0 =>
array (size=2)
0 => string '<a href='/review/'>review</a>' (length=29)
1 => int 10
1 =>
array (size=2)
0 => string '<a class='abc' href='/about/'>link2</a>' (length=39)
1 => int 45
1 =>
array (size=2)
0 =>
array (size=2)
0 => string 'review' (length=6)
1 => int 29
1 =>
array (size=2)
0 => string 'link2' (length=5)
1 => int 75
for more information you can click on the link http://php.net/manual/en/function.preg-match-all.php
for $changedStr:
let $out be the output string from preg_match_all
$count= 0;
foreach($out[0] as $result) {
$temp=preg_quote($result[0],'/');
$temp ="/".$temp."/";
$str =preg_replace($temp, "[[".$count."]]", $str,1);
$count++;
}
var_dump($str);
This gives the output :
string 'some text [[0]] here [[1]] hahaha' (length=33)
I would use a regular expression to do such, check this:
http://weblogtoolscollection.com/regex/regex.php
try them here:
http://www.solmetra.com/scripts/regex/index.php
And use this:
http://php.net/manual/en/function.preg-match-all.php
Find your best regular expression to solve every case you may find: preg_match_all, if you set the pattern correctly, will return you an array containing every link you desire.
Edit:
In your case, assuming you want to keep the "<a>", this may work:
$array = array();
preg_match_all('/<a.*.a>/', '{{your data}}', $arr, PREG_PATTERN_ORDER);
Input example:
test
Lkdlasdk
llkdla
xx
Output with the above regexp:
Array
(
[0] => Array
(
[0] => test
[1] => Lkdlasdk
[2] => xx
)
)
Hope this helps

php regex split string by [%%%]

Hi I need a preg_split regex that will split a string at substrings in square brackets.
This example input:
$string = 'I have a string containing [substrings] in [brackets].';
should provide this array output:
[0]= 'I have a string containing '
[1]= '[substrings]'
[2]= ' in '
[3]= '[brackets]'
[4]= '.'
After reading your revised question:
This might be what you want:
$string = 'I have a string containing [substrings] in [brackets].';
preg_split('/(\[.*?\])/', $string, null, PREG_SPLIT_DELIM_CAPTURE);
You should get:
Array
(
[0] => I have a string containing
[1] => [substrings]
[2] => in
[3] => [brackets]
[4] => .
)
Original answer:
preg_split('/%+/i', 'ot limited to 3 %%% so it can be %%%% or % or %%%%%, etc Tha');
You should get:
Array
(
[0] => ot limited to 3
[1] => so it can be
[2] => or
[3] => or
[4] => , etc Tha
)
Or if you want a mimimum of 3 then try:
preg_split('/%%%+/i', 'Not limited to 3 %%% so it can be %%%% or % or %%%%%, etc Tha');
Have a go at http://regex.larsolavtorvik.com/
I think this is what you are looking for:
$array = preg_split('/(\[.*?\])/', $string, null, PREG_SPLIT_DELIM_CAPTURE);

Categories