PHP parsing string to array with regular expressions - php

I have a string like this:
$msg,array('goo','gle'),000,"face",'book',['twi'=>'ter','link'=>'edin']
I want to use preg_match_all to convert this to an array that could look like this:
array(
0 => $msg,
1 => array('goo','gle'),
2 => 000,
3 => "face",
4 => 'book',
5 => ['twi'=>'ter','link'=>'edin']
);
Note that all the values are string .
I am not very good at regular expressions, so I have just been unable to create a Pattern for this. Multiple preg calls will also do.

I suggest using preg_split with the following regex:
$re = "/([a-z]*(?:\\[[^]]*\\]|\\([^()]*\\)),?)|(?<=,)/";
$str = "\$msg,array('goo','gle'),000,\"face\",'book',['twi'=>'ter','link'=>'edin']";
print_r(preg_split($re, $str, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY));
Output of the sample program:
Array
(
[0] => $msg,
[1] => array('goo','gle'),
[2] => 000,
[3] => "face",
[4] => 'book',
[5] => ['twi'=>'ter','link'=>'edin']
)

I know you asked for a regular expression solution, however I'm on an eval() kick today:
eval('$array = array('.$string.');');
print_r($array);
Also note that 000 is NOT a string and will be converted to 0.

Related

Flatten array of regular expressions

I have an array of regular expressions -$toks:
Array
(
[0] => /(?=\D*\d)/
[1] => /\b(waiting)\b/i
[2] => /^(\w+)/
[3] => /\b(responce)\b/i
[4] => /\b(from)\b/i
[5] => /\|/
[6] => /\b(to)\b/i
)
When I'm trying to flatten it:
$patterns_flattened = implode('|', $toks);
I get a regex:
/(?=\D*\d)/|/\b(waiting)\b/i|/^(\w+)/|/\b(responce)\b/i|/\b(from)\b/i|/\|/|/\b(to)\b/i
When I'm trying to:
if (preg_match('/'. $patterns_flattened .'/', 'I'm waiting for a response from', $matches)) {
print_r($matches);
}
I get an error:
Warning: preg_match(): Unknown modifier '(' in ...index.php on line
Where is my mistake?
Thanks.
You need to remove the opening and closing slashes, like this:
$toks = [
'(?=\D*\d)',
'\b(waiting)\b',
'^(\w+)',
'\b(response)\b',
'\b(from)\b',
'\|',
'\b(to)\b',
];
And then, I think you'll want to use preg_match_all instead of preg_match:
$patterns_flattened = implode('|', $toks);
if (preg_match_all("/$patterns_flattened/i", "I'm waiting for a response from", $matches)) {
print_r($matches[0]);
}
If you get the first element instead of all elements, it'll return the whole matches of each regex:
Array
(
[0] => I
[1] => waiting
[2] => response
[3] => from
)
Try it on 3v41.org
<?php
$data = Array
(
0 => '/(?=\D*\d)/',
1 => '/\b(waiting)\b/i',
2 => '/^(\w+)/',
3 => '/\b(responce)\b/i',
4 => '/\b(from)\b/i',
5 => '/\|/',
6 => '/\b(to)\b/i/'
);
$patterns_flattened = implode('|', $data);
$regex = str_replace("/i",'',$patterns_flattened);
$regex = str_replace('/','',$regex);
if (preg_match_all( '/'.$regex.'/', "I'm waiting for a responce from", $matches)) {
echo '<pre>';
print_r($matches[0]);
}
You have to remove the slashes from your regex and also the i parameter in order to make it work. That was the reason it was breaking.
A really nice tool to actually validate your regex is this :
https://regexr.com/
I always use that when i have to make a bigger than usual regular expression.
The output of the above code is :
Array
(
[0] => I
[1] => waiting
[2] => responce
[3] => from
)
There are a few adjustments to make with your $tok array.
To remove the error, you need to remove the pattern delimiters and pattern modifiers from each array element.
None of the capture grouping is necessary, in fact, it will lead to a higher step count and create unnecessary output array bloat.
Whatever your intention is with (?=\D*\d), it needs a rethink. If there is a number anywhere in your input string, you are potentially going to generate lots of empty elements which surely can't have any benefit for your project. Look at what happens when I put a space then 1 after from in your input string.
Here is my recommendation: (PHP Demo)
$toks = [
'\bwaiting\b',
'^\w+',
'\bresponse\b',
'\bfrom\b',
'\|',
'\bto\b',
];
$pattern = '/' . implode('|', $toks) . '/i';
var_export(preg_match_all($pattern, "I'm waiting for a response from", $out) ? $out[0] : null);
Output:
array (
0 => 'I',
1 => 'waiting',
2 => 'response',
3 => 'from',
)

How can I use preg_match_all to seperate this string in PHP?

I'm wondering how you can use preg_match_all to seperate this string
2:18 textextextextextext,sdfdsfd:,fdg
So it will return an array that looks like this:
array(
0 => 2
1 => 18
2 => textextextextextext,sdfdsfd:,fdg
)
Basically removing the first colon
You can use a formatted string:
print_r(sscanf("2:18 textextextextextext,sdfdsfd:,fdg", "%d:%d %s"));
First of all, what you want to use is preg_match() and not preg_match_all() (based on your desired output).
You could then use a regex like:
(\d+):(\d+)\s*(.*)
Live Demo
Which in PHP using preg_match() would look like this:
$pattern = "/(\d+):(\d+)\s*(.*)/";
$string = "2:18 textextextextextext,sdfdsfd:,fdg";
preg_match($pattern, $string, $matches);
Doing print_r($matches) would output:
Array
(
[0] => 2:18 textextextextextext,sdfdsfd:,fdg
[1] => 2
[2] => 18
[3] => textextextextextext,sdfdsfd:,fdg
)

Preg_match_all split multiple occurrences

I have a string like this:
string="59|https://site59.com20|https://site20.com30|https://site30.com16|https://site15.com66|https://site66.com29|https://site29.com";
-Just one example is not just that.
I did this regular expression
preg_match_all("/[0-9][0-9](?:\|)(?:https\:\/\/)(.*?)/", string, string2);
But it only takes number|https:
I wonder how do I get it and only stop when you find the next occurrence of the regular expression and separate it into different arrays
Try this:
$string="59|https://site59.com20|https://site20.com30|https://site30.com16|https://site15.com66|https://site66.com29|https://site29.com";
preg_match_all("/(?:[0-9][0-9](?:\|)(?:https\:\/\/)(.*?)(?=[\d][\d]\||$))|([\d][\d]\|.*)/", $string, $matches);
Results array in $matches:
[0] => 59|https://site59.com
[1] => 20|https://site20.com
[2] => 30|https://site30.com
[3] => 16|https://site15.com
[4] => 66|https://site66.com
[5] => 29|https://site29.com
Try using preg_split
<?php
$string="59|https://site59.com20|https://site20.com30|https://site30.com16|https://site15.com66|https://site66.com29|https://site29.com";
$sites = preg_split("/[0-9][0-9](?:\|)(?:https\:\/\/)(.*?)/", $string);
foreach($sites as $site){
echo "https://$site\n";
}
https://site59.com
https://site20.com
https://site30.com
https://site15.com
https://site66.com
https://site29.com

Split a string while keeping delimiters and string outside

I'm trying to do something that must be really simple, but I'm fairly new to PHP and I'm struggling with this one. What I want is to split a string containing 0, 1 or more delimiters (braces), while keeping the delimiters AND the string between AND the string outside.
ex: 'Hello {F}{N}, how are you?' would output :
Array ( [0] => Hello
[1] => {F}
[2] => {N}
[3] => , how are you? )
Here's my code so far:
$value = 'Hello {F}{N}, how are you?';
$array= preg_split('/[\{\}]/', $value,-1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
print_r($array);
which outputs (missing braces) :
Array ( [0] => Hello
[1] => F
[2] => N
[3] => , how are you? )
I also tried :
preg_match_all('/\{[^}]+\}/', $myValue, $array);
Which outputs (braces are there, but the text outside is flushed) :
Array ( [0] => {F}
[1] => {N} )
I'm pretty sure I'm on the good track with preg_split, but with the wrong regex. Can anyone help me with this? Or tell me if I'm way off?
You aren't capturing the delimiters. Add them to a capturing group:
/(\{.*?\})/
You need parentheses around the part of the expression to be captured:
preg_split('/(\{[^}]+\})/', $myValue, -1, PREG_SPLIT_DELIM_CAPTURE);
See the documentation for preg_split().

Split by whitespace only if not surrounded by [,<,{ or ],>,}

I have a string like this one:
traceroute <ip-address|dns-name> [ttl <ttl>] [wait <milli-seconds>] [no-dns] [source <ip-address>] [tos <type-of-service>] {router <router-instance>] | all}
I'd like to create an array like this:
$params = array(
<ip-address|dns-name>
[ttl <ttl>]
[wait <milli-seconds]
[no-dns]
[source <ip-address>]
[tos <tos>]
{router <router-instance>] | all}
);
Should I use preg_split('/someregex/', $mystring) ?
Or is there any better solution?
Use negative lookarounds. This one uses a negative lookahead for a <. This means it will not split if it finds a < ahead of the whitespace.
$regex='/\s(?!<)/';
$mystring='traceroute <192.168.1.1> [ttl <120>] [wait <1500>] [no-dns] [source <192.168.1.11>] [tos <service>] {router <instance>] | all}';
$array=array();
$array = preg_split($regex, $mystring);
var_dump($array);
And my output is
array
0 => string 'traceroute <192.168.1.1>' (length=24)
1 => string '[ttl <120>]' (length=11)
2 => string '[wait <1500>]' (length=13)
3 => string '[no-dns]' (length=8)
4 => string '[source <192.168.1.11>]' (length=23)
5 => string '[tos <service>]' (length=15)
6 => string '{router <instance>]' (length=19)
7 => string '|' (length=1)
8 => string 'all}' (length=4)
You could use preg_match_all such as:
preg_match_all("/\\[[^]]*]|<[^>]*>|{[^}]*}/", $str, $matches);
And get your result from the $matches array.
Yes, preg_split makes sense and is probably the most efficient way to do this.
Try:
preg_split('/[\{\[<](.*?)[>\]\}]/', $mystring);
Or if you want to match rather than split, you may want to try:
$matches=array();
preg_match('/[\{\[<](.*?)[>\]\}]/',$mystring,$matches);
print_r($matches);
Updated
I missed that you're trying to get the tokens, not the content of the tokens. I think you are going to need to use preg_match. Try something like this one for a good start:
$matches = array();
preg_match_all('/(\{.*?[\}])|(\[.*?\])|(<.*?>)/', $mystring,$matches);
var_dump($matches);
I get:
Array
(
[0] => Array
(
[0] => <ip-address|dns-name>
[1] => [ttl <ttl>]
[2] => [wait <milli-seconds>]
[3] => [no-dns]
[4] => [source <ip-address>]
[5] => [tos <type-of-service>]
[6] => {router <router-instance>] | all}
)

Categories