php regex to get all int or decimal from pattern - php

Purpose of my code to get all take profit which has int or decimal value. writing pattern of Takeprofit will not same.
Problem:
i want $m[3][4] or $m[4][4] as 1.0870 but i got only 0870. i am getting this result when number starts from 1.xxxx. They are conflicting. I can not solve
TP-----1.0870 and TP=1.0870 are not detected
My Code:
<?php
$s = 'SS 1.0140 SL 1.0670 TP1 1.0870 TP 1 1.0870 TP 2 1.0870 Takeprofit1 1.0870 Take profit 1 1.0870 TP 1.0870 TP-----1.0870 TP=1.0870 TP1=1.0870 TP Open';
$p = '#\b(TP1|TP 1|TP2|TP 2|TP3|TP 3|TAKE PROFIT 1|TAKE PROFIT 2|TAKE PROFIT 3|TAKEPROFIT 1|TAKEPROFIT 2|TAKEPROFIT 3|TAKEPROFIT\|TP)(.*?)(\bOpen\b|\b(\d+(?:\.\d+)?)\b)\b#i';
preg_match_all($p , $s , $m);
Result of $m:
Array
(
[0] => Array
(
[0] => TP1 1.0870
[1] => TP 1 1.0870
[2] => TP 2 1.0870
[3] => Take profit 1 1.0870
[4] => TP 1.0870
[5] => TP1=1.0870
)
[1] => Array
(
[0] => TP1
[1] => TP 1
[2] => TP 2
[3] => Take profit 1
[4] => TP 1
[5] => TP1
)
[2] => Array
(
[0] =>
[1] =>
[2] =>
[3] =>
[4] => .
[5] => =
)
[3] => Array
(
[0] => 1.0870
[1] => 1.0870
[2] => 1.0870
[3] => 1.0870
[4] => 0870
[5] => 1.0870
)
[4] => Array
(
[0] => 1.0870
[1] => 1.0870
[2] => 1.0870
[3] => 1.0870
[4] => 0870
[5] => 1.0870
)
)

You may use
'~\b(TAKE ?PROFIT ?(?:[1-3]|\|TP)|TP ?(?:[1-3](?!\.\d))?)\b(.*?)\b(Open|(\d+(?:\.\d+)?))\b~i'
See the regex demo
Details
\b - word boundary
(TAKE ?PROFIT ?(?:[1-3]|\|TP)|TP ?(?:[1-3](?!\.\d))?) - Group 1: TAKE, an optional space, PROFIT, an optional space, then a digit from 1 to 3 or |TP substring, or TP with an optional space after it that is optionally followed with 1, 2 or 3 that are not followed with . and a digit
\b - word boundary
(.*?) - Group 2: any 0+ chars other than line break chars as few as possible
\b - word boundary
(Open|(\d+(?:\.\d+)?)) - Group 3: Open or Group 4: 1+ digits followed with an optional sequence of . and 1+ digits
\b - word boundary.

Related

Reg Exp - preg_match_all reduce array result

This is my Reg Exp "[c]?[\d+|\D+]\s*". My input is this "c7=c4/c5*100" and the result is :
Array
(
[0] => Array
(
[0] => c7
[1] => =
[2] => c5
[3] => +
[4] => c3
[5] => *
[6] => 1
[7] => 0
[8] => 0
)
)
But what I want is:
Array
(
[0] => Array
(
[0] => c7
[1] => =
[2] => c5
[3] => +
[4] => c3
[5] => *
[6] => 100
)
)
I can't seem to get the last part working, I'm lost as what to do next - Can anyone help?
Thanks,
Paul
You specified a character class [\d+|\D+] which would match any of the specified characters. I think you meant using an or | with a grouping construct c?(?:\d+|\D+)\s* but in that case it would match c followed by either \d+ or \D so that would match the = sign right after it resulting in c= as a match and /c as a match.
Try matching an optional c c? followed by one or more digits or | match not a digit \D
c?\d+|\D
$re = '/c?\d+|\D/m';
$str = 'c7=c4/c5*100';
preg_match_all($re, $str, $matches);
print_r($matches);
That will result in:
Array
(
[0] => Array
(
[0] => c7
[1] => =
[2] => c4
[3] => /
[4] => c5
[5] => *
[6] => 100
)
)
Demo

Match numbers separated with colons, semicolons optionally

I have the following string:
objectsA=38155,54,1;38155,53,1;38155,45,1;38155,47,1;38155,46,1;2000,55,1;38155,50,1;38155,49,1;38155,48,1;38155,40,1;38155,41,1;38155,42,1;38155,43,1;38155,51,1;38155,52,1;38155,44,1;38155,35,1;38155,33,1;38155,32,1;38155,34,1;38155,36,1;38155,38,1;38155,39,1;:objectsB=
And I want to know how can I match, optionally, the numbers inside objectsA and objectsB but put into consideration, that may one or another can be empty. For example:
objectsA can be:
objectsA=38155,54,1;38155,53,1;38155,45,1;38155,47,1;38155,46,1;2000,55,1;38155,50,1;38155,49,1;38155,48,1;38155,40,1;38155,41,1;38155,42,1;38155,43,1;38155,51,1;38155,52,1;38155,44,1;38155,35,1;38155,33,1;38155,32,1;38155,34,1;38155,36,1;38155,38,1;38155,39,1;
But also can be
objectsA=:objectsB=38155,54,1;38155,53,1;38155,45,1;38155,47,1;38155,46,1;2000,55,1;38155,50,1;38155,49,1;38155,48,1;38155,40,1;38155,41,1;38155,42,1;38155,43,1;38155,51,1;38155,52,1;38155,44,1;38155,35,1;38155,33,1;38155,32,1;38155,34,1;38155,36,1;38155,38,1;38155,39,1;
Or even
objectsA=38155,54,1;38155,53,1;38155,45,1;38155,47,1;38155,46,1;2000,55,1;38155,50,1;38155,49,1;38155,48,1;38155,40,1;38155,41,1;38155,42,1;38155,43,1;38155,51,1;38155,52,1;38155,44,1;38155,35,1;38155,33,1;38155,32,1;38155,34,1;38155,36,1;38155,38,1;38155,39,1;:objectsB=objectsA=38155,54,1;38155,53,1;38155,45,1;38155,47,1;38155,46,1;2000,55,1;38155,50,1;38155,49,1;38155,48,1;38155,40,1;38155,41,1;38155,42,1;38155,43,1;38155,51,1;38155,52,1;38155,44,1;38155,35,1;38155,33,1;38155,32,1;38155,34,1;38155,36,1;38155,38,1;38155,39,1;
The current code:
$line2 = "
2016-07-31 00:39:00 debian-8gb-sfo2-01 gdeliveryd: notice : formatlog:trade:roleidA=3328:roleidB=2161:moneyA=0:moneyB=0:objectsA=38155,54,1;38155,53,1;38155,45,1;38155,47,1;38155,46,1;2000,55,1;38155,50,1;38155,49,1;38155,48,1;38155,40,1;38155,41,1;38155,42,1;38155,43,1;38155,51,1;38155,52,1;38155,44,1;38155,35,1;38155,33,1;38155,32,1;38155,34,1;38155,36,1;38155,38,1;38155,39,1;:objectsB=";
if (strpos($line2, ':trade:roleidA=3328') > 0) {
if (!preg_match('/([\d-: ]+)\s*.*\sformatlog:trade:roleidA=(\d+):(.*)roleidB=(\d+):moneyA=(\d+):moneyB=(\d+):objectsA=(regexhere):objectsB=(regexhere).*$/', $line2, $c)) {
// error occured
}
echo '<pre>';
print_r($c);
}
And the problems is that the current regex ((\d+\,\d+\,\d\;)+|) has an weird behavior, that can't happen.
Output:
Array
(
[0] => 2016-07-31 00:39:00 debian-8gb-sfo2-01 gdeliveryd: notice : formatlog:trade:roleidA=3328:roleidB=2161:moneyA=0:moneyB=0:objectsA=38155,54,1;38155,53,1;38155,45,1;38155,47,1;38155,46,1;2000,55,1;38155,50,1;38155,49,1;38155,48,1;38155,40,1;38155,41,1;38155,42,1;38155,43,1;38155,51,1;38155,52,1;38155,44,1;38155,35,1;38155,33,1;38155,32,1;38155,34,1;38155,36,1;38155,38,1;38155,39,1;:objectsB=38155,54,1;38155,53,1;38155,45,1;38155,47,1;38155,46,1;2000,55,1;38155,50,1;38155,49,1;38155,48,1;38155,40,1;38155,41,1;38155,42,1;38155,43,1;38155,51,1;38155,52,1;38155,44,1;38155,35,1;38155,33,1;38155,32,1;38155,34,1;38155,36,1;38155,38,1;38155,39,1;
[1] => 2016-07-31 00:39:00
[2] => 3328
[3] =>
[4] => 2161
[5] => 0
[6] => 0
[7] => 38155,54,1;38155,53,1;38155,45,1;38155,47,1;38155,46,1;2000,55,1;38155,50,1;38155,49,1;38155,48,1;38155,40,1;38155,41,1;38155,42,1;38155,43,1;38155,51,1;38155,52,1;38155,44,1;38155,35,1;38155,33,1;38155,32,1;38155,34,1;38155,36,1;38155,38,1;38155,39,1;
[8] => 38155,39,1;
[9] => 38155,54,1;38155,53,1;38155,45,1;38155,47,1;38155,46,1;2000,55,1;38155,50,1;38155,49,1;38155,48,1;38155,40,1;38155,41,1;38155,42,1;38155,43,1;38155,51,1;38155,52,1;38155,44,1;38155,35,1;38155,33,1;38155,32,1;38155,34,1;38155,36,1;38155,38,1;38155,39,1;
[10] => 38155,39,1;
)
For some reason, if the objects has the same size, the regex are creating a new array index, wich shouldn't happen.
The expected result:
Array
(
[0] => 2016-07-31 00:39:00 debian-8gb-sfo2-01 gdeliveryd: notice : formatlog:trade:roleidA=3328:roleidB=2161:moneyA=0:moneyB=0:objectsA=38155,54,1;38155,53,1;38155,45,1;38155,47,1;38155,46,1;2000,55,1;38155,50,1;38155,49,1;38155,48,1;38155,40,1;38155,41,1;38155,42,1;38155,43,1;38155,51,1;38155,52,1;38155,44,1;38155,35,1;38155,33,1;38155,32,1;38155,34,1;38155,36,1;38155,38,1;38155,39,1;:objectsB=38155,54,1;38155,53,1;38155,45,1;38155,47,1;38155,46,1;2000,55,1;38155,50,1;38155,49,1;38155,48,1;38155,40,1;38155,41,1;38155,42,1;38155,43,1;38155,51,1;38155,52,1;38155,44,1;38155,35,1;38155,33,1;38155,32,1;38155,34,1;38155,36,1;38155,38,1;38155,39,1;
[1] => 2016-07-31 00:39:00
[2] => 3328
[4] => 2161
[5] => 0
[6] => 0
[7] => 38155,54,1;38155,53,1;38155,45,1;38155,47,1;38155,46,1;2000,55,1;38155,50,1;38155,49,1;38155,48,1;38155,40,1;38155,41,1;38155,42,1;38155,43,1;38155,51,1;38155,52,1;38155,44,1;38155,35,1;38155,33,1;38155,32,1;38155,34,1;38155,36,1;38155,38,1;38155,39,1;
[8] => 38155,54,1;38155,53,1;38155,45,1;38155,47,1;38155,46,1;2000,55,1;38155,50,1;38155,49,1;38155,48,1;38155,40,1;38155,41,1;38155,42,1;38155,43,1;38155,51,1;38155,52,1;38155,44,1;38155,35,1;38155,33,1;38155,32,1;38155,34,1;38155,36,1;38155,38,1;38155,39,1;
)
Regex: ^(?:\s?\d+(?:[-:]\d+){2}){2}|\w+=\K[^:]+
Details:
(?:) Non-capturing group
[] Match a single character present in the list
\K Resets the starting point of the reported match
+ Matches between one and unlimited times
| Or
PHP code:
$string = "2016-07-31 00:39:00 debian-8gb-sfo2-01 gdeliveryd: notice : formatlog:trade:roleidA=3328:roleidB=2161:moneyA=0:moneyB=0:objectsA=38155,54,1;38155,53,1;38155,45,1;38155,47,1;38155,46,1;2000,55,1;38155,50,1;38155,49,1;38155,48,1;38155,40,1;38155,41,1;38155,42,1;38155,43,1;38155,51,1;38155,52,1;38155,44,1;38155,35,1;38155,33,1;38155,32,1;38155,34,1;38155,36,1;38155,38,1;38155,39,1;:objectsB=38155,54,1;38155,53,1;38155,45,1;38155,47,1;38155,46,1;2000,55,1;38155,50,1;38155,49,1;38155,48,1;38155,40,1;38155,41,1;38155,42,1;38155,43,1;38155,51,1;38155,52,1;38155,44,1;38155,35,1;38155,33,1;38155,32,1;38155,34,1;38155,36,1;38155,38,1;38155,39,1;";
preg_match_all('~^(?:\s?\d+(?:[-:]\d+){2}){2}|\w+=\K[^:]+~', $string, $matches);
print_r($matches[0]);
Output:
Array
(
[0] => 2016-07-31 00:39:00
[1] => 3328
[2] => 2161
[3] => 0
[4] => 0
[5] => 38155,54,1;38155,53,1;38155,45,1;38155,47,1;38155,46,1;2000,55,1;38155,50,1;38155,49,1;38155,48,1;38155,40,1;38155,41,1;38155,42,1;38155,43,1;38155,51,1;38155,52,1;38155,44,1;38155,35,1;38155,33,1;38155,32,1;38155,34,1;38155,36,1;38155,38,1;38155,39,1;
[6] => 38155,54,1;38155,53,1;38155,45,1;38155,47,1;38155,46,1;2000,55,1;38155,50,1;38155,49,1;38155,48,1;38155,40,1;38155,41,1;38155,42,1;38155,43,1;38155,51,1;38155,52,1;38155,44,1;38155,35,1;38155,33,1;38155,32,1;38155,34,1;38155,36,1;38155,38,1;38155,39,1;
)
Code demo
For are navigating through this question, sometimes: less is more. The pattern (.*) will do the trick.
([\d-: ]+)\s*.*\sformatlog:trade:roleidA=(\d+):roleidB=(\d+):moneyA=(\d+):moneyB=(\d+):objectsA=(.*):objectsB=(.*).*$

How to make this weird string explode in PHP?

I have a string like the following
DAS-1111[DR-Helpfull-R]-RUN--[121668688374]-N-[+helpfull_+string]
The above string is a kind of formatted in groups that looks like the following:
A-B[C]-D-E-[F]-G-[H]
The think is that I like to process some of those groups, and I like to make something like explode.
I say like, because I have try this code:
$string = 'DAS-1111[DR-Helpfull-R]-RUN--[121668688374]-N-[+helpfull_+string]';
$parts = explode( '-', $string );
print_r( $parts );
and I get the following result:
Array
(
[0] => DAS
[1] => 1111[DR
[2] => Helpfull
[3] => R]
[4] => RUN
[5] =>
[6] => [121668688374]
[7] => N
[8] => [+helpfull_+string]
)
that it is not what I need.
What I need is the following output:
Array
(
[0] => DAS
[1] => 1111[DR-Helpfull-R]
[2] => RUN
[3] =>
[4] => [121668688374]
[5] => N
[6] => [+helpfull_+string]
)
Can someone please suggest a nice and elegant way to explode this string in the way I need it ?
what I forgot to mention, is that the string can have more or less groups. Examples:
DAS-1111[DR-Helpfull-R]-RUN--[121668688374]-N-[+helpfull_+string]
DAS-1111[DR-Helpfull-R]-RUN--[121668688374]
DAS-1111[DR-Helpfull-R]-RUN--[121668688374]-N-[+helpfull_+string]-anotherPart
Update 1
As mentioned by #axiac, the preg_split can do the work. But can you please help with the regex now ?
I have try this but it seems that it is incorrect:
(?!\]\-)\-
The code:
$str = 'DAS-1111[DR-Helpfull-R]-RUN--[121668688374]-N-[+helpfull_+string]';
$re = '/([^-[]*(?:\[[^\]]*\])?[^-]*)-?/';
$matches = array();
preg_match_all($re, $str, $matches);
print_r($matches[1]);
Its output:
Array
(
[0] => DAS
[1] => 1111[DR-Helpfull-R]
[2] => RUN
[3] =>
[4] => [121668688374]
[5] => N
[6] => [+helpfull_+string]
[7] =>
)
There is an extra empty value at position 7 in the output. It appears because of the zero-or-one repetitions quantifier (?) placed at the end of the regex. The quantifier is needed because without it the last piece (at index 6) is not matched.
You can remove the ? after the last - and ask this way the dash (-) always match. In this case you must append an extra - to your input string.
The regex
( # start of the 1st subpattern
# the captured value is returned in $matches[1]
[^-[]* # match any character but '-' and '[', zero or more times
(?: # start of a non-capturing subpattern
\[ # match an opening square bracket ('[')
[^\]]* # match any character but ']', zero or more times
\] # match a closing square bracket (']')
)? # end of the subpattern; it is optional (can appear 0 or 1 times)
[^-]* # match any character but '-', zero or more times
) # end of the 1st subpattern
-? # match an optional dash ('-')
Instead of exploding you should try to match the following pattern:
(?:^|-)([^-\[]*(?:\[[^\]]+\])?)
Here is an example:
$regex = '/(?:^|-)([^-\[]*(?:\[[^\]]+\])?)/';
$tests = array(
'DAS-1111[DR-Helpfull-R]-RUN--[121668688374]-N-[+helpfull_+string]',
'DAS-1111[DR-Helpfull-R]-RUN--[121668688374]',
'DAS-1111[DR-Helpfull-R]-RUN--[121668688374]-N-[+helpfull_+string]-anotherPart'
);
foreach ($tests as $test) {
preg_match_all($regex, $test, $result);
print_r($result[1]);
}
Output:
// DAS-1111[DR-Helpfull-R]-RUN--[121668688374]-N-[+helpfull_+string]
Array
(
[0] => DAS
[1] => 1111[DR-Helpfull-R]
[2] => RUN
[3] =>
[4] => [121668688374]
[5] => N
[6] => [+helpfull_+string]
)
// DAS-1111[DR-Helpfull-R]-RUN--[121668688374]
Array
(
[0] => DAS
[1] => 1111[DR-Helpfull-R]
[2] => RUN
[3] =>
[4] => [121668688374]
)
// DAS-1111[DR-Helpfull-R]-RUN--[121668688374]-N-[+helpfull_+string]-anotherPart
Array
(
[0] => DAS
[1] => 1111[DR-Helpfull-R]
[2] => RUN
[3] =>
[4] => [121668688374]
[5] => N
[6] => [+helpfull_+string]
[7] => anotherPart
)
This case is perfect for the (*SKIP)(*FAIL) method. You want to split your string on the hyphens, so long as they aren't inside of square brackets.
Easy. Just disqualify these hyphens as delimiters like so:
Pattern: ~\[[^]]+\](*SKIP)(*FAIL)|-~ (Pattern Demo)
Code: (Demo)
$strings=['DAS-1111[DR-Helpfull-R]-RUN--[121668688374]-N-[+helpfull_+string]',
'DAS-1111[DR-Helpfull-R]-RUN--[121668688374]',
'DAS-1111[DR-Helpfull-R]-RUN--[121668688374]-N-[+helpfull_+string]-anotherPart'];
foreach($strings as $string){
var_export(preg_split('~\[[^]]+\](*SKIP)(*FAIL)|-~',$string));
echo "\n\n";
}
Output:
array (
0 => 'DAS',
1 => '1111[DR-Helpfull-R]',
2 => 'RUN',
3 => '',
4 => '[121668688374]',
5 => 'N',
6 => '[+helpfull_+string]',
)
array (
0 => 'DAS',
1 => '1111[DR-Helpfull-R]',
2 => 'RUN',
3 => '',
4 => '[121668688374]',
)
array (
0 => 'DAS',
1 => '1111[DR-Helpfull-R]',
2 => 'RUN',
3 => '',
4 => '[121668688374]',
5 => 'N',
6 => '[+helpfull_+string]',
7 => 'anotherPart',
)

return specific values from array using php

I have an array that is produced from a mysql query:
Array ( [0] => AA [alleles] => AA [1] => 6 [total] => 6 [2] => 25.00 [percentage] => 25.00 )
Array ( [0] => AG [alleles] => AG [1] => 11 [total] => 11 [2] => 45.83 [percentage] => 45.83 )
Array ( [0] => GG [alleles] => GG [1] => 7 [total] => 7 [2] => 29.17 [percentage] => 29.17 )
How do I parse this data with php to show:
AA 25%
AG 45.83%
GG 29.17%
Looks like you want to return two elements per array : elements 0 and 2
Depending on what language you are using ? it would be something like :
answer_string = Array_1[0] + " " + Array_1[2] + "%"
PS. in the future always tag your question with appropriate language to get more helpful answers (it wouldn't hurt to edit your question as such)

How to match an optional subpattern in the middle or end of an url depending on the existence of a filename and extension

I am trying to preg_match a url consisting of a category slug, an optional subcategory slug and an option item slug.
It works in all cases, except for the 4th case.
$urls[0] = '/main_cat_slug';
$urls[1] = '/main_cat_slug/';
$urls[2] = '/main_cat_slug/sub_cat_slug';
$urls[3] = '/main_cat_slug/sub_cat_slug/';
$urls[4] = '/main_cat_slug/item.html';
$urls[5] = '/main_cat_slug/sub_cat_slug/item.html';
$regexp = array();
$regexp[] = '/(?:(?<category>[\w]+)/?)'; // Find the main category (is always available)
$regexp[] = '(?:(?<subcategory>[\w]+)/?)?'; // Find an optional sub-category, is not always available
$regexp[] = '(?:(?<item>[\w]+)\.html)?'; // Find an optional item, is not always available (don't catch the extension)
$regexp = implode('', $regexp);
foreach($urls as $index=>$url) {
preg_match("#{$regexp}#i", $url, $matches);
echo '<pre><h1>', $index, '</h1>';
echo $url, '<br />';
echo '<br />';
print_r($matches);
}
In the 4-th case, the category will be found, but the item is empty and the subcategory gets the value op "item".
Could someone help me out, so that the 4-th case will only get a category and an item?
This is the output for above code:
0
/main_cat_slug
Array
(
[0] => /main_cat_slug
[category] => main_cat_slug
[1] => main_cat_slug
)
1
/main_cat_slug/
Array
(
[0] => /main_cat_slug/
[category] => main_cat_slug
[1] => main_cat_slug
)
2
/main_cat_slug/sub_cat_slug
Array
(
[0] => /main_cat_slug/sub_cat_slug
[category] => main_cat_slug
[1] => main_cat_slug
[subcategory] => sub_cat_slug
[2] => sub_cat_slug
)
3
/main_cat_slug/sub_cat_slug/
Array
(
[0] => /main_cat_slug/sub_cat_slug/
[category] => main_cat_slug
[1] => main_cat_slug
[subcategory] => sub_cat_slug
[2] => sub_cat_slug
)
4
/main_cat_slug/item.html
Array
(
[0] => /main_cat_slug/item
[category] => main_cat_slug
[1] => main_cat_slug
[subcategory] => item
[2] => item
)
5
/main_cat_slug/sub_cat_slug/item.html
Array
(
[0] => /main_cat_slug/sub_cat_slug/item.html
[category] => main_cat_slug
[1] => main_cat_slug
[subcategory] => sub_cat_slug
[2] => sub_cat_slug
[item] => item
[3] => item
)
Kind regards!
Patrick
Description
This regex will pickout the three types of data, using the following rules:
The / is always the first character in the string
The Main_Cat is always first, it follows the first / and continues until the next /
If the first string ends in .html/ then this is a Main_Cat
if the first string ends in .html followed by the end of the string, then this is an item
The Sub_Cat is always second, it follows the second / and continues until the next /
If the second string ends in .html/ then this is a Sub_Cat
if the second string ends in .html followed by the end of the string, then this is an item
The Item type always has an .html suffix
There will never be a / after the Item
the Item type will always be the last field
^\/(?:(?<Main_Cat>(?![^\/\r\n]*\.html\s*$)[^\/\r\n]*)\/)?(?:(?<Sub_Cat>(?![^\/\r\n]*\.html\s*$)[^\/\r\n]*)\/)?(?:(?<Item>[^\/\r\n]*?)(?:\.html|$))?
If you're using this expression against individual strings then you can remove the new line characters \r\n. The resulting expression would look like: ^\/(?<Main_Cat>[^\/]*)(?:(?:\/(?![^\/]*\.html)(?<Sub_Cat>[^\/]*))?(?:\/(?<Item>[^\/]*)\.html)?)?.*?$ follows the same rules above. Note the end of line $ forces the test to match your entire string
PHP Code Example:
Source String
/category0.html/subcat/item.html
/item1.html
/category2.html/subcat2.html/item2.html
/category3.html/subcat3.html/
/category4.html/item4.html
/main_cat_slug5.html/
/main_cat_slug6/item6
/main_cat_slug7/sub_cat_slug7.html/
/main_cat_slug8/item8.html
/main_cat_slug9/sub_cat_slug9/item9.html
Code
<?php
$sourcestring="your source string";
preg_match_all('/^\/(?:(?<Main_Cat>(?![^\/\r\n]*\.html\s*$)[^\/\r\n]*)\/)?(?:(?<Sub_Cat>(?![^\/\r\n]*\.html\s*$)[^\/\r\n]*)\/)?(?:(?<Item>[^\/\r\n]*?)(?:\.html|$))?/imx',$sourcestring,$matches);
echo "<pre>".print_r($matches,true);
?>
Matches
$matches Array:
(
[0] => Array
(
[0] => /category0.html/subcat/item.html
[1] => /item1.html
[2] => /category2.html/subcat2.html/item2.html
[3] => /category3.html/subcat3.html
[4] => /category4.html/item4.html
[5] => /main_cat_slug5.html
[6] => /main_cat_slug6
[7] => /main_cat_slug7/sub_cat_slug7.html
[8] => /main_cat_slug8/item8.html
[9] => /main_cat_slug9/sub_cat_slug9/item9.html
)
[Main_Cat] => Array
(
[0] => category0.html
[1] =>
[2] => category2.html
[3] => category3.html
[4] => category4.html
[5] => main_cat_slug5.html
[6] => main_cat_slug6
[7] => main_cat_slug7
[8] => main_cat_slug8
[9] => main_cat_slug9
)
[Sub_Cat] => Array
(
[0] => subcat
[1] =>
[2] => subcat2.html
[3] => subcat3.html
[4] =>
[5] =>
[6] =>
[7] => sub_cat_slug7.html
[8] =>
[9] => sub_cat_slug9
)
[Item] => Array
(
[0] => item
[1] => item1
[2] => item2
[3] =>
[4] => item4
[5] =>
[6] =>
[7] =>
[8] => item8
[9] => item9
)
)
You can try this:
preg_match('~/(?<main_cat>[^/\s]++/?+)(?<sub_cat>[^/\s]++/?+)?'
. '(?>(?<filename>\S+?)\.html)?~', $url, $match);
print_r($match);
Note that you can access easily to the different parts with the named captures (useful to test if there is a subpattern or not.).

Categories