This question already has answers here:
How to use preg_replace_callback?
(2 answers)
Closed 4 years ago.
I have a very lot of list in a text file something like below:
001.Porus.2017.S01E01.The.Epic.Story.Of.A.Warrior.720.x264.mp4
002.Porus.2017.S01E01.Welcome.With.A.Fight.720.x264.mp4
003.Porus.2017.S01E01.Anusuya.Stays.in.Poravs.720.x264.mp4
004.Porus.2017.S01E01.Olympia.Prays.For.A.Child.720.x264.mp4
.................
I want to replace all E01 in S01E01 with a number in a front of each list. Output I want :
001.Porus.2017.S01E001.The.Epic.Story.Of.A.Warrior.720.x264.mp4
002.Porus.2017.S01E002.Welcome.With.A.Fight.720.x264.mp4
003.Porus.2017.S01E003.Anusuya.Stays.in.Poravs.720.x264.mp4
004.Porus.2017.S01E004.Olympia.Prays.For.A.Child.720.x264.mp4
......................
Btw, I'm using the following codes;
$list = file("list.txt", FILE_SKIP_EMPTY_LINES | FILE_IGNORE_NEW_LINES);
$string = "";
foreach($list as $index => $entry)
{
$string .= str_pad($index + 1, 3, '0', STR_PAD_LEFT) . "." . $entry . ", ";
}
$string = substr($string, 0 , -2);
$get = explode(",", $string);
$phr = implode("<br>", array_values(array_unique($get)));
print_r($phr);
<pre>
<?php
$arr = [
'001.Porus.2017.S01E01.The.Epic.Story.Of.A.Warrior.720.x264.mp4',
'002.Porus.2017.S01E01.Welcome.With.A.Fight.720.x264.mp4',
'003.Porus.2017.S01E01.Anusuya.Stays.in.Poravs.720.x264.mp4',
'004.Porus.2017.S01E01.Olympia.Prays.For.A.Child.720.x264.mp4'
];
$your_next_number = "some_number";
$modified_values = [];
foreach($arr as $each_value){
$modified_values[] = str_replace("S01E01","S01".$your_next_number,$each_value);
//$your_next_number = something;some change that you want to make to your next iterable number.
}
print_r($modified_values);
OUTPUT
Array
(
[0] => 001.Porus.2017.S01some_number.The.Epic.Story.Of.A.Warrior.720.x264.mp4
[1] => 002.Porus.2017.S01some_number.Welcome.With.A.Fight.720.x264.mp4
[2] => 003.Porus.2017.S01some_number.Anusuya.Stays.in.Poravs.720.x264.mp4
[3] => 004.Porus.2017.S01some_number.Olympia.Prays.For.A.Child.720.x264.mp4
)
UPDATE
You can replace the code inside the foreach with the code below.
Credits to #ArtisticPhoenix for providing this improvisation and explanation of it.
foreach($arr as $each_value){
$modified_values[] = preg_replace('/^(\d+)([^S]+)(S01E)(\d+)/', '\1\2\3\1', $each_value);
}
^(\d+) => This is group 1. Capture (^ at the start) and digits (\d) one or
more (+).
([^S])+ => This is group 2. Capture anything but a capital S ([^S]) one or more (+).
(S01E)=> This is group 3. Capture S01E as it is.
(\d+) - This is group 4. Capture digits present after (S01E).
Note that group numbers have 1 based indexing since group 0 is the entire regex.
Replacement Part:
The replacement is \1\2\3\1.
The syntax \ followed by an integer(represents a group number) is known as backreferencing. This says that match the same set of characters you got
from matching that group.
So, put the 1st,2nd and 3rd capture back where
they came from, then the 4th capture is replaced with the 1st, this is the
initial digits replacing the last digits in the match.
So, let's take 001.Porus.2017.S01E01.The.Epic.Story.Of.A.Warrior.720.x264.mp4 as an example-
(\d+)([^S]+)(S01E)(\d+) matches this way => (001)(.Porus.2017.)(S01E)(01)
Hence, replacement makes it as (001)(.Porus.2017.)(S01E)(001)(notice the last change because of \1 at the end in the replacement \1\2\3\1. Rest of the string remains the same anyway.
Related
I don't know what the title of the question should be. This something logical and what I lack is the same.
I have a string in the format [CONST att1="value1" att2="value2"] and created a regex that was working fine. But certain conditions made this regex wrong.
$data = preg_split('/(?<=\")\s/', $replace_str[1]);
foreach ($data as $index_val => $exp_data) {
if(!empty($exp_data)){
$attributes = explode('=',$exp_data);
if(count($attributes) > 0){
$index = strtolower(str_replace(array("'", "\""), "", trim($attributes[0])));
$item_value = str_replace(array("'", "\""), "", trim($attributes[1]));
$item_value = $attributes[1];
$array_data[$index] = $item_value;
}
}
}
Then using the array to get key value. But in some instance, say if the format is like the one below
[CONST att1="value1" att2= "value2"]
the exploded variable contains "value2" (notice the prefixed space). What i want is "value2".
So since my format is similar to that of WordPress shortcode referred shortcode.php file in WordPress and found #[<>&/\[\]\x00-\x20=]# inside the file. But I am unable to understand or make it work.
I need to access value1 and value2 as clean data. i.e, without spaces, single and double quotes at the start and end. Also if the order of the att1 and att2 is changed, it should work.
[CONST att2="value2" att1="value1"]
Should output:
array(att1=>value1,att2=>value2)
I suggest collecting keys and values in the shortcode string using a matching regex with preg_match_all like
'~(?:\G(?!\A)\s+|^\[\w+\s+)\K([^\s=]*)\s*=\s*"([^"]*)"~'
See the regex demo.
Details
(?:\G(?!\A)\s+|^\[\w+\s+)
\K - match reset operator
([^\s=]*) - Group 1 (attribute name): 0+ chars other than whitespace and =
\s*=\s* - a = enclosed with 0+ whitespaces
" - a double quotation mark
([^"]*) - Group 2 (attribute value inside quotes): any 0+ chars other than "
" - a double quotation mark
After you get an array of matches you will have to build your associative array "manually" like
$s = '[CONST att1="value1" att2="value2"]';
preg_match_all('/(?:\G(?!\A)\s+|^\[\w+\s+)\K(\w+)\s*=\s*"([^"]*)"/', $s, $array_data, PREG_SET_ORDER, 0);
$res = [];
foreach ($array_data as $kvp) {
$res[$kvp[1]] = $kvp[2];
}
print_r($res);
// -> Array ( [att1] => value1 [att2] => value2 )
See the PHP demo.
Another way of processing the matches (demo):
if (preg_match_all('/(?:\G(?!\A)\s+|^\[\w+\s+)\K(\w+)\s*=\s*"([^"]*)"/', $s, $array_data)) {
array_shift($array_data);
print_r(array_combine($array_data[0], $array_data[1]));
}
A user from other thread help me to figure out how to get the numbers from an array, but now I can't get the numbers afer "-" dash. Let me show you what I have and put you in situation.
I''ve got an array with the next content:
Array(
[0] => <tr><td>29/06/2015</td><td>19:35</td><td>12345 Column information</td><td>67899 Column information - 12</td><td>Information</td><td>More information</td></tr>
[1] => <tr><td>12/03/2015</td><td>10:12</td><td>98545 Column information</td><td>67659 Column information - 32</td><td>Information</td><td>More information</td></tr>
[2] => <tr><td>11/02/2015</td><td>12:40</td><td>59675 Column information</td><td>94859 Column information - 11</td><td>Information</td><td>More information</td></tr>
[3] => <tr><td>01/01/2015</td><td>20:12</td><td>69365 Column information</td><td>78464 Column information - 63</td><td>Information</td><td>More information</td></tr>
)
Finally I know how to get every number (except the number after dash "-"):
$re = "/.*?(\\d+)\\s.*?(\\d+)\\s.*/m";
$str = "<tr><td>29/06/2015</td><td>19:35</td><td>12345 Column information</td><td>67899 Column information - 12</td><td>Information</td><td>More information</td></tr>";
$subst = "$1, $2";
$result = preg_replace($re, $subst, $str);
Here's the $result; output:
foreach($result as $finalresult) echo $finalresult.'<br>';
12345,67899
98545,67659
59675,94859
69365,78464
What I expected from all this process and cannot figure out is to get the number after dash "-" too:
12345,67899-12
98545,67659-32
59675,94859-11
69365,78464-63
But this does not end here... when the number after dash "-" is lower than 50 I need to transform the $result output. See the example below.
If the number after "-" < 50 then it needs to be transformed, taking the first digit and putting it at units position. Then the tens position might be zero.
When is 50 or above, the number ramains as it is. Example:
12345,67899-12 ------> 12345,67899-01
98545,67659-32 ------> 12345,67899-03
59675,94859-11 ------> 12345,67899-01
52375,53259-49 ------> 12345,67899-04
69365,73464-63 ------> 12345,67899-63
89765,12332-51 ------> 12345,67899-51
38545,54213-70 ------> 12345,67899-70
And now is when my head explodes!
Beforehand thanks a lot for your help.
This may be what you are looking for. I modified your regular expression slightly. The (.*?<td>){3} will match anything up to the third <td>. The ?P<first> in the subpattern (?P<first>\d+) etc. is called a named subpattern, which makes their value easy to access from the $matches array.
$a = [
'<tr><td>29/06/2015</td><td>19:35</td><td>12345 Column information</td><td>67899 Column information - 12</td><td>Information</td><td>More information</td></tr>',
'<tr><td>12/03/2015</td><td>10:12</td><td>98545 Column information</td><td>67659 Column information - 32</td><td>Information</td><td>More information</td></tr>',
'<tr><td>11/02/2015</td><td>12:40</td><td>59675 Column information</td><td>94859 Column information - 11</td><td>Information</td><td>More information</td></tr>',
'<tr><td>01/01/2015</td><td>20:12</td><td>69365 Column information</td><td>78464 Column information - 63</td><td>Information</td><td>More information</td></tr>',
];
$result = [];
foreach ($a as $row) {
$p = '#(.*?<td>){3}(?P<first>\d+).*?</td><td>(?P<second>\d+).*?(?P<third>\d+)#';
if (preg_match($p, $row, $matches)) {
if ($matches['third'] < 50) {
$matches['third'] = '0'.$matches['third'][0];
}
$result[] =
$matches['first'] . ',' .
$matches['second'] . '-' .
$matches['third'];
}
}
print_r($result);
Output:
Array
(
[0] => 12345,67899-01
[1] => 98545,67659-03
[2] => 59675,94859-01
[3] => 69365,78464-63
)
This will do the trick for you:
$re = '/.*?(\d+)\s.*?(\d+)\s.*?-\s(\d+).*/';
$str = "<tr><td>29/06/2015</td><td>19:35</td><td>12345 Column information</td><td>67899 Column information - 12</td><td>Information</td><td>More information</td></tr>";
preg_match($re, $str, $matches);
if ($matches[3]<50) $matches[3] = floor($matches[3]/10);
$format = '%d,%d-%02d';
$result = sprintf($format, $matches[1], $matches[2], $matches[3]);
echo $result;
Note that I changed your $re to be single quoted instead of double quoted for readability, and I'm using preg_match instead of preg_replace so I can work with the matched patterns.
To explain the regex to you, there are a few things going on:
/ is the regex delimiter.
.*?: The . tells the regex to match any character. The * says to do it zero or more times, and the ? says to do it in a "lazy" fashion. The plain .* at the end of $re matches the whole rest of the string.
(\d+): The \d is a wildcard telling the regex to match any digit. The + says "one or more times", and the () says to capture this. The first () surrounded group is $matches[1].
\s: Is a wildcard for any space character.
-: Is the literal - character.
Well... I don't know if it will help, but I made this with RegExr and it fits properly:
(([0-9]+){5})|(- [0-9]{2})
I hope you might find it some use!
I've been trying to "parse" some data using a regex, and I feel as if I'm close, but I just can't seem to bring it all home.
The data that needs parsing generally looks like this: <param>: <value>\n. The number of params can vary, just as the value can. Still, here's an example:
FooID: 123456
Name: Chuck
When: 01/02/2013 01:23:45
InternalID: 789654
User Message: Hello,
this is nillable, but can be quite long. Text can be spread out over many lines
And can start with any number of \n's. It can be empty, too.
What's worse, though is that this CAN contain colons (but they're _"escaped"_ using `\`), and even basic markup!
To push this text into an object, I put together this little expresion
if (preg_match_all('/^([^:\n\\]+):\s*(.+)/m', $this->structuredMessage, $data))
{
$data = array_combine($data[1], $data[2]);
//$data is assoc array FooID => 123456, Name => Chuck, ...
$report = new Report($data);
}
Now, this works allright most of the time, except for the User Message bit: . doesn't match new lines, because if I were to use the s flag, the second group would match everything after FooID: till the very end of the string.
I'm having to use a dirty workaround for that:
$msg = explode(end($data[1], $string);
$data[2][count($data[2])-1] = array_pop($msg);
After some testing, I've come to understand that sometimes, one or two of the parameters aren't filled in (for example the InternalID can be empty). In that case, my expression doesn't fail, but rather results in:
[1] => Array
(
[0] => FooID
[1] => Name
[2] => When
[3] => InternalID
)
[2] => Array
(
[0] => 123465
[1] => Chuck
[2] => 01/02/2013 01:23:45
[3] => User Comment: Hello,
)
I've been trying various other expressions, and came up with this:
/^([^:\n\\]++)\s{0,}:(.*+)(?!^[^:\n\\]++\s{0,}:)/m
//or:
/^([^:\n\\]+)\s{0,}:(.*)(?!^[^:\\\n]+\s{0,}:)/m
The second version being slightly slower.
That solves the issues I had with InternalID: <void>, but still leaves me with the final obstacle: User Message: <multi-line>. Using the s flag doesn't do the trick with my expression ATM.
I can only think of this:
^([^:\n\\]++)\s{0,}:((\n(?![^\n:\\]++\s{0,}:)|.)*+)
Which is, to my eye at least, too complex to be the only option. Ideas, suggestions, links, ... anything would be greatly appreciated
The following regex should work, but I'm not so sure anymore if it is the right tool for this:
preg_match_all(
'%^ # Start of line
([^:]*) # Match anything until a colon, capture in group 1
:\s* # Match a colon plus optional whitespace
( # Match and capture in group 2:
(?: # Start of non-capturing group (used for alternation)
.*$ # Either match the rest of the line
(?= # only if one of the following follows here:
\Z # The end of the string
| # or
\r?\n # a newline
[^:\n\\\\]* # followed by anything except colon, backslash or newline
: # then a colon
) # End of lookahead
| # or match
(?: # Start of non-capturing group (used for alternation/repetition)
[^:\\\\] # Either match a character except colon or backslash
| # or
\\\\. # match any escaped character
)* # Repeat as needed (end of inner non-capturing group)
) # End of outer non-capturing group
) # End of capturing group 2
$ # Match the end of the line%mx',
$subject, $result, PREG_PATTERN_ORDER);
See it live on regex101.
i'm pretty new to PHP so maybe this is totally out of whack, but maybe you could use something like
$data = <<<EOT
FooID: 123456
Name: Chuck
When: 01/02/2013 01:23:45
InternalID: 789654
User Message: Hello,
this is nillable, but can be quite long. Text can be spread out over many lines
And can start with any number of \n's. It can be empty, too
EOT;
if ($key = preg_match_all('~^[^:\n]+?:~m', $data, $match)) {
$val = explode('¬', preg_filter('~^[^:\n]+?:~m', '¬', $data));
array_shift($val);
$res = array_combine($match[0], $val);
}
print_r($res);
yields
Array
(
[FooID:] => 123456
[Name:] => Chuck
[When:] => 01/02/2013 01:23:45
[InternalID:] => 789654
[User Message:] => Hello,
this is nillable, but can be quite long. Text can be spread out over many lines
And can start with any number of
's. It can be empty, too
)
So here's what I came up with using a tricky preg_replace_callback():
$string ='FooID: 123456
Name: Chuck
When: 01/02/2013 01:23:45
InternalID: 789654
User Message: Hello,
this is nillable, but can be quite long. Text can be spread out over many lines
And can start with any number of \n\'s. It can be empty, too
Yellow:cool';
$array = array();
preg_replace_callback('#^(.*?):(.*)|.*$#m', function($m)use(&$array){
static $last_key = ''; // We are going to use this as a reference
if(isset($m[1])){// If there is a normal match (key : value)
$array[$m[1]] = $m[2]; // Then add to array
$last_key = $m[1]; // define the new last key
}else{ // else
$array[$last_key] .= PHP_EOL . $m[0]; // add the whole line to the last entry
}
}, $string); // Anonymous function used thus PHP 5.3+ is required
print_r($array); // print
Online demo
Downside: I'm using PHP_EOL to add newlines which is OS related.
I think I'd avoid using regex to do this task, instead split it into sub-tasks.
Basic algorithm outline
Split the string on \n using explode
Loop over the resulting array
Split the resulting strings on : also using explode with a limit of 2.
If the produced array's length is less than 2, add the entirety of the data to the previous key's value
Else, use the first array index as your key, the second as the value unless the split colon was escaped (in which case, instead add the key + split + value to the previous key's value)
This algorithm does assume there are no keys with escaped colons. Escaped colons in values will be dealt with just fine (i.e. user input).
Code
$str = <<<EOT
FooID: 123456
Name: Chuck
When: 01/02/2013 01:23:45
InternalID:
User Message: Hello,
this is nillable, but can be quite long. Text can be spread out over many lines
This\: works too. And can start with any number of \\n's. It can be empty, too.
What's worse, though is that this CAN contain colons (but they're _"escaped"_
using `\`) like so `\:`, and even basic markup!
EOT;
$arr = explode("\n", $str);
$prevKey = '';
$split = ': ';
$output = array();
for ($i = 0, $arrlen = sizeof($arr); $i < $arrlen; $i++) {
$keyValuePair = explode($split, $arr[$i], 2);
// ?: Is this a valid key/value pair
if (sizeof($keyValuePair) < 2 && $i > 0) {
// -> Nope, append the value to the previous key's value
$output[$prevKey] .= "\n" . $keyValuePair[0];
}
else {
// -> Maybe
// ?: Did we miss an escaped colon
if (substr($keyValuePair[0], -1) === '\\') {
// -> Yep, this means this is a value, not a key/value pair append both key and
// value (including the split between) to the previous key's value ignoring
// any colons in the rest of the string (allowing dates to pass through)
$output[$prevKey] .= "\n" . $keyValuePair[0] . $split . $keyValuePair[1];
}
else {
// -> Nope, create a new key with a value
$output[$keyValuePair[0]] = $keyValuePair[1];
$prevKey = $keyValuePair[0];
}
}
}
var_dump($output);
Output
array(5) {
["FooID"]=>
string(6) "123456"
["Name"]=>
string(5) "Chuck"
["When"]=>
string(19) "01/02/2013 01:23:45"
["InternalID"]=>
string(0) ""
["User Message"]=>
string(293) "Hello,
this is nillable, but can be quite long. Text can be spread out over many lines
This\: works too. And can start with any number of \n's. It can be empty, too.
What's worse, though is that this CAN contain colons (but they're _"escaped"_
using `\`) like so `\:`, and even basic markup!"
}
Online demo
I have bunch of strings like this:
a#aax1aay222b#bbx4bby555bbz6c#mmm1d#ara1e#abc
And what I need to do is to split them up based on the hashtag position to something like this:
Array
(
[0] => A
[1] => AAX1AAY222
[2] => B
[3] => BBX4BBY555BBZ6
[4] => C
[5] => MMM1
[6] => D
[7] => ARA1
[8] => E
[9] => ABC
)
So, as you see the character right behind the hashtag is captured plus everything after the hashtag just right before the next char+hashtag.
I've the following RegEx which works fine only when I have a numeric value in the end of each part.
Here is the RegEx set up:
preg_split('/([A-Z])+#/', $text, 0, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
And it works fine with something like this:
C#mmm1D#ara1
But, if I change it to this (removing the numbers):
C#mmmD#ara
Then it will be the result, which is not good:
Array
(
[0] => C
[1] => D
)
I've looked at this question and this one also, which are similar but none of them worked for me.
So, my question is why does it work only if it has followed by a number? and how I can solve it?
Here you can see some of them sample strings which I have:
a#123b#abcc#def456 // A:123, B:ABC, C:DEF456
a#abc1def2efg3b#abcdefc#8 // A:ABC1DEF2EFG3, B:ABCDEF, C:8
a#abcdef123b#5c#xyz789 // A:ABCDEF123, B:5, C:XYZ789
P.S. Strings are case-insensitive.
P.P.S. If you ever thinking what the hell are these strings, they are user submitted answers to a questionnaire, and I can't do anything on them like refactoring as they are already stored and just need to be proceed.
Why Not Using explode?
If you look at my examples you will see that I need to capture the character right before the # as well. If you think it's possible with explode() please post the output as well, thanks!
Update
Should we focus on why /([A-Z])+#/ works only if numbers included? thanks.
Instead of using preg_split(), decide what you want to match instead:
A set of "words" if followed by either <any-char># or <end-of-string>.
A character if immediately followed by #.
$str = 'a#aax1aay222b#bbx4bby555bbz6c#mmm1d#ara1e#abc';
preg_match_all('/\w+(?=.#|$)|\w(?=#)/', $str, $matches);
Demo
This expression uses two look-ahead assertions. The results are in $matches[0].
Update
Another way of looking at it would be this:
preg_match_all('/(\w)#(\w+)(?=\w#|$)/', $str, $matches);
print_r(array_combine($matches[1], $matches[2]));
Each entry starts with a single character, followed by a hash, followed by X characters until either the end of the string is encountered or the start of a next entry.
The output is this:
Array
(
[a] => aax1aay222
[b] => bbx4bby555bbz6
[c] => mmm1
[d] => ara1
[e] => abc
)
If you still want to use preg_split you can remove the + and it might work as expected:
'/([A-Z])#/i'
Since then you only match the hashtag and ONE alpha character before, and not all them.
Example: http://codepad.viper-7.com/z1kFDb
Edit: Added a case-insensitive flag i in the pattern.
Use explode() rather than Regexp
$tmpArray = explode("#","a#aax1aay222b#bbx4bby555bbz6c#mmm1d#ara1e#abc");
$myArray = array();
for($i = 0; $i < count($tmpArray) - 1; $i++) {
if (substr($tmpArray[$i],0,-1)) $myArray[] = substr($tmpArray[$i],0,-1);
if (substr($tmpArray[$i],-1)) $myArray[] = substr($tmpArray[$i],-1);
}
if (count($tmpArray) && $tmpArray[count($tmpArray) - 1]) $myArray[] = $tmpArray[count($tmpArray) - 1];
edit: I updated my answer to reflect better reading the questions
You can use explode() function that will split the string except the hash signs, like stated in the answers given before.
$myArray = explode("#",$string);
For the string 'a#aax1aay222b#bbx4bby555bbz6c#mmm1d#ara1e#abc' this returns something like
$myarray = array('a', 'aax1aay22b', 'bbx4bby555bbz6c' ....);
All you need now is to take the last character of each string in array as another item.
$copy = array();
foreach($myArray as $item){
$beginning = substr($item,0,strlen($item)-1); // this takes all characters except the last one
$ending = substr($item,-1); // this takes the last one
$copy[] = $beginning;
$copy[] = $ending;
} // end foreach
This is an example, not tested.
EDIT
Instead of substr($item,0,strlen($item)-1); you might use substr($item,0,-1);.
I am trying to match a semi dynamically generated string. So I can see if its the correct format, then extract the information from it that I need. My Problem is I no matter how hard I try to grasp regex can't fathom it for the life of me. Even with the help of so called generators.
What I have is a couple different strings like the following. [#img:1234567890] and [#user:1234567890] and [#file:file_name-with.ext]. Strings like this pass through are intent on passing through a filter so they can be replaced with links, and or more readable names. But again try as I might I can't come up with a regex for any given one of them.
I am looking for the format: [#word:] of which I will strip the [, ], #, and word from the string so I can then turn around an query my DB accordingly for whatever it is and work with it accordingly. Just the regex bit is holding me back.
Not sure what you mean by generators. I always use online matchers to see that my test cases work. #Virendra almost had it except forgot to escape the [] charaters.
/\[#(\w+):(.*)\]/
You need to start and end with a regex delimeter, in this case the '/' character.
Then we escape the '[]' which is use by regex to match ranges of characters hence the '['.
Next we match a literal '#' symbol.
Now we want to save this next match so we can use it later so we surround it with ().
\w matches a word. Basically any characters that aren't spaces, punctuation, or line characters.
Again match a literal :.
Maybe useful to have the second part in a match group as well so (.*) will match any character any number of times, and save it for you.
Then we escape the closing ] as we did earlier.
Since it sounds like you want to use the matches later in a query we can use preg_match to save the matches to an array.
$pattern = '/\[#(\w+):(.*)\]/';
$subject = '[#user:1234567890]';
preg_match($pattern, $subject, $matches);
print_r($matches);
Would output
array(
[0] => '[#user:1234567890]', // Full match
[1] => 'user', // First match
[2] => '1234567890' // Second match
)
An especially helpful tool I've found is txt2re
Here's what I would do.
<pre>
<?php
$subj = 'An image:[#img:1234567890], a user:[#user:1234567890] and a file:[#file:file_name-with.ext]';
preg_match_all('~(?<match>\[#(?<type>[^:]+):(?<value>[^\]]+)\])~',$subj,$matches,PREG_SET_ORDER);
foreach ($matches as &$arr) unset($arr[0],$arr[1],$arr[2],$arr[3]);
print_r($matches);
?>
</pre>
This will output
Array
(
[0] => Array
(
[match] => [#img:1234567890]
[type] => img
[value] => 1234567890
)
[1] => Array
(
[match] => [#user:1234567890]
[type] => user
[value] => 1234567890
)
[2] => Array
(
[match] => [#file:file_name-with.ext]
[type] => file
[value] => file_name-with.ext
)
)
And here's a pseudo version of how I would use the preg_replace_callback() function:
function replace_shortcut($matches) {
global $users;
switch (strtolower($matches['type'])) {
case 'img' : return '<img src="images/img_'.$matches['value'].'jpg" />';
case 'file' : return ''.$matches['value'].'';
// add id of each user in array
case 'user' : $users[] = (int) $matches['value']; return '%s';
default : return $matches['match'];
}
}
$users = array();
$replaceArr = array();
$subj = 'An image:[#img:1234567890], a user:[#user:1234567890] and a file:[#file:file_name-with.ext]';
// escape percentage signs to avoid complications in the vsprintf function call later
$subj = strtr($subj,array('%'=>'%%'));
$subj = preg_replace_callback('~(?<match>\[#(?<type>[^:]+):(?<value>[^\]]+)\])~',replace_shortcut,$subj);
if (!empty($users)) {
// connect to DB and check users
$query = " SELECT `id`,`nick`,`date_deleted` IS NOT NULL AS 'deleted'
FROM `users` WHERE `id` IN ('".implode("','",$users)."')";
// query
// ...
// and catch results
while ($row = $con->fetch_array()) {
// position of this id in users array:
$idx = array_search($row['id'],$users);
$nick = htmlspecialchars($row['nick']);
$replaceArr[$idx] = $row['deleted'] ?
"<span class=\"user_deleted\">{$nick}</span>" :
"{$nick}";
// delete this key so that we can check id's not found later...
unset($users[$idx]);
}
// in here:
foreach ($users as $key => $value) {
$replaceArr[$key] = '<span class="user_unknown">User'.$value.'</span>';
}
// replace each user reference marked with %s in $subj
$subj = vsprintf($subj,$replaceArr);
} else {
// remove extra percentage signs we added for vsprintf function
$subj = preg_replace('~%{2}~','%',$subj);
}
unset($query,$row,$nick,$idx,$key,$value,$users,$replaceArr);
echo $subj;
You can try something like this:
/\[#(\w+):([^]]*)\]/
\[ escapes the [ character (otherwise interpreted as a character set); \w means any "word" character, and [^]]* means any non-] character (to avoid matching past the end of the tag, as .* might). The parens group the various matched parts so that you can use $1 and $2 in preg_replace to generate the replacement text:
echo preg_replace('/\[#(\w+):([^]]*)\]/', '$1 $2', '[#link:abcdef]');
prints link abcdef