PHP Regular Expression to break apart a Serialized String [closed] - php

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
I have a string of the format:
$15?1?2/:1$16E/:2$17?6?7/:6$19E/:7$3E/
I want to use preg_split() to break this down into an array but I can't seem to get the regex right. Specifically I want to get an array with all the numerical values directly following each $.
So in this case:
[0] => 15
[1] => 16
[2] => 17
[3] => 19
[4] => 3
If someone could explain the regex to me that would produce this that would be amazing.

Split vs. Match All
Splitting and matching are two sides of the same coin. You don't even need to split: this returns the exact array you are looking for (see PHP demo).
$regex = '~\$\K\d+~';
$count = preg_match_all($regex, $yourstring, $matches);
print_r($matches[0]);
Output
Array
(
[0] => 15
[1] => 16
[2] => 17
[3] => 19
[4] => 3
)
Explanation
\$ matches a $
The \K tells the engine to drop what was matched so far from the final match it returns
\d+ matches your digits
Hang tight for explanation. :)

Or this:
$preg = preg_match_all("/\$(\d+)/", $input, $output);
print_r($output[1]);
http://www.phpliveregex.com/p/5Rc

Here is non-regular expression example:
$string = '$15?1?2/:1$16E/:2$17?6?7/:6$19E/:7$3E/';
$array = array_map( function( $item ) {
return intval( $item );
}, array_filter( explode( '$', $string ) ) );
Idea is to explode the string by $ character, and to map that array and use the intval() to get the integer value.
Here is preg_split() example that captures the delimiter:
$string = '$15?1?2/:1$16E/:2$17?6?7/:6$19E/:7$3';
$array = preg_split( '/(?<=\$)(\d+)(?=\D|$)/', $string, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE );
/*
(?<=\$) look behind to see if there is: '$'
( group and capture to \1:
\d+ digits (0-9) (1 or more times (greedy))
) end of \1
(?=\D|$) look ahead to see if there is: non-digit (all but 0-9) OR the end of the string
*/
With a help of this post, a interesting way to get every second value from resulting array.
$array = array_intersect_key( $array, array_flip( range( 1, count( $array ), 2 ) ) );

Related

PHP split string into integer, string and special character

I need to split this format of strings CF12:10 into array like below,
[0] => CF, [1] => 12, [2] => 10
Numbers and String of the provided string can be any length. I have found the php preg_match function but don't know how to make regular expression for my case. Any solution would be highly appreciated.
You could use this regex to match the individual parts:
^(\D+)(\d+):(.*)$
It matches start of string, some number of non-digit characters (\D+), followed by some number of digits (\d+), a colon and some number of characters after the : and before end-of-line. In PHP you can use preg_match to then find all the matching groups:
$input = 'CF12:10';
preg_match('/^(\D+)(\d+):(.*)$/', $input, $matches);
array_shift($matches);
print_r($matches);
Output:
Array
(
[0] => CF
[1] => 12
[2] => 10
)
Demo on 3v4l.org
Try the following code if it helps you
$str = 'C12:10';
$arr = preg_match('~^(.*?)(\d+):(.*)~m', $str, $matches);
array_shift($matches);
echo '<pre>';print_r($matches);

preg_split to seperate input [duplicate]

This question already has answers here:
Split string on spaces except words in quotes
(4 answers)
Closed 3 years ago.
I'm building a website using PHP.
I am using a preg_split() to separate a given string which looks like +word1 -word2 -"word word".
But I need them in the following form +word1, -word2, -"word word".
Currently, I have this one:
$words = preg_split("/[\s\"]*\"([^\"]+)\"[\s\"]*|" . "[\s\"]*'([^']+)'[\s\"]*|" . "[\s\"]+/", $search_expression, 0, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
but it didn't work as I wish: I need to do it in this way to get it work:
+word1 -word2 '-"word word"'.
Does someone have a better regex or idea?
One option is to match from a double quote till a double quote and don't split on that match using SKIP FAIL. Then match 1+ horizontal whitespace chars to split on.
"[^"]+"(*SKIP)(*FAIL)|\h+
Regex demo | Php demo
For example
$search_expression = '+word1 -word2 -"word word"';
$words = preg_split("~\"[^\"]+\"(*SKIP)(*FAIL)|\h+~", $search_expression);
print_r($words);
Output
Array
(
[0] => +word1
[1] => -word2
[2] => -"word word"
)
A simpler expression with greedy ? works for matching your examples:
preg_match_all('/[+-][^+-]+ ?/', $search_expression, $matches);
print_r($matches[0]);
Yields:
Array
(
[0] => +word1
[1] => -word2
[2] => -"word word"
)
Se Example.

How to split a string into an array using a given regex expression

I am trying to explode / preg_split a string so that I get an array of all the values that are enclosed in ( ). I've tried the following code but I always get an empty array, I have tried many things but I cant seem to do it right
Could anyone spot what am I missing to get my desired output?
$pattern = "/^\(.*\)$/";
$string = "(y3,x3),(r4,t4)";
$output = preg_split($pattern, $string);
print_r($output);
Current output Array ( [0] => [1] => )
Desired output Array ( [0] => "(y3,x3)," [1] => "(r4,t4)" )
With preg_split() your regex should be matching the delimiters within the string to split the string into an array. Your regex is currently matching the values, and for that, you can use preg_match_all(), like so:
$pattern = "/\(.*?\)/";
$string = "(y3,x3),(r4,t4)";
preg_match_all($pattern, $string, $output);
print_r($output[0]);
This outputs:
Array
(
[0] => (y3,x3)
[1] => (r4,t4)
)
If you want to use preg_split(), you would want to match the , between ),(, but without consuming the parenthesis, like so:
$pattern = "/(?<=\)),(?=\()/";
$string = "(y3,x3),(r4,t4)";
$output = preg_split($pattern, $string);
print_r($output);
This uses a positive lookbehind and positive lookahead to find the , between the two parenthesis groups, and split on them. It also output the same as the above.
You can use a simple regex like \B,\B to split the string and improve the performance by avoiding lookahead or lookbehind regex.
\B is a non-word boundary so it will match only the , between ) and (
Here is a working example:
http://regex101.com/r/cV7bO7/1
$pattern = "/\B,\B/";
$string = "(y3,x3),(r4,t4),(r5,t5)";
$result = preg_split($pattern, $string);
$result will contain:
Array
(
[0] => (y3,x3)
[1] => (r4,t4)
[2] => (r5,t5)
)

Two or more matches in expression

Is it possible to make two matches of text - /123/123/123?edit
I need to match 123, 123 ,123 and edit
For the first(123,123,123): pattern is - ([^\/]+)
For the second(edit): pattern is - ([^\?=]*$)
Is it possible to match in one preg_match_all function, or I need to do it twice - one time for one pattern, second one for second?
Thanks !
You can do this with a single preg_match_all call:
$string = '/123/123/123?edit';
$matches = array();
preg_match_all('#(?<=[/?])\w+#', $string, $matches);
/* $matches will be:
Array
(
[0] => Array
(
[0] => 123
[1] => 123
[2] => 123
[3] => edit
)
)
*/
See this in action at http://www.ideone.com/eb2dy
The pattern ((?<=[/?])\w+) uses a lookbehind to assert that either a slash or a question mark must precede a sequence of word characters (\w is a shorthand class equivalent to [a-z0-9_]).

Why does this regex have 3 matches, not 5?

I wrote a pretty simple preg_match_all file in PHP:
$fileName = 'A_DATED_FILE_091410.txt';
$matches = array();
preg_match_all('/[0-9][0-9]/',$fileName,$matches);
print_r($matches);
My Expected Output:
$matches = array(
[0] => array(
[0] => 09,
[1] => 91,
[2] => 14,
[3] => 41,
[4] => 10
)
)
What I got instead:
$matches = array(
[0] => array(
[0] => 09,
[1] => 14,
[2] => 10
)
)
Now, in this particular use case this was preferable, but I'm wondering why it didn't match the other substrings? Also, is a regex possible that would give me my expected output, and if so, what is it?
With a global regex (which is what preg_match_all uses), once a match is made, the regex engine continues searching the string from the end of the previous match.
In your case, the regular expression engine starts at the beginning of the string, and advances until the 0, since that is the first character that matches [0-9]. It then advances to the next position (9), and since that matches the second [0-9], it takes 09 as a match. When the engine continues matching (since it has not yet reached the end of the string), it advances its position again (to 1) (and then the above repeats).
See also: First Look at How a Regex Engine Works Internally
If you must get every 2 digit sequence, you can use preg_match and use offsets to determine where to start capturing from:
$fileName = 'A_DATED_FILE_091410.txt';
$allSequences = array();
$matches = array();
$offset = 0;
while (preg_match('/[0-9][0-9]/', $fileName, $matches, PREG_OFFSET_CAPTURE, $offset))
{
list($match, $offset) = $matches[0];
$allSequences[] = $match;
$offset++; // since the match is 2 digits, we'll start the next match after the first
}
Note that the offset returned with the PREG_OFFSET_CAPTURE flag is the start of the match.
I've got another solution that will get five matches without having to use offsets, but I'm adding it here just for curiosity, and I probably wouldn't use it myself in production code (it's a somewhat complex regex too). You can use a regex that uses a lookbehind to look for a number before the current position, and captures the number in the lookbehind (in general, lookarounds are non-capturing):
(?<=([0-9]))[0-9]
Let's walk through this regex:
(?<= # open a positive lookbehind
( # open a capturing group
[0-9] # match 0-9
) # close the capturing group
) # close the lookbehind
[0-9] # match 0-9
Because lookarounds are zero-width and do not move the regex position, this regular expression will match 5 times: the engine will advance until the 9 (because that is the first position which satisfies the lookbehind assertion). Since 9 matches [0-9], the engine will take 9 as a match (but because we're capturing in the lookaround, it'll also capture the 0!). The engine then moves to the 1. Again, the lookbehind succeeds (and captures), and the 1 is added as a 1st subgroup match (and so on, until the engine hits the end of the string).
When we give this pattern to preg_match_all, we'll end up with an array that looks like (using the PREG_SET_ORDER flag to group capturing groups along with the full match):
Array
(
[0] => Array
(
[0] => 9
[1] => 0
)
[1] => Array
(
[0] => 1
[1] => 9
)
[2] => Array
(
[0] => 4
[1] => 1
)
[3] => Array
(
[0] => 1
[1] => 4
)
[4] => Array
(
[0] => 0
[1] => 1
)
)
Note that each "match" has its digits out of order! This is because the capture group in the lookbehind becomes backreference 1 while the whole match is backreference 0. We can put it back together in the correct order though:
preg_match_all('/(?<=([0-9]))[0-9]/', $fileName, $matches, PREG_SET_ORDER);
$allSequences = array();
foreach ($matches as $match)
{
$allSequences[] = $match[1] . $match[0];
}
The search for the next match starts at the first character after the previous match. So when 09 is matched in 091410, the search for the next match starts at 1410.
Also, is a regex possible that would
give me my expected output, and if so,
what is it?
No single one will work because it won't match the same section twice. But you could do something like this:
$i = 0;
while (preg_match($pattern, $subject, $matches, PREG_OFFSET_CAPTURE, $i))
{
$i = $matches[0][1]; /* + 1 in many cases */
}
The above is not safe for the general case. You could get stuck in an infinite loop, depending on the pattern. Also, you may not want [0][1], but instead something like [1][1] etc, again, depending on the pattern.
For this particular case, I think it would be much simpler to do it yourself:
$l = strlen($s);
$prev_digit = false;
for ($i = 0; $i < $l; ++$i)
{
if ($s[$i] >= '0' && $s[$i] <= '9')
{
if ($prev_digit) { /* found match */ }
$prev_digit = true;
}
else
$prev_digit = false;
}
Just for fun, another way to do it :
<?php
$fileName = 'A_DATED_FILE_091410.txt';
$matches = array();
preg_match_all('/(?<=([0-9]))[0-9]/',$fileName,$matches);
$result = array();
foreach($matches[1] as $i => $behind)
{
$result[] = $behind . $matches[0][$i];
}
print_r($result);
?>

Categories