Split string into associative array (while maintaining characters) - php

I'm trying to figure out how to split a string that looks like this :
a20r51fx500fy3000
into an associative array that will look like this :
array(
'a' => 20,
'r' => 51,
'fx' => 500,
'fy' => 3000,
);
I don't think I can use preg_split as this will drop the character I'm splitting on (I tried /[a-zA-Z]/ but obviously that didn't do what I wanted it to). I'd prefer if I could do it using some kind of built-in function, but I don't really mind looping if that's required.
Any help would be much appreciated!

Multiple Matches and PREG_SET_ORDER
Do this:
$yourstring = "a20r51fx500fy3000";
$regex = '~([a-z]+)(\d+)~';
preg_match_all($regex,$yourstring,$matches,PREG_SET_ORDER);
$yourarray=array();
foreach($matches as $m) {
$yourarray[$m[1]] = $m[2];
}
print_r($yourarray);
Output:
Array ( [a] => 20 [r] => 51 [fx] => 500 [fy] => 3000 )
If your string can contain upper-case letters, make the regex case-insensitive by adding the i flag after the closing delimiter: $regex = '~([a-z]+)(\d+)~i';
Explanation
([a-z]+) captures letters to Group 1
(\d+) captures digits to Group 1
$yourarray[$m[1]] = $m[2]; creates in index for the letters, and assigns the digits

Related

PHP split string into integer, string and special character

I need to split this format of strings CF12:10 into array like below,
[0] => CF, [1] => 12, [2] => 10
Numbers and String of the provided string can be any length. I have found the php preg_match function but don't know how to make regular expression for my case. Any solution would be highly appreciated.
You could use this regex to match the individual parts:
^(\D+)(\d+):(.*)$
It matches start of string, some number of non-digit characters (\D+), followed by some number of digits (\d+), a colon and some number of characters after the : and before end-of-line. In PHP you can use preg_match to then find all the matching groups:
$input = 'CF12:10';
preg_match('/^(\D+)(\d+):(.*)$/', $input, $matches);
array_shift($matches);
print_r($matches);
Output:
Array
(
[0] => CF
[1] => 12
[2] => 10
)
Demo on 3v4l.org
Try the following code if it helps you
$str = 'C12:10';
$arr = preg_match('~^(.*?)(\d+):(.*)~m', $str, $matches);
array_shift($matches);
echo '<pre>';print_r($matches);

Regex for matching phrase between '' php

I want to match the content inside the ' and ' (single quotes). For example: 'for example' should return for and example. It's only a part of the sentence I have to analyze, I used preg_split(\s) for the whole sentence, so the 'for example' will become 'for and example'.
Right now I've tried /^'(.*)|(.*)'$/ and it only returns for but not the example, if I put it like /^(.*)'|'(.*)$/, it only returns example but not for. How should I fix this?
You can avoid double handling of the string by leveraging the \G metacharacter to continue matching an unlimited number of space-delimited strings inside of single quotes.
Code: (PHP Demo) (Regex Demo)
$string = "text 'for an example of the \"continue\" metacharacter' text";
var_export(preg_match_all("~(?|'|\G(?!^) )\K[^ ']+~", $string, $out) ? $out[0] : []);
Output:
array (
0 => 'for',
1 => 'an',
2 => 'example',
3 => 'of',
4 => 'the',
5 => '"continue"',
6 => 'metacharacter',
)
To get the single sentences (which you then want to split) you can use preg_match_all() to capture anything between two single quotes.
preg_match_all("~'([^']+)'~", $text, $matches)
$string = $matches[1];
$string now contains something like "example string with words".
Now if you want to split a string according to a specific sequence / character, you can make use of explode():
$string = "example string with words";
$result = explode(" ", $string);
print_r($result);
gives you:
Array
(
[0] => example
[1] => string
[2] => with
[3] => words
)

Regular expression to extract a numeric value on a changing position within a variable string

How can I extract the bold numeric part of a string, when the most of the string can change? /data/ is always present and followed by the relevant, variable, numeric part (in this case 123456).
differentcontentLocationhttps://example.com/api/result/13548/data/123456differentstuffincludingwhitespacesandnewlines8484
$str = "differentcontentLocationhttps://example.com/api/result/13548/data/123456differentstuffincludingwhitespacesandnewlines8484";
$str2 = "differentcontentLocationhttps://example.com/api/result/13548/data/123456";
In this example I need 123456. The only constant parts in the string are /data/ and maybe the first part of the URL, like https://.
preg_match("#/data/([0-9]+)([^0-9]+)#siU", $str, $matches);
Results in Array ( [0] => /data/123456d [1] => 123456 [2] => d ), what would be acceptable. But if there's nothing following the relevant numeric part, like in $str2, this expression fails. I've tried to make the tailing part optional with preg_match("#/ads/([0-9]+)(([^0-9]+)?)#siU", $x, $matches);, but it fails, too; returning only the first number of the numeric part.
The U greediness swapping modifier makes all greedy subpattern lazy here, you should remove it together with ([^0-9]+). You also do not need DOTALL modifier because there is no . in your pattern whose behavior could be modified with that s flag.
preg_match("#/data/([0-9]+)#i", $str, $matches);
Now, the pattern will match:
/data/ - a sequence of literal chars
([0-9]+) - Group 1 capturing 1+ digits (same as (\d+))
See the PHP demo.
$str = "differentcontentLocationhttps://e...content-available-to-author-only...e.com/api/result/13548/data/123456differentstuffincludingwhitespacesandnewlines8484";
$str2 = "differentcontentLocationhttps://e...content-available-to-author-only...e.com/api/result/13548/data/123456";
preg_match("#/data/([0-9]+)#i", $str, $matches);
print_r($matches); // Array ( [0] => /data/123456 [1] => 123456 )
preg_match("#/data/([0-9]+)#i", $str2, $matches2);
print_r($matches2); // Array ( [0] => /data/123456 [1] => 123456 )

preg_split and multiple delimiters

let me start by saying the first number before the first - will be the ID I need to extract. from the first - to the first / will be the 'name' I need to extract. Everything after that I do not care for.
Test String:
1-gc-communications/edit/profile_picture
Expected Output:
Array ( [0] => 1 [1] => gc-communications [2] => /edit/profile_picture )
The best I could come up with was the following patterns (along with their results - with a limit of 3)
Pattern: /-|edit\/profile_picture/
Result: Array ( [0] => 1 [1] => gc [2] => communications/edit/profile_picture )
^ This one is flawed because it does both dashes.
Pattern: /~-~|edit\/profile_picture/
Result: Array ( [0] => 1-gc-communications/ [1] => )
^ major fail.
I know I can do a 2-element limit and just break on the first / and then do a preg_split on the result array, but I would love a way to make this work with one line.
If this is a no-go I am open to other "one liner" solutions.
Try this one
$str = '1-gc-communications/edit/profile_picture';
$match = preg_split('#([^-]+)-([^/]+)/(.*)#', $str, 0, PREG_SPLIT_DELIM_CAPTURE);
print_r($match);
return like as
array (
0 => '',
1 => '1',
2 => 'gc-communications',
3 => 'edit/profile_picture',
4 => '',
)
the first number before the first - will be the ID I need to extract. from the first - to the first / will be the 'name' I need to extract. Everything after that I do not care for.
This task seems a great candidate for sscanf() -- it is specifically designed for parsing (scanning) a formatted string. Not only is the syntax brief, you know that you do not need to make repeated matches with the pattern. The output, in case it matters, can be pre-cast as an integer or string for convenience. The remaining string from the first occurring slash are simply ignored.
Code: (Demo)
$str = '1-gc-communications/edit/profile_picture';
var_export(
sscanf($str, '%d-%[^/]')
# ^^ ^^^^^- greedily match one or more non-slash characters
# ^^------- greedily match one or more numeric characters
);
Output:
array (
0 => 1, #<-- integer-typed
1 => 'gc-communications', #<-- string-typed
)

Why does this regex have 3 matches, not 5?

I wrote a pretty simple preg_match_all file in PHP:
$fileName = 'A_DATED_FILE_091410.txt';
$matches = array();
preg_match_all('/[0-9][0-9]/',$fileName,$matches);
print_r($matches);
My Expected Output:
$matches = array(
[0] => array(
[0] => 09,
[1] => 91,
[2] => 14,
[3] => 41,
[4] => 10
)
)
What I got instead:
$matches = array(
[0] => array(
[0] => 09,
[1] => 14,
[2] => 10
)
)
Now, in this particular use case this was preferable, but I'm wondering why it didn't match the other substrings? Also, is a regex possible that would give me my expected output, and if so, what is it?
With a global regex (which is what preg_match_all uses), once a match is made, the regex engine continues searching the string from the end of the previous match.
In your case, the regular expression engine starts at the beginning of the string, and advances until the 0, since that is the first character that matches [0-9]. It then advances to the next position (9), and since that matches the second [0-9], it takes 09 as a match. When the engine continues matching (since it has not yet reached the end of the string), it advances its position again (to 1) (and then the above repeats).
See also: First Look at How a Regex Engine Works Internally
If you must get every 2 digit sequence, you can use preg_match and use offsets to determine where to start capturing from:
$fileName = 'A_DATED_FILE_091410.txt';
$allSequences = array();
$matches = array();
$offset = 0;
while (preg_match('/[0-9][0-9]/', $fileName, $matches, PREG_OFFSET_CAPTURE, $offset))
{
list($match, $offset) = $matches[0];
$allSequences[] = $match;
$offset++; // since the match is 2 digits, we'll start the next match after the first
}
Note that the offset returned with the PREG_OFFSET_CAPTURE flag is the start of the match.
I've got another solution that will get five matches without having to use offsets, but I'm adding it here just for curiosity, and I probably wouldn't use it myself in production code (it's a somewhat complex regex too). You can use a regex that uses a lookbehind to look for a number before the current position, and captures the number in the lookbehind (in general, lookarounds are non-capturing):
(?<=([0-9]))[0-9]
Let's walk through this regex:
(?<= # open a positive lookbehind
( # open a capturing group
[0-9] # match 0-9
) # close the capturing group
) # close the lookbehind
[0-9] # match 0-9
Because lookarounds are zero-width and do not move the regex position, this regular expression will match 5 times: the engine will advance until the 9 (because that is the first position which satisfies the lookbehind assertion). Since 9 matches [0-9], the engine will take 9 as a match (but because we're capturing in the lookaround, it'll also capture the 0!). The engine then moves to the 1. Again, the lookbehind succeeds (and captures), and the 1 is added as a 1st subgroup match (and so on, until the engine hits the end of the string).
When we give this pattern to preg_match_all, we'll end up with an array that looks like (using the PREG_SET_ORDER flag to group capturing groups along with the full match):
Array
(
[0] => Array
(
[0] => 9
[1] => 0
)
[1] => Array
(
[0] => 1
[1] => 9
)
[2] => Array
(
[0] => 4
[1] => 1
)
[3] => Array
(
[0] => 1
[1] => 4
)
[4] => Array
(
[0] => 0
[1] => 1
)
)
Note that each "match" has its digits out of order! This is because the capture group in the lookbehind becomes backreference 1 while the whole match is backreference 0. We can put it back together in the correct order though:
preg_match_all('/(?<=([0-9]))[0-9]/', $fileName, $matches, PREG_SET_ORDER);
$allSequences = array();
foreach ($matches as $match)
{
$allSequences[] = $match[1] . $match[0];
}
The search for the next match starts at the first character after the previous match. So when 09 is matched in 091410, the search for the next match starts at 1410.
Also, is a regex possible that would
give me my expected output, and if so,
what is it?
No single one will work because it won't match the same section twice. But you could do something like this:
$i = 0;
while (preg_match($pattern, $subject, $matches, PREG_OFFSET_CAPTURE, $i))
{
$i = $matches[0][1]; /* + 1 in many cases */
}
The above is not safe for the general case. You could get stuck in an infinite loop, depending on the pattern. Also, you may not want [0][1], but instead something like [1][1] etc, again, depending on the pattern.
For this particular case, I think it would be much simpler to do it yourself:
$l = strlen($s);
$prev_digit = false;
for ($i = 0; $i < $l; ++$i)
{
if ($s[$i] >= '0' && $s[$i] <= '9')
{
if ($prev_digit) { /* found match */ }
$prev_digit = true;
}
else
$prev_digit = false;
}
Just for fun, another way to do it :
<?php
$fileName = 'A_DATED_FILE_091410.txt';
$matches = array();
preg_match_all('/(?<=([0-9]))[0-9]/',$fileName,$matches);
$result = array();
foreach($matches[1] as $i => $behind)
{
$result[] = $behind . $matches[0][$i];
}
print_r($result);
?>

Categories