Undesired output for php array creation

Undesired output for php array creation - php

I am trying to create an array from input in a textarea.
I have a file named with a textarea.The textarea might contain values like(exactly the way it looks):
CREATE this.
DO that.
STOP it.
Basically,I want to use PHP to :
Create an array out of the values given by the textarea for example,the array madefrom th values of the textarea above is meant to be :
Array
(
[0] => create this
[1] => do this
[2] => Stop it
)
I'v e tried the following code:
<?php
$wholecode=$_POST['code'];
$code=explode('.',trim(strtolower($wholecode)));//convert code to array
$words=explode(' ', $code);
print_r($code);
I get
Array
(
[0] => create this
[1] =>
do this
[2] =>
stop it
[3] =>
)
As it clearly shows,that is not what I want.Please help

You just need to tidy up the array contents after creating it. You have things like new lines and potentially other whitespace around the content.
This uses array_map() to trim() each entry in the array. Then uses array_filter() to remove any empty elements (calling it with no callback will do this).
$wholecode=$_POST['code'];
$code=explode('.',trim(strtolower($wholecode)));//convert code to array
$code=array_map("trim", $code );
$code = array_filter($code);
print_r($code);

Here's a direct approach using a simple regex pattern to explode on dots followed by zero or more whitespace characters. No mopping up after splitting.
Code (Demo)
$_POST['code'] = 'CREATE this.
DO that.
STOP it.';
var_export(preg_split('~\.\s*~', strtolower($_POST['code']), -1, PREG_SPLIT_NO_EMPTY));
Output:
array (
0 => 'create this',
1 => 'do that',
2 => 'stop it',
)
As a variable...
$array = preg_split('~\.\s*~', strtolower($_POST['code']), -1, PREG_SPLIT_NO_EMPTY)
Explanation transferred from my comment:
Pattern Demo: https://regex101.com/r/jygaQ1/1
There are 3 dots in your sample data. The first two have whitespaces characters that immediately follow. The final dot has no trailing whitespace characters.
The \s* means "match zero or more whitespace characters.
-1 means perform unlimited explosions.
PREG_SPLIT_NO_EMPTY means that on the final explosion (the last dot) there will be an empty element generated, but preg_split() will disregard it in the output array.

Related

Get all matches of repeating subgroup [duplicate]

I'm trying to get all substrings matched with a multiplier:
$list = '1,2,3,4';
preg_match_all('|\d+(,\d+)*|', $list, $matches);
print_r($matches);
This example returns, as expected, the last match in [1]:
Array
(
[0] => Array
(
[0] => 1,2,3,4
)
[1] => Array
(
[0] => ,4
)
)
However, I would like to get all strings matched by (,\d+), to get something like:
Array
(
[0] => ,2
[1] => ,3
[2] => ,4
)
Is there a way to do this with a single function such as preg_match_all()?

According to Kobi (see comments above):
PHP has no support for captures of the same group
Therefore this question has no solution.

It's true that PHP (or better to say PCRE) doesn't store values of repeated capturing groups for later access (see PCRE docs):
If a capturing subpattern is matched repeatedly, it is the last portion of the string that it matched that is returned.
But in most cases the known token \G does the job. \G 1) matches the beginning of input string (as \A or ^ when m modifier is not set) or 2) starts match from where the previous match ends. Saying that, you have to use it like the following:
preg_match_all('/^\d+|\G(?!^)(,?\d+)\K/', $list, $matches);
See live demo here
or if capturing group doesn't matter:
preg_match_all('/\G,?\d+/', $list, $matches);
by which $matches will hold this (see live demo):
Array
(
[0] => Array
(
[0] => 1
[1] => ,2
[2] => ,3
[3] => ,4
)
)
Note: the benefit of using \G over the other answers (like explode() or lookbehind solution or just preg_match_all('/,?\d+/', ...)) is that you are able to validate the input string to be only in the desired format ^\d+(,\d+)*$ at the same time while exporting the matches:
preg_match_all('/(?:^(?=\d+(?:,\d+)*$)|\G(?!^),)\d+/', $list, $matches);

Using lookbehind is a way to do the job:
$list = '1,2,3,4';
preg_match_all('|(?<=\d),\d+|', $list, $matches);
print_r($matches);
All the ,\d+ are in group 0.
output:
Array
(
[0] => Array
(
[0] => ,2
[1] => ,3
[2] => ,4
)
)

Splitting is only an option when the character to split isn't used in the patterns to match itself.
I had a situation where a badly formatted comma separated line has to be parsed into any of a number of known options.
i.e. options '1,2', '2', '2,3'
subject '1,2,3'.
Splitting on ',' will result in '1', '2', and '3'; only one ('2') of which is a valid match, this happens because the separator is also part of the options.
The naïve regex would be something like '~^(1,2|2|2,3)(?:,(1,2|2|2,3))*$~i', but this runs into the problem of same-group captures.
My "solution" was to just expand the regex to match the maximum number of matches possible:
'~^(1,2|2|2,3)(?:,(1,2|2|2,3))?(?:,(1,2|2|2,3))?$~i'
(if more options were available, just repeat the '(?:,(1,2|2|2,3))?' bit.
This does result in empty string results for "unused" matches.
It's not the cleanest solution, but works when you have to deal with badly formatted input data.

Why not just:
$ar = explode(',', $list);
print_r($ar);

From http://www.php.net/manual/en/regexp.reference.repetition.php :
When a capturing subpattern is repeated, the value captured is the substring that matched the final iteration.
Also similar thread:
How to get all captures of subgroup matches with preg_match_all()?

PHP preg_split adds a blank array key that can't be cleared by array_filter because there's a 'space' in it

I'm trying to use preg_split to split a text that has an odd number of new lines between paragraphs but there are also on some of those new lines(also odd) a few 'spaces'(empty spaces) but the regular expression that I'm using is not able to bypass those 'spaces' and instead it includes them in my array:
Array
(
[0] => Dummy text
[2] =>
[3] => more dummy text after some lines
[5] =>
[7] => even more dummy text
)
Here is the regular expression example: https://3v4l.org/2aMNN
preg_split('/(\r\n|\n|\r)/', $p)
So far I've used a foreach loop to clean that up:
foreach($arr as $v){
if(!empty($v){
//do something
}
}
But I'm pretty sure there's a better solution to this X_X :-s

You can use preg_split with the PREG_SPLIT_NO_EMPTY flag to remove completely empty values from the output, but you also need to include whitespace adjacent to newlines in your regex to avoid getting lines which just have spaces in them in your output. This will work ($p is copied from your demo):
$arr = preg_split('/[\r\n]+\s*/', $p, -1, PREG_SPLIT_NO_EMPTY);
print_r($arr);
Output:
Array (
[0] => Dummy text
[1] => more dummy text after some lines
[2] => even more dummy text
)
Demo on 3v4l.org

Use the PREG_SPLIT_NO_EMPTY flag.
$p ='
foo
bar
biz
';
print_r(preg_split('/(\r\n|\n|\r)/', $p, 0, PREG_SPLIT_NO_EMPTY));
Output:
Array
(
[0] => foo
[1] => bar
[2] => biz
)
See it live
For reference
http://php.net/manual/en/function.preg-split.php
PREG_SPLIT_NO_EMPTY
If this flag is set, only non-empty pieces will be returned by preg_split().
As a Bonus
A regex such as this '/[\r\n]/' is sufficient for what you want. Because \r is in it, \r\n is also in it, and \n is in there too(big surprise right). You might be thinking "well on windows it's \r\n, won't that split 2x". Sure it will, but it doesn't matter because of the No Empty flag.
Even if that worries you you can just add a + to the end like '/[\r\n]+/', so :-p, which now that I think of it, might be a bit more "faster" but I digress.
P.S. If you use the last one with the +, you don't even need the flag (if you trim it). So there 2 answers Sandbox.
Simple!

Using regex to not match periods between numbers

I have a regex code that splits strings between [.!?], and it works, but I'm trying to add something else to the regex code. I'm trying to make it so that it doesn't match [.] that's between numbers. Is that possible? So, like the example below:
$input = "one.two!three?4.000.";
$inputX = preg_split("~(?>[.!?]+)\K(?!$)~", $input);
print_r($inputX);
Result:
Array ( [0] => one. [1] => two! [2] => three? [3] => 4. [4] => 000. )
Need Result:
Array ( [0] => one. [1] => two! [2] => three? [3] => 4.000. )

You should be able to split on this:
(?<=(?<!\d(?=[.!?]+\d))[.!?])(?![.!?]|$)
https://regex101.com/r/kQ6zO4/1
It uses lookarounds to determine where to split. It looks behind to try to match anything in the set [.!?] one or more times as long as it isn't preceded by and succeeded by a digit.
It also won't return the last empty match by ensuring the last set isn't the end of the string.
UPDATE:
This should be much more efficient actually:
(?!\d+\.\d+).+?[.!?]+\K(?!$)
https://regex101.com/r/eN7rS8/1
Here is another possibility using regex flags:
$input = "one.two!three???4.000.";
$inputX = preg_split("~(\d+\.\d+[.!?]+|.*?[.!?]+)~", $input, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
print_r($inputX);
It includes the delimiter in the split and ignores empty matches. The regex can be simplified to ((?:\d+\.\d+|.*?)[.!?]+), but I think what is in the code sample above is more efficient.

php preg_match s and m modifiers not working for multiple lines

I have the following input string which consists of multiple lines:
BYTE $66,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$13,$14,$01,$19,$20,$01,$20,$17,$08,$09,$0C,$05,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$66 // comment
BYTE $66,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$66
I use the following preg_match statement to match the data part (so only the hexadecimal values) and not the preceding white space and text, nor the trailing white space and comment sections:
preg_match('/(\$.*?) /s', $sFileContents, $aResult);
The output is this:
output: Array
(
[0] => $66,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$13,$14,$01,$19,$20,$01,$20,$17,$08,$09,$0C,$05,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$66
[1] => $66,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$13,$14,$01,$19,$20,$01,$20,$17,$08,$09,$0C,$05,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$66
)
As you may be able to see, the match appears to be correct but the first input line is repeated twice. The 's' modifier should help me get past the end of line, but I cannot seem to get past the first line.
Does anyone have an idea of how to proceed?

You can match data from all lines easy:
preg_match_all('/\$[\dA-Fa-f,\$]+/', $sFileContents, $aResult);
echo "<pre>".print_r($aResult,true);
Output:
$aResultArray:
(
[0] => Array
(
[0] => $66,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$13,$14,$01,$19,$20,$01,$20,$17,$08,$09,$0C,$05,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$66
[1] => $66,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$66
)
)

You don't need s (DOTALL) flag for this. You can use:
preg_match_all('/(\$[0-9A-Fa-f]{2}(?:,\$[0-9A-Fa-f]{2})+)/', $input, $m);
print_r($m[1]);
RegEx Demo

Splitting GET string

I need to split my GET string into some array. The string looks like this:
ident[0]=<IDENT_0>&value[0]=<VALUE_0>&version[0]=<VERSION_0>&....&ident[N]=<IDENT_N>&value[N]=<VALUE_N>&version[N]=<VERSION_N>
So, I need to split this string by every third ampersand character, like this:
ident[0]=<IDENT_0>&value[0]=<VALUE_0>&version[0]=<VERSION_0>
ident[1]=<IDENT_1>&value[1]=<VALUE_1>&version[1]=<VERSION_1> and so on...
How can I do it? What regular expression should I use? Or is here some better way to do it?

There is a better way (assuming this is data being sent to your PHP page, not some other thing you're dealing with).
PHP provides a "magic" array called $_GET which already has the values parsed out for you.
For example:
one=1&two=2&three=3
Would result in this array:
Array ( [one] => 1 [two] => 2 [three] => 3 )
So you could access the variables like so:
$oneValue = $_GET['one']; // answer is 1
$twoValue = $_GET['two']; // and so on
If you provide array indexes, which your example does, it'll sort those out for you as well. So, to use your example above $_GET would look like:
Array
(
[ident] => Array
(
[0] => <IDENT_0>
[N] => <IDENT_N>
)
[value] => Array
(
[0] => <VALUE_0>
[N] => <VALUE_N>
)
[version] => Array
(
[0] => <VERSION_0>
[N] => <VERSION_N>
)
)
I'd assume your N keys will actually be numbers, so you'll be able to look them up like so:
$_GET['ident'][0] // => <IDENT_0>
$_GET['value'][0] // => <VALUE_0>
$_GET['version'][0] // => <VERSION_0>
You could loop across them all or whatever, and you will never have to worry about splitting them all out yourself.
Hope it helps you.

You can use preg_split with this pattern: &(?=ident)
$result = preg_split('~&(?=ident)~', $yourstring);
regex detail: &(?=ident) means & followed by ident
(?=..) is a lookahead assertion that performs only a check but match nothing.
Or using preg_match_all:
preg_match_all('~(?<=^|&)[^&]+&[^&]+&[^&]+(?=&|$)~', $yourstring, &matches);
$result = $matches[0];
pattern detail: (?<=..) is a lookbehind assertion
(?<=^|&) means preceded by the begining of the string ^ or an ampersand.
[^&]+ means all characters except the ampersand one or more times.
(?=&|$) means followed by an ampersand or the end of the string $.
Or you can use explode, and then a for loop:
$items = explode('&', $yourstring);
for ( $i=0; $i<sizeof($items); $i += 3 ) {
$result[] = implode('&', array_slice($items, $i, 3));
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Undesired output for php array creation - php

Related

Get all matches of repeating subgroup [duplicate]

PHP preg_split adds a blank array key that can't be cleared by array_filter because there's a 'space' in it

Using regex to not match periods between numbers

php preg_match s and m modifiers not working for multiple lines

Splitting GET string

Categories

Resources