php preg split data [closed] - php

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I want to split text by sections and the names of the sections that are inside of === === So data is like
===A===
a
===B===
b
===C===
c
preg split is like:
$sections = preg_split('/===([^=]+)===(?!=)/', $text, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
but if the data are like
===A===
a
====0====
0
===B===
b
===C===
c
It gets wrong (i need to split it only by sections with 3x === and ignore the other parts) that's why there is the negative look behind.
Edit: it turned out that problem was that split took last === from the ====0==== and made new fake section name until first === of ===B=== so it made from
====0====
0
===B===
new fake section like this (cutting away what is in parenthesis)
(====0=)===
0
===(B===)

Here is one approach using prep_match_all, with the following regex pattern:
(?!<=)={3,}[^=]+={3}(?!=).*?(?=[^=]={3}[^=]+={3}[^=]|$)
This pattern says to match a section header, being defined by three = with some other character(s) in the middle, followed by all content until reaching either another section header or the end of the entire input.
$input = "===A===
a
====0====
0
===B===
b
===C===
c";
preg_match_all("/(?!<=)={3,}[^=]+={3}(?!=).*?(?=[^=]={3}[^=]+={3}[^=]|$)/s", $input, $sections);
print_r($sections[0]);
This prints:
Array
(
[0] => ===A===
a
====0====
0
[1] => ===B===
b
[2] => ===C===
c
)
Note that we use the /s modified in the PHP regex pattern for dot all mode. This ensures that the .* used in the pattern matches across newlines.

A simple approach (although my regex is rusty) would be
preg_match_all("/(?:\s|^)===(\w*)===\s/", $input, $sections);
So just (?:\s|^)===(\w*)===\s which is whitespace or start of document, ==='s text and then ==='s and finally whitespace.
Gives...
Array
(
[0] => ===A===
[1] => ===B===
[2] => ===C===
)
Using...
$sections = preg_split("/(?:\s|^)===(\w*)===\s/", $input, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
gives...
Array
(
[0] => A
[1] => a
====0====
0
[2] => B
[3] => b
[4] => C
[5] => c
)

Related

REGEX Pattern for Validation that check all string is integer and split into single integers

I tried multiple time to make a pattern that can validate given string is natural number and split into single number.
..and lack of understanding of regex, the closest thing that I can imagine is..
^([1-9])([0-9])*$ or ^([1-9])([0-9])([0-9])*$ something like that...
It only generates first, last, and second or last-second split-numbers.
I wonder what I need to know to solve this problem.. thanks
You may use a two step solution like
if (preg_match('~\A\d+\z~', $s)) { // if a string is all digits
print_r(str_split($s)); // Split it into chars
}
See a PHP demo.
A one step regex solution:
(?:\G(?!\A)|\A(?=\d+\z))\d
See the regex demo
Details
(?:\G(?!\A)|\A(?=\d+\z)) - either the end of the previous match (\G(?!\A)) or (|) the start of string (^) that is followed with 1 or more digits up to the end of the string ((?=\d+\z))
\d - a digit.
PHP demo:
$re = '/(?:\G(?!\A)|\A(?=\d+\z))\d/';
$str = '1234567890';
if (preg_match_all($re, $str, $matches)) {
print_r($matches[0]);
}
Output:
Array
(
[0] => 1
[1] => 2
[2] => 3
[3] => 4
[4] => 5
[5] => 6
[6] => 7
[7] => 8
[8] => 9
[9] => 0
)

php string replace array and create new nested array [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have an array, I can print it with
print_r($show_arr);
it gives me this output(html source)
Array
(
[0] => Marvel's Daredevil.S01E01 - Into the Ring.mp4
[1] => Marvel's Daredevil.S01E02 - Cut Man.mp4
[2] => Marvel's Daredevil.S02E05 - Kinbaku.mp4
[3] => Marvel's Daredevil.S02E06 - Regrets Only.mp4
)
how would I go about getting the array to look like this?
Array
(
Season[1] => Array
(
Array(
episode => "01 - Into the Ring",
file => "Marvel's Daredevil.S01E01 - Into the Ring.mp4",
)
Array(
episode => "02 - Cut Man",
file => "Marvel's Daredevil.S01E02 - Cut Man.mp4",
)
)
Season[2] => Array
(
Array(
episode => "05 - Kinbaku",
file => "Marvel's Daredevil.S02E05 - Kinbaku.mp4",
)
Array(
episode => "06 - Regrets Only",
file => "Marvel's Daredevil.S02E06 - Regrets Only.mp4",
)
)
I was bored. Just loop your array and use preg_match() to build the array using the matched groups:
foreach($show_arr as $val) {
preg_match('/[^.]+\.S([\d]+)E([0-9]+[^.]+).*/', $val, $m);
$result['Season'][(int)$m[1]][(int)$m[2]] = array('episode' => $m[2],
'file' => $m[0]);
}
[^.]+ is 1 or more NOT dot . characters
\.S([\d]+) is a dot . then S followed by 1 or more digits (capture as group 1)
E([0-9]+[^.]+) is E followed by 1 or more digits followed by 1 or more NOT dot . characters (capture as group 2)
Additionally, this indexes the subarray by the episode. If you don't want that, remove the [(int)$m[2]] and just use [].

Catching ids and its values from a string with preg_match

I was wondering how can I create preg_match for catching:
id=4
4 being any number and how can I search for the above example in a string?
If this is could be correct /^id=[0-9]/, the reason why I'm asking is because I'm not really good with preg_match.
for 4 being any number, we must set the range for it:
/^id\=[0-9]+/
\escape the equal-sign, plus after the number means 1 or even more.
You should go with the the following:
/id=(\d+)/g
Explanations:
id= - Literal id=
(\d+) - Capturing group 0-9 a character range between 0 and 9; + - repeating infinite times
/g - modifier: global. All matches (don't return on first match)
Example online
If you want to grab all ids and its values in PHP you could go with:
$string = "There are three ids: id=10 and id=12 and id=100";
preg_match_all("/id=(\d+)/", $string, $matches);
print_r($matches);
Output:
Array
(
[0] => Array
(
[0] => id=10
[1] => id=12
[2] => id=100
)
[1] => Array
(
[0] => 10
[1] => 12
[2] => 100
)
)
Example online
Note: If you want to match all you must use /g modifier. PHP doesn't support it but has other function for that which is preg_match_all. All you need to do is remove the g from the regex.

Separating a few things with preg_split

For the life of me, I can't figure out how to write the regex to split this.
Lets say we have the sample text:
15HGH(Whatever)ASD
I would like to break it down into the following groups (numbers, letters by themselves, and parenthesis contents)
15
H
G
H
Whatever
A
S
D
It can have any combination of the above such as:
15HGH
12ABCD
ABCD(Whatever)(test)
So far, I have gotten it to break apart either the numbers/letters or just the parenthesis part broken away. For example, in this case:
<?php print_r(preg_split( "/(\(|\))/", "5(Test)(testing)")); ?>
It will give me
Array
(
[0] => 5
[1] => Test
[2] => testing
)
I am not really sure what to put in the regex to match on only numbers and individual characters when combined. Any suggestions?
I don't know if preg_match_all satisfying you:
$text = '15HGH(Whatever)ASD';
preg_match_all("/([a-z]+)(?=\))|[0-9]+|([a-z])/i", $text, $out);
echo '<pre>';
print_r($out[0]);
Array
(
[0] => 15
[1] => H
[2] => G
[3] => H
[4] => Whatever
[5] => A
[6] => S
[7] => D
)
I've got this: Example (I don't know how is written the \n) but the substitution is working.
(\d+|\w|\([^)]++\)) Not too much to explain, first tries to get a number, then a char, and if there's nothing there, tries to get a whole word between parentheses. (They can't be nested)
Check this out using preg_match_all():
$string = '15HGH(Whatever)(Whatever)ASD';
preg_match_all('/\(([^\)]+)\)|(\d+)|([a-z])/i', $string, $matches);
$results = array_merge(array_filter($matches[1]),array_filter($matches[2]),array_filter($matches[3]));
print_r($results);
\(([^\)]+)\) --> Matches everything between parenthesis
\d+ --> Numbers only
[a-z] --> Single letters only
i --> Case insensitive

$ not matching position immediately before a newline that is the last character

$ is not matching a position immediately before a newline that is the last character.
Ideally /1...$/ should match but match happens with the pattern /1....$/ which seems to be wrong.
What could be the reason?
PHP doc also says A dollar character ($) is an assertion which is TRUE only if the current matching point is at the end of the subject string, or immediately before a newline character that is the last character in the string (by default).
$subject = 'abc#
123#
';
$pattern = '/1...$/';
preg_match_all($pattern,$subject,$matches); // no match
Update:
I suspect extra dot due to \r\n format of newline.
I did following experiment and see some hint.
$pattern = '/1...(.)$/';
echo bin2hex($matches[1]); // 28
28 seems to be equal to \r (CR) so basically $ is matching before \n not before \r\n, that may be the reason of my problem.
Image after non printable character turn on
Issue was due to different newline representation of window file and linux file
Why this issue:
I created php file in window and transferred to linux where PHP was installed.
Windows uses \r\n to represent newline and linux \n ==> that's why initially it was taking extra dot to match.
Below experiment confirmed the same:
$subject = 'abc#
123#
';
$pattern = '/1...(.)$/';
preg_match_all($pattern,$subject,$matches);
echo bin2hex($matches[1]); // 28
// 28 is equivalent of \r or CR(carriage return)
Created new file in linux system and /1...$/ catches the match :)
I hope this will save someone's time if stuck with same problem.
Your string is multi-line. By default regex won't do multi-line. You have to add the m modifier for this to happen.
For example:
/1...$/m
I have been stuck on this issue for two days. I did a lot of testing to find any logic that lies behind this because it all depends on where your data comes from (internal and controlled vs. external and uncontrolled). In my case it was input field (<textarea>) on my website available from various browsers (and various OS-es) and there were no such problems in JavaScript with pattern testing/matching/checking. Here is a hint for those of you who are trying to fight off (or work around at least) the problem of matching a pattern correctly at the end ($) of any line in multiple lines mode (/m).
<?php
// Various OS-es have various end line (a.k.a line break) chars:
// - Windows uses CR+LF (\r\n);
// - Linux LF (\n);
// - OSX CR (\r).
// And that's why single dollar meta assertion ($) sometimes fails with multiline modifier (/m) mode - possible bug in PHP 5.3.8 or just a "feature"(?).
$str="ABC ABC\n\n123 123\r\ndef def\rnop nop\r\n890 890\nQRS QRS\r\r~-_ ~-_";
// C 3 p 0 _
$pat1='/\w$/mi'; // This works excellent in JavaScript (Firefox 7.0.1+)
$pat2='/\w\r?$/mi'; // Slightly better
$pat3='/\w\R?$/mi'; // Somehow disappointing according to php.net and pcre.org when used improperly
$pat4='/\w(?=\R)/i'; // Much better with allowed lookahead assertion (just to detect without capture) without multiline (/m) mode; note that with alternative for end of string ((?=\R|$)) it would grab all 7 elements as expected
$pat5='/\w\v?$/mi';
$pat6='/(*ANYCRLF)\w$/mi'; // Excellent but undocumented on php.net at the moment (described on pcre.org and en.wikipedia.org)
$n=preg_match_all($pat1, $str, $m1);
$o=preg_match_all($pat2, $str, $m2);
$p=preg_match_all($pat3, $str, $m3);
$r=preg_match_all($pat4, $str, $m4);
$s=preg_match_all($pat5, $str, $m5);
$t=preg_match_all($pat6, $str, $m6);
echo $str."\n1 !!! $pat1 ($n): ".print_r($m1[0], true)
."\n2 !!! $pat2 ($o): ".print_r($m2[0], true)
."\n3 !!! $pat3 ($p): ".print_r($m3[0], true)
."\n4 !!! $pat4 ($r): ".print_r($m4[0], true)
."\n5 !!! $pat5 ($s): ".print_r($m5[0], true)
."\n6 !!! $pat6 ($t): ".print_r($m6[0], true);
// Note the difference among the three very helpful escape sequences in $pat2 (\r), $pat3 and $pat4 (\R), $pat5 (\v) and altered newline option in $pat6 ((*ANYCRLF)) - for some applications at least.
/* The code above results in the following output:
ABC ABC
123 123
def def
nop nop
890 890
QRS QRS
~-_ ~-_
1 !!! /\w$/mi (3): Array
(
[0] => C
[1] => 0
[2] => _
)
2 !!! /\w\r?$/mi (5): Array
(
[0] => C
[1] => 3
[2] => p
[3] => 0
[4] => _
)
3 !!! /\w\R?$/mi (5): Array
(
[0] => C
[1] => 3
[2] => p
[3] => 0
[4] => _
)
4 !!! /\w(?=\R)/i (6): Array
(
[0] => C
[1] => 3
[2] => f
[3] => p
[4] => 0
[5] => S
)
5 !!! /\w\v?$/mi (5): Array
(
[0] => C
[1] => 3
[2] => p
[3] => 0
[4] => _
)
6 !!! /(*ANYCRLF)\w$/mi (7): Array
(
[0] => C
[1] => 3
[2] => f
[3] => p
[4] => 0
[5] => S
[6] => _
)
*/
?>
Unfortunately, I haven't got any access to a server with the latest PHP version - my local PHP is 5.3.8 and my public host's PHP is version 5.2.17.

Categories