Regex with multiple newlines in sequence - php

I'm trying to use PHP's split() (preg_split() is also an option if your answer works with it) to split up a string on 2 or more \r\n's. My current effort is:
split("(\r\n){2,}",$nb);
The problem with this is it matches every time there is 2 or 3 \r\n's, then goes on and finds the next one. This is ineffective with 4 or more \r\n's.
I need all instances of two or more \r\n's to be treated the same as two \r\n's. For example, I'd need
Hello\r\n\r\nMy\r\n\r\n\r\n\r\n\r\n\r\nName is\r\nShadow
to become
array('Hello','My','Name is\r\nShadow');

preg_split() should do it with
$pattern = "/(\\r\\n){2,}/";

What about the following suggestion:
$nb = implode("\r\n", array_filter(explode("\r\n", $nb)));

It works for me:
$nb = "Hello\r\n\r\nMy\r\n\r\n\r\n\r\n\r\n\r\nName is\r\nShadow";
$parts = split("(\r\n){2,}",$nb);
var_dump($parts);
var_dump($parts === array('Hello','My',"Name is\r\nShadow"));
Prints:
array(3) {
[0]=>
string(5) "Hello"
[1]=>
string(2) "My"
[2]=>
string(15) "Name is
Shadow"
}
bool(true)
Note the double quotes in the second test to get the characters represented by \r\n.

Adding the PREG_SPLIT_NO_EMPTY flag to preg_replace() with Tomalak's pattern of "/(\\r\\n){2,}/" accomplished this for me.

\R is shorthand for matching newline sequences across different operating systems. You can prevent empty elements being created at the start and end of your output array by using the PREG_SPLIT_NO_EMPTY flag or you could call trim() on the string before splitting.
Code: (Demo)
$string = "\r\n\r\nHello\r\n\r\nMy\r\n\r\n\r\n\r\n\r\n\r\nName is\r\nShadow\r\n\r\n\r\n\r\n";
var_export(preg_split('~\R{2,}~', $string, 0, PREG_SPLIT_NO_EMPTY));
echo "\n---\n";
var_export(preg_split('~\R{2,}~', trim($string)));
Output from either technique:
array (
0 => 'Hello',
1 => 'My',
2 => 'Name is
Shadow',
)

Related

How to get numbers between a long space (PHP Regex)

I'd like to extract the numbers specifically with a PHP regex expression, I don't get the regex very much although I'm currently trying with the regex101 website. Thing is, I have this:
66
28006 MadridVer teléfono
(Literally that, it's seen with a lot of more spaces and 28006 MadridVer teléfono is presented in the next line actually). And I'd like to extract the number 28006 or at least split the findings of the expression in a way I have the 28006 separately in one of the groups. What would be my php regex expresion like? Maybe apart from capturing spaces I should capture a new line or something. But I am totally lost in this (yes, I'm an absolute regex novice yet).
I don't see a need for regex.
Remove the new line and explode on space.
Then use array_filter to remove empty values from the array and rearrange the array with array_values.
$str = "66
28006 MadridVer teléfono";
$str = str_replace("\n", " ", $str);
$arr = explode(" ", $str);
$arr = array_values(array_filter($arr));
var_dump($arr);
Returns:
array(4) {
[0]=>
string(2) "66"
[1]=>
string(5) "28006"
[2]=>
string(9) "MadridVer"
[3]=>
string(9) "teléfono"
}

PHP preg_split curly brackets

Who can help me out?
I have a string like this:
$string = '<p>{titleInformation}<p>';
I want to split this string so that I get the following array:
array (
0 => '<p>',
1 => '{titleInformation}',
2 => '<p>',
)
I'm new to regular expressions and I tried multiple patterns with the preg_match_all() function but I cant get the correct one. Also looked at this question PHP preg_split if not inside curly brackets, but I don't have spaces in my string.
Thank you in advance.
Use preg_match() with capture groups. You need to escape the curly braces because they have special meaning in regular expressions.
preg_match('/(.*?)(\\{[^}]*\\})(.*)/', $string, $match);
var_dump($match);
Result:
array(4) {
[0]=>
string(24) "<p>{titleInformation}<p>"
[1]=>
string(3) "<p>"
[2]=>
string(18) "{titleInformation}"
[3]=>
string(3) "<p>"
}
$match[0] contains the match for the entire regexp, elements 1-3 contain the parts that you want.
In my opinion, the best function to call for your task is: preg_split(). It has a flag called PREG_SPLIT_DELIM_CAPTURE which allows you to retain your chosen delimiter in the output array. It is a very simple technique to follow and using negated character classes ([^}]*) is a great way to speed up your code. Further benefits of using preg_split() versus preg_match() include:
improved efficiency due to less capture groups
shorter pattern which is easier to read
no useless "fullstring" match in the output array
Code: (PHP Demo) (Pattern Demo)
$string = '<p>{titleInformation}<p>';
var_export(
preg_split('/({[^}]*})/', $string, 0, PREG_SPLIT_DELIM_CAPTURE)
);
Output:
array (
0 => '<p>',
1 => '{titleInformation}',
2 => '<p>',
)
If this answer doesn't work for all of your use cases, please edit your question to include the sample input strings and ping me -- I will update my answer.
With preg_split it can be done this way
preg_split('/[{}]+/', $myString);

split by special char and remove empty elements in php and javascript array

I have a merged string merged by numbers and each number element has the & character in the beginning and end.
Actual string &1&&3&&5&
If you add 6 to this string the final string will be &1&&3&&5&&6&
The problem is when I want to get numbers in this string of arrays, too many empty element in the array also I don't need them.
When I split explode(',', actualstr) the array is ["1","","3","","5","","6"] but I need this ["1","3","5","6"]
I will do this many times so need most efficient way.
There is a similar scenario in js too if there is special way need to know, if not it's ok with manual check.
Remove the leading and trailing &, then explode by double &&.
$array = explode('&&',trim($str,'&'));
print_r($array);
Array
(
[0] => 1
[1] => 3
[2] => 5
[3] => 6
)
One quickfix to that is using regex, but only if you know and are 100% about the data you are working with
preg_match_all("/[0-9]/", "&1&&3&&5&&6&", $numbers);
var_dump($numbers);
array(1) {
[0]=>
array(4) {
[0]=>
string(1) "1"
[1]=>
string(1) "3"
[2]=>
string(1) "5"
[3]=>
string(1) "6"
}
}
Another way would be to use array filter, if the data between the '&' is not fit for filtering by regex
array_filter(explode("&", "&1&&3&&5&&6&"))
You can use trim() function to remove a spacial character or removing space character from the string.
$str = "&1&&3&&5&&6&";
$str_clear = trim($str, '&');
$array = explode('&&',$str_clear);
print_r($array);
I don't know how you extract the number of this string, but if you do like this you can get an array of the numbers:
preg_match_all('/&([0-9])&/','&1&&3&&5&',$matches);
var_dump($matches);
In JavaScript you can do something like #AlexAndrei has done in PHP:
var str='&1&&3&&5&';
var result=str.substr(0,str.length-1).substr(1).split('&&');
console.log(result);
I think this is what you're trying to do.
$str = "&1&&3&&5&&6&";
$numArray = preg_split('/&/', $str, -1, PREG_SPLIT_NO_EMPTY);
print_r($numArray);

Why is preg_match_all returning two matches?

I am trying to identify if a string has any words between double quotes using preg_match_all, however it's duplicating results and the first result has two sets of double quotes either side, where as the string being searched only has the one set.
Here is my code:
$str = 'Test start. "Test match this". Test end.';
$groups = array();
preg_match_all('/"([^"]+)"/', $str, $groups);
var_dump($groups);
And the var dump produces:
array(2) {
[0]=>
array(1) {
[0]=>
string(17) ""Test match this""
}
[1]=>
array(1) {
[0]=>
string(15) "Test match this"
}
}
As you can see the first array is wrong, why is preg_match_all returning this?
It returns 2 elements because:
Element 0 captures the whole matched string
Elements 1..N capture dedicated matches.
PS: another way of expressing the same could be
(?<=")[^"]+(?=")
which would capture exactly the same but in that case you don't need additional capturing group.
Demo: http://regex101.com/r/lF3kP7/1
Hi if your are using print_r instead of vardump you will see the differences in a better way.
Array
(
[0] => Array
(
[0] => "Test match this"
)
[1] => Array
(
[0] => Test match this
)
)
The first contains whole string and the second is your match.
Remove the parenthesis.
you can write the pattern as '/"[^"]+"/'
This is because you're using group matching. take the parentheses out of your pattern and you'll get one array back. Something like:
preg_match_all('/\"[^"]+\"/', $str, $groups);

How preg_match_all() processes strings?

I'm still learning a lot about PHP and string alteration is something that is of interest to me. I've used preg_match before for things like validating an email address or just searching for inquiries.
I just came from this post What's wrong in my regular expression? and was curious as to why the preg_match_all function produces 2 strings, 1 w/ some of the characters stripped and then the other w/ the desired output.
From what I understand about the function is that it goes over the string character by character using the RegEx to evaluate what to do with it. Could this RegEx have been structured in such a way as to bypass the first array entry and just produce the desired result?
and so you don't have to go to the other thread
$str = 'text^name1^Jony~text^secondname1^Smith~text^email1^example-
free#wpdevelop.com~';
preg_match_all('/\^([^^]*?)\~/', $str, $newStr);
for($i=0;$i<count($newStr[0]);$i++)
{
echo $newStr[0][$i].'<br>';
}
echo '<br><br><br>';
for($i=0;$i<count($newStr[1]);$i++)
{
echo $newStr[1][$i].'<br>';
}
This will output
^Jony~^Smith~^example-free#wpdevelop.com~JonySmithexample-free#wpdevelop.com
I'm curious if the reason for 2 array entries was due to the original sytax of the string or if it is the normal processing response of the function. Sorry if this shouldn't be here, but I'm really curious as to how this works.
thanks,
Brodie
It's standard behavior for preg_match and preg_match_all - the first string in the "matched values" array is the FULL string that was caught by the regex pattern. The subsequent array values are the 'capture groups', whose existence depends on the placement/position of () pairs in the regex pattern.
In your regex's case, /\^([^^]*?)\~/, the full matching string would be
^ Jony ~
| | |
^ ([^^]*?) ~ -> $newstr[0] = ^Jony~
-> $newstr[1] = Jony (due to the `()` capture group).
Could this RegEx have been structured in such a way as to bypass the first array entry and just produce the desired result?
Absolutely. Use assertions. This regex:
preg_match_all('/(?<=\^)[^^]*?(?=~)/', $str, $newStr);
Results in:
Array
(
[0] => Array
(
[0] => Jony
[1] => Smith
[2] => example-free#wpdevelop.com
)
)
As the manual states, this is the expected result (for the default PREG_PATTERN_ORDER flag). The first entry of $newStr contains all full pattern matches, the next result all matches for the first subpattern (in parentheses) and so on.
The first array in the result of preg_match_all returns the strings that match the whole pattern you passed to the preg_match_all() function, in your case /\^([^^]*?)\~/. Subsequent arrays in the result contain the matches for the parentheses in your pattern. Maybe it is easier to understand with an example:
$string = 'abcdefg';
preg_match_all('/ab(cd)e(fg)/', $string, $matches);
The $matches array will be
array(3) {
[0]=>
array(1) {
[0]=>
string(7) "abcdefg"
}
[1]=>
array(1) {
[0]=>
string(2) "cd"
}
[2]=>
array(1) {
[0]=>
string(2) "fg"
}
}
The first array will contain the match of the entire pattern, in this case 'abcdefg'. The second array will contain the match for the first set of parentheses, in this case 'cd'. The third array will contain the match for the second set of parentheses, in this case 'fg'.
[0] contains entire match, while [1] only a portion (the part you want to extract)...
You can do var_dump($newStr) to see the array structure, you'll figure it out.
$str = 'text^name1^Jony~text^secondname1^Smith~text^email1^example-
free#wpdevelop.com~';
preg_match_all('/\^([^^]*?)\~/', $str, $newStr);
$newStr = $newStr[1];
foreach($newStr as $key => $value)
{
echo $value."\n";
}
This will result in... (weird result, haven't modified expression)
Jony
Smith
example-
free#wpdevelop.com
Whenever you have problems to imagine the function of preg_match_all you should use an evaluator like preg_match_all tester # regextester.net
This shows you the result in realtime and you can configure things like the result order, meta instructions, offset capturing and many more.

Categories