Parse string with regex and get desired output - php

I want to parse this string
[[delay-4]]Welcome! [[delay-2]]Do you have some questions for us?[[delay-1]] Please fill input field!
I need to get something like this:
[
[0] => '[[delay-4]]Welcome!',
[1] => '[[delay-2]]Do you have some questions for us?',
[2] => '[[delay-1]] Please fill input field!
];
String can also be something like this (without [[delay-4]] on beginning):
Welcome! [[delay-2]]Do you have some questions for us?[[delay-1]] Please fill input field!
Expected output should be something like this:
[
[0] => 'Welcome!',
[1] => '[[delay-2]]Do you have some questions for us?',
[2] => '[[delay-1]] Please fill input field!
];
I tried with this regex (https://regex101.com/r/Eqztl1/1/)
(?:\[\[delay-\d+]])?([\w \\,?!.##$%^&*()|`\]~\-='\"{}]+)
But I have problem with that regex if someone writes just one [ in text, regex fails and if I include [ to match I got wrong results.
Can anyone help me with this?

Two simpler actions might be the route to get the result:
$result = preg_replace('/\s*(\[\[delay-\d+]])/i', "\n$1", $subject);
$result = preg_split('/\r?\n/i', $result, -1, PREG_SPLIT_NO_EMPTY);
Can be seen running here:
https://ideone.com/Z5tZI3
and here:
https://ideone.com/vnSNYI
This assumes that newline characters don't have special meaning and are OK to split on.
UPDATE: As noted in the comments below it's possible with a single split.
$result = preg_split('/(?=\[\[delay-\d+]])/i', $subject, -1, PREG_SPLIT_NO_EMPTY);
But there are possible issues with zero-length matches and regular expressions, you would have to do your own research on that.

In your pattern (?:[[delay-\d+]])?([\w \,?!.##$%^&*()|`]~-='\"{}]+)
there is no opening [ in the character class. The problem is that if you add it, you get as you say wrong results.
That is because after matching after matching delay, the character class in the next part which now contains the [ can match the rest of the characters including those of the delay part.
What you could do is to add [ and make the match non greedy in combination with a positive lookahead to assert either the next match for the delay part or the end of the string to also match the last instance.
If you are not using the capturing group and only want the result you can omit it.
(?:\[\[delay-\d+]])?[\w \\,?!.##$%^&*()|`[\]~\-='\"{}]+?(?=\[\[delay-\d+]]|$)
Regex demo | Php demo

You can do that without regex too.
Explode on [[ and loop the array. If the start of the item is "delay" then add [[
$str = '[[delay-4]]Welcome! [[delay-2]]Do you have some questions for us?[[delay-1]] Please fill input field!';
$arr = array_filter(explode("[[", $str));
foreach($arr as &$val){
if(substr($val,0,5) == "delay") $val = "[[" . $val;
}
var_dump($arr);
https://3v4l.org/sIui1

Related

preg_match first string after string in parentheses

I've looked around for solutions close to this but have not been successful in finding a solution. I'm looking to clean up some legacy code via php_codesniffer but the fixer doesn't fix comments or arrays that go past 80 cols just lets you know about them. I have a solution that works for the comments but I am getting stuck on the regex for the arrays.
A sample line I would like to fix is:
$line = "drupal_add_js(array('my_common' => array('my_code_validate' => variable_get('my_code_validate', FALSE), 'inner_index2 => 'inner_value2'), 'another_item' => 'another_value'), 'setting');";
$solution = preg_match('/array.*(\(.*?\))/', $line);
echo $solution;
I'd like
$solution = "'my_common' => array('my_code_validate' => variable_get('my_code_validate', FALSE), 'inner_index2 => 'inner_value2'), 'another_item' => 'another_value'";
but I am getting 1 instead. Notice that there is another array in there which is fairly common. I only want to capture the first array's values, and then I can split them up on separate lines from there. Ultimately I'd like to share my solutions to the php codesniffer project so bonus points for showing how to code a new fixer for squizlabs.
You may use
if (preg_match('~array(\(((?:[^()]++|(?1))*)\))~', $s, $matches)) {
echo $matches[2];
}
See this demo.
Details
array - a literal substring
(\(((?:[^()]++|(?1))*)\)) - Group 1:
\(
((?:[^()]++|(?1))*) - Group 2 (the required value):
(?:[^()]++|(?1))* - zero or more repetitions of 1+ chars other than ( and ) or the whole Group 1 pattern recursed
\) - a ) char
Try this solution:
.*?array\(('.*?)\), [^\)]+'\);.*
Replace with:
$1
Demo: https://regex101.com/r/oV4nvT/4/

PHP Regular Expression to Match Function Name and Parameters with string like Needle(needle|needle)

I am filtering database results with a query string that looks like this:
attribute=operator(value|optional value)
I'll use
$_GET['attribute'];
to get the value.
I believe the right approach is using regex to get matches on the rest.
The preferred output would be
print_r($matches);
array(
1 => operator
2 => value
3 => optional value
)
The operator will always be one word and consist of letters: like(), between(), in().
The values can be many different things including letters, numbers, spaces commas, quotation marks, etc...
I was asked where my code was failing and didn't include much code because of how poorly it worked. Based on the accepted answer, I was able to whip up a regex that almost works.
EDIT 1
$pattern = "^([^\|(]+)\(([^\|()]+)(\|*)([^\|()]*)";
Edit 2
$pattern = "^([^\|(]+)\(([^\|()]+)(\|*)([^\|()]*)"; // I thought this would work.
Edit 3
$pattern = "^([^\|(]+)\(([^\|()]+)(\|+)?([^\|()]+)?"; // this does work!
Edit 4
$pattern = "^([^\|(]+)\(([^\|()]+)(?:\|)?([^\|()]+)?"; // this gets rid of the middle matching group.
The only remaining problem is when the 2nd optional parameter does not exist, there is still an empty $matches array.
This script, with the input "operator(value|optional value)", returns the array you expect:
<?php
$attribute = $_GET['attribute'];
$result = preg_match("/^([\w ]+)\(([\w ]+)\|([\w ]*)\)$/", $attribute, $matches);
print($matches[1] . "\n");
print($matches[2] . "\n");
print($matches[3] . "\n");
?>
This assumes your "values" match [\w ] regexp (all word characters plus space), and that the | you specify is a literal |...

quick pattern explain

I want this string:
value="1,'goahead'" your='56' so='"<br />"'
I want php regex to return result array as following :
value="1,'goahead'"
your='56'
so='"<br />"'
I tried this regex :
preg_match_all("#([\d\w_]+)\s*=\s*(\"|')([^'\"]*)(\"|')*#isx")
but it failed to fetch this value: value="1,'goahead'"
I think that it's because of single quotation inside the value. Please help me with improved pattern.
I'd suggest looking at DOMDocument:
If your input is a complete tag...
<p value="1,'goahead'" your='56' so='"<br />"'>
...then you can do this:
$DOM = new DOMDocument;
$DOM->loadHTML($str);
foreach ($DOM->getElementsByTagName('p')->item(0)->attributes as $attr) {
$attributes[$attr->nodeName] = $attr->nodeValue;
}
This gives you the array you're looking for:
Array
(
[value] => 1,'goahead'
[your] => 56
[so] => "<br />"
)
Working example: http://3v4l.org/TIIZ2
You would be better off with this regex:
/(\w+)\s*=\s*(["'])(.*?)\2/
This will give the attribute name in the first subpattern, the type of quote used in the second, and the attribute value in the third.
Of particular importance are the .*?, which matches lazily (ie. the least possible) and the \2 which matches the second subpattern (in this case, the quote used). This does not allow for escaping with \" or \', though. That's be a bit more involved.
I'm afraid to ask how you'd end up to do this and why, anyway, this might help you:
if (preg_match('%(value="\d+,(\s+)?\'[a-z]+\'"(\s+)?)?(your=\'\d+\'(\s+)?)?(so=\'"<br(\s+)?\/>"\')?%six', $subject, $matches)) { }

Removing parentheses from a string

I'd like to remove all parentheses from a set of strings running through a loop. The best way that I've seen this done is with the use of preg_replace(). However, I am having a hard time understanding the pattern parameter.
The following is the loop
$coords= explode (')(', $this->input->post('hide'));
foreach ($coords as $row)
{
$row = trim(preg_replace('/\*\([^)]*\)/', '', $row));
$row = explode(',',$row);
$lat = $row[0];
$lng = $row[1];
}
And this is the value of 'hide'.
(1.4956873362063747, 103.875732421875)(1.4862491569669245, 103.85856628417969)(1.4773257504016037, 103.87968063354492)
That pattern is wrong as far as i know. i got it from another thread, i tried to read about patterns but couldn't get it. I am rather short on time so I posted this here while also searching for other ways in other parts of the net. Can someone please supply me with the correct pattern for what I am trying to do? Or is there an easier way of doing this?
EDIT: Ah, just got how preg_replace() works. Apparently I misunderstood how it worked, thanks for the info.
I see you actually want to extract all the coordinates
If so, better use preg_match_all:
$ php -r '
preg_match_all("~\(([\d\.]+), ?([\d\.]+)\)~", "(654,654)(654.321, 654.12)", $matches, PREG_SET_ORDER);
print_r($matches);
'
Array
(
[0] => Array
(
[0] => (654,654)
[1] => 654
[2] => 654
)
[1] => Array
(
[0] => (654.321, 654.12)
[1] => 654.321
[2] => 654.12
)
)
I don't understand entirely why you would need preg_replace. explode() removes the delimiters, so all you have to do is remove the opening and closing parantheses on the first and last string respectively. You can use substr() for that.
Get first and last elements of array:
$first = reset($array);
$last = end($array);
Hope that helps.
"And this is the value of $coords."
If $coords is a string, your foreach makes no sense. If that string is your input, then:
$coords= explode (')(', $this->input->post('hide'));
This line removes the inner parentheses from your string, so your $coords array will be:
(1.4956873362063747, 103.875732421875
1.4862491569669245, 103.85856628417969
1.4773257504016037, 103.87968063354492)
The pattern parameter accepts a regular expression. The function returns a new string where all parts of the original that match the regex are replaced by the second argument, i.e. replacement
How about just using preg_replace on the original string?
preg_replace('#[()]#',"",$this->input->post('hide'))
To dissect your current regex, you are matching:
an asterisk character,
followed by an opening parenthesis,
followed by zero or more instances of
any character but a closing parenthesis
followed by a closing parenthesis
Of course, this will never match, since exploding the string removed the closing and opening parentheses from the chunks.

How to write regex to return only certain parts of this string?

So I'm working on a project that will allow users to enter poker hand histories from sites like PokerStars and then display the hand to them.
It seems that regex would be a great tool for this, however I rank my regex knowledge at "slim to none".
So I'm using PHP and looping through this block of text line by line and on lines like this:
Seat 1: fabulous29 (835 in chips)
Seat 2: Nioreh_21 (6465 in chips)
Seat 3: Big Loads (3465 in chips)
Seat 4: Sauchie (2060 in chips)
I want to extract seat number, name, & chip count so the format is
Seat [number]: [letters&numbers&characters] ([number] in chips)
I have NO IDEA where to start or what commands I should even be using to optimize this.
Any advice is greatly appreciated - even if it is just a link to a tutorial on PHP regex or the name of the command(s) I should be using.
I'm not entirely sure what exactly to use for that without trying it, but a great tool I use all the time to validate my RegEx is RegExr which gives a great flash interface for trying out your regex, including real time matching and a library of predefined snippets to use. Definitely a great time saver :)
Something like this might do the trick:
/Seat (\d+): ([^\(]+) \((\d+)in chips\)/
And some basic explanation on how Regex works:
\d = digit.
\<character> = escapes character, if not part of any character class or subexpression. for example:
\t
would render a tab, while \\t would render "\t" (since the backslash is escaped).
+ = one or more of the preceding element.
* = zero or more of the preceding element.
[ ] = bracket expression. Matches any of the characters within the bracket. Also works with ranges (ex. A-Z).
[^ ] = Matches any character that is NOT within the bracket.
( ) = Marked subexpression. The data matched within this can be recalled later.
Anyway, I chose to use
([^\(]+)
since the example provides a name containing spaces (Seat 3 in the example). what this does is that it matches any character up to the point that it encounters an opening paranthesis.
This will leave you with a blank space at the end of the subexpression (using the data provided in the example). However, his can easily be stripped away using the trim() command in PHP.
If you do not want to match spaces, only alphanumerical characters, you could so something like this:
([A-Za-z0-9-_]+)
Which would match any letter (within A-Z, both upper- & lower-case), number as well as hyphens and underscores.
Or the same variant, with spaces:
([A-Za-z0-9-_\s]+)
Where "\s" is evaluated into a space.
Hope this helps :)
Look at the PCRE section in the PHP Manual. Also, http://www.regular-expressions.info/ is a great site for learning regex. Disclaimer: Regex is very addictive once you learn it.
I always use the preg_ set of function for REGEX in PHP because the PERL-compatible expressions have much more capability. That extra capability doesn't necessarily come into play here, but they are also supposed to be faster, so why not use them anyway, right?
For an expression, try this:
/Seat (\d+): ([^ ]+) \((\d+)/
You can use preg_match() on each line, storing the results in an array. You can then get at those results and manipulate them as you like.
EDIT:
Btw, you could also run preg_match_all on the entire block of text (instead of looping through line-by-line) and get the results that way, too.
Check out preg_match.
Probably looking for something like...
<?php
$str = 'Seat 1: fabulous29 (835 in chips)';
preg_match('/Seat (?<seatNo>\d+): (?<name>\w+) \((?<chipCnt>\d+) in chips\)/', $str, $matches);
print_r($matches);
?>
*It's been a while since I did php, so this could be a little or a lot off.*
May be it is very late answer, But I am interested in answering
Seat\s(\d):\s([\w\s]+)\s\((\d+).*\)
http://regex101.com/r/cU7yD7/1
Here's what I'm currently using:
preg_match("/(Seat \d+: [A-Za-z0-9 _-]+) \((\d+) in chips\)/",$line)
To process the whole input string at once, use preg_match_all()
preg_match_all('/Seat (\d+): \w+ \((\d+) in chips\)/', $preg_match_all, $matches);
For your input string, var_dump of $matches will look like this:
array
0 =>
array
0 => string 'Seat 1: fabulous29 (835 in chips)' (length=33)
1 => string 'Seat 2: Nioreh_21 (6465 in chips)' (length=33)
2 => string 'Seat 4: Sauchie (2060 in chips)' (length=31)
1 =>
array
0 => string '1' (length=1)
1 => string '2' (length=1)
2 => string '4' (length=1)
2 =>
array
0 => string '835' (length=3)
1 => string '6465' (length=4)
2 => string '2060' (length=4)
On learning regex: Get Mastering Regular Expressions, 3rd Edition. Nothing else comes close to the this book if you really want to learn regex. Despite being the definitive guide to regex, the book is very beginner friendly.
Try this code. It works for me
Let say that you have below lines of strings
$string1 = "Seat 1: fabulous29 (835 in chips)";
$string2 = "Seat 2: Nioreh_21 (6465 in chips)";
$string3 = "Seat 3: Big Loads (3465 in chips)";
$string4 = "Seat 4: Sauchie (2060 in chips)";
Add to array
$lines = array($string1,$string2,$string3,$string4);
foreach($lines as $line )
{
$seatArray = explode(":", $line);
$seat = explode(" ",$seatArray[0]);
$seatNumber = $seat[1];
$usernameArray = explode("(",$seatArray[1]);
$username = trim($usernameArray[0]);
$chipArray = explode(" ",$usernameArray[1]);
$chipNumber = $chipArray[0];
echo "<br>"."Seat [".$seatNumber."]: [". $username."] ([".$chipNumber."] in chips)";
}
you'll have to split the file by linebreaks,
then loop thru each line and apply the following logic
$seat = 0;
$name = 1;
$chips = 2;
foreach( $string in $file ) {
if (preg_match("Seat ([1-0]): ([A-Za-z_0-9]*) \(([1-0]*) in chips\)", $string, $matches)) {
echo "Seat: " . $matches[$seat] . "<br>";
echo "Name: " . $matches[$name] . "<br>";
echo "Chips: " . $matches[$chips] . "<br>";
}
}
I haven't ran this code, so you may have to fix some errors...
Seat [number]: [letters&numbers&characters] ([number] in chips)
Your Regex should look something like this
Seat (\d+): ([a-zA-Z0-9]+) \((\d+) in chips\)
The brackets will let you capture the seat number, name and number of chips in groups.

Categories