PHP Regex to interpret a string as a command line attributes/options - php

let's say i have a string of
"Insert Post -title Some PostTitle -category 2 -date-posted 2013-02:02 10:10:10"
what i've been trying to do is to convert this string into actions, the string is very readable and what i'm trying to achieve is making posting a little bit easier instead of navigating to new pages every time. Now i'm okay with how the actions are going to work but i've had many failed attempts to process it the way i want, i simple want the values after the attributes (options) to be put into arrays, or simple just extract the values then ill be dealing with them the way i want.
the string above should give me an array of keys=>values, e.g
$Processed = [
'title'=> 'Some PostTitle',
'category'=> '2',
....
];
getting a processed data like this is what i'm looking for.
i've been tryin to write a regex for this but with no hope.
for example this:
/\-(\w*)\=?(.+)?/
that should be close enought to what i want.
note the spaces in title and dates, and that some value can have dashes as well, and maybe i can add a list of allowed attributes
$AllowedOptions = ['-title','-category',...];
i'm just not good at this and would like to have your help!
appreciated !

You can use this lookahead based regex to match your name-value pairs:
/-(\S+)\h+(.*?(?=\h+-|$))/
RegEx Demo
RegEx Breakup:
- # match a literal hyphen
(\S+) # match 1 or more of any non-whitespace char and capture it as group #1
\h+ # match 1 or more of any horizontal whitespace char
( # capture group #2 start
.*? # match 0 or more of any char (non-greedy)
(?=\h+-|$) # lookahead to assert next char is 1+ space and - or it is end of line
) # capture group #2 end
PHP Code:
$str = 'Insert Post -title Some PostTitle -category 2 -date-posted 2013-02:02 10:10:10';
if (preg_match_all('/-(\S+)\h+(.*?(?=\h+-|$))/', $str, $m)) {
$output = array_combine ( $m[1], $m[2] );
print_r($output);
}
Output:
Array
(
[title] => Some PostTitle
[category] => 2
[date-posted] => 2013-02:02 10:10:10
)

Related

Match regex pattern that isn't within a bbcode tag

I am attempting to create a regex patten that will match words in a string that begin with #
Regex that solves this initial problem is '~(#\w+)~'
A second requirement of the code is that it must also ignore any matches that occur within [quote] and [/quote] tags
A couple of attempts that have failed are:
(?:[0-9]+|~(#\w+)~)(?![0-9a-z]*\[\/[a-z]+\])
/[quote[\s\]][\s\S]*?\/quote](*SKIP)(*F)|~(#\w+)~/i
Example: the following string should have an array output as displayed:
$results = [];
$string = "#friends #john [quote]#and #jane[/quote] #doe";
//run regex match
preg_match_all('regex', $string, $results);
//dump results
var_dump($results[1]);
//results: array consisting of:
[1]=>"#friends"
[2]=>"#john"
[3]=>"#doe
You may use the following regex (based on another related question):
'~(\[quote](?:(?1)|.)*?\[/quote])(*SKIP)(*F)|#\w+~s'
See the regex demo. The regex accounts for nested [quote] tags.
Details
(\[quote](?:(?1)|.)*?\[/quote])(*SKIP)(*F) - matches the pattern inside capturing parentheses and then (*SKIP)(*F) make the regex engine omit the matched text:
\[quote] - a literal [quote] string
(?:(?1)|.)*? - any 0+ (but as few as possible) occurrences of the whole Group 1 pattern ((?1)) or any char (.)
\[/quote] - a literal [/quote] string
| - or
#\w+ - a # followed with 1+ word chars.
PHP demo:
$results = [];
$string = "#friends #john [quote]#and #jane[/quote] #doe";
$rx = '~(\[quote\](?:(?1)|.)*?\[/quote])(*SKIP)(*F)|#\w+~s';
preg_match_all($rx, $string, $results);
print_r($results[0]);
// => Array ( [0] => #friends [1] => #john [2] => #doe )

Regex match that exclude certain pattern

I want to split the string around for comma(,) or &. This is simple but I want to stop the match for any content between brackets.
For example if I run on
sleeping , waking(sit,stop)
there need to be only one split and two elements
thanks in advance
This is a perfect example for the (*SKIP)(*FAIL) mechanism PCRE (and thus PHP) offers.
You could come up with the following code:
<?php
$string = 'sleeping , waking(sit,stop)';
$regex = '~\([^)]*\)(*SKIP)(*FAIL)|[,&]~';
# match anything between ( and ) and discard it afterwards
# instead match any of the characters found on the right in square brackets
$parts = preg_split($regex, $string);
print_r($parts);
/*
Array
(
[0] => sleeping
[1] => waking(sit,stop)
)
*/
?>
This will split any , or & which is not in parentheses.

PHP: Parse comma-delimited string outside single and double quotes and parentheses

I've found several partial answers to this question, but none that cover all my needs...
I am trying to parse a user generated string as if it were a series of php function arguments to determine the number of arguments:
This string:
$arg1,$arg2='ABC,DEF',$arg3="GHI\",JKL",$arg4=array(1,'2)',"3\"),")
will be inserted as the arguments of a function:
function my_function( [insert string here] ){ ... }
I need to parse the string on the commas, taking into account single- and double-quotes, parentheses, and escaped quotes and parentheses to create an array:
array(4) {
[0] => $arg1
[1] => $arg2='ABC,DEF'
[2] => $arg3="GHI\",JKL"
[3] => $arg4=array(1,'2)',"3\"),")
}
Any help with a regular expression or parser function to accomplish this is appreciated!
It isn't possible to solve this problem with a classical csv tool since there is more than one character able to protect parts of the string.
Using preg_split is possible but will result in a very complicated and inefficient pattern. So the best way is to use preg_match_all. There are however several problems to solve:
as needed, a comma enclosed in quotes or parenthesis must be ignored (seen as a character without special meaning, not as a delimiter)
you need to extract the params, but you need to check if the string has the good format too, otherwise the match results may be totally false!
For the first point, you can define subpatterns to describe each cases: the quoted parts, the parts enclosed between parenthesis, and a more general subpattern able to match a complete param and that uses the two previous subpatterns when needed.
Note that the parenthesis subpattern needs to refer to the general subpattern too, since it can contain anything (and commas too).
The second point can be solved using the \G anchor that ensures that all matchs are contiguous. But you need to be sure that the end of the string has been reached. To do that, you can add an optional empty capture group at the end of the main pattern that is created only if the anchor for the end of the string \z succeeds.
$subject = <<<'EOD'
$arg1,$arg2='ABC,DEF',$arg3="GHI\",JKL",$arg4=array(1,'2)',"3\"),")
EOD;
$pattern = <<<'EOD'
~
# named groups definitions
(?(DEFINE) # this definition group allows to define the subpatterns you want
# without matching anything
(?<quotes>
' [^'\\]*+ (?s:\\.[^'\\]*)*+ ' | " [^"\\]*+ (?s:\\.[^"\\]*)*+ "
)
(?<brackets> \( \g<content> (?: ,+ \g<content> )*+ \) )
(?<content> [^,'"()]*+ # ' # (<-- comment for SO syntax highlighting)
(?:
(?: \g<brackets> | \g<quotes> )
[^,'"()]* # ' #
)*+
)
)
# the main pattern
(?: # two possible beginings
\G(?!\A) , # a comma contiguous to a previous match
| # OR
\A # the start of the string
)
(?<param> \g<content> )
(?: \z (?<check>) )? # create an item "check" when the end is reached
~x
EOD;
$result = false;
if ( preg_match_all($pattern, $subject, $matches, PREG_SET_ORDER) &&
isset(end($matches)['check']) )
$result = array_map(function ($i) { return $i['param']; }, $matches);
else
echo 'bad format' . PHP_EOL;
var_dump($result);
demo
You could split the argument string at ,$ and then append $ back the array values:
$args_array = explode(',$', $arg_str);
foreach($args_array as $key => $arg_raw) {
$args_array[$key] = '$'.ltrim($arg_raw, '$');
}
print_r($args_array);
Output:
(
[0] => $arg1
[1] => $arg2='ABC,DEF'
[2] => $arg3="GHI\",JKL"
[3] => $arg4=array(1,'2)',"3\"),")
)
If you want to use a regex, you can use something like this:
(.+?)(?:,(?=\$)|$)
Working demo
Php code:
$re = '/(.+?)(?:,(?=\$)|$)/';
$str = "\$arg1,\$arg2='ABC,DEF',\$arg3=\"GHI\",JKL\",\$arg4=array(1,'2)',\"3\"),\")\n";
preg_match_all($re, $str, $matches);
Match information:
MATCH 1
1. [0-5] `$arg1`
MATCH 2
1. [6-21] `$arg2='ABC,DEF'`
MATCH 3
1. [22-39] `$arg3="GHI\",JKL"`
MATCH 4
1. [40-67] `$arg4=array(1,'2)',"3\"),")`

Wordpress get the parameter of the first shortcode in the content

I am writing a script to find the first occurrence of the following shortcode in content and then get the url parameter of the shortcode.
the shortcode looks like this
[soundcloud url="http://api.soundcloud.com/tracks/106046968"]
and what i have currently done is
$pattern = get_shortcode_regex();
$matches = array();
preg_match("/$pattern/s", get_the_content(), $matches);
print_r($matches);
and the result looks like
Array (
[0] => [soundcloud url="http://api.soundcloud.com/tracks/106046968"]
[1] =>
[2] => soundcloud
[3] => url="http://api.soundcloud.com/tracks/106046968"
[4] =>
[5] =>
[6] =>
)
Here is the string from which i need the url of the parameter of the shortcode
$html = 'Our good homies DJ Skeet Skeet aka Yung Skeeter & Wax Motif have teamed up to do a colossal 2-track EP and we\'re getting the exclusive sneak-premiere of the EP\'s diabolical techno b-side called "Hush Hush" before its released tomorrow on Dim Mak Records!
[soundcloud url="http://api.soundcloud.com/tracks/104477594"]
Wax Motif have teamed up to do a colossal 2-track EP and we\'re getting the exclusive sneak-premiere of the EP\'s diabolical techno b-side called "Hush Hush" before its released tomorrow on Dim Mak Records!
';
I guess this is not the best way to do it. If can guide me how we can do this then it would be great. Basically i want to extract the first occurrence of soundcloud url from the content.
So here's what I came up with:
preg_match('~\[soundcloud\s+url\s*=\s*("|\')(?<url>.*?)\1\s*\]~i', $input, $m); // match
print_r($m); // print matches (groups) ...
$url = isset($m['url']) ? $m['url']:''; // if the url doesn't exist then return empty string
echo 'The url is : ' . $url; // Just some output
Let's explain the regex:
~ # set ~ as delimiter
\[soundcloud # match [soundcloud
\s+ # match a whitespace 1 or more times
url # match url
\s* # match a whitespace 0 or more times
= # match =
\s* # match a whitespace 0 or more times
("|\') # match either a double quote or a single quote and put it in group 1
(?<url>.*?) # match everything ungreedy until group 1 is found and put it in a named group "url"
\1 # match what was matched in group 1
\s* # match a whitespace 0 or more times
\] # match ]
~ # delimiter (end expression)
i # set the i modifier, which means match case-insensitive
Online PHP demo
Online regex demo

Regex problem (PHP)

[quote=Username here]quoted text here[/quote]
Reply text here
I need a regular expression that stores the "Username here", "quoted text here" and "Reply text here" in a Array.
This expression needs to support nesting aswell. Eks:
[quote=Username2 here][quote=Username here]quoted text here[/quote]
Reply text here[/quote]
Reply text here
This regex matches nested quote block (in group 1) with an additional last reply (in group 2):
(\[quote=[^]]*](?:(?R)|.)*\[/quote])(.*)
A little demo:
$text = '[quote=Username2 here][quote=Username here]quoted text[/quote]Reply text[/quote]More text';
preg_match('#(\[quote=[^]]*](?:(?R)|.)*\[/quote])(.*)#is', $text, $match);
print_r($match);
produces:
Array
(
[0] => [quote=Username2 here][quote=Username here]quoted text[/quote]Reply text[/quote]More text
[1] => [quote=Username2 here][quote=Username here]quoted text[/quote]Reply text[/quote]
[2] => More text
)
A little explanation:
( # open group 1
\[quote=[^]]*] # match '[quote= ... ]'
(?:(?R)|.)* # recursively match the entire pattern or any character and repeat it zero or more times
\[/quote] # match '[/quote]'
) # open group 1
( # open group 2
.* # match zero or more trailing chars after thae last '[/quote]'
) # close group 2
But, using these recursive regex constructs supported by PHP might make ones head spin... I'd opt for a little parser like John Kugelman suggested.
Assuming you do not want to return the values nested in some way or with quotes matched - which are impossible in a regex - you can just split on the parts you do not need:
preg_split('/(\[quote=|\[quote]|]|\[/quote])/', $yourstring);

Categories