exploding a string using a regular expression - php

I have a string as below (the letters in the example could be numbers or texts and could be either uppercase or lowercase or both. If a value is a sentence, it should be between single quotations):
$string="a,b,c,(d,e,f),g,'h, i j.',k";
How can I explode that to get the following result?
Array([0]=>"a",[1]=>"b",[2]=>"c",[3]=>"(d,e,f)",[4]=>"g",[5]=>"'h,i j'",[6]=>"k")
I think using regular expressions will be a fast as well as clean solution. Any idea?
EDIT:
This is what I have done so far, which is very slow for the strings having a long part between parenthesis:
$separator="*"; // whatever which is not used in the string
$Pattern="'[^,]([^']+),([^']+)[^,]'";
while(ereg($Pattern,$String,$Regs)){
$String=ereg_replace($Pattern,"'\\1$separator\\2'",$String);
}
$Pattern="\(([^(^']+),([^)^']+)\)";
while(ereg($Pattern,$String,$Regs)){
$String=ereg_replace($Pattern,"(\\1$separator\\2)",$String);
}
return $String;
This, will replace all the commas between the parenthesis. Then I can explode it by commas and the replace the $separator with the original comma.

You can do the job using preg_match_all
$string="a,b,c,(d,e,f),g,'h, i j.',k";
preg_match_all("~'[^']+'|\([^)]+\)|[^,]+~", $string, $result);
print_r($result[0]);
Explanation:
The trick is to match parenthesis before the ,
~ Pattern delimiter
'
[^'] All charaters but not a single quote
+ one or more times
'
| or
\([^)]+\) the same with parenthesis
| or
[^,]+ Any characters except commas one or more times
~
Note that the quantifiers in [^']+', in [^)]+\) but also in [^,]+ are all automatically optimized to possessive quantifiers at compile time due to "auto-possessification". The first two because the character class doesn't contain the next character, and the last because it is at the end of the pattern. In both cases, an eventual backtracking is unnecessary.
if you have more than one delimiter like quotes (that are the same for open and close), you can write your pattern like this, using a capture group:
$string="a,b,c,(d,e,f),g,'h, i j.',k,°l,m°,#o,p#,#q,r#,s";
preg_match_all('~([\'##°]).*?\1|\([^)]+\)|[^,]+~', $string, $result);
print_r($result[0]);
explanation:
(['##°]) one character in the class is captured in group 1
.*? any character zero or more time in lazy mode
\1 group 1 content
With nested parenthesis:
$string="a,b,(c,(d,(e),f),t),g,'h, i j.',k,°l,m°,#o,p#,#q,r#,s";
preg_match_all('~([\'##°]).*?\1|(\((?:[^()]+|(?-1))*+\))|[^,]+~', $string, $result);
print_r($result[0]);

Related

Why regex with lookaheads doesn't match?

I need (in PHP) to split a sententse by the word that cannot be the first or the last one in the sentence. Say the word is "pression" and here is my regex
/^.+?(?=[\s\.\,\:\;])pression(?=[\s\.\,\:\;]).+$/i
Live here: https://regex101.com/r/CHAhKj/1/
First, it doesn't match.
Next, I think - it is at all possible to split that way? I tryed simplified example
print_r(preg_split('/^.+pizza.+$/', 'my pizza is cool'));
live here http://sandbox.onlinephpfunctions.com/code/10b674900fc1ef44ec79bfaf80e83fe1f4248d02
and it prints an array of 2 empty strings, when I expect
['my ', ' is cool']
I need (in PHP) to split a sentence by the word that cannot be the first or the last one in the sentence
You may use this regex:
(?<=[^\s.?]\h)pression(?=\h[^\s.?])
RegEx Demo
RegEx Details:
(?<=[^\s.?]\h): Lookbehind to assert that ahead of current position we have a space and a character that not a whitespace, not a dot and not a ?.
pression: Match word pression
(?=\h[^\s.?]): Lookahead to assert that before current position we have a space and a character that not a whitespace, not a dot and not a ?
First, ^.+?(?=[\s\.\,\:\;])pression(?=[\s\.\,\:\;]).+$ can't match any string at all because the (?=[\s\.\,\:\;])p part requires p to be also either a whitespace char, or a ., ,, : or ;, which invalidates the whole match at once.
Second, ^.+pizza.+$ pattern does not ensure the pizza matched is not the first or last word in a sentence as . matches whitespace, too. It does not return anything meaningful, because preg_split uses the match to break string into chunks, and the two empty values are 1) start of string and 2) empty string positions.
That said, all you need is:
preg_match('~^(.*?\w\W+)pression(\W+\w.*)$~is', $text, $m)
See the regex demo. Details:
^ - start of string
(.*?\w\W+) - Capturing group 1: any zero or more chars, as few as possible, then a word char and then one or more non-word chars
pression - a word
(\W+\w.*) - Capturing group 2: one or more non-word chars, a word char, and then any zero or more chars as many as possible
$ - end of string.
s makes the . match across lines and i flag makes the pattern match in a case insensitive way.
See the PHP demo:
$text = "You can use any regular expression pression inside the lookahead ";
if (preg_match('~^(.*?\w\W+)pression(\W+\w.*)$~is', $text, $m)) {
echo $m[1] . " << | >> " . $m[2];
}
// => You can use any regular expression << | >> inside the lookahead

How to extract the last 2 delimitered numbers using regex

I have to extract the first instance of a number-number. For example I want to extract 8236497-234783 from the string bnjdfg/dfg.vom/fdgd3-8236497-234783/dfg8jfg.vofg. The string has no apparent structure besides the number followed by a dash and followed by a number which is the thing I want to extract.
The thing I want to extract may be at the very start of the string, or the middle, or the end, or maybe the entire string itself is just a number-number.
$b = "bnjdfg/dfg.vom/fdgd3-8236497-234783/dfg8jfg.vofg";
preg_match('\d-\d', $b, $matches);
echo($matches[0]);
// Expecting to print 8236497-234783
You're missing the delimiter around the regexp. PHP's preg functions require that the regex begin with a punctuation character, and it looks for the matching character at the end of the regexp (because flags can be put after the second delimiter).
\d just matches a single digit. If you want to match a string of digits, you should write \d+.
You should require that the numbers be surrounded by word boundaries with \b, otherwise it will match the 3 at the end of fdgd3
preg_match('/\b\d+-\d+\b/', $b, $matches);

PHP/Laravel trim all but last word in a namespace

Trying to trim a fully qualified namespace so to use just the last word. Example namepspace is App\Models\FruitTypes\Apple where that final word could be any number of fruit types. Shouldn't this...
$fruitName = 'App\Models\FruitTypes\Apple';
trim($fruitName, "App\\Models\\FruitTypes\\");
...do the trick? It is returning an empty string. If I try to trim just App\\Models\\ it returns FruitTypes\Apples as expected. I know the backslash is an escape character, but doubling should treat those as actual backslashes.
If you want to use native functionality for this rather than string manipulation, then ReflectionClass::getShortName will do the job:
$reflection = new ReflectionClass('App\\Models\\FruitTypes\\Apple');
echo $reflection->getShortName();
Apple
See https://3v4l.org/eVl9v
preg_match() with the regex pattern \\([[:alpha:]]*)$ should do the trick.
$trimmed = preg_match('/\\([[:alpha:]]*)$/', $fruitName);
Your result will then live in `$trimmed1'. If you don't mind the pattern being a bit less explicit, you could do:
preg_match('/([[:alpha:]]*)$/', $fruitName, $trimmed);
And your result would then be in $trimmed[0].
If matches is provided, then it is filled with the results of search. $matches[0] will contain the text that matched the full pattern, $matches[1] will have the text that matched the first captured parenthesized subpattern, and so on.
preg_match - php.net
(matches is the third parameter that I named $trimmed, see documentation for full explanation)
An explanation for the regex pattern
\\ matches the character \ literally to establish the start of the match.
The parentheses () create a capturing group to return the match or a substring of the match.
In the capturing group ([[:alpha:]]*):
[:alpha:] matches a alphabetic character [a-zA-Z]
The * quantifier means match between zero and unlimited times, as many times as possible
Then $ asserts position at the end of the string.
So basically, "Find the last \ then return all letter between this and the end of the string".

preg match between two strings

I need help with this preg match. I tried this from other post but did not get the result. So finally posting it.
I am trying to extract z,a,b from first and a from second example.
1) Write a function operations with parameter z,a,b and returns b.
2) write a function factorial with parameter a.
This is what I tried so far:
preg_match_all('/\parameter(.*?)\and?/', $question, $match);
$questionVars = $match[1];
print $questionVars;
Thank you so much!
Your solution can be different depending on actual requirements.
If you need a string after parameter as a whole word that can consist of word and comma chars you may use
preg_match('~\bparameter\s+\K\w+(?:\s*,\s*\w+)*~', $s, $m)
See the regex demo. The \bparameter\s+ matches a word boundary, parameter and 1+ whitespace chars, and all this text is omitted with the help of \K, the match reset operator. \w+(?:\s*,\s*\w+)* matches and returns the 1+ word chars followed with 0+ repetitions of a comma enclosed with optional whitespace chars and again 1+ word chars.
If you plan to get those comma-separated chunks separately, use
preg_match_all('~(?:\G(?!^)\s*,\s*|\bparameter\s+)\K\w+~', $s, $m)
See another regex demo. Here, (?:\G(?!^),\s*|\bparameter\s+) will either match the whole word parameter with 1+ whitespace after (\bparameter\s+, as in the previous solution) or the end of the previous successful match with , enclosed with optional whitespace chars (\G(?!^)\s*,\s*). The \K will omit the text matched so far and \w+ will grab the value. You may replace with [^,]* to grab 0+ chars other than a comma.

Explode and/or regex text to HTML link in PHP

I have a database of texts that contains this kind of syntax in the middle of English sentences that I need to turn into HTML links using PHP
"text1(text1)":http://www.example.com/mypage
Notes:
text1 is always identical to the text in parenthesis
The whole string always have the quotation marks, parenthesis, colon, so the syntax is the same for each.
Sometimes there is a space at the end of the string, but other times there is a question mark or comma or other punctuation mark.
I need to turn these into basic links, like
text1
How do I do this? Do I need explode or regex or both?
"(.*?)\(\1\)":(.*\/[a-zA-Z0-9]+)(?=\?|\,|\.|$)
You can use this.
See Demo.
http://regex101.com/r/zF6xM2/2
You can use this replacement:
$pattern = '~"([^("]+)\(\1\)":(http://\S+)(?=[\s\pP]|\z)~';
$replacement = '\1';
$result = preg_replace($pattern, $replacement, $text);
pattern details:
([^("]+) this part will capture text1 in the group 1. The advantage of using a negated character class (that excludes the double quote and the opening parenthesis) is multiple:
it allows to use a greedy quantifier, that is faster
since the class excludes the opening parenthesis and is immediatly followed by a parenthesis in the pattern, if in an other part of the text there is content between double quotes but without parenthesis inside, the regex engine will not go backward to test other possibilities, it will skip this substring without backtracking. (This is because the PCRE regex engine converts automatically [^a]+a into [^a]++a before processing the string)
\S+ means all that is not a whitespace one or more times
(?=[\s\pP]|\z) is a lookahead assertion that checks that the url is followed by a whitespace, a punctuation character (\pP) or the end of the string.
You can use this regex:
"(.*?)\(.*?:(.*)
Working demo
An appropriate Regular Expression could be:
$str = '"text1(text1)":http://www.example.com/mypage';
preg_match('#^"([^\(]+)' .
'\(([^\)]+)\)[^"]*":(.+)#', $str, $m);
print ''.$m[2].'' . PHP_EOL;

Categories