What's the best/most efficient way to extract text set between parenthesis? Say I wanted to get the string "text" from the string "ignore everything except this (text)" in the most efficient manner possible.
So far, the best I've come up with is this:
$fullString = "ignore everything except this (text)";
$start = strpos('(', $fullString);
$end = strlen($fullString) - strpos(')', $fullString);
$shortString = substr($fullString, $start, $end);
Is there a better way to do this? I know in general using regex tends to be less efficient, but unless I can reduce the number of function calls, perhaps this would be the best approach? Thoughts?
i'd just do a regex and get it over with. unless you are doing enough iterations that it becomes a huge performance issue, it's just easier to code (and understand when you look back on it)
$text = 'ignore everything except this (text)';
preg_match('#\((.*?)\)#', $text, $match);
print $match[1];
So, actually, the code you posted doesn't work: substr()'s parameters are $string, $start and $length, and strpos()'s parameters are $haystack, $needle. Slightly modified:
$str = "ignore everything except this (text)";
$start = strpos($str, '(');
$end = strpos($str, ')', $start + 1);
$length = $end - $start;
$result = substr($str, $start + 1, $length - 1);
Some subtleties: I used $start + 1 in the offset parameter in order to help PHP out while doing the strpos() search on the second parenthesis; we increment $start one and reduce $length to exclude the parentheses from the match.
Also, there's no error checking in this code: you'll want to make sure $start and $end do not === false before performing the substr.
As for using strpos/substr versus regex; performance-wise, this code will beat a regular expression hands down. It's a little wordier though. I eat and breathe strpos/substr, so I don't mind this too much, but someone else may prefer the compactness of a regex.
Use a regular expression:
if( preg_match( '!\(([^\)]+)\)!', $text, $match ) )
$text = $match[1];
i think this is the fastest way to get the words between the first parenthesis in a string.
$string = 'ignore everything except this (text)';
$string = explode(')', (explode('(', $string)[1]))[0];
echo $string;
The already posted regex solutions - \((.*?)\) and \(([^\)]+)\) - do not return the innermost strings between an open and close brackets. If a string is Text (abc(xyz 123) they both return a (abc(xyz 123) as a whole match, and not (xyz 123).
The pattern that matches substrings (use with preg_match to fetch the first and preg_match_all to fetch all occurrences) in parentheses without other open and close parentheses in between is, if the match should include parentheses:
\([^()]*\)
Or, you want to get values without parentheses:
\(([^()]*)\) // get Group 1 values after a successful call to preg_match_all, see code below
\(\K[^()]*(?=\)) // this and the one below get the values without parentheses as whole matches
(?<=\()[^()]*(?=\)) // less efficient, not recommended
Replace * with + if there must be at least 1 char between ( and ).
Details:
\( - an opening round bracket (must be escaped to denote a literal parenthesis as it is used outside a character class)
[^()]* - zero or more characters other than ( and ) (note these ( and ) do not have to be escaped inside a character class as inside it, ( and ) cannot be used to specify a grouping and are treated as literal parentheses)
\) - a closing round bracket (must be escaped to denote a literal parenthesis as it is used outside a character class).
The \(\K part in an alternative regex matches ( and omits from the match value (with the \K match reset operator). (?<=\() is a positive lookbehind that requires a ( to appear immediately to the left of the current location, but the ( is not added to the match value since lookbehind (lookaround) patterns are not consuming. (?=\() is a positive lookahead that requires a ) char to appear immediately to the right of the current location.
PHP code:
$fullString = 'ignore everything except this (text) and (that (text here))';
if (preg_match_all('~\(([^()]*)\)~', $fullString, $matches)) {
print_r($matches[0]); // Get whole match values
print_r($matches[1]); // Get Group 1 values
}
Output:
Array ( [0] => (text) [1] => (text here) )
Array ( [0] => text [1] => text here )
This is a sample code to extract all the text between '[' and ']' and store it 2 separate arrays(ie text inside parentheses in one array and text outside parentheses in another array)
function extract_text($string)
{
$text_outside=array();
$text_inside=array();
$t="";
for($i=0;$i<strlen($string);$i++)
{
if($string[$i]=='[')
{
$text_outside[]=$t;
$t="";
$t1="";
$i++;
while($string[$i]!=']')
{
$t1.=$string[$i];
$i++;
}
$text_inside[] = $t1;
}
else {
if($string[$i]!=']')
$t.=$string[$i];
else {
continue;
}
}
}
if($t!="")
$text_outside[]=$t;
var_dump($text_outside);
echo "\n\n";
var_dump($text_inside);
}
Output:
extract_text("hello how are you?");
will produce:
array(1) {
[0]=>
string(18) "hello how are you?"
}
array(0) {
}
extract_text("hello [http://www.google.com/test.mp3] how are you?");
will produce
array(2) {
[0]=>
string(6) "hello "
[1]=>
string(13) " how are you?"
}
array(1) {
[0]=>
string(30) "http://www.google.com/test.mp3"
}
This function may be useful.
public static function getStringBetween($str,$from,$to, $withFromAndTo = false)
{
$sub = substr($str, strpos($str,$from)+strlen($from),strlen($str));
if ($withFromAndTo)
return $from . substr($sub,0, strrpos($sub,$to)) . $to;
else
return substr($sub,0, strrpos($sub,$to));
}
$inputString = "ignore everything except this (text)";
$outputString = getStringBetween($inputString, '(', ')'));
echo $outputString;
//output will be test
$outputString = getStringBetween($inputString, '(', ')', true));
echo $outputString;
//output will be (test)
strpos() => which is used to find the position of first occurance in a string.
strrpos() => which is used to find the position of first occurance in a string.
function getStringsBetween($str, $start='[', $end=']', $with_from_to=true){
$arr = [];
$last_pos = 0;
$last_pos = strpos($str, $start, $last_pos);
while ($last_pos !== false) {
$t = strpos($str, $end, $last_pos);
$arr[] = ($with_from_to ? $start : '').substr($str, $last_pos + 1, $t - $last_pos - 1).($with_from_to ? $end : '');
$last_pos = strpos($str, $start, $last_pos+1);
}
return $arr; }
this is a little improvement to the previous answer that will return all patterns in array form:
getStringsBetween('[T]his[] is [test] string [pattern]') will return:
Related
I've spent my last 4 hours figuring out how to ... I got to ask for your help now.
I'm trying to extract from a text multiple substring match my starting_words_array and ending_words_array.
$str = "Do you see that ? Indeed, I can see that, as well as this." ;
$starting_words_array = array('do','I');
$ending_words_array = array('?',',');
expected output : array ([0] => 'Do you see that ?' [1] => 'I can see that,')
I manage to write a first function that can find the first substring matching one of both arrays items. But i'm not able to find how to loop it in order to get all the substring matching my requirement.
function SearchString($str, $starting_words_array, $ending_words_array ) {
forEach($starting_words_array as $test) {
$pos = strpos($str, $test);
if ($pos===false) continue;
$found = [];
forEach($ending_words_array as $test2) {
$posStart = $pos+strlen($test);
$pos2 = strpos($str, $test2, $posStart);
$found[] = ($pos2!==false) ? $pos2 : INF;
}
$min = min($found);
if ($min !== INF)
return substr($str,$pos,$min-$pos) .$str[$min];
}
return '';
}
Do you guys have any idea about how to achieve such thing ?
I use preg_match for my solution. However, the start and end strings must be escaped with preg_quote. Without that, the solution will be wrong.
function searchString($str, $starting_words_array, $ending_words_array ) {
$resArr = [];
forEach($starting_words_array as $i => $start) {
$end = $ending_words_array[$i] ?? "";
$regEx = '~'.preg_quote($start,"~").".*".preg_quote($end,"~").'~iu';
if(preg_match_all($regEx,$str,$match)){
$resArr[] = $match[0];
}
}
return $resArr;
}
The result is what the questioner expects.
If the expressions can occur more than once, preg_match_all must also be used. The regex must be modify.
function searchString($str, $starting_words_array, $ending_words_array ) {
$resArr = [];
forEach($starting_words_array as $i => $start) {
$end = $ending_words_array[$i] ?? "";
$regEx = '~'.preg_quote($start,"~").".*?".preg_quote($end,"~").'~iu';
if(preg_match_all($regEx,$str,$match)){
$resArr = array_merge($resArr,$match[0]);
}
}
return $resArr;
}
The resut for the second variant:
array (
0 => "Do you see that ?",
1 => "Indeed,",
2 => "I can see that,",
)
I would definitely use regex and preg_match_all(). I won't give you a full working code example here but I will outline the necessary steps.
First, build a regex from your start-end-pairs like that:
$parts = array_map(
function($start, $end) {
return $start . '.+' . $end;
},
$starting_words_array,
$ending_words_array
);
$regex = '/' . join('|', $parts) . '/i';
The /i part means case insensitive search. Some characters like the ? have a special purpose in regex, so you need to extend above function in order to escape it properly.
You can test your final regex here
Then use preg_match_all() to extract your substrings:
preg_match_all($regex, $str, $matches); // $matches is passed by reference, no need to declare it first
print_r($matches);
The exact structure of your $matches array will be slightly different from what you asked for but you will be able to extract your desired data from it
Benni answer is best way to go - but let just point out the problem in your code if you want to fix those:
strpos is not case sensitive and find also part of words so you need to changes your $starting_words_array = array('do','I'); to $starting_words_array = array('Do','I ');
When finding a substring you use return which exit the function so you want find any other substring. In order to fix that you can define $res = []; at the beginning of the function and replace return substr($str,$pos,... with $res[] = substr($str,$pos,... and at the end return the $res var.
You can see example in 3v4l - in that example you get the output you wanted
Can you think of any regular expression that resolves these similarities in PHP? The idea is to get a match without considering the last letters.
<?php
$word1 = 'happyness';
$word2 = 'happys';
if (substr($word1, 0, -4) == substr($word2, 0, -1))
{
echo 'same word1';
}
$word1 = 'kisses';
$word2 = 'kiss';
if (substr($word1, 0, -2) == $word2)
{
echo 'same word2';
}
$word1 = 'consonant';
$word2 = 'consonan';
if (substr($word1, 0, -1) == $word2)
{
echo 'same word3';
}
By putting the words together like happys happyness and capturing as many word characters from word 1 as word 2 matches. See this demo at regex101. Use it with the i flag for casless matching.
^(\w+)\w* \1
To use this in PHP with preg_match see this PHP demo at tio.run
preg_match('/^(\w+)\w* \1/i', preg_quote($word1,'/')." ".preg_quote($word2,'/'), $out);
where $out[1] holds the captures or $out would be an empty array if there wasn't a match.
You could use a small helper function, the first function just matches up to the length of the second string, so doesn't care how many characters it truncates. The main code works similar to your code except it uses the length of the second value as the length of the substring to take...
function match( string $a, string $b ) {
return substr($a, 0, strlen($b)) === $b;
}
This function is slightly more complicated as it takes into account a maximum gap length...
function match( string $a, string $b, int $length = 3 ) {
$len = max(strlen($a)-$length, strlen($b));
return substr($a, 0, $len) === $b;
}
So call it something along the lines of
$word1 = 'happyness';
$word2 = 'happys';
if (match($word1,$word2))
{
echo 'same word1';
}
You can use preg_match to match these data with regex as /^word2/ against word1. So regex would check if word1 starts with word2 or not, because of ^ symbol at the start.
It's always better to preg_quote() before matching to escape regex meta characters for accurate results.
<?php
$tests = [
[
'happyness',
'happys'
],
[
'kisses',
'kiss'
],
[
'consonant',
'consonan'
]
];
$filtered = array_filter($tests,function($values){
$values[1] = preg_quote($values[1]);
return preg_match("/^$values[1]/",$values[0]) === 1;
});
print_r($filtered);
Demo: https://3v4l.org/SLf15
You could also do a small function to find the similarity between the given 2 words. It could look like:
function similarity($word1, $word2)
{
$splittedWord1 = str_split($word1);
$splittedWord2 = str_split($word2);
$similarChars = array_intersect_assoc($splittedWord1, $splittedWord2);
return count($similarChars) / max(count($splittedWord1), count($splittedWord2));
}
var_dump(similarity('happyness', 'happys'));
var_dump(similarity('happyness', 'testhappys'));
var_dump(similarity('kisses', 'kiss'));
var_dump(similarity('consonant', 'consonan'));
The result would look like:
float(0.55555555555556)
int(0)
float(0.66666666666667)
float(0.88888888888889)
Based on the resulted percentage you could decide if the given words should be considered the same or not.
I'm not sure regex is the answer here.
You could try similar_text(), which returns the number of similar characters (and optionally sets a percentage value to a variable). Maybe if you consider the last two letters as non-important, you can see if the strlen() - $skippedCharacters is the same as what is matched. For example:
$skippedCharacters = 2;
$word1 = 'kisses';
$word2 = 'kiss';
$match = similar_text($word1, $word2);
if ($match + $skippedCharacters >= strlen($word1))
{
echo 'same word2';
}
You could use the PHP levenshtein function.
The levenshtein() function returns the Levenshtein distance between two strings. The Levenshtein distance is the number of characters you have to replace, insert or delete to transform string1 into string2.
$lev = levenshtein($word1, $word2);
The lower the number the bigger the similarity.
I'm having a really hard time understanding RegEx in general, so I have no clue how is it possible to use it in such an issue.
So here we have a tuple
$tuple = "(12342,43244)";
And what I try to do is get:
$value_one = 12342;
So from (value_one,value_two) get value_one.
I know it can be possible with explode( ',', $tuple ) and then delete the 1st character '(' out of the 1st element in exploded array, but that seems super sloppy, is there a way to pattern match in this manner in PHP?
Here is the simplest preg_match example with the \(([0-9]+) regex that matches a (, and captures into Group 1 one or more digits from 0 to 9 range:
$tuple = "(12342,43244)";
if (preg_match('~\(([0-9]+)~', $tuple, $m))
{
echo $m[1];
}
See the IDEONE demo
Wrapped into a function:
function retFirstDigitChunk($input) {
if (preg_match('~\(([0-9]+)~', $input, $m)) {
return $m[1];
} else {
return "";
}
}
See another demo
Or, to get both as an array:
function retValues($input) {
if (preg_match('~\((-?[0-9]+)\s*,\s*(-?[0-9]+)~', $input, $m)) {
return array('left'=>$m[1], 'right'=>$m[2]);
} else {
return "";
}
}
$tuple = "(12342,43244)";
print_r(retValues($tuple));
Output: Array( [left] => 12342 [right] => 43244 )
You have to search the number preceeded by an open brace and followed by a comma. The pattern is:
$value_one = preg_replace('/\((\d+),.*/', '$1', $tuple);
If you are looking for something efficient, try to avoid the use of regex when possible:
$result = explode(',', ltrim($tuple, '('))[0];
or
sscanf($tuple, '(%[^,]', $result);
I am trying to get the integer on the left and right for an input from the $str variable using REGEX. But I keep getting the commas back along with the integer. I only want integers not the commas. I have also tried replacing the wildcard . with \d but still no resolution.
$str = "1,2,3,4,5,6";
function pagination()
{
global $str;
// Using number 4 as an input from the string
preg_match('/(.{2})(4)(.{2})/', $str, $matches);
echo $matches[0]."\n".$matches[1]."\n".$matches[1]."\n".$matches[1]."\n";
}
pagination();
How about using a CSV parser?
$str = "1,2,3,4,5,6";
$line = str_getcsv($str);
$target = 4;
foreach($line as $key => $value) {
if($value == $target) {
echo $line[($key-1)] . '<--low high-->' . $line[($key+1)];
}
}
Output:
3<--low high-->5
or a regex could be
$str = "1,2,3,4,5,6";
preg_match('/(\d+),4,(\d+)/', $str, $matches);
echo $matches[1]."<--low high->".$matches[2];
Output:
3<--low high->5
The only flaw with these approaches is if the number is the start or end of range. Would that ever be the case?
I believe you're looking for Regex Non Capture Group
Here's what I did:
$regStr = "1,2,3,4,5,6";
$regex = "/(\d)(?:,)(4)(?:,)(\d)/";
preg_match($regex, $regStr, $results);
print_r($results);
Gives me the results:
Array ( [0] => 3,4,5 [1] => 3 [2] => 4 [3] => 5 )
Hope this helps!
Given your function name I am going to assume you need this for pagination.
The following solution might be easier:
$str = "1,2,3,4,5,6,7,8,9,10";
$str_parts = explode(',', $str);
// reset and end return the first and last element of an array respectively
$start = reset($str_parts);
$end = end($str_parts);
This prevents your regex from having to deal with your numbers getting into the double digits.
this is what I try to get:
My longest text to test When I search for e.g. My I should get My longest
I tried it with this function to get first the complete length of the input and then I search for the ' ' to cut it.
$length = strripos($text, $input) + strlen($input)+2;
$stringpos = strripos($text, ' ', $length);
$newstring = substr($text, 0, strpos($text, ' ', $length));
But this only works first time and then it cuts after the current input, means
My lon is My longest and not My longest text.
How I must change this to get the right result, always getting the next word. Maybe I need a break, but I cannot find the right solution.
UPDATE
Here is my workaround till I find a better solution. As I said working with array functions does not work, since part words should work. So I extended my previous idea a bit. Basic idea is to differ between first time and the next. I improved the code a bit.
function get_title($input, $text) {
$length = strripos($text, $input) + strlen($input);
$stringpos = stripos($text, ' ', $length);
// Find next ' '
$stringpos2 = stripos($text, ' ', $stringpos+1);
if (!$stringpos) {
$newstring = $text;
} else if ($stringpos2) {
$newstring = substr($text, 0, $stringpos2);
} }
Not pretty, but hey it seems to work ^^. Anyway maybe someone of you have a better solution.
You can try using explode
$string = explode(" ", "My longest text to test");
$key = array_search("My", $string);
echo $string[$key] , " " , $string[$key + 1] ;
You can take i to the next level using case insensitive with preg_match_all
$string = "My longest text to test in my school that is very close to mY village" ;
var_dump(__search("My",$string));
Output
array
0 => string 'My longest' (length=10)
1 => string 'my school' (length=9)
2 => string 'mY village' (length=10)
Function used
function __search($search,$string)
{
$result = array();
preg_match_all('/' . preg_quote($search) . '\s+\w+/i', $string, $result);
return $result[0];
}
There are simpler ways to do that. String functions are useful if you don't want to look for something specific, but cut out a pre-defined length of something. Else use a regular expression:
preg_match('/My\s+\w+/', $string, $result);
print $result[0];
Here the My looks for the literal first word. And \s+ for some spaces. While \w+ matches word characters.
This adds some new syntax to learn. But less brittle than workarounds and lengthier string function code to accomplish the same.
An easy method would be to split it on whitespace and grab the current array index plus the next one:
// Word to search for:
$findme = "text";
// Using preg_split() to split on any amount of whitespace
// lowercasing the words, to make the search case-insensitive
$words = preg_split('/\s+/', "My longest text to test");
// Find the word in the array with array_search()
// calling strtolower() with array_map() to search case-insensitively
$idx = array_search(strtolower($findme), array_map('strtolower', $words));
if ($idx !== FALSE) {
// If found, print the word and the following word from the array
// as long as the following one exists.
echo $words[$idx];
if (isset($words[$idx + 1])) {
echo " " . $words[$idx + 1];
}
}
// Prints:
// "text to"