I have these arrays (array and array2)
$urls = array("http://piggington.com/pb_cash_flow_positive")
I have this regular expression
(preg_match("/\/{2}.*?\./", $array[$i], $matches))
It checks for everything that comes after 2nd slash and before 1st dot. So it will find
/piggington.
Now want to concatenate a variable inside the following regular expression, so it will search for a specific string.
I tried:
$matches_imploded = implode($matches);
$matches_imploded = preg_quote($matches_imploded, '/');
$match_with_other_array = preg_grep("/\/{2}".$matches_imploded."\./", $array2);
But it's not finding any matches.. What am I doing wrong? It should be looking inside array2 and making a positive match with $matches_imploded
between second slash and first dot we found $matches_imploded
To match everything which comes after // and before the first dot, you need to use \K or positive lookbehind.
preg_match("~/{2}\K[^.]*(?=.)~", $array[$i], $matches)
$matches_imploded = implode($matches);
$matches_imploded = preg_quote($matches_imploded, '/');
$match_with_other_array = preg_grep("/\/{2}".$matches_imploded."\./", $array2);
Related
I have an array of words and a string and want to add a hashtag to the words in the string that they have a match inside the array. I use this loop to find and replace the words:
foreach($testArray as $tag){
$str = preg_replace("~\b".$tag."~i","#\$0",$str);
}
Problem: lets say I have the word "is" and "isolate" in my array. I will get ##isolate at the output. this means that the word "isolate" is found once for "is" and once for "isolate". And the pattern ignores the fact that "#isoldated" is not starting with "is" anymore and it starts with "#".
I bring an example BUT this is only an example and I don't want to just solve this one but every other possiblity:
$str = "this is isolated is an example of this and that";
$testArray = array('is','isolated','somethingElse');
Output will be:
this #is ##isolated #is an example of this and that
You may build a regex with an alternation group enclosed with word boundaries on both ends and replace all the matches in one pass:
$str = "this is isolated is an example of this and that";
$testArray = array('is','isolated','somethingElse');
echo preg_replace('~\b(?:' . implode('|', $testArray) . ')\b~i', '#$0', $str);
// => this #is #isolated #is an example of this and that
See the PHP demo.
The regex will look like
~\b(?:is|isolated|somethingElse)\b~
See its online demo.
If you want to make your approach work, you might add a negative lookbehind after \b: "~\b(?<!#)".$tag."~i","#\$0". The lookbehind will fail all matches that are preceded with #. See this PHP demo.
A way to do that is to split your string by words and to build a associative array with your original array of words (to avoid the use of in_array):
$str = "this is isolated is an example of this and that";
$testArray = array('is','isolated','somethingElse');
$hash = array_flip(array_map('strtolower', $testArray));
$parts = preg_split('~\b~', $str);
for ($i=1; $i<count($parts); $i+=2) {
$low = strtolower($parts[$i]);
if (isset($hash[$low])) $parts[$i-1] .= '#';
}
$result = implode('', $parts);
echo $result;
This way, your string is processed only once, whatever the number of words in your array.
I have some string, for example:
cats, e.g. Barsik, are funny. And it is true. So,
And I want to get as result:
cats, e.g. Barsik, are funny.
My try:
mb_ereg_search_init($text, '((?!e\.g\.).)*\.[^\.]');
$match = mb_ereg_search_pos();
But it gets position of second dot (after word "true").
How to get desired result?
Since a naive approach works for you, I am posting an answer. However, please note that detecting a sentence end is a very difficult task for a regex, and although it is possible to some degree, an NLP package should be used for that.
Having said that, I suggested using
'~(?<!\be\.g)\.(?=\s+\p{Lu})~ui'
The regex matches any dot (\.) that is not preceded with a whole word e.g (see the negative lookbehind (?<!\be\.g)), but that is followed with 1 or more whitespaces (\s+) followed with 1 uppercase Unicode letter \p{Lu}.
See the regex demo
The case insensitive i modifier does not impact what \p{Lu} matches.
The ~u modifier is required since you are working with Unicode texts (like Russian).
To get the index of the first occurrence, use a preg_match function with the PREG_OFFSET_CAPTURE flag. Here is a bit simplified regex you supplied in the comments:
preg_match('~(?<!т\.н)(?<!т\.к)(?<!e\.g)\.(?=\s+\p{L})~iu', $text, $match, PREG_OFFSET_CAPTURE);
See the lookaheads are executed one by one, and at the same location in string, thus, you do not have to additionally group them inside a positive lookahead. See the regex demo.
IDEONE demo:
$re = '~(?<!т\.н)(?<!т\.к)(?<!e\.g)\.(?=\s+\p{L})~iu';
$str = "cats, e.g. Barsik, are funny. And it is true. So,";
preg_match($re, $str, $match, PREG_OFFSET_CAPTURE);
echo $match[0][1];
Here are two approaches to get substring from start to second last . position of the initial string:
using strrpos and substr functions:
$str = 'cats, e.g. Barsik, and e.g. Lusya are funny. And it is true. So,';
$len = strlen($str);
$str = substr($str, 0, (strrpos($str, '.', strrpos($str, '.') - $len - 1) - $len) + 1);
print_r($str); // "cats, e.g. Barsik, and e.g. Lusya are funny."
using array_reverse, str_split and array_search functions:
$str = 'cats, e.g. Barsik, and e.g. Lusya are funny. And it is true. So,';
$parts = array_reverse(str_split($str));
$pos = array_search('.', $parts) + 1;
$str = implode("", array_reverse(array_slice($parts, array_search('.', array_slice($parts, $pos)) + $pos)));
print_r($str); // "cats, e.g. Barsik, and e.g. Lusya are funny."
I'm trying to use PHP regular expressions. I've tried this code:
$regex = "c:(.+),";
$input = "otherStuff094322f98c:THIS,OtherStuffHeree129j12dls";
$match = Array();
preg_match_all($regex, $input, $match);
It should return a sub-string THIS ("c" and ":" followed by any character combination followed by ",") from $input. But it returns a empty array. What am I doing wrong?
I think you need the slashes to make regex working.
and using .+ will match everything behind the comma too, which is you don't want. Use .+? or [^,]+
$regex = "/c:(.+?),/";
or
$regex = "/c:([^,]+),/";
If I have a string like: 10/10/12/12
I'm using:
$string = '10/10/12/12';
preg_match_all('/[0-9]+\/[0-9]+/', $string, $results);
This only seems to match 10/10, and 12/12. I also want to match 10/12. Is it because after the 10/10 is matched that is removed from the picture? So after the first match it'll only match things from /12/12?
If I want to match all 10/10, 10/12, 12/12, what should my regex look like? Thanks.
Edit: I did this
$arr = explode('/', $string);
$count = count($arr) - 1;
$newarr = array();
for ($i = 0; $i < $count; $i++)
{
$newarr[] = $arr[$i].'/'.$arr[$i+1];
}
I'd advise not using regular expression. Instead you could for example first split on slash using explode. Then iterate over the parts, checking for two consecutive parts which both consist of only digits.
The reason why your regular expression doesn't work is because the match consumes the characters it matches. Searching for the next match starts from just after where the previous match ended.
If you really want to use regular expressions you can use a zero-width match such as a lookahead to avoid consuming the characters, and put a capturing match inside the lookahead.
'#[0-9]+/(?=([0-9]+))#'
See it working online: ideone
I have a string:
$uri = "start/test/go/";
Basically I need to know which regular expression and PHP function I can use to match the first item with a forward slash ("/") and remove it from the string. It should also work if the first item is not start and is anything else which might also have a space in it.
So all these combination should work:
$uri = "start_my_test/test/go/";
$uri2 = "start my test/test/go/";
Then after the RegEx it should always return:
$newUri = "test/go/";
Oh and the other side of the string could be anything as well, So basically I want it to delete anything before the first occurrence of a forward slash.
Cheers
Use strstr to find the first occurrence of a string in php.
That in itself should return the remainder of the string.
see here
$result = preg_replace('/^[^\/]*\//' , '', $subject);
This says "start at the beginning of the string" ^, "match any number of characters that are not a forward slash" [^\/]*, then match a single forward slash \/ -- and "replace the whole matched thing with nothing" ''.
regex is too expensive an operation for what you need. use strpos and substr instead
$position = strpos($needle, $haystack);
if ( $position !== false ) {
$result = substr($needle, $position + 1);
}