Parse text between 2 words - php

For sure this has already been asked by someone else, however I've searched here on SO and found nothing https://stackoverflow.com/search?q=php+parse+between+words
I have a string and want to get an array with all the words contained between 2 delimiters (2 words). I am not confident with regex so I ended up with this solution, but it is not appropiate because I need to get all the words that match those requirements and not only the first one.
$start_limiter = 'First';
$end_limiter = 'Second';
$haystack = $string;
# Step 1. Find the start limiter's position
$start_pos = strpos($haystack,$start_limiter);
if ($start_pos === FALSE)
{
die("Starting limiter ".$start_limiter." not found in ".$haystack);
}
# Step 2. Find the ending limiters position, relative to the start position
$end_pos = strpos($haystack,$end_limiter,$start_pos);
if ($end_pos === FALSE)
{
die("Ending limiter ".$end_limiter." not found in ".$haystack);
}
# Step 3. Extract the string between the starting position and ending position
# Our starting is the position of the start limiter. To find the string we must take
# the ending position of our end limiter and subtract that from the start limiter
$needle = substr($haystack, $start_pos+1, ($end_pos-1)-$start_pos);
echo "Found $needle";
I thought also about using explode() but I think a regex could be better and faster.

I'm not much familiar with PHP, but it seems to me that you can use something like:
if (preg_match("/(?<=First).*?(?=Second)/s", $haystack, $result))
print_r($result[0]);
(?<=First) looks behind for First but doesn't consume it,
.*? Captures everything in between First and Second,
(?=Second) looks ahead for Second but doesn't consume it,
The s at the end is to make the dot . match newlines if any.
To get all the text between those delimiters, you use preg_match_all and you can use a loop to get each element:
if (preg_match_all("/(?<=First)(.*?)(?=Second)/s", $haystack, $result))
for ($i = 1; count($result) > $i; $i++) {
print_r($result[$i]);
}

Not sure that the result will be faster than your code, but you can do it like this with regex:
$pattern = '~(?<=' . preg_quote($start, '~')
. ').+?(?=' . preg_quote($end, '~') . ')~si';
if (preg_match($pattern, $subject, $match))
print_r($match[0]);
I use preg_quote to escape all characters that have a special meaning in a regex (like +*|()[]{}.? and the pattern delimiter ~)
(?<=..) is a lookbehind assertion that check a substring before what you want to find.
(?=..) is a lookahead assertion (same thing for after)
.+? means all characters one or more times but the less possible (the question mark make the quantifier lazy)
s allows the dot to match newlines (not the default behavior)
i make the search case insensitive (you can remove it, if you don't need)

This allows you to run the same function with different parameters, just so you don't have to rewrite this bit of code all of the time. Also uses the strpos which you used. Has been working great for me.
function get_string_between($string, $start, $end){
$string = " ".$string;
$ini = strpos($string,$start);
if ($ini == 0) return "";
$ini += strlen($start);
$len = strpos($string,$end,$ini) - $ini;
return substr($string,$ini,$len);
}
$fullstring = 'This is a long set of words that I am going to use.';
$parsed = get_string_between($fullstring, 'This', "use");
echo $parsed;
Will output:
is a long set of words that I am going to

Here's a simple example for finding everything between the words 'mega' and 'yo' for the string $t.
PHP Example
$t = "I am super mega awesome-sauce, yo!";
$arr = [];
preg_match("/mega\ (.*?)\ yo/ims", $t, $arr);
echo $arr[1];
PHP Output
awesome-sauce,

You can also use two explode statements.
For example, say you want to get "z" in y=mx^z+b. To get z:
$formula="y=mx^z+b";
$z=explode("+",explode("^",$formula)[1])[0];
First I get everything after ^: explode("^",$formula)[1]
Then I get everything before +: explode("+",$previousExplode)[0]

Related

Remove s or 's from all words in a string with PHP

I have a string in PHP
$string = "Dogs are Jonny's favorite pet";
I want to use regex or some method to remove s or 's from the end of all words in the string.
The desired output would be:
$revisedString = "Dog are Jonny favorite pet";
Here is my current approach:
<?php
$string = "Dogs are Jonny's favorite pet";
$stringWords = explode(" ", $string);
$counter = 0;
foreach($stringWords as $string) {
if(substr($string, -1) == s){
$stringWords[$counter] = trim($string, "s");
}
if(strpos($string, "'s") !== false){
$stringWords[$counter] = trim($string, "'s");
}
$counter = $counter + 1;
}
print_r($stringWords);
$newString = "";
foreach($stringWords as $string){
$newString = $newString . $string . " ";
}
echo $newString;
}
?>
How would this be achieved with REGEX?
For general use, you must leverage much more sophisticated technique than an English-ignorant regex pattern. There may be fringe cases where the following pattern fails by removing an s that it shouldn't. It could be a name, an acronym, or something else.
As an unreliable solution, you can optionally match an apostrophe then match a literal s if it is not immediately preceded by another s. Adding a word boundary (\b) on the end improves the accuracy that you are matching the end of words.
Code: (Demo)
$string = "The bass can access the river's delta from the ocean. The fishermen, assassins, and their friends are happy on the banks";
var_export(preg_replace("~'?(?<!s)s\b~", '', $string));
Output:
'The bass can access the river delta from the ocean. The fishermen, assassin, and their friend are happy on the bank'
PHP Live Regex always helped me a lot in such moments. Even already knowing how REGEX works, I still use it just to be sure some times.
To make use of REGEX in your case, you can use preg_replace().
<?php
// Your string.
$string = "Dogs are Jonny's favorite pet";
// The vertical bar means "or" and the backslash
// before the apostrophe is needed so you don't end
// your pattern string since we're using single quotes
// to delimit it. "\s" means a single space.
$regex_pattern = '/\'s\s|s\s|s$/';
// Fill the preg_replace() with the pattern, the replacement
// (a single space in this case), your string, -1 (so preg_replace()
// will replace all the matches) and a variable of your desire
// to be the "counter" (preg_replace() will automatically
// fill it).
$newString = preg_replace($regex_pattern, ' ', $string, -1, $counter);
// Use the rtrim() to remove spaces at the right of the sentence.
$newString = rtrim($newString, " ");
echo "New string: " . $newString . ". ";
echo "Replacements: " . $counter . ".";
?>
In this case, the function will identify any "'s" or "s" with spaces (\s) after them and then replace them with a single space.
The preg_replace() will also count all the replacements and register them automatically on $counter or any variable you place there instead.
Edit:
Phil's comment is right and indeed my previous REGEX would lose a "s" at the end of the string. Adding "|s$" will solve it. Again, "|" means "or" and the "$" means that the "s" must be at the end of the string.
In attention to mickmackusa's comment, my solution is meant only to remove "s" characters at the end of words inside the string as this was Sparky Johnson' request here. Removing plurals would require a complex code since not only we need to remove "s" characters from plural only words but also change verbs and other stuff.

How to cut string from start to second last dot of the string?

I have some string, for example:
cats, e.g. Barsik, are funny. And it is true. So,
And I want to get as result:
cats, e.g. Barsik, are funny.
My try:
mb_ereg_search_init($text, '((?!e\.g\.).)*\.[^\.]');
$match = mb_ereg_search_pos();
But it gets position of second dot (after word "true").
How to get desired result?
Since a naive approach works for you, I am posting an answer. However, please note that detecting a sentence end is a very difficult task for a regex, and although it is possible to some degree, an NLP package should be used for that.
Having said that, I suggested using
'~(?<!\be\.g)\.(?=\s+\p{Lu})~ui'
The regex matches any dot (\.) that is not preceded with a whole word e.g (see the negative lookbehind (?<!\be\.g)), but that is followed with 1 or more whitespaces (\s+) followed with 1 uppercase Unicode letter \p{Lu}.
See the regex demo
The case insensitive i modifier does not impact what \p{Lu} matches.
The ~u modifier is required since you are working with Unicode texts (like Russian).
To get the index of the first occurrence, use a preg_match function with the PREG_OFFSET_CAPTURE flag. Here is a bit simplified regex you supplied in the comments:
preg_match('~(?<!т\.н)(?<!т\.к)(?<!e\.g)\.(?=\s+\p{L})~iu', $text, $match, PREG_OFFSET_CAPTURE);
See the lookaheads are executed one by one, and at the same location in string, thus, you do not have to additionally group them inside a positive lookahead. See the regex demo.
IDEONE demo:
$re = '~(?<!т\.н)(?<!т\.к)(?<!e\.g)\.(?=\s+\p{L})~iu';
$str = "cats, e.g. Barsik, are funny. And it is true. So,";
preg_match($re, $str, $match, PREG_OFFSET_CAPTURE);
echo $match[0][1];
Here are two approaches to get substring from start to second last . position of the initial string:
using strrpos and substr functions:
$str = 'cats, e.g. Barsik, and e.g. Lusya are funny. And it is true. So,';
$len = strlen($str);
$str = substr($str, 0, (strrpos($str, '.', strrpos($str, '.') - $len - 1) - $len) + 1);
print_r($str); // "cats, e.g. Barsik, and e.g. Lusya are funny."
using array_reverse, str_split and array_search functions:
$str = 'cats, e.g. Barsik, and e.g. Lusya are funny. And it is true. So,';
$parts = array_reverse(str_split($str));
$pos = array_search('.', $parts) + 1;
$str = implode("", array_reverse(array_slice($parts, array_search('.', array_slice($parts, $pos)) + $pos)));
print_r($str); // "cats, e.g. Barsik, and e.g. Lusya are funny."

php regex replace each character with asterisk

I am trying to something like this.
Hiding users except for first 3 characters.
EX)
apple -> app**
google -> goo***
abc12345 ->abc*****
I am currently using php like this:
$string = "abcd1234";
$regex = '/(?<=^(.{3}))(.*)$/';
$replacement = '*';
$changed = preg_replace($regex,$replacement,$string);
echo $changed;
and the result be like:
abc*
But I want to make a replacement to every single character except for first 3 - like:
abc*****
How should I do?
Don't use regex, use substr_replace:
$var = "abcdef";
$charToKeep = 3;
echo strlen($var) > $charToKeep ? substr_replace($var, str_repeat ( '*' , strlen($var) - $charToKeep), $charToKeep) : $var;
Keep in mind that regex are good for matching patterns in string, but there is a lot of functions already designed for string manipulation.
Will output:
abc***
Try this function. You can specify how much chars should be visible and which character should be used as mask:
$string = "abcd1234";
echo hideCharacters($string, 3, "*");
function hideCharacters($string, $visibleCharactersCount, $mask)
{
if(strlen($string) < $visibleCharactersCount)
return $string;
$part = substr($string, 0, $visibleCharactersCount);
return str_pad($part, strlen($string), $mask, STR_PAD_RIGHT);
}
Output:
abc*****
Your regex matches all symbols after the first 3, thus, you replace them with a one hard-coded *.
You can use
'~(^.{3}|(?!^)\G)\K.~'
And replace with *. See the regex demo
This regex matches the first 3 characters (with ^.{3}) or the end of the previous successful match or start of the string (with (?!^)\G), and then omits the characters matched from the match value (with \K) and matches any character but a newline with ..
See IDEONE demo
$re = '~(^.{3}|(?!^)\G)\K.~';
$strs = array("aa","apple", "google", "abc12345", "asdddd");
foreach ($strs as $s) {
$result = preg_replace($re, "*", $s);
echo $result . PHP_EOL;
}
Another possible solution is to concatenate the first three characters with a string of * repeated the correct number of times:
$text = substr($string, 0, 3).str_repeat('*', max(0, strlen($string) - 3));
The usage of max() is needed to avoid str_repeat() issue a warning when it receives a negative argument. This situation happens when the length of $string is less than 3.

Find and replace starting from the end [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Preg Replace - replace second occurance of a match
I have a string that includes the word rules twice. I need to find and replace the 2nd word. Tried fooling around with str_replace() but couldn't get anything, the 4th parameter wasn't what I expected.
Here is an example string:
http://localhost/proj1/modstart/admin/index.php?i=rules&sid=397ab1f6b8eb8a17787438a7e2e60ea3&mode=rules
After my replace it should look like this:
http://localhost/proj1/modstart/admin/index.php?i=rules&sid=397ab1f6b8eb8a17787438a7e2e60ea3&mode=manage
I read that preg_replace() could help, but I don't know how to write patterns.
Ideas?
P.S: Don't suggest splitting the string into two variables, that wouldn't serve my needs.
It would be a very good idea to learn about regular expressions. In PHP, you can accomplish your find/replace like this:
$result = preg_replace('/(rules.*?)rules/','$1manage',$str,1);
It basically finds "rules" once, then anything, then rules a second time, then puts it all back before the second match and replaces the word.
To not use a regular expression, and also to not store anything into a second variable, you could use str_replace() with a little magic from strpos():
$string = substr($string, 0, strpos($str, 'rules') + 5) . str_replace('rules', 'whatever', substr($string, strpos($string, 'rules') + 5));
This will take the full string up-to the end of the first instance of rules and then do the string-replacement on the second-part of the string which will contain any other instance of the word.
The same thing, but a little more cleaner (yes, by using a second variable):
$pos = strpos($string, 'rules') + 5;
$string = substr($string, 0, $pos) . str_replace('rules', 'whatever', substr($string, $pos));
If the word to find+replace is dynamic or you want to use a different word on different pages, you could make that a variable, like this:
$findMe = 'rules';
$replaceWith = 'whatever';
$pos = strpos($string, $findMe) + strlen($findMe);
$string = substr($string, 0, $pos) . str_replace($findMe, $replaceWith, substr($string, $pos));
You should use regex >>
$new = preg_replace('/\brules\b(?!.*\brules\b)/', 'manage', $old);
It is a good idea to use word boundaries \b, so it will not match some larger strings that contain "rules", such as "preudorules".
Negative lookahead (?!.*\brules\b) ensures there is no other word "rules" behind, so the one you are replacing is the last one.

Replace after a needle in a string?

I have a string, something like
bbbbabbbbbccccc
Are there any way for me to replace all the letters "b" after the only one letter "a" into "c" without having to split the string, using PHP?
bbbbacccccccccc
odd question.
echo preg_replace('/a(.*)$/e', "'a'.strtr($1, 'b', 'c')", 'bbbabbbbbccccc');
preg_replace matches everything to the right of 'a' with regex. the e modifier in the regex evaluates the replacement string as code. the code in the replacement string uses strtr() to replace 'b's with 'c's.
Here are three options.
First, a split. Yes, I know you want to do it without a split.
$string = 'bbbbabbbbbccccc';
$parts = preg_split('/(a)/', $string, 2, PREG_SPLIT_DELIM_CAPTURE);
// Parts now looks like:
// array('bbb', 'a', 'bbbbcccc');
$parts[2] = str_replace('b', 'c', $parts[2]);
$correct_string = join('', $parts);
Second, a position search and a substring replacement.
$string = 'bbbbabbbbbccccc';
$first_a_index = strpos($string, 'a');
if($first_a_index !== false) {
// Now, grab everything from that first 'a' to the end of the string.
$replaceable = substr($string, $first_a_index);
// Replace it.
$replaced = str_replace('b', 'c', $replaceable );
// Now splice it back in
$string = substr_replace($string, $replaced, $first_a_index);
}
Third, I was going to post a regex, but the one dqhendricks posted is just as good.
These code examples are verbose for clarity, and can be reduced to one-or-two-liners.
$s = 'bbbbabbbbbccccc';
echo preg_replace('/((?:(?!\A)\G|(?<!a)a(?!a))[^b]*)b/', '$1c', $s);
\G matches the position where the previous match ended. On the first match attempt, \G matches the beginning of the string like \A. We don't want that, so we use (?!\A) to prevent it.
(?<!a)a(?!a) matches an a that's neither preceded nor followed by an a. The a is captured in group #1 so we can plug it back into the replacement with $1.
This is a "pure regex" solution, meaning it does the whole job in one call to preg_replace and doesn't rely on embedded code and the /e modifier. It's good to know in case you ever find yourself working within those constraints, but it definitely shouldn't be your first resort.

Categories