Regex to match sentences with at least n words

Regex to match sentences with at least n words - php

I'm trying to pull all sentences from a text that consist of, say, at least 5 words in PHP. Assuming sentences end with full stop, question or exclamation mark, I came up with this:
/[\w]{5,*}[\.|\?|\!]/
Any ideas, what's wrong?
Also, what needs to be done for this to work with UTF-8?

\w only matches a single character. A single word would be \w+. If you need at least 5 words, you could do something like:
/(\w+\s){4,}\w+[.?!]/
i.e. at least 4 words followed by spaces, followed by another word followed by a sentence delimiter.

I agree with the solution posted here. If you're using preg functions in PHP you can add 'u' pattern modifier for this to work with UTF-8. /(\w+\s){4,}\w+[.?!]/u for example

The without regex method:
$str = "this is a more than five word sentence. But this is not. Neither this. NO";
$sentences = explode(".", $str);
foreach($sentences as $s)
{
$words = explode(' ', $s);
if(count(array_filter($words, 'is_notempty')) > 5)
echo "Found matching sentence : $s" . "<br/>";
}
function is_notempty($x)
{
return !empty($x);
}
This outputs:
Found matching sentence : this is a more than five word sentence

Related

Replace whole word only with or without regex

I am trying to replace occurances of whole words in a string. There are similar questions here at SO like this and this.
Answers to all these questions recommend using regex like this:
$needle = "a";
$haystack = "oh my dear a";
$haystack = preg_replace("/$needle\b/", 'God', $haystack);
echo $haystack;
This works good for whole words - echoes oh my dear God
But if I replace a with a. in both needle and haystack, i;e
$needle = "a.";
$haystack = "oh my dear a.";
the output becomes oh my deGod a. because . gets evaluated as regex.
I would want a. to be replaced by God with or without regex.

\b refers only to word boundaries in an ASCII perception. Also . is a character of special meaning in regular expression — meaning "match any single character (except newline)"
If the "needle" may contain special characters, use preg_quote() and create DIY boundaries.
preg_quote() takes str and puts a backslash in front of every character that is part of the regular expression syntax. This is useful if you have a run-time string that you need to match in some text and the string may contain special regex characters.
$str = preg_replace("~(?<=^| )" . preg_quote($needle, "~") . "(?= |$)~", 'God', $str);

Maybe this will give you an inspiration...
$haystack="oh my dear a." ;
$needle="a" ;
$hout=$haystack ;
$hout=~ s/\b$needle\b\./God/g ;
print "$haystack $needle $hout\n";
...produces this output...
oh my dear a. a oh my dear God
This works in perl. Sorry, my php is way rusty.

Replace/regex: Put span tags around numbers that are followed by %

I'm trying to sorround all numbers in a string (numbers that are followed by % sign) in span tags
Here are some regex noob attempts at solving it:
$str = preg_replace("/(1-9+)%/", "<span>$1</span>", $str);
or
$str = preg_replace("/([1-9]+)%/", "<span>$1</span>", $str);
Nothing is replaced
I bet I have got it all wrong.. I need to learn regex more, I know
But can you help me further with this?

Your regex is almost correct but there is one crucial glitch. You are matching [1-9] instead of [0-9].
EDIT: Use of preg_replace_callback to generate random number:
Following should work:
echo preg_replace_callback('/([0-9]+)%/', function ($m) {
return '<span>' . rand($m[1]-5, $m[1]+5) . '%</span>'; }, 'hello 100% hello');

After trying some more, this works:
$str = preg_replace('/[0-9]{1,3}%/', '<span>$0</span>', $str);
Using () was first mistake. Using 1-9 instead of 0-9 was other mistake. Using $1 instead of $0 was third mistake (I think)

Regular Expression - php - getting spaces not preceded and not followed by a word

Having something like this:
'This or is or some or information or stuff or attention here or testing'
I want to capture all the [spaces] that aren't preceded nor followed by the word or.
I reached this, I think I'm on the right track.
/\s(?<!(\bor\b))\s(?!(\bor\b))/
or this
/(?=\s(?<!(\bor\b))(?=\s(?!(\bor\b))))/
I'm not getting all the spaces, though. What is wrong with this? (the second one was a tryout to get the "and" going")

Try this:
<?php
$str = 'This or is or some or information or stuff or attention is not here or testing';
$matches = null;
preg_match_all('/(?<!\bor\b)[\s]+(?!\bor\b)/', $str, $matches);
var_dump($matches);
?>

How about (?<!or)\s(?!or):
$str='This or is or some or information or stuff or attention here or testing';
echo preg_replace('/(?<!or)\s(?!or)/','+',$str);
>>> This or is or some or information or stuff or attention+here or testing
This uses negitive lookbehind and lookahead, this will replace the space in Tor operator for example so if you want to match only or add trailing and preceding spaces:
$str='Tor operator';
echo preg_replace('/\s(?<!or)\s(?!or)\s/','+',$str);
>>> Tor operator

Code: (PHP Demo) (Pattern Demo)
$string = "You may organize to find or seek a neighbor or a pastor in a harbor or orchard.";
echo preg_replace('~(?<!\bor) (?!or\b)~', '_', $string);
Output:
You_may_organize_to_find or seek_a_neighbor or a_pastor_in_a_harbor or orchard.
Effectively the pattern says:
Match every space IF:
the space is not preceded by the full word "or" (a word that ends in "or" doesn't count), and
the space is not followed by the full word "or" (a word that begins with "or" doesn't count)

Turn first character of str capitalized but ignore one or two letter words

I am a newbie and this is a tough one for me.
I have a text inside a variable:
$bio = 'text, text, tex ...';
I can use the ucfirst php function to make word in the text start with an uppercase letter.
The problem is I don't want the words with one, two or three letters be capitalized becouse it would look unprofessional.
IMPORTANT: But I want also to keep the letter "I" capitalized since it's proper english grammar.
So a text like:
this is a text without ucfirst function and i think it needs some capitalizing
Would look like:
This is a Text Without Ucfirst Function and I Think it Needs Some Capitalizing
Any ideas?

This will capitalize any word (sequence of English letters) that is 4 or more letters long:
$bio = preg_replace_callback('/[a-z]{4,}|\bi\b/i', function($match){
return ucfirst($match[0]);
}, $bio);
For PHP versions before 5.3:
$bio = preg_replace_callback('/[a-z]{4,}|\bi\b/i',
create_function('$match', 'return ucfirst($match[0]);'), $bio);
It will leave any shorter words as is such as I and add, and capitalize i.

I would use regex. If you don't want to, you could use split, then iterate over the tokens and use a bunch of if-else, but regex is cooler. ;)
There is a very similar question here: Regex capitalize first letter every word, also after a special character like a dash

Mmmm so?
$bio_x = explode(" ",$bio); if(strlen($bio_x[0]) > 3) $bio = ucfirst($bio); $bio = str_replace(" i "," I ",$bio);

Regex replace one or two letter words

I am trying to replace one or two letters in a string. Please consider this regex
$str = 'I haven\'t got much time to spend!';
echo preg_replace('/\b([a-z0-9]{1,2})\b/i','',$str);
returns: haven' got much time spend!
expected output: haven't got much time spend!
My goal is remove any one or two characters length words from a string. This can be alphanumeric or special characters.

Use lookarounds:
preg_replace('/(?<!\S)\S{1,2}(?!\S)/', '', $str)
Altho this leaves double whitespace when words are removed. To also remove spaces you could try something like:
preg_replace('/\s+\S{1,2}(?!\S)|(?<!\S)\S{1,2}\s+/', '', $str)

Just use:
echo preg_replace('/(?<!\S)\S{1,2}(?!\S)/i', '', 'a dljlj-b2 adl xy zq a');
The output is as wanted:
dljlj-b2 adl
So don't forget to handle beginning/end of a string by negative assertions.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Regex to match sentences with at least n words - php

I'm trying to pull all sentences from a text that consist of, say, at least 5 words in PHP. Assuming sentences end with full stop, question or exclamation mark, I came up with this: /[\w]{5,*}[\.|\?|\!]/ Any ideas, what's wrong? Also, what needs to be done for this to work with UTF-8?

\w only matches a single character. A single word would be \w+. If you need at least 5 words, you could do something like: /(\w+\s){4,}\w+[.?!]/ i.e. at least 4 words followed by spaces, followed by another word followed by a sentence delimiter.

I agree with the solution posted here. If you're using preg functions in PHP you can add 'u' pattern modifier for this to work with UTF-8. /(\w+\s){4,}\w+[.?!]/u for example

Related

Replace whole word only with or without regex

Replace/regex: Put span tags around numbers that are followed by %

Regular Expression - php - getting spaces not preceded and not followed by a word

Turn first character of str capitalized but ignore one or two letter words

Regex replace one or two letter words

Categories

Resources