Match beginning of string in full words

Match beginning of string in full words - php

I am using the startsWith() function here startsWith() and endsWith() functions in PHP
But I want it to only match full words.
Currently it will match the following:
hi
high
hiho
But I want it to only match "hi", not the other two words if the input is:
hi there

You can match it with this regular expression: /^hi$|^hi\s|\shi\s|\shi$/
$test = ['hi', 'hi there', 'high', 'hiho'];
$pattern = '/^hi$|^hi\s|\shi\s|\shi$/';
$matches = [];
foreach ($test as $t) {
var_dump($t);
preg_match($pattern, $t, $matches);
var_dump($matches);
}
Parts explained:
^hi$ - your sting is "hi"
^hi\s - your string starts with hi: "hi "
\shi\s - there's a " hi " somewhere in your string
\shi$ - your string ends with " hi"
Those parts are glued together with pipe "|", which in regex means "or", so the entire expression is matching any one of the parts

If you test whole text against hi words, try this:
<?php
preg_match_all('#hi\s#i',
"hi me
hi there
high
highlander
historic
hire",
$matches);
var_dump($matches);
Test it - modify here: https://regex101.com/r/tV3jR6/1

Related

preg replace would ignore non-letter characters when detecting words

I have an array of words and a string and want to add a hashtag to the words in the string that they have a match inside the array. I use this loop to find and replace the words:
foreach($testArray as $tag){
$str = preg_replace("~\b".$tag."~i","#\$0",$str);
}
Problem: lets say I have the word "is" and "isolate" in my array. I will get ##isolate at the output. this means that the word "isolate" is found once for "is" and once for "isolate". And the pattern ignores the fact that "#isoldated" is not starting with "is" anymore and it starts with "#".
I bring an example BUT this is only an example and I don't want to just solve this one but every other possiblity:
$str = "this is isolated is an example of this and that";
$testArray = array('is','isolated','somethingElse');
Output will be:
this #is ##isolated #is an example of this and that

You may build a regex with an alternation group enclosed with word boundaries on both ends and replace all the matches in one pass:
$str = "this is isolated is an example of this and that";
$testArray = array('is','isolated','somethingElse');
echo preg_replace('~\b(?:' . implode('|', $testArray) . ')\b~i', '#$0', $str);
// => this #is #isolated #is an example of this and that
See the PHP demo.
The regex will look like
~\b(?:is|isolated|somethingElse)\b~
See its online demo.
If you want to make your approach work, you might add a negative lookbehind after \b: "~\b(?<!#)".$tag."~i","#\$0". The lookbehind will fail all matches that are preceded with #. See this PHP demo.

A way to do that is to split your string by words and to build a associative array with your original array of words (to avoid the use of in_array):
$str = "this is isolated is an example of this and that";
$testArray = array('is','isolated','somethingElse');
$hash = array_flip(array_map('strtolower', $testArray));
$parts = preg_split('~\b~', $str);
for ($i=1; $i<count($parts); $i+=2) {
$low = strtolower($parts[$i]);
if (isset($hash[$low])) $parts[$i-1] .= '#';
}
$result = implode('', $parts);
echo $result;
This way, your string is processed only once, whatever the number of words in your array.

Regex matches new lines contains text after line contains only specific characters

I have a text for a book pages that may have footnotes at the end of the string like the following example:
والخاتِم بكسر التاء اسم فاعل، فكأنه قد جاء آخر الرسل، والخاتَم بفتح التاء اسم آلة، كأنه قد ختمت به الرسالة.
__________
(1) - سورة الأحزاب آية : 43.
(2) - سورة البقرة آية : 157.
(3) - سورة الأنعام آية : 17.
(4) - سورة الكهف آية : 19.
The line that I mean in the sample and the specific characters in this case are Kashidas _ (It is not dash -), in Latin, it called underscore. What I need to get is matching the four lines or any number of lines under that line.
What I have tried let only to match the first line under that line:/_.*\n*(.*)/gum and this is a demo. The only way to get them all, is to repeat the pattern portion \n*(.*) n times equals to the number of lines in the footnotes i.e four times, regarding the example case, and this is not a practical solution like this demo

You can utilize the \G anchor here:
preg_match_all('~(?:\G(?!^)|_)\R+\K[^\n]+~', $str, $matches);
print_r($matches[0]);
eval.in

Basically its not that easy to catch lines, and then every match. But what can you do is to catch everything after line, and then match again every line.
You can do that making:
/_{4,}.+/gums
/(\(.*?\.)*/gums
I hope that is good enough for you.

I just tested this successfully:
$text = "_________\r\n\r\nLine 1\r\nLine 2\r\nLine 3\r\n";
$matches = array();
$pattern = '/_+\r\n\r\n(.+)/s'; // s to have . match newlines.
// Change \r\n to \n if appropriate
// Extract all footnotes
preg_match($pattern, $text, $matches);
$footnotes = $matches[1]; // $matches[0] is the whole matched string,
// $matches[1] is the part within ()
$matches = array();
$pattern = '/(.+)/'; // Don't match newlines here
// Extract individual footnotes
preg_match_all($pattern, $footnotes, $matches);
foreach ($matches[0] as $match) { // preg_match_all returns multi-dimensional array
// Do something with each footnote
}

Using regex in php to find string values between characters multiple times

Here I have a string, "Hello World! I am trying out regex in PHP!". What I want to do is retrieve string values between a set of characters. In this example, the characters are ** **
$str = "**Hello World!** I am trying out regex in PHP!";
preg_match('#\*\*(.*)\*\*#Us', $str, $match);
echo $match[1];
This will echo out "Hello World!", but I want to echo out several matches:
$str = "**Hello World!** I am trying out **regex in PHP!**";
How would I be able to do so? I tried using preg_match_all() but I don't think I was using it properly, or that it would work at all in this case.

You can use:
$str = "**Hello World!** I am trying out **regex in PHP!**";
preg_match_all('/\*{2}([^*]*)\*{2}/', $str, $m);
print_r($m[1]);
Array
(
[0] => Hello World!
[1] => regex in PHP!
)
Even your regex #\*\*(.*)\*\*#Us should work with this but my suggested regex is little more efficient due to negation based pattern [^*]*

You got 1 match owing to using preg_match.You should use preg_match_all Here is another pattern.It uses word non word match between the delimiters
<?php
$str = "**Hello World!** I am trying out **regex in PHP!**";
$regex='/\*\*([\w\W]*)\*\*/iU';
preg_match_all($regex, $str, $m);
print_r($m[1]);

I suggest you to use a non-greedy form of regex. Because i think you want to match also the contents (text inside **) where the single * resides.
$str = "**Hello World!** I am trying out **regex in PHP!**";
preg_match_all('~\*\*(.*?)\*\*~', $str, $matches);
print_r($matches[1]);
DEMO

regex for matching three specific character

while attempting a question in SO,i tried to write the regular expression which matches three characters that should be in the string.
i am following the answer Regular Expressions: Is there an AND operator?
<?php
$words = "systematic,gear,synthesis,mysterious";
$words=explode(",",$words);
$your_array = preg_grep("/^(^s|^m|^e)/", $words);
print_r($your_array);
?>
the output should be systematic and mysterious.but i am getting synthesis also.
Why is it so?what i am doing wrong?
** i dont want a new solution :)
SEE HERE

You can do this:
$wordlist = 'systematic,gear,synthesis,mysterious';
$words = explode(',', $wordlist);
foreach($words as $word) {
if (preg_match('~(?=[^s]*s)(?=[^m]*m)(?=[^e]*e)~', $word))
echo '<br/>' . $word;
}
//or
$res = preg_grep('~(?=[^s]*s)(?=[^m]*m)(?=[^e]*e)~', $words);
print_r($res);
To test the presence of a character in the string, I use (?=[^s]*s).
[^s]*s means all that is not a "s" zero or more times, and a "s".
(?=..) is a lookahead assertion and means "followed by". It is only a check, a lookahead give no characters in a match result, but the main interest with this feature is that you can check the same substring several times.
What is wrong with your pattern?
/^(^s|^m|^e)/ will give you only words that begins with "s" or "m" or "e" because ^ is an anchor and means : "start of the string". In other words, your pattern is the same as /^([sme])/.

How to use regex to delete everything except some words? [duplicate]

This question already has an answer here:
Closed 10 years ago.
Possible Duplicate:
Regular expression: match all words except
I need your help for using Regex in PHP to negate a selection. So I have a string like this :
"Hello my name is tom"
What I need to do is to delete everything from this string witch is not "tom" or "jack" or "alex" so I tried :
$MyString = "Hello my name is tom"
print_r(preg_replace('#^tom|^jack|^alex#i', '', $MyString));
But it's not working...
Can you help me with that ?
Thanks

If you want to delete everything except something, may be it's better done the other way around: capture the something only? For example...
$testString = 'Hello my name is tom or jack';
$matches = array();
preg_match_all('/\b(tom|jack|alex)\b/i', $testString, $matches);
$result = implode('', $matches[0]);
echo $result; // tomjack
What you've tried to do is use a character class syntax ([^s] will match any character but s). But this doesn't work with series of characters, there's no such thing as 'word class'. )

If you want to remove everything that is not "tom" or "jack" or "alex" you can use the following:
$MyString = "Hello my name is jack";
print_r(preg_replace('#.*(tom|jack|alex)#i', '$1', $MyString));
This replaces the whole string with just the matched name.

regex:
\b(?!tom|jack|alex)[^\s]+\b

You could match what you want and then reconstruct the string:
$s = 'hello my name is tom, jack and alex';
if (preg_match_all('/(?:tom|jack|alex)/', $s, $matches)) {
print_r($matches);
$s = join('', $matches[0]);
} else {
$s = '';
}
echo $s;
Output:
tomjackalex

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Match beginning of string in full words - php

I am using the startsWith() function here startsWith() and endsWith() functions in PHP But I want it to only match full words. Currently it will match the following: hi high hiho But I want it to only match "hi", not the other two words if the input is: hi there

If you test whole text against hi words, try this: <?php preg_match_all('#hi\s#i', "hi me hi there high highlander historic hire", $matches); var_dump($matches); Test it - modify here: https://regex101.com/r/tV3jR6/1

Related

preg replace would ignore non-letter characters when detecting words

Regex matches new lines contains text after line contains only specific characters

Using regex in php to find string values between characters multiple times

regex for matching three specific character

How to use regex to delete everything except some words? [duplicate]

Categories

Resources