I found this solution on stackoverflow for getting the first word from a sentence.
$myvalue = 'Test me more';
$arr = explode(' ',trim($myvalue));
echo $arr[0]; // will print Test
However, this case takes ' ' (a space) as the divider. Does anyone know how to get the first word from a string if you do not know what the divider is? It can be ' ' (space), '.' (full stop), '.' (or comma). Basically, how do you take anything that is a letter from a string up to the point where there is no letter?
E.g.:
'House, rest of sentence here' would give 'House'
'House.' would also give 'House'
'House thing' would also give 'House'
Thanks!
There is a string function (strtok) which can be used to split a string into smaller strings (tokens) based on some separator(s). For the purposes of this thread, the first word (defined as anything before the first space character) of Test me more can be obtained by tokenizing the string on the space character.
<?php
$value = "Test me more";
echo strtok($value, " "); // Test
?>
For more details and examples, see the strtok PHP manual page.
preg_split is what you're looking for.
$str = "bla1 bla2,bla3";
$words = preg_split("/[\s,]+/", $str);
This snippet splits the $str by space, \t, comma, \n.
Use the preg_match() function with a regular expression:
if (preg_match('/^\w*/', 'Your text here', $matches) > 0) {
echo $matches[0]; // $matches[0] will contain the first word of your sentence
} else {
// no match found
}
Related
I have a string in PHP
$string = "Dogs are Jonny's favorite pet";
I want to use regex or some method to remove s or 's from the end of all words in the string.
The desired output would be:
$revisedString = "Dog are Jonny favorite pet";
Here is my current approach:
<?php
$string = "Dogs are Jonny's favorite pet";
$stringWords = explode(" ", $string);
$counter = 0;
foreach($stringWords as $string) {
if(substr($string, -1) == s){
$stringWords[$counter] = trim($string, "s");
}
if(strpos($string, "'s") !== false){
$stringWords[$counter] = trim($string, "'s");
}
$counter = $counter + 1;
}
print_r($stringWords);
$newString = "";
foreach($stringWords as $string){
$newString = $newString . $string . " ";
}
echo $newString;
}
?>
How would this be achieved with REGEX?
For general use, you must leverage much more sophisticated technique than an English-ignorant regex pattern. There may be fringe cases where the following pattern fails by removing an s that it shouldn't. It could be a name, an acronym, or something else.
As an unreliable solution, you can optionally match an apostrophe then match a literal s if it is not immediately preceded by another s. Adding a word boundary (\b) on the end improves the accuracy that you are matching the end of words.
Code: (Demo)
$string = "The bass can access the river's delta from the ocean. The fishermen, assassins, and their friends are happy on the banks";
var_export(preg_replace("~'?(?<!s)s\b~", '', $string));
Output:
'The bass can access the river delta from the ocean. The fishermen, assassin, and their friend are happy on the bank'
PHP Live Regex always helped me a lot in such moments. Even already knowing how REGEX works, I still use it just to be sure some times.
To make use of REGEX in your case, you can use preg_replace().
<?php
// Your string.
$string = "Dogs are Jonny's favorite pet";
// The vertical bar means "or" and the backslash
// before the apostrophe is needed so you don't end
// your pattern string since we're using single quotes
// to delimit it. "\s" means a single space.
$regex_pattern = '/\'s\s|s\s|s$/';
// Fill the preg_replace() with the pattern, the replacement
// (a single space in this case), your string, -1 (so preg_replace()
// will replace all the matches) and a variable of your desire
// to be the "counter" (preg_replace() will automatically
// fill it).
$newString = preg_replace($regex_pattern, ' ', $string, -1, $counter);
// Use the rtrim() to remove spaces at the right of the sentence.
$newString = rtrim($newString, " ");
echo "New string: " . $newString . ". ";
echo "Replacements: " . $counter . ".";
?>
In this case, the function will identify any "'s" or "s" with spaces (\s) after them and then replace them with a single space.
The preg_replace() will also count all the replacements and register them automatically on $counter or any variable you place there instead.
Edit:
Phil's comment is right and indeed my previous REGEX would lose a "s" at the end of the string. Adding "|s$" will solve it. Again, "|" means "or" and the "$" means that the "s" must be at the end of the string.
In attention to mickmackusa's comment, my solution is meant only to remove "s" characters at the end of words inside the string as this was Sparky Johnson' request here. Removing plurals would require a complex code since not only we need to remove "s" characters from plural only words but also change verbs and other stuff.
i'm not very firm with regular Expressions, so i have to ask you:
How to find out with PHP if a string contains a word starting with # ??
e.g. i have a string like "This is for #codeworxx" ???
I'm so sorry, but i have NO starting point for that :(
Hope you can help.
Thanks,
Sascha
okay thanks for the results - but i did a mistake - how to implement in eregi_replace ???
$text = eregi_replace('/\B#[^\B]+/','\\1', $text);
does not work??!?
why? do i not have to enter the same expression as pattern?
Match anything with has some whitespace in front of a # followed by something else than whitespace:
$ cat 1812901.php
<?php
echo preg_match("/\B#[^\B]+/", "This should #match it");
echo preg_match("/\B#[^\B]+/", "This should not# match");
echo preg_match("/\B#[^\B]+/", "This should match nothing and return 0");
echo "\n";
?>
$ php 1812901.php
100
break your string up like this:
$string = 'simple sentence with five words';
$words = explode(' ', $string );
Then you can loop trough the array and check if the first character of each word equals "#":
if ($stringInTheArray[0] == "#")
Assuming you define a word a sequence of letters with no white spaces between them, then this should be a good starting point for you:
$subject = "This is for #codeworxx";
$pattern = '/\s*#(.+?)\s/';
preg_match($pattern, $subject, $matches);
print_r($matches);
Explanation:
\s*#(.+?)\s - look for anything starting with #, group all the following letters, numbers, and anything which is not a whitespace (space, tab, newline), till the closest whitespace.
See the output of the $matches array for accessing the inner groups and the regex results.
#OP, no need regex. Just PHP string methods
$mystr='This is for #codeworxx';
$str = explode(" ",$mystr);
foreach($str as $k=>$word){
if(substr($word,0,1)=="#"){
print $word;
}
}
Just incase this is helpful to someone in the future
/((?<!\S)#\w+(?!\S))/
This will match any word containing alphanumeric characters, starting with "#." It will not match words with "#" anywhere but the start of the word.
Matching cases:
#username
foo #username bar
foo #username1 bar #username2
Failing cases:
foo#username
#username$
##username
Can you please help assemble a regex to be used in preg_split which will split a string on it's first word - case insensitive (up until the first space).
This should work
$result = preg_split('/\s/', trim($subject));
$firstword = $result[0]
If sentence has space as word separators you can do:
list($firstWord) = explode(' ',trim($input));
If you just need to split up until the first space character, your regex is essentially just a space character:
$output = preg_split('/ /', 'My name is Mansoor', 2);
echo $output[0]; // Will return 'My';
echo $output[1]; // will return 'name is Mansoor';
If you only need the first word, make sure you pass the optional argument (the 2) to specify that you want only two results in your $output array -- the first word, and the rest of the sentence. Otherwise, you'll spend time parsing text that you don't care about.
For example, if my sentence is $sent = 'how are you'; and if I search for $key = 'ho' using strstr($sent, $key) it will return true because my sentence has ho in it.
What I'm looking for is a way to return true if I only search for how, are or you. How can I do this?
You can use the function preg-match that uses a regex with word boundaries:
if(preg_match('/\byou\b/', $input)) {
echo $input.' has the word you';
}
If you want to check for multiple words in the same string, and you're dealing with large strings, then this is faster:
$text = explode(' ',$text);
$text = array_flip($text);
Then you can check for words with:
if (isset($text[$word])) doSomething();
This method is lightning fast.
But for checking for a couple of words in short strings then use preg_match.
UPDATE:
If you're actually going to use this I suggest you implement it like this to avoid problems:
$text = preg_replace('/[^a-z\s]/', '', strtolower($text));
$text = preg_split('/\s+/', $text, NULL, PREG_SPLIT_NO_EMPTY);
$text = array_flip($text);
$word = strtolower($word);
if (isset($text[$word])) doSomething();
Then double spaces, linebreaks, punctuation and capitals won't produce false negatives.
This method is much faster in checking for multiple words in large strings (i.e. entire documents of text), but it is more efficient to use preg_match if all you want to do is find if a single word exists in a normal size string.
One thing you can do is breaking up your sentence by spaces into an array.
Firstly, you would need to remove any unwanted punctuation marks.
The following code removes anything that isn't a letter, number, or space:
$sent = preg_replace("/[^a-zA-Z 0-9]+/", " ", $sent);
Now, all you have are the words, separated by spaces. To create an array that splits by space...
$sent_split = explode(" ", $sent);
Finally, you can do your check. Here are all the steps combined.
// The information you give
$sent = 'how are you';
$key = 'ho';
// Isolate only words and spaces
$sent = preg_replace("/[^a-zA-Z 0-9]+/", " ", $sent);
$sent_split = explode(" ", $sent);
// Do the check
if (in_array($key, $sent))
{
echo "Word found";
}
else
{
echo "Word not found";
}
// Outputs: Word not found
// because 'ho' isn't a word in 'how are you'
#codaddict's answer is technically correct but if the word you are searching for is provided by the user, you need to escape any characters with special regular expression meaning in the search word. For example:
$searchWord = $_GET['search'];
$searchWord = preg_quote($searchWord);
if (preg_match("/\b$searchWord\b", $input) {
echo "$input has the word $searchWord";
}
With recognition to Abhi's answer, a couple of suggestions:
I added /i to the regex since sentence-words are probably treated case-insensitively
I added explicit === 1 to the comparison based on the documented preg_match return values
$needle = preg_quote($needle);
return preg_match("/\b$needle\b/i", $haystack) === 1;
i'm not very firm with regular Expressions, so i have to ask you:
How to find out with PHP if a string contains a word starting with # ??
e.g. i have a string like "This is for #codeworxx" ???
I'm so sorry, but i have NO starting point for that :(
Hope you can help.
Thanks,
Sascha
okay thanks for the results - but i did a mistake - how to implement in eregi_replace ???
$text = eregi_replace('/\B#[^\B]+/','\\1', $text);
does not work??!?
why? do i not have to enter the same expression as pattern?
Match anything with has some whitespace in front of a # followed by something else than whitespace:
$ cat 1812901.php
<?php
echo preg_match("/\B#[^\B]+/", "This should #match it");
echo preg_match("/\B#[^\B]+/", "This should not# match");
echo preg_match("/\B#[^\B]+/", "This should match nothing and return 0");
echo "\n";
?>
$ php 1812901.php
100
break your string up like this:
$string = 'simple sentence with five words';
$words = explode(' ', $string );
Then you can loop trough the array and check if the first character of each word equals "#":
if ($stringInTheArray[0] == "#")
Assuming you define a word a sequence of letters with no white spaces between them, then this should be a good starting point for you:
$subject = "This is for #codeworxx";
$pattern = '/\s*#(.+?)\s/';
preg_match($pattern, $subject, $matches);
print_r($matches);
Explanation:
\s*#(.+?)\s - look for anything starting with #, group all the following letters, numbers, and anything which is not a whitespace (space, tab, newline), till the closest whitespace.
See the output of the $matches array for accessing the inner groups and the regex results.
#OP, no need regex. Just PHP string methods
$mystr='This is for #codeworxx';
$str = explode(" ",$mystr);
foreach($str as $k=>$word){
if(substr($word,0,1)=="#"){
print $word;
}
}
Just incase this is helpful to someone in the future
/((?<!\S)#\w+(?!\S))/
This will match any word containing alphanumeric characters, starting with "#." It will not match words with "#" anywhere but the start of the word.
Matching cases:
#username
foo #username bar
foo #username1 bar #username2
Failing cases:
foo#username
#username$
##username