I am doing text analysis. I have a table having positive words.The records are being fetched one by one and imploded in an array through mysqli_fetch_array.
while(($rowx = mysqli_fetch_array($resultx,MYSQLI_NUM)))
{
$wordx = implode("", $rowx);
if(strpos($text, $wordx) !== FALSE)
{
$count1 = substr_count($text, $wordx);
$pos_prob += .2 * $count1;
echo "pos prob is".$pos_prob;
}
}
But strpos is not able to match the string that is being fetched from the table.i.e. if text is "It's an excellent book" the if condition is never true. Even though the word excellent is present in the table. And if I hard code the value $wordx as
$wordx='excellent';
Only then it works. Does anyone has any idea why this is happening? :( Any help would be much appreciated :)
I don't understand the need to implode each row. My assumption is that each row has one word.
Simple strpos text matching example:
<?php
$words = array(
'big',
'fat',
'mamma'
);
$text = 'One day fat foo walked to the bar';
$matches = array();
foreach($words as $word) {
if(strpos($text, $word) !== false)
$matches[] = $word;
}
var_dump($matches);
Output:
array (size=1)
0 => string 'fat' (length=3)
Note that this would also match word parts and be case sensitive, so not ideal. For example: 'fat' is contained in the words: 'father', 'infatuated' and 'marrowfat'.
Related
I've spent my last 4 hours figuring out how to ... I got to ask for your help now.
I'm trying to extract from a text multiple substring match my starting_words_array and ending_words_array.
$str = "Do you see that ? Indeed, I can see that, as well as this." ;
$starting_words_array = array('do','I');
$ending_words_array = array('?',',');
expected output : array ([0] => 'Do you see that ?' [1] => 'I can see that,')
I manage to write a first function that can find the first substring matching one of both arrays items. But i'm not able to find how to loop it in order to get all the substring matching my requirement.
function SearchString($str, $starting_words_array, $ending_words_array ) {
forEach($starting_words_array as $test) {
$pos = strpos($str, $test);
if ($pos===false) continue;
$found = [];
forEach($ending_words_array as $test2) {
$posStart = $pos+strlen($test);
$pos2 = strpos($str, $test2, $posStart);
$found[] = ($pos2!==false) ? $pos2 : INF;
}
$min = min($found);
if ($min !== INF)
return substr($str,$pos,$min-$pos) .$str[$min];
}
return '';
}
Do you guys have any idea about how to achieve such thing ?
I use preg_match for my solution. However, the start and end strings must be escaped with preg_quote. Without that, the solution will be wrong.
function searchString($str, $starting_words_array, $ending_words_array ) {
$resArr = [];
forEach($starting_words_array as $i => $start) {
$end = $ending_words_array[$i] ?? "";
$regEx = '~'.preg_quote($start,"~").".*".preg_quote($end,"~").'~iu';
if(preg_match_all($regEx,$str,$match)){
$resArr[] = $match[0];
}
}
return $resArr;
}
The result is what the questioner expects.
If the expressions can occur more than once, preg_match_all must also be used. The regex must be modify.
function searchString($str, $starting_words_array, $ending_words_array ) {
$resArr = [];
forEach($starting_words_array as $i => $start) {
$end = $ending_words_array[$i] ?? "";
$regEx = '~'.preg_quote($start,"~").".*?".preg_quote($end,"~").'~iu';
if(preg_match_all($regEx,$str,$match)){
$resArr = array_merge($resArr,$match[0]);
}
}
return $resArr;
}
The resut for the second variant:
array (
0 => "Do you see that ?",
1 => "Indeed,",
2 => "I can see that,",
)
I would definitely use regex and preg_match_all(). I won't give you a full working code example here but I will outline the necessary steps.
First, build a regex from your start-end-pairs like that:
$parts = array_map(
function($start, $end) {
return $start . '.+' . $end;
},
$starting_words_array,
$ending_words_array
);
$regex = '/' . join('|', $parts) . '/i';
The /i part means case insensitive search. Some characters like the ? have a special purpose in regex, so you need to extend above function in order to escape it properly.
You can test your final regex here
Then use preg_match_all() to extract your substrings:
preg_match_all($regex, $str, $matches); // $matches is passed by reference, no need to declare it first
print_r($matches);
The exact structure of your $matches array will be slightly different from what you asked for but you will be able to extract your desired data from it
Benni answer is best way to go - but let just point out the problem in your code if you want to fix those:
strpos is not case sensitive and find also part of words so you need to changes your $starting_words_array = array('do','I'); to $starting_words_array = array('Do','I ');
When finding a substring you use return which exit the function so you want find any other substring. In order to fix that you can define $res = []; at the beginning of the function and replace return substr($str,$pos,... with $res[] = substr($str,$pos,... and at the end return the $res var.
You can see example in 3v4l - in that example you get the output you wanted
Here is my issue:
$array = array(
"1" => array("fruit", "salad", "vegetable"),
"2" => array("beef", "meat", "sausage"),
"3" => array("chocolate", "cake", "bread")
);
$sentence = "I love big sausage";
$sentence could also be I love big sausageS.
I need to associate a sentence to a category, so I need to analyze the sentence and to return the ID of the subarray matching with the sentence. For example "2" in my example.
I'm looking for the solution with the best performance. I guess I have no other choice than "explode" the sentence and "foreach" it at a minimum.
The project uses PHP7 and if it can use amazing native functions it'll be great.
I think this is best I can do.
Foreach the array and use preg_grep to find matches.
I use str_replace to replace spaces with | that is used as "or" in regex.
foreach($array as $key => $sub){
if(preg_grep("/" . str_replace(" ", "|", $sentence) . "/" ,$sub )){
echo "Match in ". $key . "\n";
}
}
https://3v4l.org/BqkW2
To match your sussageS example you can reverse the search and add .*? in the grep.
$arrSent = explode(" ", $sentence);
foreach($array as $key => $sub){
if(preg_grep("/" . implode(".*?|", $sub) . ".*?/" , $arrSent))
{
echo "Match in ". $key . "\n";
}
}
https://3v4l.org/MJqrv
But this will also accept sussage_and_beans. If you only want to match if the word is in plural (an s added at the end). Change .*? to s.
But it will be case sensitive so sussageS as in your example will not work.
but with : if(preg_grep("/" . implode("s|", $sub) . "s/i" , $arrSent))
Should make it case insensitive.
If you explode your $sentence and use a whitespace as the delimiter you will get an array of words.
You could use array_filter to remove those arrays from $array by checking if the intersect contains 1 or more words using array_intersect.
Then you could return an array using array_keys to get all the id's which contain word(s) that are in you sentence.
$array = array (
"1" => array("fruit","salad","vegetable"),
"2" => array("beef","meat","sausage"),
"3" => array("chocolate","cake","bread")
);
$expl = explode(' ', "I love big sausage");
$array = array_filter($array, function($x) use ($expl) {
return count(array_intersect($expl, $x)) > 0;
});
var_dump(array_keys($array));
Demo
That would give you:
array(1) {
[0]=>
int(2)
}
The earlier answers are making the mistake of trying to search the array of "needles" with the exploded or piped words in the "haystack". This will not work when you modify "sausage" in your sentence to "sausageS" -- there is no needle that has an s after sausage, so even if you use a case-insensitive approach it will still fail.
Because you are seeking a solitary qualifying key, it makes the most sense to stop searching as soon as a match is found. This task requirement eliminates array_filter() and preg_grep() as best performers -- they will both keep scanning data until the input is exhausted instead of stopping as soon as a match is found.
Code: (Demo)
$needlestack = [
"1" => ["fruit", "salad", "vegetable"],
"2" => ["beef", "meat", "sausage"],
"3" => ["chocolate", "cake", "bread"]
];
// $haystack = "I love big sausage";
$haystack = "I love big sausageS";
$found = null;
foreach ($needlestack as $id => $needles) {
foreach ($needles as $needle) {
if (stripos($haystack, $needle) !== false) {
$found = $id;
break 2;
}
}
}
var_export($found); // 2
stripos() will perform case-insensitively AND will allow for partial word matching (which is desirable for matching sausageS to sausage). By using nested loops and stripos(), this approach does not waste time preparing strings that will be unused.
I have an array called 'words' storing many words.
For example:
I have 'systematic', 'سلام','gear','synthesis','mysterious', etc.
NB: we have utf8 words too.
How to query efficiently to see which words include letters 's','m','e' (all of them) ?
The output would be:
systematic,mysterious
I have no idea how to do such a thing in PHP. It should be efficient because our server would suffer otherwise.e.
Use a regular expression to split each string into an array of characters, and use array_intersect() to find out if all the characters in your search array is present in the split array:
header('Content-Type: text/plain; charset=utf8');
$words = array('systematic', 'سلام','gear','synthesis','mysterious');
$search = array('s','m','e');
foreach ($words as $word) {
$char_array = utf8_str_split($word);
$contains = array_intersect($search, $char_array) == $search;
echo sprintf('%s : %s', $word, (($contains) ? 'True' : 'False'). PHP_EOL);
}
function utf8_str_split($str) {
return preg_split('/(?!^)(?=.)/u', $str);
}
Output:
systematic : True
سلام : False
gear : False
synthesis : False
mysterious : True
Demo.
UPDATE: Or, alternatively, you could use array_filter() with preg_match():
$array = array_filter($words, function($item) {
return preg_match('~(?=[^s]*s)(?=[^m]*m)(?=[^e]*e)~u', $item);
});
Output:
Array
(
[0] => systematic
[4] => mysterious
)
Demo.
This worked to me:
$words = array('systematic', 'سلام','gear','synthesis','mysterious');
$letters=array('s','m', 'e');
foreach ($words as $w) {
//print "lets check word $w<br>";
$n=0;
foreach ($letters as $l) {
if (strpos($w, $l)!==false) $n++;
}
if ($n>=3) print "$w<br>";
}
It returns
systematic
mysterious
Explanation
It uses nested foreach: one for the words and the other one for the letters to be matched.
In case any letter is matched, the counter is incremented.
Once the letters loop is over, it checks how many matches were there and prints the word in case it is 3.
Something like this:
$words = array('systematic', 'سلام','gear','synthesis','mysterious');
$result=array();
foreach($words as $word){
if(strpos($word, 's') !== false &&
strpos($word, 'm') !== false &&
strpos($word, 'e') !== false){
$result[] = $word;
}
}
echo implode(',',$result); // will output 'systematic,mysterious'
Your question is wide a little bit.
What I understand from your question that's those words are saved in a database table, so you may filter the words before getting them into the array, using SQL like function.
in case you want to search for a letters in an array of words, you could loop over the array using foreach and each array value should be passed to strpos function.
http://www.php.net/function.strpos
why not use PREG_GREP
$your_array = preg_grep("/[sme]/", $array);
print_r($your_array);
WORKING DEMO
I want to check the first word of some sentences. If the first word are For, And, Nor, But, Or, etc, I want to skip the sentence.
Here's the code :
<?php
$sentence = 'For me more';
$arr = explode(' ',trim($sentence));
if(stripos($arr[0],'for') or stripos($arr[0],'but') or stripos($arr[0],'it')){
//doing something
}
?>
Blank result, Whats wrong ? thank you :)
Here, stripos will return 0 if the word is found (found at position 0).
It returns false if the word is not found.
You should write :
if(stripos($arr[0],'for') !== false or stripos($arr[0],'but') !== false or stripos($arr[0],'it') !== false){
//skip
}
Stripos returns the position on the first occurrence of the needle in the haystack
The first occurrence is at position 0, which evaluates to false.
Try this as an alternative
$sentence = 'For me more';
// make all words lowercase
$arr = explode(' ', strtolower(trim($sentence)));
if(in_array($arr[0], array('for', 'but', 'it'))) {
//doing something
echo "found: $sentence";
} else {
echo 'failed';
}
Perhaps use preg_filter if you are going to know what the string to be evaluated is (i.e. you don't need to parse out sentences).
$filter_array = array(
'/^for\s/i',
'/^and\s/i',
'/^nor\s/i',
// etc.
}
$sentence = 'For me more';
$result = preg_filter(trim($sentence), '', $filter_array);
if ($result === null) {
// this sentence did not match the filters
}
This allows you to determine a set of filter regex patterns to see if you have a match. Note that in this case I just used '' as "replacement" value, as you don't really care about actually making a replacement, this function just gives you a nice way to pas in an array of regular expressions.
Unfortunately, for some strange reason the regex method isn't working for me with UTF-8 (preg_replace + UTF-8 doesn't work on one server but works on another).
What would be the most efficient way to accomplish my goal without using regex?
Just to make it as clear as possible, for the following set of words:
cat, dog, sky
cats would return false
the sky is blue would return true
skyrim would return false
Super short example but it's the way I'd do it without Regex.
$haystack = "cats"; //"the sky is blue"; // "skyrim";
$needles = array("cat", "dog", "sky");
$found = false;
foreach($needles as $needle)
if(strpos(" $haystack ", " $needle ") !== false) {
$found = true;
break;
}
echo $found ? "A needle was found." : "A needle was not found.";
My initial thought is to explode the text on spaces, and then check to see if your words exist in the resulting array. Of course you may have some punctuation leaking into your array that you'll have to consider as well.
Another idea would be to check the strpos of the word. If it's found, test for the next character to see if it is a letter. If it is a letter, you know that you've found a subtext of a word, and to discard this finding.
// Test online at http://writecodeonline.com/php/
$aWords = array( "I", "cat", "sky", "dog" );
$aFound = array();
$sSentence = "I have a cat. I don't have cats. I like the sky, but not skyrim.";
foreach ( $aWords as $word ) {
$pos = strpos( $sSentence, $word );
// If found, the position will be greater than or equal to 0
if ( !($pos >= 0) ) continue;
$nextChar = substr( $sSentence , ( $pos + strlen( $word ) ), 1 );
// If found, ensure it is not a substring
if ( ctype_alpha( $nextChar ) ) continue;
$aFound[] = $word;
}
print_r( $aFound ); // Array ( [0] => I [1] => cat [2] => sky )
Of course the better solution is to determine why you cannot use regex, as these solutions will be nowhere near as efficient as pattern-seeking would be.
If you are simply trying to find if a word is in a string you could store the string in a variable (If printing the string print the variable with the string inside instead) and use "in". Example:
a = 'The sky is blue'
The in a
True