Find a sentence that contains a word in text from database

Find a sentence that contains a word in text from database - php

I want to explain what I am trying to accomplish with an example. Let's assume I have two rows in my posts table.
It snows a lot in winter in Norway. While it barely snows where I live.
I run four miles every morning. I am trying to get fit.
When I search in winter I want to get the sentence It snows a lot in winter in Norway.
I know I can get the row with:
$posts = \App\Models\Post::where('body', 'like', "%{in winter}%")->get();
But I am not sure how to get the exact sentence.

While it might be technically possible using SQL to get the exact sentence, you are better off using PHP for getting the exact sentence from the collection.
I have created an example using collections (since you're using Laravel), starting with your provided Post query (although I did remove the curly braces from the like string).
1. Get the collection of posts with body like search query
$posts = Post::where('body', 'like', "%in winter%")->get();
2. Map collection of posts to individual sentences. Flatten to remove empty sentences.
$postSentences = $posts->map( function($post) {
// preg split makes sure the text is splitted on . or ! or ?
return preg_split('/\.|\?|!/', $post->body);
})->flatten();
3. Get the corresponding sentence(s) using a filter and Str::contains()
$matchingSentences = $postSentences->filter( function($sentence) {
return Str::contains($sentence, 'in winter');
});
$matchingSentences should return:
Illuminate\Support\Collection
all: [
"It snows a lot in winter in Norway",
],
}
The example can probably be altered / shortened to your fitting. But this should solve the aforementioned problem.

Related

PHP Questions. Loops or If statement?

I am trying to learn PHP while I write a basic application. I want a process whereby old words get put into an array $oldWords = array(); so all $words, that have been used get inserted using array_push(oldWords, $words).
Every time the code is executed, I want a process that finds a new word from $wordList = array(...). However, I don't want to select any words that have already been used or are in $oldWords.
Right now I'm thinking about how I would go about this. I've been considering finding a new word via $wordChooser = rand (1, $totalWords); I've been thinking of using an if/else statement, but the problem is if array_search($word, $doneWords) finds a word, then I would need to renew the word and check it again.
This process seems extremely inefficient, and I'm considering a loop function but, which one, and what would be a good way to solve the issue?
Thanks

I'm a bit confused, PHP dies at the end of the execution of the script. However you are generating this array, could you also not at the same time generate what words haven't been used from word list? (The array_diff from all words to used words).
Or else, if there's another reason I'm missing, why can't you just use a loop and quickly find the first word in $wordList that's not in $oldWord in O(n)?
function generate_new_word() {
foreach ($wordList as $word) {
if (in_array($word, $oldWords)) {
return $word; //Word hasn't been used
}
}
return null; //All words have been used
}
Or, just do an array difference (less efficient though, since best case is it has to go through the entire array, while for the above it only has to go to the first word)
EDIT: For random
$newWordArray = array_diff($allWords, $oldWords); //List of all words in allWords that are not in oldWords
$randomNewWord = array_rand($newWordArray, 1);//Will get a new word, if there are any
Or unless you're interested in making your own datatype, the best case for this could possibly be in O(log(n))

How to highlight search-matching text on a web page

I'm trying to write a PHP function that takes some text to be displayed on a web page, and then based on some entered search terms, highlights the corresponding parts of the text. Unfortunately, I'm having a couple of issues.
To better explain the two issues I'm having, let's imagine that the following innocuous string is being searched on and will be displayed on the web page:
My daughter was born on January 11, 2011.
My first problem is that if more than one search term is entered, any placeholder text I use to mark the start and end of any matches for the first term may then be matched by the second term.
For example, I'm currently using the following delimiting strings to mark the beginning and end of a match (upon which I use the preg_replace function at the end of the function to turn the delimiters into HTML span tags):
'#####highlightStart#####'
'#####highlightEnd#####'
The problem is, if I do a search like 2011 light, then 2011 will be matched first, giving me:
My daughter was born on January 11, #####highlightStart#####2011#####highlightEnd#####.
Upon which when light is searched for, it will match the word light within both #####highlightStart##### and #####highlightEnd#####, which I don't want.
One thought I had was to create some really obscure delimiting strings (in perhaps a foreign language) that would likely never be searched on, but I can't guarantee that any particular string will never be searched on and it just seems like a really kludgy solution. Basically, I imagine that there is a better way to do it.
Any advice on this first point would be greatly appreciated.
My second question has to do with how to handle overlapping matches.
For example, with the same string My daughter was born on January 11, 2011., if the entered search is Jan anuar, then Jan will be matched first, giving me:
My daughter was born on #####highlightStart#####Jan#####highlightEnd#####uary 11, 2011.
And because the delimiting text is now a part of the string, the second search term, anuar will never be matched.
Regarding this issue, I am quite perplexed and really have no clue how to solve it.
I feel like I need to somehow do all of the search operations on the original string separately and then somehow combine them at the end, but again, I'm lost on how to do this.
Perhaps there's a way better solution altogether, but I don't know what that would be.
Any advice or direction on how to solve either or both of these problems would be greatly appreciated.
Thank you.

In this case I think it's simpler to use str_replace (though it won't be perfect).
Assuming you've got an array of terms you want to highlight, I'll call it $aSearchTerms for the sake of argument... and that wrapping the highlighted terms in the HTML5 <mark> tag is acceptable (for the sake of legibility, you've stated it's going on a web-page and it's easy to strip_tags() from your search terms):
$aSearchTerms = ['Jan', 'anu', 'Feb', '11'];
$sinContent = "My daughter was born on January 11, 2011.";
foreach($aSearchTerms as $sinTerm) {
$sinContent = str_replace($sinTerm, "<mark>{$sinTerm}</mark>", $sinContent);
}
echo $sinContent;
// outputs: My d<mark>au</mark>ghter was born on <mark>Jan</mark>uary <mark>11</mark>, 20<mark>11</mark>.
It's not perfect since, using the data in that array, the first pass will change January to <mark>Jan</mark>uary which means anu will no longer match in January - something like this will, however, cover most usage needs.
EDIT
Oki - I'm not 100% certain this is sane but I took a totally different approach looking at the link #AlexAtNet posted:
https://stackoverflow.com/a/3631016/886824
What I've done is looked at the points in the string where the search term is found numerically (the indexes) and built an array of start and end indexes where the <mark> and </mark> tags are going to be entered.
Then using the answer above merged those start and end indexes together - this covers your overlapping matches issue.
Then I've looped that array and cut the original string up into substrings and glued it back together inserting the <mark> and </mark> tags at the relevant points (based on the indexes). This should cover your second issue so you're not having string replacements replacing string replacements.
The code in full looks like:
<?php
$sContent = "Captain's log, January 11, 2711 - Uranus";
$ainSearchTerms = array('Jan', 'asduih', 'anu', '11');
//lower-case it for substr_count
$sContentForSearching = strtolower($sContent);
//array of first and last positions of the terms within the string
$aTermPositions = array();
//loop through your search terms and build a multi-dimensional array
//of start and end indexes for each term
foreach($ainSearchTerms as $sinTerm) {
//lower-case the search term
$sinTermLower = strtolower($sinTerm);
$iTermPosition = 0;
$iTermLength = strlen($sinTermLower);
$iTermOccursCount = substr_count($sContentForSearching, $sinTermLower);
for($i=0; $i<$iTermOccursCount; $i++) {
//find the start and end positions for this term
$iStartIndex = strpos($sContentForSearching, $sinTermLower, $iTermPosition);
$iEndIndex = $iStartIndex + $iTermLength;
$aTermPositions[] = array($iStartIndex, $iEndIndex);
//update the term position
$iTermPosition = $iEndIndex + $i;
}
}
//taken directly from this answer https://stackoverflow.com/a/3631016/886824
//just replaced $data with $aTermPositions
//this sorts out the overlaps so that 'Jan' and 'anu' will merge into 'Janu'
//in January - whilst still matching 'anu' in Uranus
//
//This conveniently sorts all your start and end indexes in ascending order
usort($aTermPositions, function($a, $b)
{
return $a[0] - $b[0];
});
$n = 0; $len = count($aTermPositions);
for ($i = 1; $i < $len; ++$i)
{
if ($aTermPositions[$i][0] > $aTermPositions[$n][1] + 1)
$n = $i;
else
{
if ($aTermPositions[$n][1] < $aTermPositions[$i][1])
$aTermPositions[$n][1] = $aTermPositions[$i][1];
unset($aTermPositions[$i]);
}
}
$aTermPositions = array_values($aTermPositions);
//finally chop your original string into the bits
//where you want to insert <mark> and </mark>
if($aTermPositions) {
$iLastContentChunkIndex = 0;
$soutContent = "";
foreach($aTermPositions as $aChunkIndex) {
$soutContent .= substr($sContent, $iLastContentChunkIndex, $aChunkIndex[0] - $iLastContentChunkIndex)
. "<mark>" . substr($sContent, $aChunkIndex[0], $aChunkIndex[1] - $aChunkIndex[0]) . "</mark>";
$iLastContentChunkIndex = $aChunkIndex[1];
}
//... and the bit on the end
$soutContent .= substr($sContent, $iLastContentChunkIndex);
}
//this *should* output the following:
//Captain's log, <mark>Janu</mark>ary <mark>11</mark>, 27<mark>11</mark> - Ur<mark>anu</mark>s
echo $soutContent;
The inevitable gotcha!
Using this on content that's already HTML may fail horribly.
Given the string.
In January this year...
A search/mark of Jan will insert the <mark>/</mark> around 'Jan' which is fine. However a search mark of something like In Jan is going to fail as there's markup in the way :\
Can't think of a good way around that I'm afraid.

Do not modify original string and store the matches in the individual array, either starts in odd and ends in even elements or store them in records (arrays of two items).
After searching for the several keywords, you end up with several arrays with matches. So the task now is how to merge two lists of segments, producing the segments that covers the areas. As the lists are sorted, this is a trivial task that can be solved in O(n) time.
Then just insert highlight tokens into positions recorded in the resulting array.

Laravel - Single query then splitting Vs Two queries

I am using laravel framework and I need to get 2 arrays, one with premium themes and one with free themes.. so I would do:
$premium_themes = \App\Theme::where('premium', '=', '1')->get();
$free_themes = \App\Theme::where('premium', '=', '0')->get();
This will work Ok, but will perform two queries on the database. Since I'm an optimization geek, I think it might be better to have a single query... which I would get all themes by using:
$themes = \App\Theme::all();
And then I'd to process this in php to split based on the theme premium property.
So I have 2 questions:
1) A single query is better than 2 queries in this case, or am I over-thinking this?
2) Is there a fast simple way to split the resulting collection into two collections based on the premium property? (I know Laravel has many shortcuts but I'm still new to the framework)

Single query would be better as both of the queries will go over all the rows in the database. Except the 2 queries to split them will go over them for a second time.
You can simply filter them like so;
The simple one line solution $themes = \App\Theme::all()->groupBy('premium');.
Or into separate collections if you need to filter by another element etc just add more to the following;
$themes = \App\Theme::all();
$premium = new Collection;
$free = new Collection;
$themes->each(function ($item) use ($premium, $free){
if($item->premium == '1'){
$premium->push($item);
}
else {
$free->push($item);
}
});
And your items will be filtered into the relevant Collection. Be sure you use the Collection class at the top.

The only reason I can think to keep it as separate queries would be if you need to paginate the data - not something you can do easily if its all mixed together.
A "short cut" would be to use the collection filter() method, I put short cut in quotes because it's not short per-se, more syntatic sugar - but Larvel is nothing if not full of sugar so why not?
Code would look something like this:
$allThemes = \App\Theme::all();
$premiumThemes = $allThemes->filter(function($theme)
{
return $theme->premium;
});
$freeThemes = $allThemes->filter(function($theme)
{
return !$theme->premium;
});
Edit: I'd recommend using Matt Burrow's answer, but I'll leave mine here as the solution is different.

Mongodb like statement with array

I am trying to save some db action by compiling a looped bit of code with a single query, Before I was simply adding to the the like statements using a loop before firing off the query but i cant get the same idea going in Mongo, id appreciate any ideas....
I am basically trying to do a like, but with the value as an array
('app', replaces 'mongodb' down to my CI setup )
Here's how I was doing it pre mongofication:
foreach ($workids as $workid):
$this->ci->app->or_like('work',$workid) ;
endforeach;
$query = $this->ci->db->get("who_users");
$results = $query->result();
print_r($results);
and this is how I was hoping I could get it to work, but no joy here, that function is only designed to accept strings
$query = $this->ci->app->like('work',$workids,'.',TRUE,TRUE)->get("who_users");
print_r($query);
If anyone can think of a way any cunning methods I can get my returned array with a single call again it would be great I've not found any documentation on this sort of query, The only way i can think of is to loop over the query and push it into a new results array.... but that is really gonna hurt if my app scales up.

Are you using codeigniter-mongodb-library? Based on the existing or_like() documentation, it looks like CI wraps each match with % wildcards. The equivalent query in Mongo would be a series of regex matches in an $or clause:
db.who_users.find({
$or: [
{ work: /.*workIdA.*/ },
{ work: /.*workIdB.*/ },
...
]});
Unfortunately, this is going to be quite inefficient unless (1) the work field is indexed and (2) your regexes are anchored with some constant value (e.g. /^workId.*/). This is described in more detail in Mongo's regex documentation.
Based on your comments to the OP, it looks like you're storing multiple ID's in the work field as a comma-delimited string. To take advantage of Mongo's schema, you should model this as an array of strings. Thereafter, when you query on the work field, Mongo will consider all values in the array (documented discussed here).
db.who_users.find({
work: "workIdA"
});
This query would match a record whose work value was ["workIdA", "workIdB"]. And if we need to search for one of a set of ID's (taking this back to your OR query), we can extend this example with the $in operator:
db.who_users.find({
work: { $in: ["workIdA", "workIdB", ...] }
});
If that meets your needs, be sure to index the work field as well.

Searching numbers with Zend_Search_Lucene

So why does the first search example below return no results? And any ideas on how to modify the below code to make number searches possible would be much appreciated.
Create the index
$index = new Zend_Search_Lucene('/myindex', true);
$doc->addField(Zend_Search_Lucene_Field::Text('ssn', '123-12-1234'));
$doc->addField(Zend_Search_Lucene_Field::Text('cats', 'Fluffy'));
$index->addDocument($doc);
$index->commit();
Search - NO RESULTS
$index = new Zend_Search_Lucene('/myindex', true);
$results = $index->find('123-12-1234');
Search - WITH RESULTS
$index = new Zend_Search_Lucene('/myindex', true);
$results = $index->find('Fluffy');

First you need to change your text analizer to include numbers
Zend_Search_Lucene_Analysis_Analyzer::setDefault( new Zend_Search_Lucene_Analysis_Analyzer_Common_TextNum() );
Then for fields with numbers you want to use Zend_Search_Lucene_Field::Keyword instead of Zend_Search_Lucene_Field::Text
this will skip the the creation of tokens and saves the value 'as is' into the index. Then you can search by it. I don't know how it behaves with floats ( is probably not going to work for floats 3.0 is not going to match 3) but for natural numbers ( like ids ) works like a charm.

This is an effect of which Analyzer you have chosen.
I believe the default Analyzer will only index terms that match /[a-zA-Z]+/. This means that your SSN isn't being added to the index as a term.
Even if you switched to the text+numeric case insensitive Analyzer, what you are wanting still will not work. The expression for a term is /[a-zA-Z0-9]+/ this would mean your terms added to the index would be 12,123,1234.
If you need 123-12-1234 to be seen as a valid term, you are probably going to need to extend Zend_Search_Lucene_Analysis_Analyzer_Common and make it so that 123-12-1234 is a term.
See
http://framework.zend.com/manual/en/zend.search.lucene.extending.html#zend.search.lucene.extending.analysis
Your other choice is to store the ssn as a Zend_Search_Lucene_Field::Keyword. Since a keyword is not broken up into terms.
http://framework.zend.com/manual/en/zend.search.lucene.html#zend.search.lucene.index-creation.understanding-field-types

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Find a sentence that contains a word in text from database - php

Related

PHP Questions. Loops or If statement?

How to highlight search-matching text on a web page

Laravel - Single query then splitting Vs Two queries

Mongodb like statement with array

Searching numbers with Zend_Search_Lucene

Categories

Resources