PHP check if two keywords occur in String - php

my challange explained on the following example: The keyword combination "gaming notebook" is given.
I want to check whether the two keywords occur in a string. The challange is that the string could look like this:
"Nice Gaming Notebook"
"Notebook for Gaming"
"Notebook for extreme Gaming"
I want my function to return true for all of the three strings. There is a tolerance of 3-4 words that can be between the word combination and as the examples show, I want it to work if the keywords are switched.
So my approach was the following, but it does not seem to work:
$keyword = strtolower("gaming notebook");
$parts = explode(" ", $keyword);
$string = strtolower("Which notebook for good gaming performance");
//point to end of the array
end($parts);
//fetch key of the last element of the array.
$lastElementKey = key($parts);
//iterate the array
$searchExpression = "";
foreach($parts as $k => $v) {
if($k != $lastElementKey) {
$searchExpression .= $v . "|";
} else {
$searchExpression .= $v;
}
}
if(preg_match_all('#\b('. $searchExpression .')\b#', $string, $matches) > 0) {
echo "Jep, keyword combination is in string";
} else {
echo "No, keyword combination is not in string";
}

You want to use something like CMU Sphinx or a natural language index in your database. (See http://dev.mysql.com/doc/refman/5.7/en/fulltext-natural-language.html) Doing a quick search of php libraries turned up "nlp-tools/nlp-tools," however, I have never used a pure php solution to accomplish what you are trying to do.

The solution using preg_match_all and array_intersect functions:
$keywordStr = "gaming notebook";
$string = "Which notebook for good gaming performance,it's my notebook";
$keywords = explode(" ", $keywordStr);
$parts = implode("|", $keywords);
preg_match_all("/\b$parts\b/i", $string, $matches);
// matched items should contain all needed keywords
if (count($keywords) == count(array_intersect($keywords, $matches[0]))) {
echo "Jep, keyword combination is in string";
} else {
echo "No, keyword combination is not in string";
}

<?php
$keyword = strtolower("gaming notebook");
$string = strtolower("Which notebooks for good gaming performance");
function check($keyword,$string){
$parts = explode(' ',$keyword);
$result = false;
$pattern = implode('|',$parts);
preg_match_all("(\b{$pattern}\b)",$string,$matches);
if(isset($matches[0])){
return true;
}
return false;
}
var_dump(check($keyword, $string));

$reg = "/(?:\b$kw1(?:\s+\w+){0,4}\s+$kw2\b)|(?:\b$kw2(?:\s+\w+){0,4}\s+$kw1\b)/";
if (preg_match($reg, $string)) {
echo "OK\n";
} else {
echo "KO\n";
}
This will echo OK when the 2 keywords occur in the string, in any order and separated by at most 4 words.
Explanation:
/
(?: : non capture group
\b$kw1 : keyword 1
(?:\s+\w+){0,4} : followed by 0 to 4 other word
\s+ : space(s)
$kw2\b : keyword 2
)
|
(?: : non capture group
\b$kw2 : keyword 2
(?:\s+\w+){0,4} : followed by 0 to 4 other word
\s+ : space(s)
$kw1\b : keyword 1
)
/

Related

Pattern Matching for a value in tuple PHP ( Regular Expressions? )

I'm having a really hard time understanding RegEx in general, so I have no clue how is it possible to use it in such an issue.
So here we have a tuple
$tuple = "(12342,43244)";
And what I try to do is get:
$value_one = 12342;
So from (value_one,value_two) get value_one.
I know it can be possible with explode( ',', $tuple ) and then delete the 1st character '(' out of the 1st element in exploded array, but that seems super sloppy, is there a way to pattern match in this manner in PHP?
Here is the simplest preg_match example with the \(([0-9]+) regex that matches a (, and captures into Group 1 one or more digits from 0 to 9 range:
$tuple = "(12342,43244)";
if (preg_match('~\(([0-9]+)~', $tuple, $m))
{
echo $m[1];
}
See the IDEONE demo
Wrapped into a function:
function retFirstDigitChunk($input) {
if (preg_match('~\(([0-9]+)~', $input, $m)) {
return $m[1];
} else {
return "";
}
}
See another demo
Or, to get both as an array:
function retValues($input) {
if (preg_match('~\((-?[0-9]+)\s*,\s*(-?[0-9]+)~', $input, $m)) {
return array('left'=>$m[1], 'right'=>$m[2]);
} else {
return "";
}
}
$tuple = "(12342,43244)";
print_r(retValues($tuple));
Output: Array( [left] => 12342 [right] => 43244 )
You have to search the number preceeded by an open brace and followed by a comma. The pattern is:
$value_one = preg_replace('/\((\d+),.*/', '$1', $tuple);
If you are looking for something efficient, try to avoid the use of regex when possible:
$result = explode(',', ltrim($tuple, '('))[0];
or
sscanf($tuple, '(%[^,]', $result);

Cannot read the first letter

I want to add a function to return whether the first letter is a capital or not from my last question.
Here's the code:
<?php
function isCapital($string) {
return $string = preg_match('/[A-Z]$/',$string{0});
}
$text = " Poetry. do you read poetry while flying? Many people find it relaxing to read on long flights. Poetry can be divided into several genres, or categories. ";
$sentences = explode(".", $text); $save = array();
foreach ($sentences as $sentence) {
if (count(preg_split('/\s+/', $sentence)) > 6) {
$save[] = $sentence. ".";
}
}
if( count( $save) > 0) {
foreach ($save as $nama){
if (isCapital($nama)){
print_r ($nama);
}
}
}
?>
The result should be...
Poetry can be divided into several genres, or categories.
...but it prints nothing. I need only the sentence that consists of more than 6 words and start with capital letter.
When you do the explode() function, you are leaving a space at the start of the string, which means that the leftmost character of $string will never be a capital letter--it will be a space. I would change the isCapital() function to the following:
function isCapital($string) {
return preg_match('/^\\s*[A-Z]/', $string) > 0;
}
You should be able to accomplish all of this through one regular expression, if you're so inclined:
preg_match_all('/((?=[A-Z])([^\s.!?]+\s+){5,}[^\s.!?]+[.!?])/', $string, $matches);
http://refiddle.com/2hz
Alternatively, remove the ! and ? from the character classes to only count . as a sentence delimiter.

PHP preg_match meaning and issue

Currently I have this code:
<?php
if (isset($_GET['id'])) {
$itemid = $_GET['id'];
$search = "$itemid";
$query = ucwords($search);
$string = file_get_contents('http://clubpenguincheatsnow.com/tools/newitemdatabase/items.php');
if($itemid=="")
{
echo "Please fill out the form.";
}
else
{
$string = explode('<br>',$string);
foreach($string as $row)
{
preg_match('/^(\D+)\s=\s(\d+)\s=\s(\D+)\s=\s(\d+)/', trim($row), $matches);
if(strstr($matches[1], $query))
{
echo "<a href='http://clubpenguincheatsnow.com/tools/newitemdatabase/info.php?id=$matches[2]'>";
echo $matches[1];
echo "</a><br>";
}
}
if($matches[1]=="")
{
echo "Item does not exist!";
}
}
}
else {
echo "Item does not exist!";
}
?>
What I want to know is what does this section mean? preg_match('/^(\D+)\s=\s(\d+)\s=\s(\D+)\s=\s(\d+)/', trim($row), $matches); mainly the /^(\D+)\s=\s(\d+)\s=\s(\D+)\s=\s(\d+)/ part is what I am wondering about.
Also, an issue that I have been having is how can I allow it to use numbers too? Because I have another file that has the data (http://clubpenguincheatsnow.com/tools/newitemdatabase/items.php) and it want it to grab everything, even the names with the numbers.
How do I do this though? Please help me! Any help would be VERY HIGHLY appreciated!
That is a regular expression.
The '^' matches the beginning of a string.
The '\D' matches any character that is not a digit.
The '\d' matches any digit.
The '\s' matches any whitespace.
The plus sign means that the previous character can occur multiple times.
So basically it would match all those lines in your file, except that last comma.
Blue = 1 = No = 20
That line would match the regex.
About your last question to allow numbers too, use this:
/^(.+)\s=\s(\d+)\s=\s(\D+)\s=\s(\d+)/
the code is a regular expression:
/^(\D+)\s=\s(\d+)\s=\s(\D+)\s=\s(\d+)/
the code will use the regular expression to cut the string um pieces and put in an array ($matches)
preg_match('/^(\D+)\s=\s(\d+)\s=\s(\D+)\s=\s(\d+)/', trim($row), $matches);
You shall use the code to see better
print_r($matches)
To find by name or by item number change the code
if(strstr($matches[1], $query))
to
if(isset($matches[1]) && (strstr($matches[1], $query) || $matches[2] == $query) )
Your code shall look like this...
if (isset($_GET['id'])) {
$itemid = $_GET['id'];
$search = "$itemid";
$query = ucwords($search);
$string = file_get_contents('http://clubpenguincheatsnow.com/tools/newitemdatabase/items.php');
if($itemid=="")
{
echo "Please fill out the form.";
}
else
{
$string = explode('<br>',$string);
foreach($string as $row)
{
preg_match('/^(\D+)\s=\s(\d+)\s=\s(\D+)\s=\s(\d+)/', trim($row), $matches);
if(isset($matches[1]) && (strstr($matches[1], $query) || $matches[2] == $query) )
{
echo "<a href='http://clubpenguincheatsnow.com/tools/newitemdatabase/info.php?id=$matches[2]'>";
echo $matches[1];
echo "</a><br>";
}
}
}
}
else {
echo "Item does not exist!";
}
/^(\D+)\s=\s(\d+)\s=\s(\D+)\s=\s(\d+)/
This regular expression will match any number of non-numeric character, followed by a whitespace character, followed by equals, and so on. For example, this
asd = 1 = yh = 23
To allow numbers in the names:
/^(\w+)\s=\s(\d+)\s=\s(\w+)\s=\s(\d+)/
To allow numbers and alpha-numeric chars in everything:
/^(\w+)\s=\s(\w+)\s=\s(\w+)\s=\s(\w+)/
To include spaces and ' too:
/^([\w\s']+)\s=\s([\w\s']+)\s=\s([\w\s']+)\s=\s([\w\s']+)/
The code, as said by Sena, is a regular expression. It is capturing four groups with "=" in between them.
group 1: (\D+) : any character that is not a digit one or more times
group 2: (\d+) : any character that is a digit one or more times
group 3: (\D+) : same as one
group 4: (\d+) : same as two.
So, it will match something like this: a = 1 = bc = 2
So, it is matching numbers, what do you want it to do? try print_r($matches) as suggested above.

Create acronym from a string containing only words

I'm looking for a way that I can extract the first letter of each word from an input field and place it into a variable.
Example: if the input field is "Stack-Overflow Questions Tags Users" then the output for the variable should be something like "SOQTU"
$s = 'Stack-Overflow Questions Tags Users';
echo preg_replace('/\b(\w)|./', '$1', $s);
the same as codaddict's but shorter
For unicode support, add the u modifier to regex: preg_replace('...../u',
Something like:
$s = 'Stack-Overflow Questions Tags Users';
if(preg_match_all('/\b(\w)/',strtoupper($s),$m)) {
$v = implode('',$m[1]); // $v is now SOQTU
}
I'm using the regex \b(\w) to match the word-char immediately following the word boundary.
EDIT:
To ensure all your Acronym char are uppercase, you can use strtoupper as shown.
Just to be completely different:
$input = 'Stack-Overflow Questions Tags Users';
$acronym = implode('',array_diff_assoc(str_split(ucwords($input)),str_split(strtolower($input))));
echo $acronym;
$initialism = preg_replace('/\b(\w)\w*\W*/', '\1', $string);
If they are separated by only space and not other things. This is how you can do it:
function acronym($longname)
{
$letters=array();
$words=explode(' ', $longname);
foreach($words as $word)
{
$word = (substr($word, 0, 1));
array_push($letters, $word);
}
$shortname = strtoupper(implode($letters));
return $shortname;
}
Regular expression matching as codaddict says above, or str_word_count() with 1 as the second parameter, which returns an array of found words. See the examples in the manual. Then you can get the first letter of each word any way you like, including substr($word, 0, 1)
The str_word_count() function might do what you are looking for:
$words = str_word_count ('Stack-Overflow Questions Tags Users', 1);
$result = "";
for ($i = 0; $i < count($words); ++$i)
$result .= $words[$i][0];
function initialism($str, $as_space = array('-'))
{
$str = str_replace($as_space, ' ', trim($str));
$ret = '';
foreach (explode(' ', $str) as $word) {
$ret .= strtoupper($word[0]);
}
return $ret;
}
$phrase = 'Stack-Overflow Questions IT Tags Users Meta Example';
echo initialism($phrase);
// SOQITTUME
$s = "Stack-Overflow Questions IT Tags Users Meta Example";
$sArr = explode(' ', ucwords(strtolower($s)));
$sAcr = "";
foreach ($sArr as $key) {
$firstAlphabet = substr($key, 0,1);
$sAcr = $sAcr.$firstAlphabet ;
}
using answer from #codaddict.
i also thought in a case where you have an abbreviated word as the word to be abbreviated e.g DPR and not Development Petroleum Resources, so such word will be on D as the abbreviated version which doesn't make much sense.
function AbbrWords($str,$amt){
$pst = substr($str,0,$amt);
$length = strlen($str);
if($length > $amt){
return $pst;
}else{
return $pst;
}
}
function AbbrSent($str,$amt){
if(preg_match_all('/\b(\w)/',strtoupper($str),$m)) {
$v = implode('',$m[1]); // $v is now SOQTU
if(strlen($v) < 2){
if(strlen($str) < 5){
return $str;
}else{
return AbbrWords($str,$amt);
}
}else{
return AbbrWords($v,$amt);
}
}
}
As an alternative to #user187291's preg_replace() pattern, here is the same functionality without needing a reference in the replacement string.
It works by matching the first occurring word characters, then forgetting it with \K, then it will match zero or more word characters, then it will match zero or more non-word characters. This will consume all of the unwanted characters and only leave the first occurring word characters. This is ideal because there is no need to implode an array of matches. The u modifier ensures that accented/multibyte characters are treated as whole characters by the regex engine.
Code: (Demo)
$tests = [
'Stack-Overflow Questions Tags Users',
'Stack Overflow Close Vote Reviewers',
'Jean-Claude Vandàmme'
];
var_export(
preg_replace('/\w\K\w*\W*/u', '', $tests)
);
Output:
array (
0 => 'SOQTU',
1 => 'SOCVR',
2 => 'JCV',
)

determine if a string contains one of a set of words in an array

I need a simple word filter that will kill a script if it detects a filtered word in a string.
say my words are as below
$showstopper = array(badword1, badword2, badword3, badword4);
$yourmouth = "im gonna badword3 you up";
if(something($yourmouth, $showstopper)){
//stop the show
}
You could implode the array of badwords into a regular expression, and see if it matches against the haystack. Or you could simply cycle through the array, and check each word individually.
From the comments:
$re = "/(" . implode("|", $showstopper) . ")/"; // '/(badword1|badword2)/'
if (preg_match($re, $yourmouth) > 0) { die("foulmouth"); }
in_array() is your friend
$yourmouth_array = explode(' ',$yourmouth);
foreach($yourmouth_array as $key=>$w){
if (in_array($w,$showstopper){
// stop the show, like, replace that element with '***'
$yourmouth_array[$key]= '***';
}
}
$yourmouth = implode(' ',$yourmouth_array);
You might want to benchmark this vs the foreach and preg_match approaches.
$showstopper = array('badword1', 'badword2', 'badword3', 'badword4');
$yourmouth = "im gonna badword3 you up";
$check = str_replace($showstopper, '****', $yourmouth, $count);
if($count > 0) {
//stop the show
}
A fast solution involves checking the key as this does not need to iterate over the array. It would require a modification of your bad words list, however.
$showstopper = array('badword1' => 1, 'badword2' => 1, 'badword3' => 1, 'badword4' => 1);
$yourmouth = "im gonna badword3 you up";
// split words on space
$words = explode(' ', $yourmouth);
foreach($words as $word) {
// filter extraneous characters out of the word
$word = preg_replace('/[^A-Za-z0-9]*/', '', $word);
// check for bad word match
if (isset($showstopper[$word])) {
die('game over');
}
}
The preg_replace ensures users don't abuse your filter by typing something like bad_word3. It also ensures the array key check doesn't bomb.
not sure why you would need to do this but heres a way to check and get the bad words that were used
$showstopper = array(badword1, badword2, badword3, badword4);
$yourmouth = "im gonna badword3 you up badword1";
function badWordCheck( $var ) {
global $yourmouth;
if (strpos($yourmouth, $var)) {
return true;
}
}
print_r(array_filter($showstopper, 'badWordCheck'));
array_filter() returns an array of bad words, so if the count() of it is 0 nothign bad was said

Categories