Magento: improving search engine (inflections, irrelevant words removal, etc.)

Magento: improving search engine (inflections, irrelevant words removal, etc.) - php

I'm interested in knowing if I can detect inflections (e.g. dogs/dog), remove non-important words ("made in the usa" -> "in" and "the" are not important), etc. in the search string entered by the user for the Magento search engine without hard-coding such many scenarios in one big PHP code block. I can process this search string to a certain degree, but it will look unsanitary and ugly.
Any suggestions or pointers for making it an "intelliegent" search engine?

Use this class:
class Inflection
{
static $plural = array(
'/(quiz)$/i' => "$1zes",
'/^(ox)$/i' => "$1en",
'/([m|l])ouse$/i' => "$1ice",
'/(matr|vert|ind)ix|ex$/i' => "$1ices",
'/(x|ch|ss|sh)$/i' => "$1es",
'/([^aeiouy]|qu)y$/i' => "$1ies",
'/(hive)$/i' => "$1s",
'/(?:([^f])fe|([lr])f)$/i' => "$1$2ves",
'/(shea|lea|loa|thie)f$/i' => "$1ves",
'/sis$/i' => "ses",
'/([ti])um$/i' => "$1a",
'/(tomat|potat|ech|her|vet)o$/i'=> "$1oes",
'/(bu)s$/i' => "$1ses",
'/(alias)$/i' => "$1es",
'/(octop)us$/i' => "$1i",
'/(ax|test)is$/i' => "$1es",
'/(us)$/i' => "$1es",
'/s$/i' => "s",
'/$/' => "s"
);
static $singular = array(
'/(quiz)zes$/i' => "$1",
'/(matr)ices$/i' => "$1ix",
'/(vert|ind)ices$/i' => "$1ex",
'/^(ox)en$/i' => "$1",
'/(alias)es$/i' => "$1",
'/(octop|vir)i$/i' => "$1us",
'/(cris|ax|test)es$/i' => "$1is",
'/(shoe)s$/i' => "$1",
'/(o)es$/i' => "$1",
'/(bus)es$/i' => "$1",
'/([m|l])ice$/i' => "$1ouse",
'/(x|ch|ss|sh)es$/i' => "$1",
'/(m)ovies$/i' => "$1ovie",
'/(s)eries$/i' => "$1eries",
'/([^aeiouy]|qu)ies$/i' => "$1y",
'/([lr])ves$/i' => "$1f",
'/(tive)s$/i' => "$1",
'/(hive)s$/i' => "$1",
'/(li|wi|kni)ves$/i' => "$1fe",
'/(shea|loa|lea|thie)ves$/i'=> "$1f",
'/(^analy)ses$/i' => "$1sis",
'/((a)naly|(b)a|(d)iagno|(p)arenthe|(p)rogno|(s)ynop|(t)he)ses$/i' => "$1$2sis",
'/([ti])a$/i' => "$1um",
'/(n)ews$/i' => "$1ews",
'/(h|bl)ouses$/i' => "$1ouse",
'/(corpse)s$/i' => "$1",
'/(us)es$/i' => "$1",
'/s$/i' => ""
);
static $irregular = array(
'move' => 'moves',
'foot' => 'feet',
'goose' => 'geese',
'sex' => 'sexes',
'child' => 'children',
'man' => 'men',
'tooth' => 'teeth',
'person' => 'people',
'admin' => 'admin'
);
static $uncountable = array(
'sheep',
'fish',
'deer',
'series',
'species',
'money',
'rice',
'information',
'equipment'
);
public static function pluralize( $string )
{
global $irregularWords;
// save some time in the case that singular and plural are the same
if ( in_array( strtolower( $string ), self::$uncountable ) )
return $string;
// check for irregular singular forms
foreach ( $irregularWords as $pattern => $result )
{
$pattern = '/' . $pattern . '$/i';
if ( preg_match( $pattern, $string ) )
return preg_replace( $pattern, $result, $string);
}
// check for irregular singular forms
foreach ( self::$irregular as $pattern => $result )
{
$pattern = '/' . $pattern . '$/i';
if ( preg_match( $pattern, $string ) )
return preg_replace( $pattern, $result, $string);
}
// check for matches using regular expressions
foreach ( self::$plural as $pattern => $result )
{
if ( preg_match( $pattern, $string ) )
return preg_replace( $pattern, $result, $string );
}
return $string;
}
public static function singularize( $string )
{
global $irregularWords;
// save some time in the case that singular and plural are the same
if ( in_array( strtolower( $string ), self::$uncountable ) )
return $string;
// check for irregular words
foreach ( $irregularWords as $result => $pattern )
{
$pattern = '/' . $pattern . '$/i';
if ( preg_match( $pattern, $string ) )
return preg_replace( $pattern, $result, $string);
}
// check for irregular plural forms
foreach ( self::$irregular as $result => $pattern )
{
$pattern = '/' . $pattern . '$/i';
if ( preg_match( $pattern, $string ) )
return preg_replace( $pattern, $result, $string);
}
// check for matches using regular expressions
foreach ( self::$singular as $pattern => $result )
{
if ( preg_match( $pattern, $string ) )
return preg_replace( $pattern, $result, $string );
}
return $string;
}
public static function pluralize_if($count, $string)
{
if ($count == 1)
return "1 $string";
else
return $count . " " . self::pluralize($string);
}
}
And if you have a time use a standard way for inflection usage: http://en.wikipedia.org/wiki/Inflection
You can as array combine with XML so put all inflections data, look at how codeigniter has inflection very friendly: http://ellislab.com/codeigniter/user-guide/helpers/inflector_helper.html
Many frameworks supports built-in inflections but it will focus only in mainly English only. For other languages you should write own... or use unicode.org with some inflections standards for other languages if you need it.

Related

PHP get array of search terms from string

Is there an easy way to parse a string for search terms including negative terms?
'this -that "the other thing" -"but not this" "-positive"'
would change to
array(
"positive" => array(
"this",
"the other thing",
"-positive"
),
"negative" => array(
"that",
"but not this"
)
)
so those terms could be used to search.

The code below will parse your query string and split it up into positive and negative search terms.
// parse the query string
$query = 'this -that "-that" "the other thing" -"but not this" ';
preg_match_all('/-*"[^"]+"|\S+/', $query, $matches);
// sort the terms
$terms = array(
'positive' => array(),
'negative' => array(),
);
foreach ($matches[0] as $match) {
if ('-' == $match[0]) {
$terms['negative'][] = trim(ltrim($match, '-'), '"');
} else {
$terms['positive'][] = trim($match, '"');
}
}
print_r($terms);
Output
Array
(
[positive] => Array
(
[0] => this
[1] => -that
[2] => the other thing
)
[negative] => Array
(
[0] => that
[1] => but not this
)
)

For those looking for the same thing I have created a gist for PHP and JavaScript
https://gist.github.com/UziTech/8877a79ebffe8b3de9a2
function getSearchTerms($search) {
$matches = null;
preg_match_all("/-?\"[^\"]+\"|-?'[^']+'|\S+/", $search, $matches);
// sort the terms
$terms = [
"positive" => [],
"negative" => []
];
foreach ($matches[0] as $i => $match) {
$negative = ("-" === $match[0]);
if ($negative) {
$match = substr($match, 1);
}
if (($match[0] === '"' && substr($match, -1) === '"') || ($match[0] === "'" && substr($match, -1) === "'")) {
$match = substr($match, 1, strlen($match) - 2);
}
if ($negative) {
$terms["negative"][] = $match;
} else {
$terms["positive"][] = $match;
}
}
return $terms;
}

Create shortcode with parameter in PHP Joomla

I've created a simple shortcode plugin on Joomla.
Actually I am trying to integrate Cleeng Video with Joomla. And will connect it's users in the future ( I hope ).
I've stack on creating shortcode's parameter. I don't know how to parse it's parameter and value.
My Shortcode is here (no parameter)
{cleengvideo}<iframe class="wistia_embed" src="http://fast.wistia.net/embed/iframe/5r8r9ib6di" name="wistia_embed" width="640" height="360" frameborder="0" scrolling="no" allowfullscreen=""></iframe>{/cleengvideo}
My code is here
public function onContentPrepare($content, $article, $params, $limit) {
preg_match_all('/{cleengvideo}(.*?){\/cleengvideo}/is', $article->text, $matches);
$i = 0;
foreach ($matches[0] as $match) {
$videoCode = $matches[1][$i];
$article->text = str_replace($match, $videoCode, $article->text);
}
I want to set height, width and 5r8r9ib6di this code from shortcode at least.
Please can anyone help me with adding and parsing it's parameter

To get a parameter, you can simply use the following code:
$params->get('param_name', 'default_value');
So for example, in your XML file, if you had a field like so:
<field name="width" type="text" label="Width" default="60px" />
you would call the parameter like so:
$params->get('width', '60px');
Note that you don't have to add the default value as the second string, however I always find it good practice.
Hope this helps

I think I could found it's solution.
It's here https://github.com/Cleeng/cleeng-wp-plugin/blob/master/php/classes/Frontend.php
Code is
$expr = '/\[cleeng_content(.*?[^\\\])\](.*?[^\\\])\[\/cleeng_content\]/is';
preg_match_all( $expr, $post->post_content, $m );
foreach ( $m[0] as $key => $content ) {
$paramLine = $m[1][$key];
$expr = '/(\w+)\s*=\s*(?:\"|")(.*?)(?<!\\\)(?:\"|")/si';
preg_match_all( $expr, $paramLine, $mm );
if ( ! isset( $mm[0] ) || ! count( $mm[0] ) ) {
continue;
}
$params = array( );
foreach ( $mm[1] as $key => $paramName ) {
$params[$paramName] = $mm[2][$key];
}
if ( ! isset( $params['id'] ) ) {
continue;
}
$content = array(
'contentId' => $params['id'],
'shortDescription' => #$params['description'],
'price' => #$params['price'],
'itemType' => 'article',
'purchased' => false,
'shortUrl' => '',
'referred' => false,
'referralProgramEnabled' => false,
'referralRate' => 0,
'rated' => false,
'publisherId' => '000000000',
'publisherName' => '',
'averageRating' => 4,
'canVote' => false,
'currencySymbol' => '',
'sync' => false
);
if ( isset( $params['referral'] ) ) {
$content['referralProgramEnabled'] = true;
$content['referralRate'] = $params['referral'];
}
if ( isset( $params['ls'] ) && isset( $params['le'] ) ) {
$content['hasLayerDates'] = true;
$content['layerStartDate'] = $params['ls'];
$content['layerEndDate'] = $params['le'];
}
$this->cleeng_content[$params['id']] = $content;
}

Hope this helps someone searching for shortcode parameters, for parameters in short code we can use preg_match_all like that
preg_match_all('/{cleengvideo(.*?)}(.*?){\/cleengvideo}/is', $article->text, $matches);
This will give a array with 3 array elements, the second array have the parameters which you can maupulate with codes.
Hope this helps.

Why does php str_replace with multiple arrays give wrong result, but for loop gives correct result?

I'm trying to replace the characters (numbers and letters) in a string. When I try the "php" way, it gives the wrong result for some of the characters. Why?
PHP-WAY:
$find = array( "0","1","2","3","4","5","6","7","8","9","a","b","c","d","e","f" );
$replace = array( "a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p" );
$haystack = "a5c9a06bfacf5f12cf01ab3f202f6c78"
//This incorrectly returns: kpmjkkglpkmppplmmpklklnpmkmpgmhi
echo str_replace( $find, $replace, $haystack );
LOOP WAY:
$find = array( "0","1","2","3","4","5","6","7","8","9","a","b","c","d","e","f" );
$replace = array( "a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p" );
$haystack = "a5c9a06bfacf5f12cf01ab3f202f6c78"
//This correctly returns: kfmjkaglpkmpfpbcmpabkldpcacpgmhi
$newStr = "";
$chars = str_split( $haystack );
for ( $i = 0, $length = count( $chars ); $i < $length; $i++ )
{
$newStr .= $replace[ array_search( $chars[ $i ], $find ) ];
}
echo $newStr;
Why is the first one incorrect? Am I using it wrong?

Order of entries in your arrays.... str_replace() will process each array entry in the order they appear in your array, so if a '1' gets replaced with 'b', then that 'b' will subsequently get replaced with 'l'; use strtr() rather than str_replace() if you want to prevent that behaviour.
$find = array( "0","1","2","3","4","5","6","7","8","9","a","b","c","d","e","f" );
$replace = array( "a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p" );
$haystack = "a5c9a06bfacf5f12cf01ab3f202f6c78" ;
echo strtr($haystack, array_combine($find, $replace));
Your own code only does a single replace because it's looping against your string, not against the from/to arrays.

Just use strtr
$haystack = "a5c9a06bfacf5f12cf01ab3f202f6c78" ;
echo strtr($haystack, implode($find), implode($replace));
Or preg_replace_callback
$find = array_flip($find);
echo preg_replace_callback('/[a-f0-9]/', function ($v) use($replace, $find) {
return $replace[$find[$v[0]]];
}, $haystack);
Output
kfmjkaglpkmpfpbcmpabkldpcacpgmhi

As specified by #MarkBaker, the answer is that str_replace does not simply move forward in the string, but instead works like a recursive .replace(). Instead, use strtr (which is equivalent to Linux tr command:
$tr = array( "0" => "a","1" => "b","2" => "c","3" => "d","4" => "e","5" => "f","6" => "g","7" => "h","8" => "i","9" => "j","a" => "k","b" => "l","c" => "m","d" => "n","e" => "o","f" => "p" );
$haystack = "a5c9a06bfacf5f12cf01ab3f202f6c78"
echo strtr( $haystack, $tr );

Parse RFC 822 compliant addresses in a TO header

I would like to parse an email address list (like the one in a TO header) with preg_match_all to get the user name (if exists) and the E-mail. Something similar to mailparse_rfc822_parse_addresses or Mail_RFC822::parseAddressList() from Pear, but in plain PHP.
Input :
"DOE, John \(ACME\)" <john.doe#somewhere.com>, "DOE, Jane" <jane.doe#somewhere.com>
Output :
array(
array(
'name' => 'DOE, John (ACME)',
'email' => 'john.doe#somewhere.com'
),
array(
'name' => 'DOE, Jane',
'email' => 'jane.doe#somewhere.com'
)
)
Don't need to support strange E-mail format (/[a-z0-9._%-]+#[a-z0-9.-]+.[a-z]{2,4}/i for email part is OK).
I can't use explode because the comma can appear in the name. str_getcsv doesn't work, because I can have:
DOE, John \(ACME\) <john.doe#somewhere.com>
as input.
Update:
For the moment, I've got this :
public static function parseAddressList($addressList)
{
$pattern = '/^(?:"?([^<"]+)"?\s)?<?([^>]+#[^>]+)>?$/';
if (preg_match($pattern, $addressList, $matches)) {
return array(
array(
'name' => stripcslashes($matches[1]),
'email' => $matches[2]
)
);
} else {
$parts = str_getcsv($addressList);
$result = array();
foreach($parts as $part) {
if (preg_match($pattern, $part, $matches)) {
$result[] = array(
'name' => stripcslashes($matches[1]),
'email' => $matches[2]
);
}
}
return $result;
}
}
but it fails on:
"DOE, \"John\"" <john.doe#somewhere.com>
I need to test on back reference the \" but I don't remember how to do this.

Finally I did it:
public static function parseAddressList($addressList)
{
$pattern = '/^(?:"?((?:[^"\\\\]|\\\\.)+)"?\s)?<?([a-z0-9._%-]+#[a-z0-9.-]+\\.[a-z]{2,4})>?$/i';
if (($addressList[0] != '<') and preg_match($pattern, $addressList, $matches)) {
return array(
array(
'name' => stripcslashes($matches[1]),
'email' => $matches[2]
)
);
} else {
$parts = str_getcsv($addressList);
$result = array();
foreach($parts as $part) {
if (preg_match($pattern, $part, $matches)) {
$item = array();
if ($matches[1] != '') $item['name'] = stripcslashes($matches[1]);
$item['email'] = $matches[2];
$result[] = $item;
}
}
return $result;
}
}
But I'm not sure it works for all cases.

I don't know that RFC, but if the format is always as you showed then you can try something like:
preg_match_all("/\"([^\"]*)\"\\s+<([^<>]*)>/", $string, $matches);
print_r($matches);

str_replace() with associative array

You can use arrays with str_replace():
$array_from = array ('from1', 'from2');
$array_to = array ('to1', 'to2');
$text = str_replace ($array_from, $array_to, $text);
But what if you have associative array?
$array_from_to = array (
'from1' => 'to1';
'from2' => 'to2';
);
How can you use it with str_replace()?
Speed matters - array is big enough.

$text = strtr($text, $array_from_to)
By the way, that is still a one dimensional "array."

$array_from_to = array (
'from1' => 'to1',
'from2' => 'to2'
);
$text = str_replace(array_keys($array_from_to), $array_from_to, $text);
The to field will ignore the keys in your array. The key function here is array_keys.

$text='yadav+RAHUL(from2';
$array_from_to = array('+' => 'Z1',
'-' => 'Z2',
'&' => 'Z3',
'&&' => 'Z4',
'||' => 'Z5',
'!' => 'Z6',
'(' => 'Z7',
')' => 'Z8',
'[' => 'Z9',
']' => 'Zx1',
'^' => 'Zx2',
'"' => 'Zx3',
'*' => 'Zx4',
'~' => 'Zx5',
'?' => 'Zx6',
':' => 'Zx7',
"'" => 'Zx8');
$text = strtr($text,$array_from_to);
echo $text;
//output is
yadavZ1RAHULZ7from2

$search = array('{user}', '{site}');
$replace = array('Qiao', 'stackoverflow');
$subject = 'Hello {user}, welcome to {site}.';
echo str_replace ($search, $replace, $subject);
Results in Hello Qiao, welcome to stackoverflow..
$array_from_to = array (
'from1' => 'to1',
'from2' => 'to2',
);
This is not a two-dimensional array, it's an associative array.
Expanding on the first example, where we place the $search as the keys of the array, and the $replace as it's values, the code would look like this.
$searchAndReplace = array(
'{user}' => 'Qiao',
'{site}' => 'stackoverflow'
);
$search = array_keys($searchAndReplace);
$replace = array_value($searchAndReplace);
# Our subject is the same as our first example.
echo str_replace ($search, $replace, $subject);
Results in Hello Qiao, welcome to stackoverflow..

$keys = array_keys($array);
$values = array_values($array);
$text = str_replace($key, $values, $string);

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Magento: improving search engine (inflections, irrelevant words removal, etc.) - php

Related

PHP get array of search terms from string

Create shortcode with parameter in PHP Joomla

Why does php str_replace with multiple arrays give wrong result, but for loop gives correct result?

Parse RFC 822 compliant addresses in a TO header

str_replace() with associative array

Categories

Resources