elastica scoring based on regular expression using mvel - php

I am new to elastic search and here is my scenario I am trying to solve.
I have a search input box that supports autosuggestion logic.
The results are fetched from an elastic index which uses ngram filter.
What I want to improve is to introduce a scoring capability so as to order the results from the most important to the less important one (depending on the score).
The score must be based on the following cases:
If there is a match that starts with the given string, set score 100
If there is a match that contains the given string and does not start with it, set score to 10
For this purpose an elastica script was implemented with mvel statements in order to support regular expression match. In other words, it checks to see if the value on the left matches the regular expression on the right (only then a variable is incremented accordingly). But unfortunately it goes wrong when search string is language specific despite the fact that the value on the left is of the specified language too. Another problem to deal with is the second case I mention above (cannot make it to work).
The script when a value ('one example' (belongs to the name field)) starting with the given word ('one') works just fine.
$testParam = mb_strtolower('one', 'utf-8');
$regexStart = '^' . $testParam . '.*$';
$ElasticaScript = new Elastica_Script(" total = 1; if(doc['name'].value ~= '{$regexStart}'){ total += 100; } return total; ");
The script when a value ('one example' (belongs to the name field)) contain the given word ('example') does not work and as a result total score remains 1 and does not increment to 11 as it should be.
$testParam = mb_strtolower('example', 'utf-8');
$regexStart = '^.*' . $testParam . '.*$';
$ElasticaScript = new Elastica_Script(" total = 1; if(doc['name'].value ~= '{$regexStart}'){ total += 10; } return total; ");
And at last, with the same logic, when I try to match a greek word against a value (containing greek letters) of the name field, the increment of the total score is ignored as well.
All the work has been done using the elastica, let alone php.
Could you please help to solve my problem ?
If there is another approach/solution, feel free to share it with me.
Thank you in advance

doc['name'].value loads the analyzed version of the field. Unless your field is set to not analyzed, this will likely be very different than the original content of the field, and not useful for doing regex matches. The Elasticsearch docs on script fields say this only makes sense for non-analyzed or single term fields. For example, if your content is indexed as ngrams, this value will consist of ngrams.
You can access the original text of the field using _source.field_name, and then compute your score based on that. You can still do your search as usual against the ngrams, and use the _source just for scoring.
Here's a sample function_score query that defaults the score to _score, adds 100 if the name field starts with one, else adds 10 if the name field contains one anywhere else. It uses _source.name to access the contents of the name field, so it's doing the regex against the original text of the name field, not the ngrams calculated from the name field.
{
"query": {
"function_score": {
"boost_mode": "replace",
"script_score": {
"script": "total = _score; if (_source.name ~= '^one.*') { total += 100 } else if (_source.name ~= '.*?one.*?') { total += 10 } return total"
}
}
}
}

Related

dynamic calculation of simple math expression

We are taking simple math expression as inputs from user and want to evaluate it. Total number of fields are also dynamic. Each field contains the specific css class as per their index. For example, 1st field has css field "col1", 2nd field has "col2" and so on.
Users gives us input in the form of
"col5 = col4 * col3"
We are converting it to
jQuery(".col5").val(jQuery(".col4").val() * jQuery(".col3").val())
using str_replace function. To do so, we need to do loop for total no of fields. (below is php code example)
for($colLoop = 0; $colLoop < $total_cols; $colLoop++){
$formula = str_replace("col$colLoop","parseFloat(jQuery('.col$colLoop input').val())", $formula);
}
This works but we are looking for some proper solution as it's loops unnecessary for all fields. Is it possible using some other methods? Let us know

PHP increment booking number according to the last booking number in database

I'm using PHP 7 with Phalcon PHP and I'm trying to create a method to generate a booking number. Here is my current method :
public function generateNumber($company_code) {
// Build the prefix : COMPANY20190820
$prefix = $company_code . date('Ymd');
// It's like SELECT count(*) FROM bookings WHERE number LIKE 'COMPANY20190820%'
$counter = Bookings::count(array(
"number LIKE :number:",
"bind" => array('number' => $prefix.'%')
));
// Concat prefix with bookings counter with str_pad
// COMPANY20190820 + 005 (if 4 bookings in DB)
$booking_number = $prefix . str_pad($counter + 1, 3, 0, STR_PAD_LEFT);
// Return COMPANY20190820005
return $booking_number;
}
So I have a problem because sometime I have to delete 1 or multiple bookings so I can get :
COMPANY20190820001
COMPANY20190820002
COMPANY20190820005
COMPANY20190820006
COMPANY20190820007
And I need to add after the last in my DB so here 007, because I can get duplicated booking number if I count like that.
So how can I do to take the last and increment according the last booking number of the current day ?
You need to rethink what you want to do here as it will never work that way.
As I see it you have at least two options:
Use an auto-increment id and use that in combination with the prefix
Use a random fairly unique string (e.g. UUID4)
You should never manually try to get the current maximum id as that may and most likely will at some point result in race conditions and brittle code as a result of that.
So I found a solution, maybe there is a better way to do that but my function works now:
public function generateNumber($company_code) {
// Build the prefix : COMPANY20190820
$prefix = $company_code . date('Ymd');
// Get the last booking with the today prefix
// e.g : COMPANY20190820005
$last_booking = Bookings::maximum(array(
"column" => "number",
"number LIKE :number:",
"bind" => array('number' => $prefix.'%')
));
// Get the last number by removing the prefix (e.g 005)
$last_number = str_replace($prefix, "", $last_booking);
// trim left 0 if exist to get only the current number
// cast to in to increment my counter (e.g 5 + 1 = 6)
$counter = intval(ltrim($last_number, "0")) + 1;
// Concat prefix + counter with pad 006
$booking_number = $prefix . str_pad($counter, 3, 0, STR_PAD_LEFT);
// Return COMPANY20190820006
return $booking_number;
}
I reckon that the use case you describe does not justify the hassle of writing a custom sequence generator in PHP. Additionally, in a scenario where booking deletion is expected to happen, ID reusing feels more a bug than a feature, so your system should store a permanent counter to avoid reusing, making it less simple. Don't take me wrong, it can be done and it isn't rocket science, but it's time and energy you don't need to spend.
Your database engine surely has a native tool to generate autoincremented primary keys, with varying names and implementations (SQL Server has identity, Oracle has sequences and identity, MySQL has auto_increment...). Use that instead.
Keep internal data and user display separated. More specifically, don't use the latter to regenerate the former. Your COMPANY20190820007 example is trivial to compose from individual fields, either in PHP:
$booking_number = sprintf('%s%s%03d',
$company_code,
$booking_date->format('Ymd'),
$booking_id
);
... or in SQL:
-- This is MySQL dialect, other engines use their own variations
SELECT CONCAT(company_code, DATE_FORMAT(booking_date, '%Y%m%d'), LPAD(booking_id, 3, '0')) AS booking_number
FROM ...
You can (and probably should) save the resulting booking_number, but you cannot use it as source for further calculations. It's exactly the same case as dates: don't need to store dates in plain English in order to eventually display them to the end-user and you definitively don't want to parse English dates back to actual dates in order to do anything else beyond printing.
You also mention the possibility of generating long pure-digit identifiers, as Bookings.com does. There're many ways to do it and we can't know which one they use, but you may want to considering generating a numeric hash out of your auto-incremented PK via integer obfuscation.
you could split your database field in two parts, so you hold the prefix and the counter separately.
then, you simply select the highest counter for your desired prefix and increment that one.
if you can't change the table structure, you could alternatively order by the id descendingly and select the first. then you can extract its counter manually. keep in mind you should pad the numbers then, or you get #9 even if #10 exists.
if padding is not an option, you can direct the database to replace your prefix. that way, you can cast the remaining string to a number and let the database sort - this will cost some performance, though, so keep the amount of records low.

Search in MySQL with permutations

I need help.
I have a table where only two columns are: ID and NAME and these data:
ID | NAME
1 HOME
2 GAME
3 LINK
And I want show e.g. row with name: HOME if user search: HOME or OMEH or EMOH or HMEO, etc... - all permutations from word HOME.
I can't save to mysql all these permutations and search in this columns, because some words will be a too big (9-10 chars) and more than 40 MB for each 9 chars words.
One way to solve this problem is to store the sorted set of characters in each name in your database as an additional column and then sort the string the user inputs before searching e.g. database has
ID NAME CHARS
1 HOME EHMO
2 GAME AEGM
3 LINK IKLN
Then when searching in PHP you would do this:
$search = 'MEHO'; // user input = MEHO
$chars = str_split($search);
sort($chars);
$search = implode('', $chars); // now contains EHMO
$sql = "SELECT ID, NAME FROM table1 WHERE CHARS = '$search'";
// perform query etc.
Output
ID NAME
1 HOME
This sounds like a "please do my homework for me" question. It is hard to conceive what real world problem this is applicable to and there is no standard solution. It is OK to ask for help with your homework here, but you should state that this is the case.
more than 40 MB for each 9 chars words
Your maths is a bit wonky, but indeed the storage does not scale well. OTOH leaving aside the amount of storage, in terms of the processing workload it does scale well as a solution.
You could simply brute-force a dynamic query:
function mkqry($word)
{
$qry="SELECT * FROM yourtable WHERE 1 ";
$last=strlen($word);
for ($x=0; $x<$last; $x==) {
$qry.=" AND word LIKE '%" . substr($word, $x, 1) . "%'";
}
return $qry;
}
However this will always result in a full table scan (slow) and won't correctly handle cases where a letter occurs twice in a word.
The solution is to use an indexing function which is independent of the order in which the characters appear - a non-cryptographic hash. An obvious candidate would be to XOR the characters together, although this only results in a one character identifier which is not very selective. So I would suggest simply adding the character codes:
function pos_ind_hash($word)
{
$sum=0;
for ($x=0; $x<$last; $x==) {
$sum+=ord(substr($word, $x));
}
return $sum;
}
function mkqry($word)
{
$qry="SELECT * FROM yourtable WHERE 1 ";
$last=strlen($word);
for ($x=0; $x<$last; $x==) {
$qry.=" AND word LIKE '%" . substr($word, $x, 1) . "%'";
}
$qry.=" AND yourtable.hash=" . pos_ind_hash($word);
return $qry;
}
Note that the hash mechanism here does not uniquely identify a single word, but is specific enough to reduce the volume to the point where an index (on the hash) would be effective.
Multiplying rather than adding would create fewer collisions but at a greater risk of overflowing (which would create ambiguity between implementations).
But both the hash and the single character LIKE only reduce the number of potential matches. To get the query to behave definitively, you need to go further. You could add an attribute to the table (and to the index with the hash)containing the string length - this would be more selective (i.e. improve effectiveness of the index) but still not definitive.
For a definitive method you would need to specify in your query that the data does NOT contain characters which are NOT in the word you are looking for.
The wrong way to do that would be to add a loop specifying "AND NOT LIKE....".
A valid way of doing that would be to add a test in the query which replaces all the letters in the table attribute which appear in the word you are searching for which results in a zero length string.

PHP - Generating random integers within specified range from a key

I have a set of questions with unique IDs in a MySQL database.
Users also have a unique ID and are to answer these questions and their answers are saved in the database.
Now, I want users to get 5 non-repeating uniquely and randomly picked questions from the pool of available ones (let's say 50) based on users ID. So when a user with id 10 starts answering his questions, but stops and wants to return later to the same page, he will get the same questions as before. A user with id 11 will get a different random set of questions, but it will always be the same for him and different from all other users.
I found that random.org can generate exactly what I need with their sequence generator that generates a random sequence of numbers based on provided ID:
https://www.random.org/sequences/?min=1&max=50&col=1&format=plain&rnd=id.10
But I would like the generation to be done locally instead of relying random.org API.
So, I need to generate 'X' unique random integers, within specified range 'Y' that are generated based on supplied integer 'Z'. I should be able to call a function with 'Z' as parameter and receive back the same 'X' integers every time.
I need to know how to replicate this generation with PHP code or at least a push or hint in a direction of a PHP function, pseudo-code or code snippet that will allow me to do it myself.
Thank you in advance!
Why reinvent the wheel
mt_srand(44);
for ($i=0; $i < 10; $i++) echo mt_rand(). "\n";
echo "\n\n";
mt_srand(44);
for ($i=0; $i < 10; $i++) echo mt_rand(). "\n";
result
362278652
928876241
1914830862
68235862
1599103261
790008503
1366233414
1758526812
771614145
1520717825
362278652
928876241
1914830862
68235862
1599103261
790008503
1366233414
1758526812
771614145
1520717825
Generate your random numbers at the beginning and save it in a session. That way the random numbers for that user is always known and you can know what id of question you should go back to by looking it up in the session.
Cheers
you can get random $w array values. try this code as example and change with your logic.
$w = array('0'=>11,'1'=>22,'2'=>44,'3'=>55,'4'=>66,'5'=>88);
$str = '';
for($i=0;$i<5;$i++) {
$str.= $w[rand(0,5)];
}
As this article suggests, you could use a non-repeating pseudo random number generator. Only problem would be to generate a primnumber that is atleast 2x as big as the upper-bound for IDs and satisfies the condition p = 3 in the ring Z4. Though there should be big-enough primnumbers matching the conditions on the net for free use.
Due to my lack of experience with PHP i can only provide pseudocode though.
int[] generateUniqueRands(int id , int ct)
int[] res
const int prim//the primnumber described above
for int i in [0 , ct[
res[i] = ((id + i) * (id + i)) % prim
return res
Note that this algorithm basically works like a window:
id = x set = [a , b , c , d]
id = x + 1 set = [b , c , d , e]
...
If you wish to avoid this kind of behavior just generate a unique random-number from the id first (can be achieved in the same way the set of random numbers is generated).
When the user with ID 10 opens the page for the first time, use rand() to generate random numbers then store them into a cell in the users table in database. So the user with id 10 has the rand() numbers stored.
For example the users table has id, rand_questions.
Check if the rand_questions is empty then update with the new random numbers generated, else you get the numbers from the database.

Perform partial search on MySQL table when exact match may be available

I am running the following SQL statement from a PHP script:
SELECT PHONE, COALESCE(PREFERREDNAME, POPULARNAME) FROM distilled_contacts WHERE PHONE LIKE :phone LIMIT 6
As obvious, the statement returns the first 6 matches against the table in question. The value I'm binding to the :phone variable is goes something like this:
$search = '%'.$search.'%';
Where, $search could be any string of numerals. The wildcard characters ensure that a search on, say 918, would return every record where the PHONE field contains 918:
9180078961
9879189872
0098976918
918
...
My problem is what happens if there does exist an entry with the value that matches the search string exactly, in this case 918 (the 4th item in the list above). Since there's a LIMIT 6, only the first 6 entries would be retrieved which may or may not contain the one with the exact match. Is there a way to ensure the results always contain the record with the exact match, on top of the resulting list, should one be available?
You could use an order by to ensure the exact match is always on top:
ORDER BY CASE WHEN PHONE = :phone THEN 1 ELSE 2 END
Using $search = ''.$search.'%' will show result, that matches the starting value.

Categories