Laravel multiple orderBy() - php

I have this code.
$array= $this->morphMany
('App\Models\Asset', 'assigned', 'assigned_type', 'assigned_to')
->withTrashed()
->orderByRaw('LENGTH(name)', 'ASC')->orderBy('name', 'ASC');
I am using it to perform a natural search on a string with alphanumeric characters, as using an alphabetic search causes strange ordering e.g.
product1
product10
product2
product20
It seems to be working flawlessly.
I have a few questions about this, mainly what is the algorithm used in orderBy? and how does the combination of both here end up giving me a natural order? I get that the combination of a length check and alphabetic check is the solution, but how does this work in laravel? is there a specific sort algorithm used here such as merge sort? I don't understand how it prioritizes one sort over the other.
I'm a total newbie to laravel. Thanks.

The problem is: item123 would come before item2 in the dictionary. To overcome this you're saying "Sort according to the dictionary only when the items have the same length otherwise shorter items come first". By that combination of rules you get:
item2 comes before item11 because it's shorter (ORDER BY LENGTH(name) takes priority)
item123 comes before item234 because it precedes it in the dictionary (Items have the same length so they are ordered by their value)
Now what algorithms MySQL uses for sorting are not important, but it's enough to know that it's optimised for speed and sorting data for huge data sets. What is important is that each sort algorithm uses a compare function to compare two values and determine their order.
MySQL constructs this function based on your ORDER BY statements and its own internal comparison rules. For example: ORDER BY LENGTH(name), name could result in a comparison as follows:
compare(x,y)
if (default_comparer(LENGTH(x.name),LENGTH(y.name)) == 0) {
return default_comparer(x.name,y.name);
} else {
return default_comparer(LENGTH(x.name),LENGTH(y.name));
}
where default_comparer would be a mock name of the default internal comparers that MySQL uses which (in the case of strings) would take a number of things into account like alphabetical order, locale, case rules etc. (In reality MySQL probably has a general comparer and then iterates through each order by statement to get the first non-zero result to return).
This are all a bit vague, I'm not a MySQL developer so I can't provide more precise information, but this is the rough image of how it works.

This has nothing to do with Laravel, your database decides how ordering is done.
Normally, when there are two or more ORDER statements, the results are first ordered by the first statement. If there are elements that have the same value for the first order statement, these are ordered by the second order statement and so on.

Related

Full Text searching functionality not working as expected

I've used full text search functionality, however it's not working up to the expectation.
I've following code in search.php file:
$kws = $_POST['kw'];
$kws = mysql_real_escape_string($kws);
$query = "SELECT * FROM products WHERE MATCH (product_name,brand,short_desc)
AGAINST ('*".$kws."*' IN BOOLEAN MODE) limit 14" ;
$res_old = mysql_query($query);
'kw' is something what I type in search box. Now for an example, if I search for 'Dove Intense', it places Dove Antihairfall on top because that's on top in database.
I understand I'm searching the full text functionality over two separate columns i.e. brand & product_name, this situation can occur. However is there anyway I can have it the other way round so that it actually ranks the search higher if it matches against both the columns. Basically what user types in, I need that thing ranks higher in search result.
Anyone can give some idea how to achieve that?
You (or others) may still be interested in the answer:
From Mysql documentation for match against using boolean mode :
"They do not automatically sort rows in order of decreasing relevance. You can see this from the preceding query result:..." CheckThis
You should use boolean operations to achieve what you expect, so a search for 'Dove Intense' will return rows that contain at least one of the two words (ie. rows with Dove only + rows with Intense only + rows with both dove, intense (in any order) ), simply because having no operation between the two words indicates an OR operation !
This may not be the result you expect, since you may be interested to make an "and" operation between the two words, but what about the order?
If you don't mind the order (ie. any row containing both words will be included in the results ex. 'Intense whatever whatever dove ...') this means that you should match against:
'+Dove +Intense'
in this search, rows containing only one of the two worlds will not be included in result.
if you are trying to implement a strict search ie. only rows containing this phrase "Dove Intense" you should match against '"Dove Intense"'
Now about ranking:
If you want to obtain all results having at least "Dove" but rank rows higher if they also contain "Intense", then you should match against '+Dove Intense'
hope this was useful for what you are trying to implement.

Algorithm for optimising compound index search in MongoDb

I have a collection X on which I have to apply a filter.
The filter is saved as a sepparate entity (collection filters) and the only data it holds are the field name and the conditions applied to that field name
Example of filter:
Name is Stephan and Age BETWEEN 10, 20
Basically what I have to improve is the fact that each field in my filter is an index added upon creation of the filter.
The only structure that matches is a compound index on the fields filtered.
In conclusion, the problem is that when I have a filter like:
Name is Stephan and Age BETWEEN 10,20
My compound index in MongoDb will be: {'Name':1,'Age':1}
But then, if I add another filter, let's say: Age is 10 and Name is Adrian and Height BETWEEN 170,180
compound index is: {'Age':1,'Name':1, 'Height':1}
{'Name':1,'Age':1} <> {'Age':1,'Name':1, 'Height':1}
What can I do to make the last index fit with the first and the other way around.
Please let me know if I haven't been to explicit.
The cleanest solution to this problem is index intersections, which is currently in development. That way, an index for each of the criteria would be sufficient.
In the mean time, I see two options:
Use a separate search database that returns the relevant ids based on your criteria, then use $in in MongoDB to query the actual documents. There are a number of tools that use this approach, but it adds quite a bit of overhead because you need to code against and administer a second db, keep the data in sync, etc.
Use a smart mix of compound indexes and 'infinite range queries'. For instance, you can argue that a query for age in the range of (0, 200) won't discard anybody from the result set, neither will a height query between 0 and 400.
That might not be the cleanest approach, and its efficiency depends very much on the details of the queries, so that might require some fine-tuning.

How to find records in a database which differ only from one character to the search string?

I have database with a field 'clinicNo' and that field contains records like 1234A, 2343B, 9999Z ......
If by mistake I use '1234B' instead of '1234A' for the select statement, I want to get a result set which contains clinicNos which are differ only by a one character to the given string (ie. 1234B above)
Eg. Field may contain following values.
1234A, 1235B, 5433A, 4444S, 2978C
If I use '1235A' for the select query, it should give 1234A and 1235B as the result.
You could use SUBSTRING for your column selection, below example return '1235' with 'A to Z'
select * from TableName WHERE SUBSTRING(clinicNo, 0, 5) LIKE '1235A'
What you're looking for is called the Levenshtein Distance algorithm. While there is a levenshtein function in PHP, you really want to do this in MySQL.
There are two ways to implement a Levenshtein function in MySQL. The first is to create a STORED FUNCTION which operates much like a STORED TRANSACTION, except it has distinct inputs and an output. This is fine for small datasets, but a little slow on anything approaching several thousand rows. You can find more info here: http://kristiannissen.wordpress.com/2010/07/08/mysql-levenshtein/
The second method is to implement a User Defined Function in C/C++ and link it into MySQL as a shared library (*.so file). This method also uses a STORED FUNCTION to call the library, which means the actual query for this or the first method may be identical (providing the inputs to both functions are the same). You can find out more about this method here: http://samjlevy.com/2011/03/mysql-levenshtein-and-damerau-levenshtein-udfs/
With either of these methods, your query would be something like:
SELECT clinicNo FROM words WHERE levenshtein(clinicNo, '1234A') < 2;
It's important to remember that the 'threshold' value should change in relation to the original word length. It's better to think of it in terms of a percentage value, i.e. half your word = 50%, half of 'term' = 2. In your case, you would probably be looking for a difference of < 2 (i.e. a 1 character difference), but you could go further to account for additional errors.
Also see: Wikipedia: Levenshtein Distance.
SELECT * FROM TABLE
WHERE ClinicNo like concat(LEFT(ClinicNo,4),'%')
In general development, you could use a function like Levenshtein to find the difference between two strings and it returns you a number of "how similar they are". You probably want then the result with the most similarity.
To get Levenshtein also in MySQL, read this post.
Or just get all results and use the Levenshtein function of PHP.

Sphinx returns inconsistent result set depending on sorting

I'm trying to implement multilingual indexes for the web application I'm developing. At the moment, records exist in a few languages, English, Malay & Arabic (but they are not separated into different columns). Only English stemmer is currently enabled.
Only two indexes are built, for the stemmed and the non-stemmed indexes. I'm having the problem with the stemmed index, as the result set returned is not consistent, depending on the sort column.
These two queries (from the stemmed index), each returns a different number of total results, although the difference between them is only the sort order.
SELECT * FROM test1stemmed WHERE MATCH('#institution universiti') GROUP BY art_id ORDER BY art_title_ord ASC;
SELECT * FROM test1stemmed WHERE MATCH('#institution universiti') GROUP BY art_id ORDER BY art_title_ord DESC;
However, if the same queries were run on the non-stemmed index, the numbers of results are equal.
I'm also having the same problem with Sphinx PHP API:
$sp = new SphinxClient();
$sp->SetServer('localhost', 9312);
$sp->SetMatchMode(SPH_MATCH_EXTENDED);
$sp->SetGroupBy('art_id', SPH_GROUPBY_ATTR, "$sp_sort_column $sort");
$sp->SetLimits($offset, $rows_per_page, 1000);
$sp->Query("$q", 'test1stemmed');
What am I missing?
Something that I missed from the documentation here http://sphinxsearch.com/docs/2.0.2/clustering.html
WARNING: grouping is done in fixed memory and thus its results are only approximate; so there might be more groups reported in total_found than actually present. #count might also be underestimated. To reduce inaccuracy, one should raise max_matches. If max_matches allows to store all found groups, results will be 100% correct.
So I can workaround this by increasing the value in max_matches, but since putting a very large value is absolutely undesirable, I would fix the query instead.

Compare/Diff Multiple (>Millions) Arrays

I'm not sure if this is possible; but I have millions of "lists" in a MySQL database, and would like develop a system where I take one of the lists; and compare it against all of the other lists in the database and return:
1.) Lists that closely resemble the primary list (some sort of % would be great)
2.) Given a certain items in a list; it would return a list of of items that are included in the majority of all the other lists (ie. autocomplete a list based on popular options).
I would've intially thought this would've been possible if I could create some sort of 'loose hash' that I can compare lists mathematically, but I haven't been able to find a solution that scales (since this is exponential when tackled head-on).
Any new ideas/solutions would be greatly appreciated. Thanks!
Your basic MD5 is a (somewhat) loose hash, supported by both php and mysql and quite fast in these kind of things. Just get an MD5 of what ever data and compare it to others.
Do it in PHP, store the MD5 of the data in array key an use if isset().
Your part 2) Given a certain items in a list; it would return a list of of items that are included in the majority of all the other lists (ie. autocomplete a list based on popular options).
is not very clear, but I interpret it as: Given few items, find all lists that contain all or most of the items.
This should be easy once you create an index on your list elements, essentially like a hash table. The exact query will depend on your requirement, length of lists (whether that is a factor in defining the specs, etc).
If you're saying there are millions of lists, itnis really not an option to load them all into a php script.
You could get the values of the list you are comparing the others to, and then run an SQL query similar to this:
SELECT list_id, COUNT(value) as c FROM lists WHERE value IN (a,b,c) GROUP BY list_id
ORDER BY c DESC
I'm not sure the sql is correct, but the idea is to select the ids of the lists that have the same members in them and then sort the output by the number of list items that intersect with the original list. The percenage of item correspondence is easily obtained in this case.

Categories