Zend Searching for serial numbers (strings not integers) - php

I have some search functionality currently working in which I can search for serial numbers aslong as they are integers.
For example this works fine.
Search for serials >= 352123 and <= 360000
What is the best approach for doing a similar search when serials can consist of a string?
For example:
= '352123/230w' and <= '352123/250w'
I have my table set up as MyISAM to make use of fulltext search in the hope that this will help in this case?

You can do string comparisons like this, but you have to be aware that it's using the lexical value. eg:
SELECT '352123/230w' <= '352123/250w';
-- returns 1
SELECT '352123/230w' <= '4';
-- also returns 1
Assuming that all of your serial numbers are formatted like this you could use string operations to split out the numbers, convert to integers, and sort based on those, however you're both introducing a lot of conversion overhead as well as throwing away your indexes, so it would be very slow/inefficient.
You could always add another field or two to store a numerical equivalent, ie. '352123/230w' stripped of non-numeric chars to 352123230 and use that for your sort, but it really depends on the formatting of all tracked serial numbers being consistent.

Related

Full text document similarity search

I have big database of articles and I'd like before adding new items to DB check if already similar items exist and if so - group them together, so that later I can easily display them as a group of similar items.
Currently we use very simple, but shockingly very precise and our needs fully satisfying PHP's similar_text() function. The problem is, that before we add an item to DB, we first need to pull X amount of items from DB to then loop through every single one in order to check whether our new item is at least 75% similar to other items in order to group them together. This uses a lot of resources and time that we don't really have.
We use MySQL and Solr for all our queries. I've tried using MySQL Full-Text Search, Solr More like this. Compared to PHPs implementation, they are super fast and efficient, but I just can't get a robust percentage score which PHP similar_text() provides. It is crucial for our grouping to be accurate.
For example using this MySQL query:
SELECT id, body, ROUND(((MATCH(body) AGAINST ('ARTICLE TEXT')) / scores.max_score) * 100) as relevance
FROM natural_text_test,
(SELECT MAX(MATCH(body) AGAINST('ARTICLE TEXT')) as max_score FROM natural_text_test LIMIT 1) scores
HAVING relevance > 75
ORDER BY relevance DESC
i get that article with 130 words is 85% similar with another article with 4700 words. And in comparison PHP's similar_text() returns only 3% similarity score which is well below our threshold and is correct in our case.
I've also looked into Levenshtein distance algorithm, but it seems that the same problem as with MySQL and Solr arises.
There has to be a better way to handle similarity checks, maybe I'm using the algorithms incorrectly?
Based on some of the Comments, I might propose this...
It seems that 75%-similar documents would have a lot of the same sentences in the same order.
Break the doc into sentences
Take a crude hash of each sentence, map it to a visible ascii character. This gives you a string that is, perhaps, 1/100th the size of the original doc.
Store that with the doc.
When searching, use levenshtein() on this string to find 'similar' documents.
Sure, hashing is imperfect, etc. But this is fast. And you could apply some other technique to double-check the few docs that are close.
For a hash, I might do
$md5 = md5($sentence);
$x = somehow get 6 bits out of that hex string
$hash = chr(ord('0' + $x));

SQL BETWEEN prices stored as string

I'm running into a problem here. I'm storing prices in my database as a string in the following format: 14.500,00 and 199,95. Sometime later I created this range slider so the users can filter on price as you can see in the provided image. For this to work, I needed to write a new query so I was thinking of a BETWEEN in SQL but this doesn't work on strings. Any ideas to filter on price with a range slider in SQL?
BETWEEN does work on strings. It works just fine -- with the strings ordered alphabetically.
Your problem is that BETWEEN on strings doesn't follow the numeric ordering. Well, that is normal. If I'm speaking French, I wouldn't expect an English speaker to understand me. The same with types. If I use BETWEEN on strings, then I expect the comparisons to be string-based, not numeric. (The same is true of dates, by the way.)
Fix your data so the values are stored as numeric/decimal values. These are numbers with a fixed number of decimal places, exactly what is needed for monetary values.
In most databases, you will need to get rid of the dollar sign. Something like this should work:
update t
set price = replace(price, '$', '');
alter table t alter column price numeric(10, 2); -- or whatever is appropriate
The exact syntax might vary, depending on the database.

How to display the column value descending when the column having spacial characters in mysql

How to display the column desc order when the column having spacial chars in mysql
I am using the follow query but not display correctly
SELECT quotation_pno FROM crm_quotation order by quotation_pno desc
My output coming like this
quotation_pno
PT/17/999
PT/17/1533
PT/17/1532
PT/16/1531
I want my output like this
quotation_pno
PT/17/1533
PT/17/1532
PT/17/999
PT/16/1531
Please help me
I'd argue, that the output is correct, but your assumptions are not. It looks to me, as if quotation_pno is some kind of textual column, right?
The sorting assumes, that you want to sort text and this works this way:
Set i to 0
Compare the i-th character of two strigns
If they are the same and the end is not reached, increase i by 1 and proceed with step 2
Otherwise order the two strings according to the value at the i-th position
(There are some things elided and the pseudocode is boiled down to the very basic, needed to understand the principle).
Applied to your example this means, when the comparison compares PT/17/999 and PT/17/1533 it looks at the characters 0 to 5 and "sees" that they are equal. When it compares the characters at position 6, they are '9' and '1'. Since the character '9' is considered to be greater than '1', PT/17/999 is placed before PT/17/1533.
How to solve the issue?
There are some ways coming into my mind, that will allow you to achieve the desired sort order.
First, you could prepend the numbers with zeros. This will allow you to re-use most of your existing structure, but will result either in very many zeros, or a system that is somehow limited, since you will be restricted to the number of digits you decided to use (or the sort will fail again).
The second possibility is, to store the parts in (additional) numerical columns in the table, e.g. one for year and one for the order number in this year. This is the more flexible approach, but involves more changes.

Searching Large Mysql Database For Exact Date Within Row ID

I have a large mysql database which is about 10gb large. One of the tables in the database is called
clients
In that table there is a colum named
case
The date this client is created is mixed into the number within this column.
Here is an example of an entry in case
011706-0001
The 06 part means this client was created in 2006. I need to pull all the clients that were created in 2015 and 2016. So I need to query for anything that case has a 15 or 16 before the dash.
For example, 000015-0000 or 000016-0000
Is there a way to do this with only mysql? My thought process was I would have to query the whole column then use php to preg_match()
I am worried that based on the size of the database this would cause problems.
To locate rows that have a case column value that contains '06-' (the characters 0 and 6 followed by a dash ...
One option is to use a LIKE comparison operator:
SELECT ...
FROM clients t
WHERE t.case LIKE '%06-%'
ORDER BY ...
The percent sign characters are wildcards in the LIKE comparison, which match any number of characters (zero, one or more.)
MySQL will need to evaluate that condition for every row in the table. MySQL can't make use of an index range scan operation with that.
SELECT ...
FROM clients t
WHERE t.case LIKE '%15-%'
OR t.case LIKE '%16-%'
ORDER BY ...
That will evaluate to true for any values that include the sequence of three characters '15-' or '16-'.
If there's a more standard format for the values in the case column, where the value always starts with exactly six characters representing date 'mmddyy-nnnnn' and you only want to match the 5th thru 7th characters, you could use the underscore wildcard character which matches any one character (in the LIKE comparison) for example... using four underscores
t.case LIKE '____16-%'
Or you could use a SUBSTR function to extract the three characters from the case value, and perform an equality comparison...
SUBSTR(t.case,5,3) = '15-'
SUBSTR(t.case,5,3) IN ('15-','16-')
It's also possible to make use of a REGEXP comparison in place of the LIKE comparison.
In terms of performance, all of the above approaches are going to need to crank through every row in the table, to evaluate the comparison condition.
If that date value was stored as a separate column, as a DATE datatype, and there was an index with that as the leading column, then MySQL could make effective use of a range scan operation, for a query like this...
WHERE t.casedate >= '2015-01-01'
AND t.casedate < '2017-01-01'

Comparing large numbers in php and sql

I need to compare a very large number in php (30 digits long) with 2 numbers in my database. Whats a good way to do this? I tried using floats but its not precise enough and I don't know of a good way to use large numbers in php.
Have you tried using string comparison? Just make sure every number is padded with zeroes.
mysql> select "123123123123123123456456456"<"123123123123123123456456457";
+-------------------------------------------------------------+
| "123123123123123123456456456"<"123123123123123123456456457" |
+-------------------------------------------------------------+
| 1 |
+-------------------------------------------------------------+
Justed test this up to 200+ chars, works like a charm.
Check bcdcomp function
You could compare strings instead.
Depending on how you're fetching the data from the database, you may want to explicitly cast the integer to a string type in the SQL statement.
Other than that, there are several libraries in PHP that handle large integers, like BCMath and GMP.
Handling large numbers in PHP is done through either of two libraries: GMP or BC Math.
I haven't done this myself, so it may not be correct, but I think you'd have to take the string result from GMP or BC Math, and feed that into the query. Make sure you store your numbers as bigint.
Interestin fact: You might think BigInt would be limited to about 20 digits, and you'd be right, except for the fact that it has Mysql Magic:
You can always store an exact integer value in a BIGINT column by storing it using a string. In this case, MySQL performs a string-to-number conversion that involves no intermediate double-precision representation.
If they're -very- big, I'd compare them as strings even. First, if one is longer than the other, it wins. If they're the same length, compare digit by digit left-to-right - if two digits differ, the number with the bigger digit wins. This of course for Positive integers.

Categories