Search mySQL for special unicode characters with mysqli with PHP

Search mySQL for special unicode characters with mysqli with PHP - php

I have a search autocomplete feature which breaks when someone types a French characters, like É
É is stored like '\u00c9' - a unicode codepoint - in the mySQL table:
'id', 'term', 'count', 'words', 'locale'
'5218', '\u00c9COLORADO', '4', '1', 'fr-ca'
'5590', '\u00c9MADEUP', '1', '1', 'fr-ca'
'5511', 'EXCITE', '1', '1', 'fr-ca'
In the PHP, É is '\xc3\x89'. I wrote the code below to convert it to unicode for the query so it would match. On my system, json_encode() outputted "\\u00c9" so I had to str_replace() some of those additional characters
$andrew = json_encode($criteria);
$temp2 = str_replace('"', "", $temp1);
$temp3 = str_replace('\\\\', '\\', $temp2);
$data = self::all( array( 'locale' => $locale , 'term' => array('$like' => $temp3."%" ) ), array('count'=>0,'term'=>2),0,12 );
When I type É in the search and error_log() the SQL query, it is:
SELECT * FROM search_term WHERE `locale` = 'fr-ca' AND `term` LIKE '\\\\u00c9%' ORDER BY `count` DESC, `term` ASC,
When I run that SQL query in mySQL Workbench, it works (the quadruple backslashes are necessary in the case of LIKE) and the result set is:
'id', 'term', 'count', 'words', 'locale'
'5218', '\u00c9COLORADO', '4', '1', 'fr-ca'
'5590', '\u00c9MADEUP', '1', '1', 'fr-ca'
But when I run that query in PHP with mysqli:
$res = mysqli_query($conn, $query);
it doesn't return any results/matches.
How or why does mysqli_query() change the query so it fails? How do I write this so that when the search character is É it matches with that character - how its stored - in the database?

json_encode($str, JSON_UNESCAPED_UNICODE)
Add that flag so that you will get the letter, not the Unicode code.

Related

I am using Case for SQL select query and want to calculate price dynamically

My Controller code
$nicepay_commission = Configure::read('nicepay_commission');
$paypal_commission = Configure::read('paypal_commission');
$getQuery = $this->OrderProduct
->find('all', [
'contain' => [
'Orders' => ['PaymentMethods'],
'Products' => ['ProductType']
]
])
->distinct('Products.id')
->select([
'product_name' => 'MAX(Products.product_name)',
'count' => 'SUM(OrderProduct.qty)',
'actual_rate' => 'SUM(OrderProduct.actual_rate)',
'revenue_based_actual_rate' => '(
SUM(
CASE
WHEN PaymentMethods.payment_gateway = \'nicepay\'
THEN (OrderProduct.actual_rate-((OrderProduct.actual_rate*"'.$nicepay_commission.'")/100))
WHEN PaymentMethods.payment_gateway = \'paypal\'
THEN (OrderProduct.actual_rate-((OrderProduct.actual_rate*"'.$paypal_commission.'")/100))
ELSE (OrderProduct.actual_rate)
END
)
)'
])
->where($conditions);
But there is some error occurring I couldn't find how to manage this.
My error log looks like
2020-08-20 07:56:56 Error: [PDOException] SQLSTATE[42S22]: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Invalid column name '2'.
If I staticly use the values then there is no error
$getQuery = $this->OrderProduct
->find('all', [
'contain' => [
'Orders' => ['PaymentMethods'],
'Products' => ['ProductType']
]
])
->distinct('Products.id')
->select([
'product_name' => 'MAX(Products.product_name)',
'count' => 'SUM(OrderProduct.qty)',
'actual_rate' => 'SUM(OrderProduct.actual_rate)',
'revenue_based_actual_rate' => '(
SUM(
CASE
WHEN PaymentMethods.payment_gateway = \'nicepay\'
THEN (OrderProduct.actual_rate-((OrderProduct.actual_rate*2)/100))
WHEN PaymentMethods.payment_gateway = \'paypal\'
THEN (OrderProduct.actual_rate-((OrderProduct.actual_rate*1)/100))
ELSE (OrderProduct.actual_rate)
END
)
)'
])
->where($conditions);

First things first, never insert date into SQL snippets directly if it can be avoided, even if you think that they might stem from a secure source!
That being said, look at the generated SQL query (if you're not already using Debug Kit, you should install it), you are enclosing the values in double quotes, ie the generated SQL will look like:
OrderProduct.actual_rate * "2"
which in ISO SQL means 2 is going to be used as an identifier.
Removing the quotes will fix the problem, but you're still injecting dynamic data into an SQL string, which should be avoided if possible, so you should go a step further and bind the values instead, in order to reduce the chances of creating SQL injection vulnerabilities:
// ...
->select([
'product_name' => 'MAX(Products.product_name)',
'count' => 'SUM(OrderProduct.qty)',
'actual_rate' => 'SUM(OrderProduct.actual_rate)',
'revenue_based_actual_rate' => '(
SUM(
CASE
WHEN PaymentMethods.payment_gateway = \'nicepay\'
THEN (OrderProduct.actual_rate-((OrderProduct.actual_rate * :nicepayCommission)/100))
WHEN PaymentMethods.payment_gateway = \'paypal\'
THEN (OrderProduct.actual_rate-((OrderProduct.actual_rate * :paypalCommission)/100))
ELSE (OrderProduct.actual_rate)
END
)
)'
])
->bind(':nicepayCommission', $nicepay_commission, 'integer')
->bind(':paypalCommission', $paypal_commission, 'integer')
// ...
See also
Cookbook > Database Access & ORM > Query Builder > SQL Injection Prevention

How to use insert_string query helper with SQL function?

My database is using UUIDs as a primary key. When I insert into the DB (mariaDB), I need to do:
insert into table_name (id, parent_id, name,... etc. )
values (UUID_TO_BIN(UUID()), 'a UUID', 'record name', .etc)
I would like to use CI's insert_string function, but this array:
$data = array(
'id' => 'UUID_TO_BIN(UUID())',
'name' => 'record name',
'parent_id' => 'UUID_TO_BIN(' . $parent_id . ')'
);
$this->db->insert_string('table_name',$data);
...I do not think will work, because each result is escaped, so CI will escape the whole text including the function, instead of only what is inside the UUID_TO_BIN function in the parent_id value.
I am trying to figure out if this is a possibility for the parent_id to run the function given. Otherwise, I guess the easiest way is to do the conversion to BIN from HEX in PHP, but will that break the SQL?

You could use the set() method, which accept optional third parameter ($escape), that will prevent data from being escaped if set to FALSE on the id column.
$data = array(
// 'id' => 'UUID_TO_BIN(UUID())',
'name' => 'record name',
'parent_id' => 'UUID_TO_BIN(' . $parent_id . ')'
);
//set id column value as UUID
$this->db->set('id', 'UUID_TO_BIN(UUID())', FALSE);
$this->db->insert_string('table_name', $data);
more on set() method.

Evaluate if string is not in English: best and easiest practices?

I have long enough string (5000+ chars), and I need to check if it is in English.
After brief web search I found several solutions:
using of PEAR Text_LanguageDetect (it looks attractive but I'm still avoiding solutions which I don't understand how thet works)
check letters frequency (I made a function below with some comments)
check the string for national charecters (like č, ß and so on)
check the string for markers like 'is', 'the' or anything
So the function is the following:
function is_english($str){
// Most used English chars frequencies
$chars = array(
array('e',12.702),
array('t', 9.056),
array('a', 8.167),
array('o', 7.507),
array('i', 6.966),
array('n', 6.749),
array('s', 6.327),
array('h', 6.094),
array('r', 5.987),
);
$str = strtolower($str);
$sum = 0;
foreach($chars as $key=>$char){
$i = substr_count($str,$char[0]);
$i = 100*$i/strlen($str); // Normalization
$i = $i/$char[1];
$sum += $i;
}
$avg = $sum/count($chars);
// Calculation of mean square value
$value = 0;
foreach($chars as $char)
$value += pow($char[2]-$avg,2);
// Average value
$value = $value / count($chars);
return $value;
}
Generally this function estimates the chars frequency and compares it with given pattern. Result should be closer to 0 as the frequency closer the pattern.
Unfortunately it working not as good: mostly I could consider that results 0.05 and lower is English and higher is not. But there are many English strings have high values and many foreign (in my case mostly German) - low.
I can't implement Third solution yet as I wasn't able to find any comprehensive chars set - foreign language markers.
The forth looks attractive but I can not figure out which marker is best to be used.
Any thoughts?
PS After some discussion Zod proposed that this question is duplicate to question Regular expression to match non-English characters?, which answers only in part. So I'd like to keep this question independent.

I think the fourth solution might be your best bet, but I would expand it to include a wider dictionary.
You can find some comprehensive lists at: https://en.wikipedia.org/wiki/Most_common_words_in_English
With your current implementation, you will suffer some setbacks because many languages use the standard latin alphabet. Even languages that go beyond the standard latin alphabet typically use primarily "English-compliant characters," so to speak. For example, the sentence "Ich bin lustig" is German, but uses only latin alphabetic characters. Likewise, "Jeg er glad" is Danish, but uses only latin alphabetic characters. Of course, in a string of 5000+ characters, you will probably see some non-latin characters, but that is not guaranteed. Additionally, but focusing solely on character frequency, you might find that foreign languages which utilize the latin alphabet typically have similar character occurrence frequencies, thus rendering your existing solution ineffective.
By using an english dictionary to find occurrences of English words, you would be able to look over a string and determine exactly how many of the words are English, and from there, calculate a frequency of the number of words that are English. (With a higher percentage indicating the sentence is probably English.)
The following is a potential solution:
<?php
$testString = "Some long string of text that you would like to test.";
// Words from: https://en.wikipedia.org/wiki/Most_common_words_in_English
$common_english_words = array('time', 'person', 'year', 'way', 'day', 'thing', 'man', 'world', 'life', 'hand', 'part', 'child', 'eye', 'woman', 'place', 'work', 'week', 'case', 'point', 'government', 'company', 'number', 'group', 'problem', 'fact', 'be', 'have', 'do', 'say', 'get', 'make', 'go', 'know', 'take', 'see', 'come', 'think', 'look', 'want', 'give', 'use', 'find', 'tell', 'ask', 'seem', 'feel', 'try', 'leave', 'call', 'good', 'new', 'first', 'last', 'long', 'great', 'little', 'own', 'other', 'old', 'right', 'big', 'high', 'different', 'small', 'large', 'next', 'early', 'young', 'important', 'few', 'public', 'bad', 'same', 'able', 'to', 'of', 'in', 'for', 'on', 'with', 'at', 'by', 'from', 'up', 'about', 'into', 'over', 'after', 'beneath', 'under', 'above', 'the', 'and', 'a', 'that', 'i', 'it', 'not', 'he', 'as', 'you', 'this', 'but', 'his', 'they', 'her', 'she', 'or', 'an', 'will', 'my', 'one', 'all', 'would', 'there', 'their', 'I', 'we', 'what', 'so', 'out', 'if', 'who', 'which', 'me', 'when', 'can', 'like', 'no', 'just', 'him', 'people', 'your', 'some', 'could', 'them', 'than', 'then', 'now', 'only', 'its', 'also', 'back', 'two', 'how', 'our', 'well', 'even', 'because', 'any', 'these', 'most', 'us');
/* you might also consider replacing "'s" with ' ', because 's is common in English
as a contraction and simply removing the single quote could throw off the frequency. */
$transformedTest = preg_replace('#\s+#', ' ', preg_replace("#[^a-zA-Z'\s]#", ' ', strtolower($testString)));
$splitTest = explode(' ', $transformedTest);
$matchCount = 0;
for($i=0;$i<count($splitTest);$i++){
if(in_array($splitTest[$i], $common_english_words))
$matchCount++;
}
echo "raw count: $matchCount\n<br>\nPercent: " . ($matchCount/count($common_english_words))*100 . "%\n<br>\n";
if(($matchCount/count($common_english_words)) > 0.5){
echo "More than half of the test string is English. Text is likely English.";
}else{
echo "Text is likely a foreign language.";
}
?>
You can see an example here which includes two sample strings to test (one which is German, and one which is English): https://ideone.com/lfYcs2
In the IDEOne code, when running it on the English string, you will see that the result is roughly 69.3% matching with the common English words. When running it on the German, the match percentage is only 4.57% matching with the common English words.

This problem is called language detection and is not trivial to solve with a single function. I suggest you use LanguageDetector from github.

i would go with the fourth solution and try to also search for not englisch. For Example if you find "the" then high posibility for english. If you find "el" or "la" then the posibility is high for spanish. I would search for "der","die"and "das" then it is very posible that it is German.

I need to pull data based on its exact sequence in an array

Below is my code and I want to pull the data based on the sequence 3, 10 then 7, how can I do that? so far it pulls first 10, then 7, then 3.
<code>
$cars = $this->car->find('all', array(
'conditions' => array(
'car.id' => array(3, 10, 7)
),
'limit' => 3,
'order' => array('car.id' => 'desc')
));
</code>

The idea is to order the result by their respective position in the array. In this case MySQL FIND_IN_SET function can help you.
You may add the following order by statement:
ORDER BY FIND_IN_SET(car.id,'3,10,7')
Note: You need convert this order by statement in your equivalent cake php mysql query.
MySQL FIND_IN_SET() returns the position of a string if it is present
(as a substring) within a list of strings. The string list itself is a
string contains substrings separated by ‘,’ (comma) character.
This function returns 0 when search string does not exist in the
string list and returns NULL if either of the arguments is NULL.
FIND_IN_SET
Sample Input:
query:
SELECT *
FROM cars
id
2
3
4
5
6
7
8
9
10
11
Output:
query:
SELECT *
FROM cars
WHERE cars.id IN (3,10,7)
ORDER BY FIND_IN_SET(cars.id,'3,10,7')
id
3
10
7
Check the SQLFIDDLE DEMO here
Edit:
I don't know CAKE PHP syntax in building mysql query.
But The equivalent query in cake php mysql may be something like that:
$cars = $this->car->find('all', array(
'conditions' => array(
'car.id' => array(3, 10, 7)
),
'limit' => 3,
'order' => array(FIND_IN_SET('car.id' , '3,10,7'))
));

Optimizing MySQL Queries (Adding indexes, re-writing queries, using explain, etc) for beginners? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I'm looking at the slow query log and running explain on queries, now how do I go about interpreting the output to make improvements?
Example:
EXPLAIN SELECT corecountry, corestatus
FROM daydream_ddvalpha.propcore
WHERE corecountry = '7' AND corestatus >= '100'
Output:
# id, select_type, table, type, possible_keys, key, key_len, ref, rows, Extra
'1', 'SIMPLE', 'propcore', 'ALL', NULL, NULL, NULL, NULL, '1532', 'Using where'
Show index:
SHOW INDEX FROM daydream_ddvalpha.propcore =
# Table, Non_unique, Key_name, Seq_in_index, Column_name, Collation, Cardinality, Sub_part, Packed, Null, Index_type, Comment, Index_comment
'propcore', '0', 'PRIMARY', '1', 'coreref', 'A', '1773', NULL, NULL, ”, 'BTREE', ”, ”
Describe:
describe daydream_ddvalpha.propcore
# Field, Type, Null, Key, Default, Extra
'coreref', 'varchar(10)', 'NO', 'PRI', '', ''
'coretitle', 'varchar(75)', 'NO', '', '', ''
'coreprice', 'int(25) unsigned', 'NO', '', '0', ''
'corecurr', 'tinyint(1)', 'NO', '', '0', ''
'coreagent', 'varchar(10)', 'NO', '', '0', ''
'corebuild', 'smallint(4)', 'NO', '', '0', ''
'coretown', 'varchar(25)', 'NO', '', '', ''
'coreregion', 'varchar(25)', 'NO', '', '', ''
'corecountry', 'smallint(4)', 'NO', '', '0', ''
'corelocation', 'smallint(4)', 'NO', '', '0', ''
'corestatus', 'smallint(4)', 'NO', '', '0', ''
'corelistsw', 'char(1)', 'NO', '', '', ''
'corepstatus', 'tinyint(4)', 'NO', '', '0', ''
'coreseq', 'mediumint(10)', 'NO', '', '0', ''
'coreviews', 'mediumint(10)', 'NO', '', '0', ''
'coreextract', 'char(1)', 'NO', '', 'n', ''
EDIT: NEW EXAMPLE
I found a more complex query:
EXPLAIN SELECT coreref, coretitle, coreprice, corecurr, corebuild, coretown, corecountry, corepstatus, corestatus FROM daydream_ddvalpha.propcore
WHERE coretown = 'Torrepacheco'
AND corestatus >= '100'
ORDER BY coreprice ASC
LIMIT 135, 10
Output:
# id, select_type, table, type, possible_keys, key, key_len, ref, rows, Extra
'1', 'SIMPLE', 'propcore', 'ALL', NULL, NULL, NULL, NULL, '1579', 'Using where; Using filesort'
I understood the answers given to the first example regarding the indexes, but how about this? Should I create an index to cover coretown and corestatus and coreprice. I kind of get the impression I'll end up with lots of indexes with duplicate values, or is that normal?

This is your query:
SELECT corecountry, corestatus
FROM daydream_ddvalpha.propcore
WHERE corecountry = '7' AND corestatus >= '100'
You have two conditions in the where clause. One is equality one is not. The index that will help is daydream_ddvalpha.propcore(corecountry, corestatus). corecountry has to go first, because equality conditions need to be the left most columns in the index. Then you get one inequality, which is corecountry.
You are only selecting these two fields. The above index is said to be a covering index for the query, because all columns needed for the query are in the index. In other words, only the index is read for the query, rather than the original data.
As a note: if the fields are numeric, then you don't need to put quotes around the values. Using single quotes makes them look like strings, which can sometimes confuse both SQL optimizers and people reading the code.
EDIT:
As noted in the comments, the syntax for adding an index is:
create index idx_propcore_country_status ON propcore(corecountry, corestatus);
I usually name indexes with the name of the table followed by the columns (but the name can be any valid identifier).

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Search mySQL for special unicode characters with mysqli with PHP - php

json_encode($str, JSON_UNESCAPED_UNICODE) Add that flag so that you will get the letter, not the Unicode code.

Related

I am using Case for SQL select query and want to calculate price dynamically

How to use insert_string query helper with SQL function?

Evaluate if string is not in English: best and easiest practices?

I need to pull data based on its exact sequence in an array

Optimizing MySQL Queries (Adding indexes, re-writing queries, using explain, etc) for beginners? [closed]

Categories

Resources