How to count results in sphinx? - php

I have to deal with queries that have lots of results, but I only show them in sets of 20-30 rows.
Then I use the SetLimits() method from the php API.
But I need to know what's the total number of results, to calculate the number of pages (or sets of results)
The only way I can do this right now is pulling all the results by setting the limit to 10000000 and see what is in the 'total' key of the array returned by sphinx, but this isn't good because I only need the count() number, I don't wan't sphinx to create a huge array with all the id's.
Performing a select..count() query in mysql won't work, because the indexed data in sphinx is always different.
Any ideas?

Isn't SphinxClient:query returning data about how many records matched your request?
"total" is the number of entries returned by this request (affected by SetLimit) and total_found is the total number of results matching query (not affected by SetLimit) as I understand.

According to manual: SphinxClient::setLimits,
This should do the trick
$cl->SetLimits(0,0);
I'm not Sphinx developer, so this is just a blind guess... It should avoid memory
overflow with large number of results.
Let me know does it work so I can remove answer if this is not correct.
I've also found that SELECT..COUNT() doesn't work in Sphinx query, so you're right about that.
Also, according to Sphinx documentation, you can retrive number of results using SHOW META query.
SHOW META
SHOW META shows additional meta-information about the latest query such as query time and keyword statistics:
mysql> SELECT * FROM test1 WHERE MATCH('test|one|two');
+------+--------+----------+------------+
| id | weight | group_id | date_added |
+------+--------+----------+------------+
| 1 | 3563 | 456 | 1231721236 |
| 2 | 2563 | 123 | 1231721236 |
| 4 | 1480 | 2 | 1231721236 |
+------+--------+----------+------------+
3 rows in set (0.01 sec)
mysql> SHOW META;
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| total | 3 |
| total_found | 3 |
| time | 0.005 |
| keyword[0] | test |
| docs[0] | 3 |
| hits[0] | 5 |
| keyword[1] | one |
| docs[1] | 1 |
| hits[1] | 2 |
| keyword[2] | two |
| docs[2] | 1 |
| hits[2] | 2 |
+---------------+-------+
12 rows in set (0.00 sec)
References:
Sphinx: SHOW META syntax
SphinxClient::setLimits

SELECT VARIABLE_NAME, VARIABLE_VALUE
FROM information_schema.GLOBAL_STATUS WHERE
VARIABLE_NAME LIKE 'SPHINX_TOTAL_FOUND';
for more info
SELECT VARIABLE_NAME, VARIABLE_VALUE
FROM information_schema.GLOBAL_STATUS WHERE
VARIABLE_NAME LIKE 'SPHINX_%';

Related

Remove duplicated records in sphinx without doing setGroupBy?

Hey I am new to sphinx search.
In my query I retrieve course_ids. All the courses belong to a theme_id, but some of them can belong to more than 1 theme, so some of them are duplicated.
I set limits to my query to display results from 1-20, then 21-40... So 20 by 20.
But sometimes in those 20 results there are duplicated results, so for example if from 21 from 40 there are 3 duplicated results I want to remove them and then fill the 3 empty spaces with the next 3 results, so the query returns instead 21-43. Then 44-64...
I tried setGroupBy(), and it worked, but I don't want the courses to be sorted by course_id but with setSortMode(), so the course_ids are again duplicated.
How can I remove the duplicated records and keep the sorting?
Any help would be appreciated. Thanks
setGroupBy has a third and option argument, to specify the final sort order.
So can group by (for example) course_ids but still do the final sorting by weight (or whatever), rather than the default '#group desc'.
$client->setSortOrder( SPH_SORT_RELEVANCE );
$client->setGroupBy( 'course_id', SPH_GROUPBY_ATTR, "#weight desc" );
Still use setSortOrder, which determins WHICH of the rows from the course, is kept. Ie show the highest rank one first, which mimicks overall sorting of weight.
Looks like what you are looking for is exactly what REMOVE_REPEATS() does. Not sure it's available in the programming language clients. You'll probably need to use SphinxQL instead which is anyway recommended as the clients are outdated and miss a lot of functionality.
Here's an example:
Without REMOVE_REPEATS():
MySQL [(none)]> select * from testrt;
+------+------+
| id | gid |
+------+------+
| 1 | 10 |
| 2 | 10 |
| 3 | 20 |
| 4 | 30 |
| 5 | 30 |
+------+------+
5 rows in set (0.04 sec)
With REMOVE_REPEATS() by gid:
MySQL [(none)]> select remove_repeats((select * from testrt), gid, 0,10);
+------+------+
| id | gid |
+------+------+
| 1 | 10 |
| 3 | 20 |
| 4 | 30 |
+------+------+
3 rows in set (0.06 sec)

How to filter in Sphinx

Sphinx data:
+----------+-------------+-------------+
| id | car_id | filter_id |
+----------+-------------+-------------+
| 37280991 | 4261 | 46 |
| 37280992 | 4261 | 18 |
| 37281000 | 4261 | 1 |
| 37281002 | 4261 | 28 |
| 51056314 | 4277 | 18 |
| 51056320 | 4277 | 1 |
| 51056322 | 4277 | 28 |
+----------+-------------+-------------+
I have a page that show cars and you can apply filters. I'm trying that Sphinx return the cars that have filter 1 and 46. If you take a look the above table, you will see that just one car(4261) have both filters. The problem is that I don't know how to apply this in Sphinx.
$this->cs->SetFilter('filter_id', array(1, 46)); // this don't work because show me both(4261, 4277) cars, because work like a "in"
$this->cs->SetGroupBy('car_id', SPH_GROUPBY_ATTR);
$this->cs->SetFilter('filter_id', array(1));
$this->cs->SetFilter('filter_id', array(46));
Both filters apply, and both need to match. In effect they are 'AND'ed.
Seems, misread the question, missed the fact using group by. THought using a MVA.
... so have to be a bit more creative. Alas will probably need to use SphinxQL, rather than SphinxAPI. As sphinxQL has HAVING
SELECT id,car_id FROM index WHERE filter_id IN (1,46) GROUP BY car_id HAVING COUNT(*)>1
This only includes rows where multiple documents match per car (ie matches each time using the IN clause. If there can be duplicates (like two rows with filter_id=1 then can perhaps use COUNT(DISTINCT filter_id) instead? )

Proper way of handling and preventing deadlocks

I've understood that deadlocks occur when an sql query tries to lock an already locked row and I'm currently experiencing deadlocks. here's my sql query below:
INSERT INTO transactions (product_id, category, C, amount, date)
SELECT 'SomeProduct', 'SomeCategory', v.username, 10, '2016-3-31' FROM balance v
WHERE v.username = 'SomeUsername' AND v.balance + 10 >= 0
balance is a virtual table that sums transactions to get user's balance.
This error usually is noticed when having a reasonable amount of users which makes it hard to test, any tips on how to avoid deadlocks or any possible solution because I'm inserting rows into the transaction table in a very numerous way and looking to solve it!
I've also tried tried to catch the exception, but I couldn't create a loop that would redo the query until it is finished.
General answer
Deadlocks can only occur when you have two or more resources, two or more processes, and the processes lock the resources in different order.
Say, process 1 wants to lock resource A, then B, then C. Process 2 wants to lock B, then A, then C.
This may lead to a dead lock if 1 gets A, then 2 gets B, then 1 waits for B and 2 waits for A - indefinitely.
The solution is, thankfully quite simple: anytime if a process needs to lock two or more resources, it must do so in a "sorted" fashion. In this example,
if process 2 also gets A, then B, then C, a deadlock can never happen.
Specific answer
I your case, you seem to be locking different table rows within one transaction in more or less random order. Try to find out how to release locks with mysql and make sure you are only holding as many as you actually need. If you need to hold more than one at a time, try to order your requests in some way.
Hard to tell without knowing more about your code... the first Google hit for "mysql deadlock" shows some promising stuff though: https://www.percona.com/blog/2014/10/28/how-to-deal-with-mysql-deadlocks
I have create a sample table with 2 field. id has primary key
MariaDB [who]> DESC mynum;
+-------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| num | float | YES | | NULL | |
+-------+------------------+------+-----+---------+----------------+
2 rows in set (0.00 sec)
I have filled with 100000 records
MariaDB [who]> SELECT * FROM mynum LIMIT 10;
+----+----------+
| id | num |
+----+----------+
| 1 | 0.47083 |
| 2 | 0.670773 |
| 3 | 0.941373 |
| 4 | 0.69455 |
| 5 | 0.648627 |
| 6 | 0.159488 |
| 7 | 0.851557 |
| 8 | 0.779321 |
| 9 | 0.341933 |
| 10 | 0.371704 |
+----+----------+
10 rows in set (0.00 sec)
MariaDB [who]> SELECT count(*) FROM mynum;
+----------+
| count(*) |
+----------+
| 100000 |
+----------+
1 row in set (0.02 sec)
Now i select row and calculate +10 to the id. You see that he must read ALL rows
MariaDB [who]> EXPLAIN SELECT * FROM mynum WHERE id +10 > 20;
+------+-------------+-------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+------+---------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | mynum | ALL | NULL | NULL | NULL | NULL | 100464 | Using where |
+------+-------------+-------+------+---------------+------+---------+------+--------+-------------+
1 row in set (0.00 sec)
and now i compare the id with a constant. You can see the only reads the row they use and use a index
MariaDB [who]> EXPLAIN SELECT * FROM mynum WHERE id < 10;
+------+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | mynum | range | PRIMARY | PRIMARY | 4 | NULL | 9 | Using where |
+------+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.00 sec)

MySQL query how to get list of all distinct values from columns that contain multiple string values?

I am trying to get a list of distinct values from the columns out of a table.
Each column can contain multiple comma delimited values. I just want to eliminate duplicate values and come up with a list of unique values.
I know how to do this with PHP by grabbing the entire table and then looping the rows and placing the unique values into a unique array.
But can the same thing be done with a MySQL query?
My table looks something like this:
| ID | VALUES |
---------------------------------------------------
| 1 | Acadian,Dart,Monarch |
| 2 | Cadillac,Dart,Lincoln,Uplander |
| 3 | Acadian,Freestar,Saturn |
| 4 | Cadillac,Uplander |
| 5 | Dart |
| 6 | Dart,Cadillac,Freestar,Lincoln,Uplander |
So my list of unique VALUES would then contain:
Acadian
Cadillac
Dart
Freestar
Lincoln
Monarch
Saturn
Uplander
Can this be done with a MySQL call alone, or is there a need for some PHP sorting as well?
Thanks
Why would you store your data like this in a database? You deliberately nullify all the extensive querying features you would want to use a database for in the first place. Instead, have a table like this:
| valueID | groupID | name |
----------------------------------
| 1 | 1 | Acadian |
| 2 | 1 | Dart |
| 3 | 1 | Monarch |
| 4 | 2 | Cadillac |
| 2 | 2 | Dart |
Notice the different valueID for Dart compared to Matthew's suggestion. That's to have same values have the same valueID (you may want to refer to these later on, and you don't want to make the same mistake of not thinking ahead again, do you?). Then make the primary key contain both the valueID and the groupID.
Then, to answer your actual question, you can retrieve all distinct values through this query:
SELECT name FROM mytable GROUP BY valueID
(GROUP BY should perform better here than a DISTINCT since it shouldn't have to do a table scan)
I would suggest selecting (and splitting) into a temp table and then making a call against that.
First, there is apparently no split function in MySQL http://blog.fedecarg.com/2009/02/22/mysql-split-string-function/ (this is three years old so someone can comment if this has changed?)
Push all of it into a temp table and select from there.
Better would be if it is possible to break these out into a table with this structure:
| ID | VALUES |AttachedRecordID |
---------------------------------------------------------------------
| 1 | Acadian | 1 |
| 2 | Dart | 1 |
| 3 | Monarch | 1 |
| 4 | Cadillac | 2 |
| 5 | Dart | 2 |
etc.

Optimum MySQL Table Structure for Fastest Lookups

For a table with 100% reading (no writing), which structure is better and why?
[My table has many columns, but I've made an example here with 4 columns for simplicity]
Option 1: One table with multiple columns
ID | Length | Width | Height
-----------------------------------------
1 | 10 | 20 | 30
2 | 100 | 200 | 300
Option 2: Two tables; one storing column headers, and other storing values
Table 1:
ID | Object_ID | Attribute_ID | Attribute_Value
------------------------------------------
1 | 1 | 1 | 10
2 | 1 | 2 | 20
3 | 1 | 3 | 30
4 | 2 | 1 | 100
5 | 2 | 2 | 200
6 | 2 | 3 | 300
Table 2:
ID | Name
-------------------
1 | Length
2 | Width
3 | Height
Your second option is an under-optimized implementation of the EAV anti-pattern:
Entity-Attribute-Value Model
Why it's bad has already been argued to death on this site and elsewhere.
You'll get much better results from the first.
I will preface this by saying that I'm a relative novice to SQL and database tables; that, however, doesn't mean that I don't know my basics.
Unless your example is heavily oversimplified, you really should use the first example. Not only will it be faster and easier to query, but it simply makes more sense.
In this example, you don't need to split your tables at all; your 'Attribute IDs' are adequately represented by the table headers. Further, these values have no real meaning by themselves, so they really don't need to be in another table.
You would generally break out a new table and reference it as you have if you had another object, existing separately, relating to your object with a one-to-many relationship.
Here's an example (actually from my database on an O'Reilly server) using blog entries and comments on blog entries:
mysql> select * from blog_entries;
+----+--------------+-------------+---------------------+
| id | poster | post | timestamp |
+----+--------------+-------------+---------------------+
| 1 | lunchmeat317 | blah blah | 0000-00-00 00:00:00 |
| 2 | Yongho Shin | yadda yadda | 0000-00-00 00:00:00 |
+----+--------------+-------------+---------------------+
2 rows in set (0.00 sec)
mysql> select id, blog_id, poster, post, timestamp from blog_comments;
+----+---------+--------------+----------------+---------------------+
| id | blog_id | poster | post | timestamp |
+----+---------+--------------+----------------+---------------------+
| 1 | 1 | lunchmeat317 | humina humina | 0000-00-00 00:00:00 |
| 2 | 1 | Joe Blow | huh? | 0000-00-00 00:00:00 |
| 3 | 2 | lunchmeat317 | yakk yakk yakk | 0000-00-00 00:00:00 |
| 4 | 2 | Yongho Shin | lol | 0000-00-00 00:00:00 |
+----+---------+--------------+----------------+---------------------+
4 rows in set (0.00 sec)
mysql>
Think about it from a logical perspective; there's no reason to artificially inject complexity into this design when it doesn't need to be there. In your example, length, width, and height aren't really separate objects, and they're all related to the dimensions of the object you're describing in the table row. Further, length width and height only have one value at a given time.
I hope that made some sense - if I was a bit pedantic in my pedagogy, I apologize. However, if someone else stumbles on this question, hopefully this example will help them.
Good luck.
Edit: I just realized that your question was specifically about performance. That's a little more in-depth, perhaps based on the db engine that you use? Generally, though, I would imagine that querying a table without doing any joins would be slightly faster, considering that denormalization is a commonly-cited method of improving performance.

Categories