Sphinx problem. Filter - php

I have a problem with Sphinx. I have configuration like this:
sql_query = \
SELECT id, product_title, product_inf, product_code, ptype_name, title, cat, value, car \
FROM Catalog_View;
sql_attr_uint = car
sql_attr_uint = cat
Catalog_View is a view which collect data from several tables. It works good and haven't got any problem. I created index with this configuration:
index src1
{
source = src1
path = /var/data/src1
docinfo = extern
mlock = 0
morphology = stem_en, stem_ru
min_word_len = 3
charset_type = sbcs
min_prefix_len = 0
min_infix_len = 3
enable_star = 1
}
And indexer done his job perfect. But when I'm looking for empty query (like this '') and setup two filters
$cl->SetFilter('cat',array(9));
$cl->SetFilter('car',array(2));
I loose a lot of matches. For example when I use SQL-query to Catalog_View I have 76 rows, and the same in Sphinx gives me only 11 rows. I can't figure out what am i doing wrong. Everything seems fine except filter.
Actually I have the same problem with filters when I'm looking for non-empty query.

I stumbled upon this one too. My solution was making document IDs unique. If you have duplicated document IDs the possible result can be the same. I suppose Sphinx would take only first unique document thrashing all duplicated data.

Related

Execute category search Sphinx php

Good time of day!
There is such config Sphinx
source txtcontent : ru_config
{
sql_query = SELECT `id` as `txt_id`, 1 as index_id, `type_id`,`content_type_id`, `title`, `annonce`, `content` FROM `TxtContent` WHERE `status` = 1 AND `content_type_id` != 14
sql_attr_uint = index_id
sql_attr_uint = type_id
}
The entire table is indexed, and is stored in one large search index.
When it comes to find what is in it then all works OK
But today the task was to search for categories
The categories described in the field and have a type_id of type int
How in php using SphinxAPI to perform such a search?
Standard search looks like this.
$sphinxClient = new SphinxClient();
$sphinxClient->SetServer("127.0.0.1", 3312 );
$sphinxClient->SetLimits( 0, 700,700 );
$sphinxClient->SetSortMode(SPH_SORT_RELEVANCE);
$sphinxClient->SetArrayResult( true );
$result = $sphinxClient->Query( $this->query, 'txtcontent provider item');
I tried to add
$sphinxClient->SetFilter('type_id','1');
To search only where type_id = 1 but it didn't help.
Actually how can I search for a specific category? option to find everything in php to let go of the result excess is not considered (otherwise, the search will then be saturada existing limit) how to do it "properly" via the API without placing each topic in a separate search index?
setFilter takes an Array of values. And they need to be numeric (type_id is a numeric attribute)
$sphinxClient->SetFilter('type_id',array(1));
The sphinxapi class actully uses assertions to detect invalid data like this, which I guess you have disabled (otherwise would of seen them!).

Mysql Server timing out on specific locate queries

Im programming a search with ZF3 and the DB module.
Everytime i use more than 1 short keyword - like "49" and "am" or "1" and "is" i get this error:
Statement could not be executed (HY000 - 2006 - MySQL server has gone away)
Using longer keywords works perfectly fine as long as i dont use 2 or more short keywords.
The problem only occurs on the live server its working fine on the local test server.
The project table has ~2200 rows with all kind of data the project_search table has 17000 rows with multiple entries for each project , each looking like:
id, projectid, searchtext
The searchtext Column is fulltext. Here the relevant part of the php code:
$sql = new Sql($this->db);
$select = $sql->select(['p'=>'projects']);
if(isset($filter['search'])) {
$keywords = preg_split('/\s+/', trim($filter['search']));
$join = $sql->select('project_search');
$join->columns(['projectid' => new Expression('DISTINCT(projectid)')]);
$join->group("projectid");
foreach($keywords as $keyword) {
$join->having(["LOCATE('$keyword', GROUP_CONCAT(searchtext))"]);
}
$select->join(
["m" => $join],
"m.projectid = p.id",
['projectid'],
\Zend\Db\Sql\Select::JOIN_RIGHT
);
}
Here the resulting Query:
SELECT p.*, m.projectid
FROM projects AS p
INNER JOIN (
SELECT projectid
FROM project_search
GROUP BY projectid
HAVING LOCATE('am', GROUP_CONCAT(searchtext))
AND LOCATE('49', GROUP_CONCAT(searchtext))
) AS m
ON m.projectid = p.id
GROUP BY p.id
ORDER BY createdAt DESC
I rewrote the query using "MATCH(searchtext) AGAINST('$keyword)" and "searchtext LIKE '%keyword%' with the same result.
The problem seems to be with the live mysql server how can i debug this ?
[EDIT]
After noticing that the error only occured in a special view which had other search related queries - each using multiple joins (1 join / keyword) - i merged those queries and the error was gone. The amount of queries seemed to kill the server.
Try refactoring your inner query like so.
SELECT a.projectid
FROM (
SELECT DISTINCT projectid
FROM projectsearch
WHERE searchtext LIKE '%am%'
) a
JOIN (
SELECT DISTINCT projectid
FROM projectsearch
WHERE searchtext LIKE '%49%'
) b ON a.projectid = b.projectid
It should give you back the same set of projectid values as your inner query. It gives each projectid value that has matching searchtext for both search terms, even if those terms show up in different rows of project_search. That's what your query does by searching GROUP_CONCAT() output.
Try creating an index on (searchtext, projectid). The use of column LIKE '%sample' means you won't be able to random-access that index, but the two queries in the join may still be able to scan the index, which is faster than scanning the table. To add that index use this command.
ALTER TABLE project_search ADD INDEX project_search_text (searchtext, projectid);
Try to do this in a MySQL client program (phpmyadmin for example) rather than directly from your php program.
Then, using the MySQL client, test the inner query. See how long it takes. Use EXPLAIN SELECT .... to get an explanation of how MySQL is handling the query.
It's possible your short keywords are returning a ridiculously high number of matches, and somehow overwhelming your system. In that case you can put a LIMIT 1000 clause or some such thing at the end of your inner query. That's not likely, though. 17 kilorows is not a large number.
If that doesn't help your production MySQL server is likely misconfigured or corrupt. If I were you I would call your hosting service tech support, somehow get past the front-line support agent (who won't know anything except "reboot your computer" and other such foolishness), and tell them the exact times you got the "gone away" message. They'll be able to check the logs.
Pro tip: I'm sure you know the pitfalls of using LIKE '%text%' as a search term. It's not scalable because it's not sargable: it can't random access an index. If you can possibly redesign your system, it's worth your time and effort.
You could TRY / CATCH to check if you get a more concrete error:
BEGIN TRY
BEGIN TRANSACTION
--Insert Your Queries Here--
COMMIT
END TRY
BEGIN CATCH
DECLARE #ErrorMessage NVARCHAR(4000);
DECLARE #ErrorSeverity INT;
DECLARE #ErrorState INT;
SELECT
#ErrorMessage = ERROR_MESSAGE(),
#ErrorSeverity = ERROR_SEVERITY(),
#ErrorState = ERROR_STATE();
IF ##TRANCOUNT > 0
ROLLBACK
RAISERROR (#ErrorMessage, -- Message text.
#ErrorSeverity, -- Severity.
#ErrorState -- State.
);
END CATCH
Although because you are talking about short words and fulltext it seems to me it must be related to StopWords.
Try running this query from both your dev server and production server and check if there are any differences:
SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_DEFAULT_STOPWORD;
Also check in my.ini (if that is the config file) text file if these are set to:
ft_stopword_file = ""
ft_min_word_len = 1
As stated in my EDIT the problem wasnt the query from the original Question, but some other queries using the search - parameter as well. Every query had a part like follows :
if(isset($filter['search'])) {
$keywords = preg_split('/\s+/', trim($filter['search']));
$field = 1;
foreach($keywords as $keyword) {
$join = $sql->select('project_search');
$join->columns(["pid$field" => 'projectid']);
$join->where(["LOCATE('$keyword', searchtext)"]);
$join->group("projectid");
$select->join(
["m$field" => $join],
"m$field.pid$field = p.id"
);
$field++;
}
}
This resulted in alot of queries with alot of resultrows killing the mysql server eventually. I merged those Queries into the first and the error was gone.

Select the result where multiple columns

I need to select the result with check the multiple columns.
SELECT * FROM atricle WHERE
a.article_free_1 = 1 AND
a.article_free_2 = 1 AND
a.article_free_3 = 1 AND
a.article_free_4 = 1 AND
a.article_free_5 = 1 AND
a.article_free_6 = 1 AND
a.article_free_7 = 1 AND
a.article_free_8 = 1 AND
a.article_free_9 = 1 AND
a.article_free_10 = 1;
Here I want to simplify the query.Its going very long and I need to add 40+ columns in my query.
How to simplify my query?
If you are using this query very frequently then its better to create a calculated/computed column in the table. This will be like this:
(CASE WHEN article_free_1 = 1
AND article_free_2 = 1 AND ....THEN 1 ELSE 0 END)
you can replace the query with :
SELECT * FROM atricle WHERE <computedColumn> = 1
it should simplify the query and give you optimized result eachtime you execute it.
I can't simply comment yet so I'll post as an answer.
Your query is as optimal as it can get. A calculated field will just add overhead, getting in the way of the query optimizer trying to evaluate. Whatever you do, DON'T loop in sql, it was a horrendous addition way back trying to make people like SQL. Stick to standard queries. What you got is good.
-Edit I jus read what Jeemusu wrote. spot on.

Sphinx partial word search, version 2.2.4

I'm using Sphinx to provide a search webpage to a huge set of data, recently I upgraded Sphinx from v2.1.8 to v2.2.4
I had some troubles in config file, one of them is that 'enable_star' option has been removed, which affected the expected results in my search page, so if search for 'rea' it will not return 'real madrid' until I complete typing 'real', the same issue at words ends 'madrid'.
The expected results if I searched for 'mad' :
Real Madrid
Atlatico Madrid
Mad-Croc
Madila
mad bla
In my case I get 'Mad-Croc' and 'mad bla'.
Here is a part of my config file :
docinfo = extern
mlock = 0
morphology = stem_en
min_word_len = 1
expand_keywords = 1
dict = keywords
PHP Code :
$_sphinx = new SphinxClient();
$_sphinx->SetServer('............', '....');
$_sphinx->SetMatchMode(SPH_MATCH_ANY);
$_sphinx->SetFieldWeights(array('auther_name' => 50));
$_sphinx->SetArrayResult(true);
$_sphinx->SetSortMode(SPH_SORT_EXTENDED2, 'cat_priority DESC, #weight DESC');
//////////////////
$_result = $_sphinx->Query($searchTerm . '*');
could any body look for this.
You dont seem to have min_prefix_len setup on your index, suggest you add it.
Although not sure how your index would ever of worked, as min_prefix_len, would be required for enable_star=0 to have an effect.
That should allow expand_keywords to work its magic. At which poing suggest removing the * from the end of the query. Which would only affect the last word entered anyway, and * should autotmatically by added by expand_keywords setting anyway.

Sphinx sort mode not working in PHP

I am using the following code:
include('sphinxapi.php');
$search = "John"
$s = new SphinxClient;
$s->SetServer("localhost", 9312);
$s->SetMatchMode(SPH_MATCH_EXTENDED2);
$s->SetSortMode(SPH_SORT_EXTENDED, 'name ASC');
$nameindex = $s->Query("$search");
echo $nameindex['total_found'];
This returns a blank page however without the SetSortMode it works fine and returns the number of results. No matter what I set the SetSortMode to it does not work. Any ideas as to why this would be?
I am indexing one column called name
You can't sort by (normal) fields in Sphinx, only attributes, or fields marked with the sql_field_string setting (which creates an attribute of the same name). So you'll need to either add an attribute with the same column, or use sql_field_string - they're equivalent.
Also: I've removed the thinking-sphinx tag - you're not using Ruby, and thus not the Thinking Sphinx library.

Categories