I'm building an interactive search service for a client, and part of what we need is the ability to add "tags" to documents. These tags will be both pre-existing and custom-defined. The schema has been setup to support this, but I'm having issues with Solarium PHP on updating a resultset.
For example, if the user searches for "Overflow" in our database, and that returns 1-1000+ results, they need to ability to tag this entire resultset with any number of tags.
So I'm taking the resultset from execute() and am currently unable to alter the documents returned -- the exception being "A readonly document cannot be altered".
Any one have a workaround for this?
For an updateable document you should use this class: Solarium\QueryType\Update\Query\Document
Solarium uses this document type as default for select queries for two
reasons:
in most cases no update functionality is needed, so it will only be
overhead to discourage the use of Solr as a DB, as in reading -
altering - saving. Almost all schemas have index-only fields. There is
no way to read the value of there fields, so this data will be lost
when re-saving the document! Updates should normally be done based on
your origin data (i.e. the database). If you are really sure you want
to update Solr data, you can set a read-write document class as the
document type for your select query, alter the documents and use them
in an update query.
http://solarium.readthedocs.org/en/stable/documents/
Related
Lets say I have the following "expenses" MySQL Table:
id
amount
vendor
tag
1
100
google
foo
2
450
GitHub
bar
3
22
GitLab
fizz
4
75
AWS
buzz
I'm building an API that should return expenses based on partial "vendor" or "tag" filters, so vendor="Git" should return records 2&3, and tag="zz" should return records 3&4.
I was thinking of utilizing elasticsearch capabilities, but I'm not sure the correct way..
most articles I read suggest replicating the table records (using logstash pipe or other methods) to elastic index.
So my API doesn't even query the DB and return an array of documents directly from ES?
Is this considered good practice? replicating the whole table to elastic?
What about table relations... What If I want to filter by nested table relation?...
So my API doesn't even query the DB and return an array of documents
directly from ES?
Yes, As you are doing query to elasticsearch, you will get result only from Elasticsearch. Another way is, just get id from Elasticsearch and use id to retrive documeents from MySQL, but this might impact response time.
Is this considered good practice? replicating the whole table to
elastic? What about table relations... What If I want to filter by
nested table relation?...
It is not about good practice or bad practice, it is all about what type of functionality and use case you want to implement and based on that technology stack can be used and data can be duplicated. There is lots of company using Elasticsearch as secondary data source where they have duplicated data just because there usecase is best fit with Elasticsearh or other NoSQL db.
Elasticsearch is NoSQL DB and it is not mantain any relationship between data. Hence, you need to denormalize your data before indexing to the Elasticsearch. You can read this article for more about denormalizetion and why it is required.
ElasticSearch provide Nested and Join data type for parent child relationship but both have some limitation and performance impact.
Below is what they have mentioned for join field type:
The join field shouldn’t be used like joins in a relation database. In
Elasticsearch the key to good performance is to de-normalize your data
into documents. Each join field, has_child or has_parent query adds a
significant tax to your query performance. It can also trigger global
ordinals to be built.
Below is what they have mentioned for nested field type:
When ingesting key-value pairs with a large, arbitrary set of keys,
you might consider modeling each key-value pair as its own nested
document with key and value fields. Instead, consider using the
flattened data type, which maps an entire object as a single field and
allows for simple searches over its contents. Nested documents and
queries are typically expensive, so using the flattened data type for
this use case is a better option.
most articles I read suggest replicating the table records (using
logstash pipe or other methods) to elastic index.
Yes, You can use logstash or any language client like java, python etc, to sync data from DB to Elasticsearch. You can check this SO answer for more information on this.
Your Search Requirements
If you go ahead with Elasticsearch then you can use N-Gram Tokenizer or Regex Query and achieve your search requirements.
Maybe you can try TiDB: https://medium.com/#shenli3514/simplify-relational-database-elasticsearch-architecture-with-tidb-c19c330b7f30
If you want to scale your MySQL and have fast filtering and aggregating, TiDB could simplify the architecture and reduce development work.
I am using Bitnami Apache Solr 7.4.0(Latest)
I indexd documents
Now in admin Panel for query search i need to write field:value format
But I just want to search with only value
Example:
q=field:value (It works)
q=value (It give 0 result)
So what should i configure in schema.xml file that i can search through only by Value of the field
In Solr Admin --> Query page, you can add the field name to df to which you want to route your queries. df means default search field.In order to use you dont need to use dismax or edismax parsers. df will work with Standard Query parser itself. So, I hope this is what you are looking for. Thanks.
You don't need to modify the schema. You can create your own request handler which can perform query operations based on your requirements by creating a new requestHandler in the solrconfig.xml file. For more details on how to do this see here.
That being said, I would suggest you first go through the basics of querying in solr and understand how the different parameters like q, qf, defType etc. work and what different query parsers (standard, dismax etc.) are available for use. See this.
There is nothing special to configure, but you have to use the edismax or dismax query parsers. These query parses are made to support free form user input, and you can use it with just q=value. You tell Solr to use the edismax query parser by providing defType=edismax in the query URL.
Since the field to search no longer is part of the actual query, you tell the edismax handler which field to search by giving the qf parameter. You can give multiple fields in qf, and you can give each field different weights by using the syntax field^<weight>.
So to get the same result as in your first example:
?q=value&defType=edismax&qf=field
I'm trying to create a table, which contains an "email_address" field and an "activation hash" field.
The default value of "activation hash" field should be
sha1(microtime().email_address))
Is it possible to set this up using laravel migrations and how can I do this?
Surely, i should say, that i'm using postgres as DB engine
Most database engines are limited as to what values can be set as a default to columns. Depending on your database engine, you might be able to do it via an event/trigger but it's easier and arguably better to solve this at an application level with Eloquent events.
When I need to do something like this, I split the code in 2 pieces:
Create the column (using Schema Builder)
Set the value (over PHP, loading all models and updating one by one in a foreach)
Note: Both steps can be done at migration file.
You can't do it entirely at MySQL (if you are using MySQL) query since it doesn't have a microtime() function.
But I found this UDF implementation for microtime() on MySql:
https://github.com/JohannesMP/MySQL-udf-microtime
If you are not updating a huge dataset, I would stick with the first solution. But if you are updating a huge dataset, so the MySQL implementation will be faster.
I would like to get a list of KINDS from my google app engine datastore using queries (GQL maybe?). For example the way one would simply show tables of a database.
I have looked at a similar question (How to list kinds in datastore?), however it does not solve my problem as it is python specific.
I am currently using a GDS library in PHP (https://github.com/tomwalder/php-gds) that helps me fetch data from GDS if I know the Entity name using "SELECT * FROM Kind" GQL query.
I am currently in a situation where I may not know the name of the Entity KIND which I need to fetch data from hence the need to get the list of Entity KINDS which I can then look through and confirm if the entity exists, then run my select query.
Any pointers would be greatly appreciated.
Here's an alternate approach based on GQL... This returns a list of the Kinds available:
SELECT * FROM __kind__
In theory, you can then obtain the schema for this object by listing the associated properties for a given Kind (e.g., Person):
SELECT * FROM __property__
WHERE __key__ HAS ANCESTOR KEY(__kind__, 'Person')
If you are using Google's library to issue these queries, then you will have to set the AllowLiterals = true property in your request to avoid getting an exception with an error detail saying Disallowed literal: KEY.
Also, since the property_representation values are overloaded for things like Dates vs Integers, then you can only use the type data as a guess of the underlying type, possibly using conventions. As you cursor through the data, you could update the type information. It's curious that there isn't a better way since Google's Datastore UI provides type information when you create a new instance of an Entity.
You can query them Using the metadata objects of ndb.
https://cloud.google.com/appengine/docs/python/ndb/metadata#get_kinds
I was wondering if there is a way to perform a find() and have Mongo automatically return the associated references without having to run getDBRef() once the parent record has been returned.
I don't see it anywhere in the PHP documentation. I can easily support using getDBRef but it doesn't seem as efficient as it could be.
Also...I'm surprised there's no way to select the specific data to return in the linked reference. I may as well just perform another manual find statement so I can control what the return is...but there has to be a more performance oriented way to do this.
Perhaps I should change my methodology and instead of using the PHP library classes for find, generate my own JavaScript command and run it using the MongoCode class? Would that work and if so...I'm wondering what it would look like. scratches head then heads to The Google
Thanks!
MongoDB does not support joins. Database References (DBRefs) just refers to the practice of a field storing an _id referencing another document. There is currently no specific server-side support for this, and hydrating the reference to a document does require another query. Some MongoDB drivers have convenience methods so you don't have to manually do the find. It is equally valid/performant if you want to do your own find() given a DBRef to lookup (or use other criteria to find related documents).
Depending on your use case and data modelling, a more efficient alternative to the DBRef linking could be embedding related data as a subdocument. See the MongoDB wiki info on Schema Design for more examples.
As far as performance goes, it would be better to use PHP queries than MongoCode (JavaScript which needs to be eval'ed on the server). MongoCode is really intended for more limited use such as within Map/Reduce functions. Refer to Server-Side Code Execution for some of the potential limitations with that approach.
Refer: http://docs.mongodb.org/manual/reference/database-references/
Manual references where you save the _id field of one document in another document as a reference. Then your application can run a second query to return the related data. These references are simple and sufficient for most use cases.
DBRefs are references from one document to another using the value of the first document’s _id field, collection name, and, optionally, its database name. By including these names, DBRefs allow documents located in multiple collections to be more easily linked with documents from a single collection.
To resolve DBRefs, your application must perform additional queries to return the referenced documents. Many drivers have helper methods that form the query for the DBRef automatically. The drivers do not automatically resolve DBRefs into documents.
So either way, no matter which type of referencing you are using, you need to do the dereferencing yourself.
Hope it helps!