I am writing an app in cakephp that will perform scheduled searches for users and store the search results in a matches table. My question is do I really need this matches model in cakephp to store the results? If the answer is no, how should I store the results?
Happy new year.
There are many ways to store data and the one you choose will depend on the data itself and the use to which it will be put (and when it will be used). Because you are doing scheduled searches, I assume that the user may not be around when the search is done, in which case the result needs to be stored.
In this case, I'd use the database. If you need to keep historical results this is definitely the way to go. If the results can be overwritten, you could use a text file per user, but that might get messy.
You don't need to use the main database - you could have another MySql, for example or even a totally different one such as a flat file db.
What would I do? I'd use a table in the main database and get on with something else.
Related
Use Case:
I'm building a site where users can search records - with SQL. BUT - they should also be able to save their search and be notified when a new submitted record meets the criteria.
It's not a car buying site, but for example: The user searches for a 1967 Ford Mustang with a 289 V8 engine, within the 90291 ZIP code. Can't find the right one, but they want to be notified if a matching car is submitted 2 weeks later.
So of course, every time a new car is added to the DB, I can retrieve all the user search queries, and run all of them over all the cars in the DB. But that is not scalable.
Rather than search the entire "car" table with every "search" query every time a new car is submitted, I would like to just check that single "car" object/array in memory, with the existing user queries.
I'm doing this in PHP with Laravel and Eloquent, but I am implementation agnostic and welcome any theoretical approaches.
Thanks,
Chris
I would rather run the saved searches in batches at scheduled intervals and not run them avery time a record is appended to the tables.
It comes down to how you structure your in memory cache.
Whatever cache it is it usually relies on key, value pairs. It will be the same for the cache you are using:
http://laravel.com/docs/4.2/cache
So in the end it is all about using the right key. If you want to update the cached objects based on a car, then you would need to make the key in a way so that you can retrieve all objects from the cache using the car as (part of) the key. Usually you would concat multiple things for key like userId+carId+xyz and then make a MD5 checksum of that.
So that would be the answer to your question. However generally I would not recommend this approach. It sounds like your search results are more like persisted long term available results. So you would probably want to store them somewhere more permanent like a simple table. Then you can use standard SQL tools to join the table and find out what is needed.
My approach would be to use a MySQL stored procedure and use https://dev.mysql.com/doc/refman/5.1/en/event-scheduler.html to review the configs for possible changes and then flag them storing some kind of dirty indicator which is then checked by a php script which would be executed on demand or from cron etc periodically.
You could use the trigger to simply flag that the event scheduler has work to do. However you approach there are a number of state variables which starts to get ugly however this use case doesn't seem to map neatly into a queuing architecture as far as I can see.
A possible approach would be to use a trigger in SQL to send a notification. Here is something related with it: 1s link or 2nd link.
I'm programming a search engine for my website in PHP, SQL and JQuery. I have experience in adding autocomplete with existing data in the database (i.e. searching article titles). But what about if I want to use the most common search queries that the users type, something similar to the one Google has, without having so much users to contribute to the creation of the data (most common queries)? Is there some kind of open-source SQL table with autocomplete data in it or something similar?
As of now use the static data that you have for auto complete.
Create another table in your database to store the actual user queries. The schema of the table can be <queryID, query, count> where count is incremented each time same query is supplied by some other user [Kind of Rank]. N-Gram Index (so that you could also auto-complete something like "Manchester United" when person just types "United", i.e. not just with the starting string) the queries and simply return the top N after sorting using count.
The above table will gradually keep on improving as and when your user base starts increasing.
One more thing, the Algorithm for accomplishing your task is pretty simple. However the real challenge lies in returning the data to be displayed in fraction of seconds. So when your query database/store size increases then you can use a search engine like Solr/Sphinx to search for you which will be pretty fast in returning back the results to be rendered.
You can use Lucene Search Engiine for this functionality.Refer this link
or you may also give look to Lucene Solr Autocomplete...
Google has (and having) thousands of entries which are arranged according to (day, time, geolocation, language....) and it is increasing by the entries of users, whenever user types a word the system checks the table of "mostly used words belonged to that location+day+time" + (if no answer) then "general words". So for that you should categorize every word entered by users, or make general word-relation table of you database, where the most suitable searched answer will be referenced to.
Yesterday I stumbled on something that answered my question. Google draws autocomplete suggestions from this XML file, so it is wise to use it if you have little users to create your own database with keywords:
http://google.com/complete/search?q=[keyword]&output=toolbar
Just replacing [keyword] with some word will give suggestions about that word then the taks is just to parse the returned xml and format the output to suit your needs.
I have a file that contains info that I need to return according to a form submission.
e.g
User submits a name and I look for that name in the file and return the info corresponding to the name in the file to the user.
It sounds silly but I can't use an SQL database. So I need to use php directly, but for high load and large data(~1Million rows) the time taken is too much.
I'd like a solution and one of the possible solutions is to run a php script save sort/save/hash all data and then query the sorted data but I know of no way to do this.
(Obviously I do not want to fire up the interpreter repeatedly and form the table/sort repeatedly I need to run it once and then repeatedly query it) But I have no Idea how to achieve this or where to start.
If it's static, I'd suggest breaking it down into numerous smaller files, so you can just search through the one you need - ie a file just containing all the names starting with 'A', 'B', and so on. – andrewsi
As #andrewsi said, the best approach seems to be to break it up into some sort of sub records based on some logical structure and then query the sub records.
I still haven't found out how to setup the sorted/hashed Data Structure and query it repeatedly without firing up the interpreter over and over (Although I do realise that would essentially be the same as creating some sort of an elementary DB/DBMS)
So for now the solution is to use a predetermined file structure/organisation.
I think I'm probably looking at this the complete wrong way. I have a stored procedure that returns a (potentially large, but usually not) result set. That set gets put into a table on the web via PHP. I'm going to implement some AJAX for stuff like dynamic reordering and things. The stored procedure takes one to two seconds to run, so it would be nice if I could store that final table somewhere that I can access it faster once it's been run. More specifically, the SP is a search function; so I want the user to be able to do the search, but then run an ORDER BY on the returned data without having to redo the whole search to get that data again.
What comes to mind is if there is a way to get results from the stored procedure without it terminating, so I can use a temp table. I know I could use a permanent table, but then I'd run into trouble if two people were trying to use it at the same time.
A short and simple answer to the question: 'is a way to get results from the stored procedure without it terminating?': No, there isn't. How else would the SP return the resultset?
2 seconds does sound like an awfully long time, perhaps you could post the SP code, so we can look at ways to speed up the query's you use. It might also prove useful to give some more info on your tables (indeces, primary keys... ).
If all else fails, you might consider looking into JavaScript table sorters... but again: some code might help here
Before anything, this is not necessarily a question.. But I really want to know your opinion about the performance and possible problems of this "mode" of search.
I need to create a really complex search on multiple tables with lots of filters, ranges and rules... And I realize that I can create something like this:
Submit the search form
Internally I run every filter and logic step-by-step (this may take some seconds)
After I find all the matching records (the result that I want) I create a record on my searches table generating a token of this search (based on the search params) like 86f7e437faa5 and save all the matching records IDs
Redirect the visitor to a page like mysite.com/search?token=86f7e437faa5
And, on the results page I only need to discover what search i'm talking about and page the results IDs (retrieved from the searches table).
This will make the refresh & pagination much faster since I don't need to run all the search logic on every pageview. And if the user change a filter or search criteria, I go back to step 2 and generate a new search token.
I never saw a tutorial or something about this, but I think that's wat some forums like BBForum or Invision do with search, right? After the search i'm redirect to sometihng like search.php?id=1231 (I don't see the search params on the URL or inside the POST args).
This "token" will no last longer than 30min~1h.. So the "static search" is just for performance reasons.
What do you think about this? It'll work? Any consideration? :)
Your system may have special token like 86f7e437faa5 and cache search requests. It's a very useful mechanism for system efficiency and scalability.
But user must see all parameters in accordance with usability principles.
So generating hash of parameters on the fly on server-side will be a good solution. System checks existanse of genereted hash in the searches table and returns result if found.
If no hash, system makes query from base tables and save new result into searches table.
Seems logical enough to me.
Having said that, given the description of you application, have you considered using Sphinx. Regardless of the number of tables and/or filters and/or rules, all that time consuming work is in the indexing, and is done beforehand/behind the scene. The filtering/rules/fields/tables is all done quickly and on the fly after the fact.
So, similar to your situation, Sphinx could give you your set of ID's very quickly, since all the hard work was pre-done.
TiuTalk,
Are you considering keeping searches saved on your "searches" table? If so, remember that your param-based generated token will remain the same for a given set of parameters, lasting in time. If your search base is frequently altered, you can't rely on saved searches, as it may return outdated results. Otherwise, it seems a good solution at all.
I'd rather base the token on the user session. What do you think?
#g0nc1n
Sphinx seems to be a nice solution if you have control of your server (in a VPS for example).
If you don't and a simple Full Text Search isn't enough for you, I guess this is a nice solution. But it seems not so different to me than a paginated search with caching. It seems better than a paginated search with simple url refered caching. But you still have the problem of the searches remaining static. I recommend you flush the saved searches from time to time.