Jobs site search engine doubt - Sphinx is the solution? - php

im developing a web app to manager jobs, curriculum and etc...
For example in my case: I have a CV table which contain some information about... and same fields in my table, is a reference to others tables like (Kind of company, kind of job looking for, education, languages the guy knows... a ordinary cv model)
My doubt is ... the sphinx is a good search engine? i need to search like: A person, who have X years of experience in YYY area with XXX grade complete ...
I dont know other websites out of Brazil... but i guess its a "ordinary job/cv search"...
Sphinx can be applied for this purpose? Or build each querys is the best cause i have one or more "select box filters"??
Real tkz to all!
Roberto

I'd say that yes, you could use Sphinx for this kind of search (and it would surely be very fast), but the kind of fields you want to search on are really better served directly within the database - making some assumptions that you're providing good indexes on the tables.
The real strength of Sphinx lies in full-text search, which you don't indicate you'll need. If you do find you need to index the full content of the CVs provided, then Sphinx starts to look more appropriate.

Related

Building a fast semantic MySQL search engine for private articles from scratch

I am working on a project that will involve full-text and semantic searches of articles within the site (if it's not possible to combine it, the user can select either option). These articles are subscription-based and can only be searched after logging in; so they are not accessible to external search engines or their APIs.
I read about Sphinx for full text keywords searches (and I intend to implement it for that aspect) but I am not sure how to go about building a semantic search engine out of this. e.g. Searching for "U.S. President" should list articles that contain references to the actual names of the U.S. presidents e.g. George Washington, Bill Clinton (or William Jefferson Clinton).
I have ideas that maybe a sort of tagging system can be used to relate various keywords e.g. relate President to George Washington and President to Bill Clinton, but since the data is really huge and many such relations will exist I don't know how to further this idea.
Please advice me on how to go about building a semantic search engine (I guess Sphinx can handle the full-text keyword search) from scratch. Otherwise, please inform me of any internet-based resources or if there are already existent software in any language that I can integrate into my application.
P.S. My database of choice is MySQL (please advice if another database system is more suitable for the task), and I prefer to program in PHP but if I need to learn Python or any other language that will be more effective to this task, I will be willing.
I already searched at answers.semanticweb.com
I would use Apache Solr. I think it's more flexible than Sphinx. Solr supports full-text search and I believe has add-ons for semantic support (like siren). Solr is the serverized version of Lucene.
Solr supports a SynonymFilter: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#SynonymFilter
This post discusses some strategies for optimizing content retrieval http://www.lucidimagination.com/devzone/technical-articles/optimizing-findability-lucene-and-solr
This book may be useful for someone reading this thread. I just found it on Amazon.
http://www.amazon.com/E-Librarian-Service-User-Friendly-Libraries-X-media-publishing/dp/3642177425

Similar and semantic search

I have few issues with semantic web search. I m building application in PHP/MySQL, which will work as "semantic" search engine. This problem generally is really hard, but my situation is little bit easier. I will need to search only across data on my website and only data which i will add to database.
The idea is that somoene search food, so system returns beside food documents also document which contain word Pizza, because Pizza is a food. My website will be really specific, so it is possible to model all this relations (at least i think so), but i expects, there wont be everything. FIrst problem is that i dont know how to save this data to database, i mean this relations, cause it will be N:M relations and it has to be really flexible, cause it will be used for every search on website. It will be "like tree", from most abstract to most specific, for example Food -> pizza -> margherita but also food->vegetarian->margherita. My idea is using triples from semantic web and save all relations as reasoned triples.
Next problem is about user data input. Lets say users will be able to add some "tags" to their document and my app should have connect them to my triples. So if the user input Pizza, first of all my app should suggest him all known pizzas and if he choose margherita, than his document would be connected to pizza margherita, but if he add some unknown pizza, my app will connect his document with Puzza only (higher abstraction).
Later every search query would search best match in my triples model and search related document, is it good idea?
My question is really general, how to design this application, what should be first idea or some first push.
Thank you for any ideas how to solve this problem.
One of quick ways would be to keep somewhere phrases like
"Food pizza margherita" and "Food pizza something" connected to category id and/or set of documents so you could perform full text and morphology-enabled search for related categories/documents and show upper/lower categories.
This type of queries could be done using stock MySQL Full-Text search http://dev.mysql.com/doc/refman/5.1/en/fulltext-boolean.html or external Full-text search engines like Lucene http://lucene.apache.org/ or Sphinx http://sphinxsearch.com

PostgreSQL, PHP - How to design a database search?

I need to design a search form and the code behind it.
I'm not very familiar with searches.
My table have the following aspect:
- Table_ads
site_name
ad_type
uri
pagetitle
name_ad
general_description_ad
age
country_ad
location_ad
zone_ad
Initially my idea was to do a search like google, we have a single text box and the button search, but I think this will be difficult. The other option is to build a search by fields(traditional search)
What do you think about this subject. What type of search should I do?
Best Regards,
PS: Sorry my English.
For "google-like" search it's best to use Full-Text Search (FTS) solution.
PostgreSQL 8.3 and newer has a built-in FTS engine, and it will let you do all querying in SQL. Example:
SELECT uri FROM ads WHERE fts ## plainto_tsquery('cereal');
See documentation -> http://www.postgresql.org/docs/current/static/textsearch.html and come back if you have more questions :-)
However, in-database FTS is several times slower than dedicated FTS.
So if you need better performance, you will have to build an index outside of database,
Here I would recommend Sphinx -> http://sphinxsearch.com/docs/current.html, Sphinx integrates smoothly with PHP. You feed it with documents (preferably, in form of special XML docset) and update the index on demand or with some scheduler. Then you do searching directly from PHP (not touching the database).
HTH.

How do I do a search on my website

I wanted to add a search feature on a website that would allow users to search the whole website.
The site has around 20 tables and i wanted the search to search through all 20 tables.
Any one can point me into what sort of mysql queries I need to build?
First of all, what about adding custom Google websearch to your site?
The hard way: You should propably do a query for each of your tables and LIMIT (with LIKE on text columns or use full text indexing if your database software supports this) the result to X (e.g. ten) results. In your code, somehow rate these results and display the X best results.
You could also try to use a UNION of multiple queries but then the resulting tuples all have to same structure (if I remember correctly).
Search engines. My Comp Sci degree thesis. First of all you have to ask yourself the question. What type of search do you want to offer the user. If the user will clearly know what they are looking for, for example a product based website then you should provide a search engine based on meta-data. For example users will be searching for a specific product, or product type. This is generally quite easy to provide.
The next is your familiar web search engine such as Google. Google here targets a completely different market. The typical user doesn't know exactly what they are looking for. They just know that they are looking for something to do with Aeroplanes for example. Now Google has to try and figure out what is the result that is most likely to match that and be the most relevant.
I know Google has an incredibly complex and optimised system but from memory if you want to go this way you need to create something called an inverted index file. Then you need to start thinking about a thesaurus because what if the user types in cat, then you should also provide results that contain the word feline. Also word trees, because the user typed in cat the cats result will also be relevant.
I am pretty sure that if you are providing a search engine for your website then it most likely be a metadata search engine in which case you can roll your own solution. If not and you are looking for the second type then why not use Google's services. They provide a custom search that will work within your own website.
Use Sphinx or if you're using ZF — Lucene.
1.: Set a FULLTEXT index on the fields with the content and use the fulltext search mysql provides: http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html
or
2.: Have a look at the lucene search the Zend Framework provides: http://framework.zend.com/manual/en/zend.search.lucene.html
have u tried looking at lucene? its one of the best search modules available today. i would strongly suggest you to give it a shot

How would I implement a simple site search with php and mySQL?

I'm creating a site that allows users to submit quotes. How would I go about creating a (relatively simple?) search that returns the most relevant quotes?
For example, if the search term was "turkey" then I'd return quotes where the word "turkey" appears twice before quotes where it only appears once.
(I would add a few other rules to help filter out irrelevant results, but my main concern is that.)
Everyone is suggesting MySQL fulltext search, however you should be aware of a HUGE caveat. The Fulltext search engine is only available for the MyISAM engine (not InnoDB, which is the most commonly used engine due to its referential integrity and ACID compliance).
So you have a few options:
1. The simplest approach is outlined by Particle Tree. You can actaully get ranked searches off of pure SQL (no fulltext, no nothing). The SQL query below will search a table and rank results based off the number of occurrences of a string in the search fields:
SELECT
SUM(((LENGTH(p.body) - LENGTH(REPLACE(p.body, 'term', '')))/4) +
((LENGTH(p.body) - LENGTH(REPLACE(p.body, 'search', '')))/6))
AS Occurrences
FROM
posts AS p
GROUP BY
p.id
ORDER BY
Occurrences DESC
edited their example to provide a bit more clarity
Variations on the above SQL query, adding WHERE statements (WHERE p.body LIKE '%whatever%you%want'), etc. will probably get you exactly what you need.
2. You can alter your database schema to support full text. Often what is done to keep the InnoDB referential integrity, ACID compliance, and speed without having to install plugins like Sphinx Fulltext Search Engine for MySQL is to split the quote data into it's own table. Basically you would have a table Quotes that is an InnoDB table that, rather than having your TEXT field "data" you have a reference "quote_data_id" which points to the ID on a Quote_Data table which is a MyISAM table. You can do your fulltext on the MyISAM table, join the IDs returned with your InnoDB tables and voila you have your results.
3. Install Sphinx. Good luck with this one.
Given what you described, I would HIGHLY recommend you take the 1st approach I presented since you have a simple database driven site. The 1st solution is simple, gets the job done quickly. Lucene will be a bitch to setup especially if you want to integrate it with the database as Lucene is designed mainly to index files not databases. Google custom site search just makes your site lose tons of reputation (makes you look amateurish and hacked), and MySQL fulltext will most likely cause you to alter your database schema.
Use Google Custom Site Search. I've heard they know a thing or two about searching.
Stackoverflow plans to use the Lucene search engine. There is a PHP port of this written for the Zend Framework but can be downloaded as a separate entity without needing all the ZF bloat. This is called Zend_Search_Lucene, documentation for which can be found here.
Your sql for that will look something like this (where you're trying to find quotes with 'turkey' in it):
SELECT * FROM Quotes
WHERE the_quote LIKE "%turkeyt%";
From there you can figure out what to do with whatever it spits out at you.
Be careful to properly handle cases where a malicious user might inject malicious SQL into your database, especially if you're planning on putting this on the www. If you're doing this for fun though, I guess it's just about what you want to learn.
If you're new to databases and sql, I recommend sqlite over mysql. Much easier to set up and work with, as in no set up. It'll get you around the potential headaches of having to install and set up mysql for the first time.
I'd go with Full Text Search, look at it here: http://hockinson.com/fulltext-search-of-mysql-database-table.html
If you want to write your own, take a look at phpBB's implementation. They have two tables, the first is a unique list of all the words that appear in entries, and the second is a many-to-many reference between the words and the entries. You could then do a group and count to sort the entries in the manner you're looking for.
It's a lot more work then implementing a third-party search engine (or full text search), but it will allow you greater control over the results.
As an alternative to Sphinx and Lucene, a relatively simple search engine can be created using the Xapian library.
+ Supports many advanced search features (such as relevancy ranking)
+ Fast
- You would need to learn the API to create your interface
- Requires a php extension to be installed
Note also that Xapian stores its data in a separate index to mysql.
You might also be interested in Forage which is a wrapper for Solr, Xapian and Lucene.
The Xapian people also created the Omega search engine which is a frontend to Xapian, and can be called via cgi.
Here's a much simpler and easier to operate open source alternative to Solr / Lucene:
http://github.com/typesense/typesense
Google Custom Site Search is great, if you don't query it much (I think you get 1k queries/ day for free) or if you're willing to pay.
MySQL's fulltext search is also a great resource (as has been mentioned previously).
Yahoo's BOSS is an intriguing project -- I'm going to give it a shot during my next search project.
And, finally, Lucene is a great resource if you need more power than fulltext, but want to tweak your own search engine. http://lucene.apache.org
I came across the Zoom Search Engine a few days ago and think this might be the simplest search engine I have ever used.
The Windows based tool creates a database of the site, then it also asks you what language (PHP, ASP.NET, JavaScript, etc), you want to use. I picked PHP and it built the PHP code for me. All, I had to do then was upload the files to the server and (optionally) customize the template and site search was working.
This is free to small sites, and the only con I can find is that the spider tool (database builder) has to run on Windows.

Categories