php script to find synonyms - php

Im writing a php script to compare the similarity of 2 strings. This works pretty good at the moment, but what I would like to do is match words when one is a synonym of the first.
Any thoughts?

You might want to try looking for a thesaurus service that allows you to query the synonyms for a word and have it return an XML list of synonyms.
Here is something to look at: http://nbii-thesaurus.ornl.gov/thesaurus/

I don't know if this would be helpful for you but time ago I have been working on a PHP (CodeIgniter) library for Google Search that gets related terms by using the ~ on searches.
Maybe you can digg on the source code codeigniter-googlesearch-api
Formally aren't synonymous but depending on the application that you have in mind it could be useful (for example for SEO purposes).
As a side note, if you put ~term in Google, then it will bold you the terms that are related. Try it with ~investment for example.

Related

Generate keywords for contents through Solr

I'm integrating Solr for my new PHP application.
As I'm newbie in solr section, I want to know that is it possible to generate some useful tags for every content pages through solr? something like auto-tagging mechanism.
Thanks in Advance...
P.S My contents available in both Persian and English languages.
something like auto-tagging mechanism.
Yes, you can build something like that.
There are 2 different ways to realize that:
Use the Clustering Component from Solr to build groups of docs and label those docs by solr. The labels are something like the taggs your are looking for.
Realize a tagging by using the MLT feature.
I started an auto-tagging project with the 1.) method with medium success. Finding labels for a cluster of documents is an hard process.
But fortunately, I had some already taggegd documents. If you also have some documents with valid tags, than you can use the 2.) method to use those document as an base to start learning:
Take a document without tags and perform a MLT search against docs with tags. Take the tags from the docs you fond and count them. Depending on the count, apply one or more tags to the untaggegd document. In my case, that works very well. Method 2.) is an cheep implementation of machine based learning, but you will get 95% success with only 5% Work-input.
As it's a PHP application, if it's OK for you to generate tags in php and then inserting/updating to Solr, Here are few options -
If using a web service is OK, check Yahoo's Term Extractor
If you can/want to host a term extraction service yourself to (may be in local server), check FiveFilters
Here is a php function for extracting valuable words from text block. Surely not as efficient as Yahoo Term Extractor, but it may work for you.

Search Algorithm for tags and contents

i'm designing a tag system and i'm looking for a good search algorithm. It must consider both tags and text contents, maybe with the possibility to give more importance to tag or to contents according to my needs. Is there anything similar in the literature? It's my first time working on such a system, so easy and popular solutions could fit too.
Thank you for your time.
It would be possible to implement this within MySQL but I think it would be worth looking at dedicated full text search applications for what you're trying to achieve. Most of them handle tags (usually referred to as attributes) as this is a common use case.
I'd recommend looking at the following:
Sphinx Search
Elastic Search
Solr

Is there a tool to obtain all get all derivatives of a word in PHP?

I need to input "face" and get "facial, faces, faced, facing, facer, faceable" etc.
I've come across some ineffective programs which do the opposite, such as SNOWBALL and a couple of Porter Stemming PHP scripts which don't seem to work.
I'm beginning to think I may have to write this script - But, I thought I'd check to see if somebody has already been there/done that.
It will be very hard to simply find an algorithm to find the different way a word can be written like that.
You can use a dictionary webservice instead that have all the words available already

Thesaurus class or API for PHP [edited]

TL;DR Summary: I need a single command-line application which I can use to get synonyms and other related words. It needs to be multi-lingual and works cross platform. Can anyone suggest a suitable program for me, or help me with the ones I've already found? Thanks.
Longer version:
I've been tasked with writing a system in PHP that can come up with alternative suggestions for words entered by the user. I need to find a thesaurus application / API or similar which I can use to generate these suggestions.
Importantly, it needs to be multilingual (English, Danish, French and German). This rules out most of the software that I managed to find using Google. It also needs to be cross-platform (it needs to work on Linux and Windows).
My research has let me to two promising candidates: WordNet and Stardict.
I've been focusing on WordNet so far, calling it from PHP using the shell_exec() function, and I've managed to use it to create a very promising prototype PHP page, but so far in English only. I'm struggling with how to use it multi-lingual.
The Wordnet site has external links to Wordnet projects in other language (eg DanNet for Danish), but although they're often called Wordnet, they seem to use a variety of database formats and software, which makes them unsuitable for me. I need a consistent interface that I can call from my PHP program.
Stardict looked more promising from that perspective: they provide dictionaries in many languages in a standard DB format for the one application.
But the down-side of Stardict is that its primarily a GUI app. Calling it from the command-line launches the GUI. There is apparently a command-line version (SDCV), but it seems quite out of date (last update 2006), and only for Linux.
Can anyone help me with my problems with either of these programs? Or else, can anyone suggest any other alternative software or API that I could use?
Many thanks.
You could try to leverage PostgreSQL's full text search functionality:
http://www.postgresql.org/docs/9.0/static/textsearch.html
You can configure it with any of the available languages and all sorts of collations to fit your needs. PostgreSQL 9.1 adds some extra collation functionality that you may want to look into if the approach seems reasonable.
The basic steps would be (for each language):
Create the needed table (collated appropriately). For our sake, a single column is enough, e.g.:
create table dict_en (
word text check (word = lower(word)) primary key
);
Fetch the needed dictionary/thesaurus files (those from aspell/Open-Office should work).
Configure text search (see link above, namely section 12.6) using the relevant files.
Insert the whole dictionary into the table. (Surely there's a csv file somewhere...)
And finally index the vector, e.g.:
create index on dict_en using gin (to_tsvector('english', word));
You can now run queries that use this index:
-- Find words related to `:word`
select word
from dict_en
where to_tsvector('english', word) ## plainto_tsquery('english', :word)
and word <> :word;
You might need to create a separate database or schema for each language, and add an additional field (tsvector) if Postgres refuses to index the expression because of the language parameter. (I read the full text docs a long time ago). The details on this would be in section 12.2, and I'm sure you'll know how to adjust the above if this is the case.
Whichever the implementation details, though, I believe the approach should work.
There is a PHP example for a thesaurus API usage here...
http://thesaurus.altervista.org/testphp
Available for Italian, English, French, Deutsch, Spanish and Portuguese.
This seems to be an option, though I'm not sure whether its multilingual:
http://developer.dictionary.com/products/synonyms
I also found the following site which does something similar to your end goal, maybe you could try contacting the owner and ask him how he did it:
http://www.synonymlab.com/

Search for contents inside a website

Anybody please give some useful links on this topic.i need to create a content search for my website.. i have tried google but not get useful materials on this topic...please help me
While google custom search is a good solution, and you didn't give much information, a simple google search does turn up some good results:
Sphider, which I think I used years ago:
Sphider is a lightweight web spider and search engine written in PHP, using MySQL as its back end database. It is a great tool for adding search functionality to your web site or building your custom search engine. Sphider is small, easy to set up and modify, and is used in thousands of websites across the world.
PhpDig (on the 2nd page of results, so it was hard to find), I know I've used this before, another 'installable' php based search engine:
PhpDig is a web spider and search engine written in PHP, using a MySQL database and flat file support. PhpDig builds a glossary with words found in indexed pages. On a search query, it displays a result page containing the search keys, ranked by occurrence.
Sphinx + PHP, an older article, I can't really speak to how well it fits your needs, but it might be a good place to start if you don't want to use a ready made script:
While Google and its ilk are virtually omniscient, the Web's mighty search engines aren't well suited to every site. If your site content is highly specialized or distinctly categorized, use Sphinx and PHP to create a finely tuned local search system.
About's PHP Search Tutorial, certianlly nothing special (it's quite the simplification of a search engine), but another place to start if you want to write it yourself:
Our search engine tutorial assumes that all the data you want to be searchable is stored in your MySQL database. It will not have any fancy algorithms - just a simple LIKE query, but it will work for basic searching and give you a jumping off point to make a more complex searching system.
Of course, more information would mean better answers.
have tried google but not get useful materials on this topic
Have you tried Google?
Seriously, Google Custom Search is very easy to set up and does not require any PHP programming. It doesn't integrate 100% in your site's design but works well.

Categories