I want to build a in-site search engine with php. Users must login to see the information. So I can't use the google or yahoo search engine code.
I want to make the engine searching for the text and pages, and not the tables in mysql database right now.
Has anyone ever done this? Could you give me some pointers to help me get started?
you'll need a spider that harvests pages from your site (in a cron job, for example), strips html and saves them in a database
You might want to have a look at Sphinx http://sphinxsearch.com/ it is a search engine that can easily be access from php scripts.
You can cheat a little bit the way the much-hated Experts-Exchange web site does. They are for-profit programmer's Q&A site much like StackOverflow. In order to see answers you have to pay, but sometimes the answers come up in Google search results. It is rather clear that E-E present different page for web crawlers and different for humans. You could use the same trick, then add Google Custom Search to your site. Users who are logged in would then see the results, otherwise they'd be bounced to login screen.
Do you have control over your server? Then i would recommend that you install Solr/Lucene for index and SolPHP for interacting with PHP. That way you can have facets and other nice full text search features.
I would not spider the actual pages, instead i would spider pages without navigation and other things that is not content related.
SOLR requiers Java on the server.
I have used sphider finally which is a free tool, and it works well with php.
Thanks all.
If the content and the titles of your pages are already managed by a database, you will just need to write your search engine in php. There are plenty of solutions to query your database, for example:
http://www.webreference.com/programming/php/search/
If the content is just contained in html files and not in the db, you might want to write a spider.
You may be interested in caching the results to improve the performances, too.
I would say that everything depends on the size and the complexity of your website/web application.
Related
I am a relatively novice programmer with a good understanding of PHP but more of the case of read, understand and copy the bits I need rather than develop from scratch.
I have a list of over 1000 URLs I would like to search. I would like to search those pages for content on demand and return only results containing the text query I provide. I have looked at Google Custom Search Engine as an easy option and this works well but limits the amount of pages I can add.
I've looked into cURL but doesn't seem to offer what I'm looking for unless I'm missing something?
Or are there other options like Google CSE that are free and easy to use?
You can write crawler for needed pages and use Sphinx engine(http://sphinxsearch.com/) for search in pages. For my opinion, should write a crawler with HTTP extension is better than pure cURL lib.
What I would like to accomplish is to integrate a search feature into my website that is capable of searching my web pages that are static(content does not change). I need the search engine to be free to use and must operate using JavaScript or PHP (and MySQL if needed). I have tried looking on Google (if anyone is wondering) , but maybe I'm just not searching for the right thing. If anyone could point me in the right direction I would greatly appreciate it.
Thanks
Why reinvent the wheel - use Google Custom Search: http://www.google.com/cse/
i got something today , so updating for other users
Google Internal Site Search script (JavaScript, free)
Need a powerful internal search engine script to allow visitors to search the contents of your site? This script uses Google to enable comprehensive search on your site. Cut and paste installation that works on any type of sites.
Sphider (PHP, free)
Sphider is a lightweight web spider and search engine written in PHP, using MySQL as its back end database. It is suitable for adding search functionality to small or medium sites (up to around 20,000 pages). It also works great as a tool for site analysis - finding broken links, gathering statistics about the site etc.
TSEP (PHP, free)
TSEP is a search engine for a website for your website! You can put a "Search this site" anywhere on your website and let people quickly find what they are looking for.
Zoom Search Engine (PHP, commercial $49-$99)
Zoom is a robust PHP script for adding powerful custom search engine to your website, intranet, or CD/DVD.
Perlfect Search (Perl, free)
An integrated, general purpose, site indexer and search engine. It comes as a pair of distinct scripts. The indexer, that automatically, scans and indexes a web site, and the search engine, a cgi script that serves search queries for keywords over the index, and displays results pages in html, in a standard format including title, description and relevance ranking for each matching document.
CGIWorld Site Search (Perl, commercial $25)
SiteSearch gives you the ability to search your website quickly & easily by the use of the password protected browser based administration area. Set the path of the directory you want searched, set the files & directories you want searched, and also the directories & files you do not want searched. SiteSearch is a great tool for the average website of around or below 500 pages.
Fluid Dynamics Search Engine (Perl, free and commercial versions)
FDSE is an easy-to-install search engine for local and remote sites. It returns fast, accurate results from a template-driven architecture. Freeware and shareware versions are available with Perl source.
ASP Site Search (ASP, free)
This ASP Site Search application is commented on each line of code to make it easier for a beginner to follow or to customise. The Site Search application comes in two versions the Advanced version has more functions but requires that the web server has the VB Scripting Engine 5 or above installed.
Site Search Pro (ASP, commercial)
Site Search Pro 2.0 is comprehensive search script for ASP or PHP site
Refer : http://www.javascriptkit.com/howto/search2.shtml
You might want to look at this. (For anyone who struggles their way through this problem)
JSE internal seach engine
http://www.javascriptkit.com/script/script2/jse/
Uses regular expressions to efficiently and rapidly search the index for matches based on the entered keywords. Supports basic logic (ie: negation).
Returns the results on a seperate page from the search form itself, neatly formatted. Uses session cookies to transmit the query between the two pages.
Stores the index (url, keywords and description for each page you wish to be "crawled") in the "results" page. This means the index is loaded only when a search has actually been performed, saving on bandwidth and download time.
Searches title, description and desingated keywords within the index for a match.
"Sphider is a lightweight web spider and search engine written in PHP, using MySQL as its back end database. It is a great tool for adding search functionality to your web site or building your custom search engine. Sphider is small, easy to set up and modify, and is used in thousands of websites across the world."
http://www.sphider.eu/
A bit late, anyway I would suggest Tipue-search.
Its pure javascript and can be integrated with any page.
https://github.com/Tipue/Tipue-Search
Swiftype is another more recent addition to the market too: https://swiftype.com/
Let's say I have a plain HTML website. More than 80% of my visitors are usually from search engines like Google, Yahoo, etc. What I want to do is to make my whole website in Flash.
However, search engines can't read information from Flash or JavaScript. That means my web page would lose more than half of the visitors.
So how do I show show HTML pages instead of Flash to the search engines?
Note: you could reach a specific page/category/etc in Flash by using PHP GET function, for example: you can surf trough all the web pages from the homepage and link to a specific web page by typing page?id=1234.
Short answer: don't make your whole site in Flash.
Longer answer: If you show humans one view and the googlebot another, you are potentially guilty of "cloaking". If the Google Gods find you guilty, you will be banned to the Supplemental Index, never to be heard from again.
Also, doing an entire site in Flash breaks the basic contract of the web, namely that you can link to specific content from other sites or in emails. If your site has just one URL and everything else is handled inside of Flash ... well, I don't know what you have, but it isn't a website anymore. Adobe may like you, but many people will not. Oh, and Flash is very unfriendly to people with handicaps.
I recommend using Flash where it is needed (videos, animations, etc.), but make it part of an honest-to-God website.
What I want to do is to make my whole
website in Flash
So how to accomplish this: show HTML
pages instead of Flash?
These two seem a bit contradictory.
Important is to understand the reasoning behind choosing Flash to build your entire website.
More than 80 percent of my visitors
are usually from search engines
You did some analysis but did you look at how many visitors access your website via a mobile device? Because apart from SEO, Flash won't serve on the majority of these devices.
Have you considered HTML5 as an alternative for anything you want to do with Flash?
Facebook requires you to build applications in Flash among others but html, why? I do not know, but that is their policy and there has got to be a reason.
I have been recently developing simple social applications in Flash (*.swf) and my latest app is a website in flash that will display in tab of my company webpage in Facebook; at the same time, I also want to use that website as a regular webpage on the internet for my company. So, the only way I could find out to display html text within a flash file is by changing the properties for the text wherever I can in CHARACTER to "Render text as HTML", look for the symbol "<>". I think that way the search engines will be able to read your content and process your website accordingly. Good luck.
As you say that you can reach the Flash page by get variable using page ID or any other variables. So its good. I hope you will add Flash in each HTML page. Beside this, you can add all other HTML contents in hidden format. So the crawlers could reach the content and your site will look-up in Flash. Isn't it?
Since no-one actually gave you an straight answer (probably because your question is absolute face-palm-esque), i'll try:
Consider using the web-development approach called progressive enhancement. Now, it's fair to say that it probably wasn't intended for Flashification of a website, but you can make use of it's principles.
Start with your standard HTML version of your website
Introduce swfobject to dynamically (important bit) swap out the HTML content for it's Flash equivalent
Introduce swfaddress to allow for deep linking into your Flash movies (pseudo-URLs)
Granted, steps 2 and 3 are a little more advanced that how i've described them and your site size/structure/design may not suit this approach, but at least it's an answer.
All that being said, I agree with the other answers/comments about the need for using Flash to display your entire site - there's very very very few reasons anyone would do that, and there's more reasons than already added as to why not to (iOS devices etc)...
I'm launching this big database (1.5+ million records) driven website and I want to know some SEO tips before..
Which links I need to tag as rel="nofollow", rel="me", etc?
How to prevent search engines to follow links that are meant to users only? Like 'login', 'post message', 'search', etc.
Do I need to prevent search engines from entering the 'search' section of the site? How to prevent it?
The site is basically a database of movies and actors. How to create a good sitemap?
I need to prevent search engines form reading user comments and reviews???
Another robots.txt or .htacces configuration is needed?
How to use noindex the right way?
Additional tips?
Thanks!
If you just have internal links, no reason to make them nofollow
Make them buttons on forms with method="post" (that's the correct way to do it anyway)
Don't think you need to do that.
Perhaps see how IMDb does it? I'd consider just listing all actors and all movies in some sort of a sensible manner or something like that.
Why would you need to do that?
Depending on whether you want to block something (via robots.txt) or need .htaccess for something else
No idea
Remember to use semantic HTML - use h1's for page titles and so on.
Use nofollow when you don't want your linking to a page to give it additional weight in Google's pageRank. So, for example, you'd use it on links to user homepages for comments or signatures. Use me when you are linking to your other "identities", e.g. your facebook page, your myspace account, etc.
robots.txt allows you to give a set of rules to webcrawlers on what they can or can't crawl and how to crawl. nofollow also tells Google not to crawl a link supposedly. Additionally, if you have application queries that are non-idempotent (cannot be safely called multiple times), then they should be POST requests—these include things like news/message/page deletions.
Unless your searches are incredibly database-intensive (perhaps they should be cached) then you probably don't need to worry about this.
Google is intelligent enough to figure out a sitemap that you've created for your user. And that's the way you ought to be thinking instead of SEO; E.g. how can I make my site more usable/accessible/user-friendly—all of which will indirectly optimize your site for search engines. But if you want to go the distance, there are semantic sitemap technologies you can use, like RDF sitemaps or XML sitemaps. Also, Google Webmasters Tools offers site map creation.
No, why would you want to hide content from the search engine? Probably 90% of StackOverflow's search engine referrals are from user-generated content.
What? Configure your web server for people, not search engines.
This is easy to find the answer to.
Don't make your site spammy, such as overloading it with banners or using popup ads; use semantic markup (H1, H2, P, etc.); use good spelling/grammar; use REST-style URLs (even if it's not a RESTful application); use slugs to hide ugly URI-encoding; observe accessibility standards and guidelines; and, most importantly, make your site useful to encourage return visits and backlinks—that is the most sure fire way of attaining good search ranking.
Anybody please give some useful links on this topic.i need to create a content search for my website.. i have tried google but not get useful materials on this topic...please help me
While google custom search is a good solution, and you didn't give much information, a simple google search does turn up some good results:
Sphider, which I think I used years ago:
Sphider is a lightweight web spider and search engine written in PHP, using MySQL as its back end database. It is a great tool for adding search functionality to your web site or building your custom search engine. Sphider is small, easy to set up and modify, and is used in thousands of websites across the world.
PhpDig (on the 2nd page of results, so it was hard to find), I know I've used this before, another 'installable' php based search engine:
PhpDig is a web spider and search engine written in PHP, using a MySQL database and flat file support. PhpDig builds a glossary with words found in indexed pages. On a search query, it displays a result page containing the search keys, ranked by occurrence.
Sphinx + PHP, an older article, I can't really speak to how well it fits your needs, but it might be a good place to start if you don't want to use a ready made script:
While Google and its ilk are virtually omniscient, the Web's mighty search engines aren't well suited to every site. If your site content is highly specialized or distinctly categorized, use Sphinx and PHP to create a finely tuned local search system.
About's PHP Search Tutorial, certianlly nothing special (it's quite the simplification of a search engine), but another place to start if you want to write it yourself:
Our search engine tutorial assumes that all the data you want to be searchable is stored in your MySQL database. It will not have any fancy algorithms - just a simple LIKE query, but it will work for basic searching and give you a jumping off point to make a more complex searching system.
Of course, more information would mean better answers.
have tried google but not get useful materials on this topic
Have you tried Google?
Seriously, Google Custom Search is very easy to set up and does not require any PHP programming. It doesn't integrate 100% in your site's design but works well.