Trying to learning some more PHP. Here is what I'm after.
Essentially, I would like to search a website and return data to my own website.
Add a few keywords to a form.
Use those keywords to query a website such as monster.com for results that match the keywords entered.
Grab that data and return it to my own website.
How hard is something like this? I acknowledge the above outline is oversimplified but any tips you can offer are much appreciated.
If you're querying a site that has an API designated for this kind of functionality, you're on easy street. Just call the API's appropriate search function and you're all set.
If the site you're querying doesn't have an API, you still might be able to search the site with an HTTP GET using the right parameters. Then you just need to scrape through the file for the search results with your script and a few regex functions.
Here's a little tutorial on screen scraping with PHP. Hopefully that will be of some help to you. The trouble with this is that in general if the site hasn't made it easy to access their data, they might not want you to do this.
Enter Yahoo Query Language (yql). It's a service that let's you use things like xpath to get data from websites and put them into an easy to use xml or json format. The language is similarly structured to sql (hence the name).
I've used it for other sites to build rss feeds for sites that didn't have it and it was pretty easy to learn.
http://developer.yahoo.com/yql/
Related
Is it possible to scrap the web based on Keywords using Search engines in PHP?
Like when some put keyword, the script will search google and render the results and then render the pages and scrap/extract the line that includes the matched keywords?
Any idea or library to refer to?
You can do that using google api https://developers.google.com/custom-search/json-api/v1/overview and a related php client https://github.com/google/google-api-php-client.
Later on you need to write a web scraper to download the websites (curl) and parse the html parser (i.e. https://github.com/paquettg/php-html-parser).
I would, however, not recommend php for the latter task. There are much more sophisticated scraping tools available for python (i.e. BeautifulSoup or Scrapy) that will make your life much MUCH easier than using php.
You can use php function call
file_get_contents('web url goes here');
example file_get_contents('http://www.google.com');
That function will get the html returned from the url, then you can use xpath to extract the element of html to get the data that you want.
You can see example and more explanation url below.
https://gist.github.com/anchetaWern/6150297
I personally have done something similar of your question, but it's in ruby on rails, you can explore the project here.
https://github.com/dvarun/gextract
the xpath that I used is here:
https://github.com/dvarun/gextract/blob/master/app/jobs/fetch_keyword_job.rb
I am very entry level php. I am fairly proficient in wordpress and I have created some simple plugins. My situation is, I would like to recreate google's way of displaying a wikipedia article. I would like to be able to use a shortcode with the person's name and have it return the same results google would return and styled the same way just on my wordpress website. I know Wikipedia has an api that allows for search and display, I am just trying to wrap my head around the process. If someone could point me in the right direction of how this would be achieved in php or wordpress I would really appreciate it. I know there are some similar questions on here about the wiki api but I would like to hear some different approaches to finding the best way to achieve what google is doing.
If you don't know what I am talking about, try googling someone famous who would have a wiki article and it will display on the right hand side of the screen with their photo and their info in a very nicely displayed box. Can this overall page info be queried all at once, or does each piece of information have to be queried from wiki and then displayed that way with css?
Forgive me if this is to vague or has already been covered. I am very interested in peoples logic to approaching this situation. Any information on this would be very helpful.
You'll be lucky if this doesn't get downvoted to hell, but I'll try and give you at least a basic run-through.
There are two ways to approach this. AJAX or PHP. The PHP method is a little more complicated to wrap your head around (at least for me anyway).
PHP
First, you should sit down and REALLY read the Wikipedia API manual and sources. The people behind Wikipedia have put a ton of effort into it and wouldn't have done so without giving you information on how to use their system. Don't be intimidated--it's really not that hard.
Second, after you've read the API, you'll know what this url means.
http://en.wikipedia.org/w/api.php?format=json&action=query&titles=Main%20Page&prop=revisions&rvprop=content
A method that might work without using CURL, which can be very confusing, is file_get_contents().
So set the query string parameters for the api, and use them like so:
$api_call = file_get_contents('http://en.wikipedia.org/w/api.php?format=json&action=query&titles=Main%20Page&prop=revisions&rvprop=content');
$api_data = json_decode( $homepage );
Now you should have an array that you should easily be able to manipulate and place into your site.
jQuery/AJAX
Way easier in my opinion, and you have the added bonus of some user manipulation:
$.ajax({
type: 'GET',
url: 'http://en.wikipedia.org/w/api.php?format=json&action=query&titles=Main%20Page&prop=revisions&rvprop=content'
}).done(function(data){
var d = $.parseJSON( data );
$('div.data-holder').html( 'foo'+d.bar );
});
None of this is tested. Just meant for a general idea. Hope this helps.
I am looking for advice on which web programming languages can achieve the following objective:
I am building a website which will allow users to initially search a mysql database I have. The results of this initial search will then be displayed on a results page. I would then like the ability to dynamically create filters based on data in the search results which the user can (de)select to further filter the results. The results should update in real time.
The best example of this I can see is Skyscanner - you make an initial search on a static web form. This then takes you to a results page with the initial search results and it also creates a dynamic filter on the LHS allowing you to filter out certain airlines (for example). This list of airlines is taken from the results dataset (and therefore must be generated dynamically).
How is this best achieved? Is Javascript the way to go or can asp and php also do this?
Many thanks
JavaScript is definitely the way to go.
You will need ajax. A front end JS templating library would help to display the results.
I would create a JSON web service. Fetch the search results over AJAX and use a frontend templating engine. There are many out there. This linkedin article may be of help selecting.
Edit: What does a templating library do?
It will allow you to define a set of tags to display each search result. When you fetch the data using JSON, you convert this to javascript objects. Your templating framework will generate the html using your result template to display all the values.
If you are not familiar with javascript templating, definitely read about it. Once you know it, you will find yourself applying them in a lot of solutions. I am not recommending any particular engine here, cause each has its benefits and you should decide after considering the features you want to provide.
I'm trying to traverse the whole PhoneGap thing to get a native app up and running. I am completely fine with creating html5 markup for the actual app, what I need help with is trying to pull in dynamic content from a website. In particular, there is some content on our website that also needs to be in the app. We use a program call Expression Engine that handles all of our content. The content that I would need to pull over would be:
Sermon Videos
Sermon Series
Locations
Plain text content
The majority of the app will be local, but there are some dynamic needs as you can see. I've read a couple things that say "JSON" is the way to go, but it looks pretty complicated as I'm not quite familiar with AJAX. Is this the only way or are there any options or resources anyone can point me to that might help. I'm not even sure if that method would work for our website. I appreciate any help you can provide.
They are correct. What you need to look into is AJAX/JSON and how to present your data to your app using these technologies.
Expression Engine would actually be quite a good choice for this as its template system is quite flexible. There are even add-on modules for delivering your content as JSON if you want t go that route.
A quick google led me to: http://samcroft.co.uk/2011/updated-loading-data-in-phonegap-using-jquery-1-5/
It's a bit more than you need since you will have your content in an existing CMS instead of creating a new database to store the data, but the concepts will hold true and I am sure you will be able to use it to find more tutorials that suit you better.
I want to build a in-site search engine with php. Users must login to see the information. So I can't use the google or yahoo search engine code.
I want to make the engine searching for the text and pages, and not the tables in mysql database right now.
Has anyone ever done this? Could you give me some pointers to help me get started?
you'll need a spider that harvests pages from your site (in a cron job, for example), strips html and saves them in a database
You might want to have a look at Sphinx http://sphinxsearch.com/ it is a search engine that can easily be access from php scripts.
You can cheat a little bit the way the much-hated Experts-Exchange web site does. They are for-profit programmer's Q&A site much like StackOverflow. In order to see answers you have to pay, but sometimes the answers come up in Google search results. It is rather clear that E-E present different page for web crawlers and different for humans. You could use the same trick, then add Google Custom Search to your site. Users who are logged in would then see the results, otherwise they'd be bounced to login screen.
Do you have control over your server? Then i would recommend that you install Solr/Lucene for index and SolPHP for interacting with PHP. That way you can have facets and other nice full text search features.
I would not spider the actual pages, instead i would spider pages without navigation and other things that is not content related.
SOLR requiers Java on the server.
I have used sphider finally which is a free tool, and it works well with php.
Thanks all.
If the content and the titles of your pages are already managed by a database, you will just need to write your search engine in php. There are plenty of solutions to query your database, for example:
http://www.webreference.com/programming/php/search/
If the content is just contained in html files and not in the db, you might want to write a spider.
You may be interested in caching the results to improve the performances, too.
I would say that everything depends on the size and the complexity of your website/web application.