I would like to use the GoogleNews XML Feed and use some PHP to style it differently for creating morning news summaries.
QUESTIONS
Is it possible to search for a series of phrases in one xml address. Only one phrase needs to match for it to return results, but all are involved in the search?
e.g.. Fiscal+Cliff,US+Debt
The feed url should only fetch the last 24 hours. My query is not. The problem is with the last 2 variables. What needs to be done to fix it.
xml = http://news.google.com/news?output=rss&num=100&q=fiscal+cliff&as_drrb=q&as_qdr=d
I then want to fetch the <title>, <url> and if possible <author> of each article
Then I want each URL to be used for the PHP to fetch a caption and an image.
$item[title], $item[url] $item[author], Item[image_src], Item[caption]
I would then echo this information how I want it set up on the page. How do I do this?
http://www.queness.com/post/8743/learn-how-to-read-parse-and-display-xml-data-in-random-order-with-jquery
Use google as this question is not unique and you can use xml dom to fetch, parse and display data.
Related
Im looking for a simple way to scrape any webpage for the presence of certain keywords. I have a list of words such as {Apple, Banana, Pear, Pineapple} and I have a list of links. I need to search each page for the presence of my list of words and return which ones are present on each link. For example for a link:
http://www.xyz.com
I should search that page and return a vector of binary variables 0 1 1 0, where each respective binary variable corresponds to the presence or absence of each corresponding search key in the list. I am having trouble finding a way to search a webpage since i am new to php. what is the best way to scrape a webpage to get back only relevant text on the page (ie. no html tags or css or javascript metadata etc)? I have tried curl and get_file_contents but they returned pretty ugly representations of the webpage. Can anyone please provide a snippet that returns the text on a page so i can search that returned text?
Thanks in advance!
One of the main examples of curl not working is for the page https://plus.google.com/107630561301274451844/about?gl=us&hl=en
I am trying to find the keyword IL on it and it returns non-relavent text for me to search within.
Look into using something pre-built
This will do what you're looking for: http://simplehtmldom.sourceforge.net/
There is something I am trying to accomplish although I'm not really sure where to start.
I currently have a MySql database with a list of articles. The DB contains the article title, content, and some other info like dates, etc.
There is an RSS feed that we monitor for new articles, it's a Google Alert feed that just contains the latest news on certain subjects. I want to be able to automatically monitor this feed and record any feed items that are similar to stories currently in our DB.
I know how to set a script to run automatically, and I know how to parse the RSS feed with SimplePie.
What I need to figure out is how to take the description of the rss feed items, run a check on our DB to see if the feed item is similar to something we have in our DB, and return a numerical score of some sort, sort of like a "similarity rating" or something.
After that I can have the info I need recorded to the DB if the "similarity rating" is above a set limit, which I know how to do.
So my only issue is how to compare each feed item to our current articles, and return a score based on how similar it is.
The Levenshtein function (available for both PHP and MySQL) is a good way to handle this. It basically calculates a value based on the number of permutations (replacements, moves, etc) required to convert one string to another. That score would be your "similarity rating".
EDIT: the Levenshtein function is not available natively in MySQL but there are SQL implementations of it that you can use such as: http://kristiannissen.wordpress.com/2010/07/08/mysql-levenshtein/
I want to get the url of an image (where it's stored) from the description of a RSS feed, so then I can put that url inside a variable.
I know that for getting the link of the RSS feed post I have to use $feed->channel->item->link; where $feed is $feed=simplexml_load_file("link_of_the_feed";.
But what if I have get the image url of the post, do I have to use something like $feed->channel->item->image;?
I really don't know, maybe a RSS parser like MagPie RSS which I tried without results?
Thanks in advance.
If the image node is at the top level of the item node, then yes. If it's deeper than the item node, you'll have to traverse it accordingly. It would be helpful if you posted your XML.
EDIT: you can also check out my answer here on how to parse through an XML file with PHP.
You're on the right track! But it all depends on the format that the RSS Feed is set up in.
The item node actually contains a whole bunch of different fields, of which link is only one. Take a look here for information on the other fields that the item node contains.
Now, if the RSS feed points directly to the image file, then you can just use item->link. More likely, however, the link points to a blog post or something that has the image embedded in it. In this case, you can undertake some processing on $feed->channel->item->description to find what you need. The description node contains an escaped HTML summary of the post, and then from there, you can just use a regular expression to find the source of the image. Also remember: before you start using the regular expressions, you might need to decode the description using htmlspecialchars_decode() before you start processing it with the regular expressions - in my own experience, descriptions often come formatted with special characters escaped.
I know that's a lot of information, but once you get started it's really not as hard as it sounds. Good luck!
What im trying to do:
Fetch X numbers of RSS Feeds from my Blogs and echo only new entries. My Problem is, how to know wich items are already parsed?
Solution so far:
Fetch the Feed every 5 hours, store all titles inside an Database table or flat file. Next run check if the title is already in database if not print it and save it inside the database.
But iam not sure if this is best practise to do this?
If someone knows a fast way, it would be great. Sorry for my poor english.
If the blog entries your are parsing have some date indicator, just have a field called CREATED of type DATETIME in your database and save this date value there. Then when you parse select the latest DATETIME SELECT MAX(CREATED) FROM posts LIMIT 1 and don't insert anything that has a date earlier than that one.
This solution might have a slight drawback if you expect some of your blogs to update their rss with delay, but keep the past date as their timestamp.
I think you should store the date of the last post you fetched. When you fetch the next time, you can collect only that ones that are newer then the date you stored...
I believe that the usual practice is to work off of the guid element in the RSS feed. This is sometimes the URI of the source article, sometimes a number, sometimes a traditional GUID.
Using this element to see if you have already received an article will negate the need to parse for a date and this is how Google Reader usually determines if an item has already been collected.
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<atom:link href="http://www.stevefenton.co.uk/RSS/Blog/" rel="self" type="application/rss+xml" />
<title>Steve Fenton Blog</title>
<link>http://www.stevefenton.co.uk/RSS/Blog/</link>
<description>Blog</description>
<language>en</language>
<copyright>Copyright 2008 - 2010 Steve Fenton</copyright>
<category>Blog</category>
<generator>Swift Point Content Management System</generator>
<ttl>60</ttl>
<managingEditor>info#stevefenton.co.uk (Site Admin)</managingEditor>
<item>
<title><![CDATA[Jquery Plugin Infinite Scroller With AJAX]]></title>
<link>http://www.stevefenton.co.uk/Content/Blog/Date/201004/Blog/Jquery-Plugin-Infinite-Scroller-With-AJAX/</link>
<description><![CDATA[Friday, 9th April 2010 - Jquery Plugin Infinite Scroller With AJAX <p>I have just finished a new plugin for the jQuery framework.</p><p>The jQuery Infinite Scroller is a great way to deliver a really long list of things, in smaller chunks. For example, if you were displaying articles you could load a page with the first 10 results, then dynamically add more results to the bottom of the list when people start scrolling down. The further they scroll, the more articles you add - thus making it theoretically infinite.</p><p>When the plugin detects that no more results are available, it stops trying to get more items to add.]]> <a href="http://www.stevefenton.co.uk/Content/Blog/Date/201004/Blog/Jquery-Plugin-Infinite-Scroller-With-AJAX">View Details</a>.</description>
<guid>http://www.stevefenton.co.uk/Content/Blog/Date/201004/Blog/Jquery-Plugin-Infinite-Scroller-With-AJAX</guid>
</item>
<item>
<title><![CDATA[Auto Load Your PHP Classes]]></title>
<link>http://www.stevefenton.co.uk/Content/Blog/Date/201004/Blog/Auto-Load-Your-PHP-Classes/</link>
<description><![CDATA[Wednesday, 7th April 2010 - Auto Load Your PHP Classes <p>In PHP5 you can create classes to organise your code and represent objects that you want to pass around. This has long been a feature of other languages and was a fundamentally important step forward for PHP.</p><p>There was one thing, though, that I didn't like about PHP classes. If I wanted to instantiate a new "Customer" or "Product", I had to make sure that I included the PHP file that contained the "Customer" or "Product" class. This meant doing this:</p><p>[[#CODE:php:<br>include_once 'classes/Customer.php';</p>]]> <a href="http://www.stevefenton.co.uk/Content/Blog/Date/201004/Blog/Auto-Load-Your-PHP-Classes">View Details</a>.</description>
<guid>http://www.stevefenton.co.uk/Content/Blog/Date/201004/Blog/Auto-Load-Your-PHP-Classes</guid>
</item>
</channel>
</rss>
Every feed has a unique ID associated with it. You can check that id and store it in database instead of storing the Title.
Try reading the docs from Pubsubhb http://superfeedr.com/documentation#pubsubhubbub
I have about 50 feeds (and growing) that I would like to filter before adding them to Google Reader. Each of the feeds will be filtered for the same keywords. If a keyword match is found, that item will be removed from the feed. Basically I'm just trying to eliminate some noise.
I know I can do this with Yahoo Pipes, but I'm looking for a self-hosted solution.
I'd like to pass a feed to a script on my server. That script will filter out unwanted feed items based on a list of defined keywords. The filtered feed will be the result. I plan to then add the feed to Google Reader.
(BTW, why doesn't Google Reader have filters like Gmail?)
Try using a RSS library like Simplepie. From there, writing the filter logic should be easy-peasy.
Try ReFilter. Looks nice.