I already made a simple RSS reader, but it only gets me like 25 articles. How do I make it to work like feedly.com or digg.com, so that it retrieves me many more feeds, and not only 25?
The php code I have:
$rss = simplexml_load_file('http://www.elespectador.com/rss.xml');
I already know how to retrieve the title, description, etc. of each item.
Pagination in feeds is arbitrary and you'll have trouble finding a consistent pattern. You should store any data so that now you have 25 elements, but when new ones are added, you keep adding more and more.
Another solution is to use the data from a service like Superfeedr (I created it!) which stores past content for milions of feeds.
Related
There is something I am trying to accomplish although I'm not really sure where to start.
I currently have a MySql database with a list of articles. The DB contains the article title, content, and some other info like dates, etc.
There is an RSS feed that we monitor for new articles, it's a Google Alert feed that just contains the latest news on certain subjects. I want to be able to automatically monitor this feed and record any feed items that are similar to stories currently in our DB.
I know how to set a script to run automatically, and I know how to parse the RSS feed with SimplePie.
What I need to figure out is how to take the description of the rss feed items, run a check on our DB to see if the feed item is similar to something we have in our DB, and return a numerical score of some sort, sort of like a "similarity rating" or something.
After that I can have the info I need recorded to the DB if the "similarity rating" is above a set limit, which I know how to do.
So my only issue is how to compare each feed item to our current articles, and return a score based on how similar it is.
The Levenshtein function (available for both PHP and MySQL) is a good way to handle this. It basically calculates a value based on the number of permutations (replacements, moves, etc) required to convert one string to another. That score would be your "similarity rating".
EDIT: the Levenshtein function is not available natively in MySQL but there are SQL implementations of it that you can use such as: http://kristiannissen.wordpress.com/2010/07/08/mysql-levenshtein/
I am trying to create multiple landing pages populated dynamically with data from a feed.
My initial thought was to create a generic php page as a template that can be used to create other pages dynamically and populate them with data from a feed. For instance, the generic page could be called landing.php; then populate that page and other pages created on the go with data from a feed depending on an id, keyword or certain string in the url. e.g http://www.example.com/landing.php?page=cars or http://www.example.com/landing.php?page=bikes will show contents that are either only about cars or bikes as the case may be.
My question is how feasible is this approach and is there a better way to create multiple dynamic pages populated with data from a feed depending on the url query string or some sort of id.
Many thanks for your help in advance.
I use this quite extensively. For example, where I work, we often have education oriented landing pages, but target each landing page to different types of visitors. A good example may be arts oriented schools looking for a diverse array of potential students who may be interested in a variety of programs for any number of reasons.
Well, who likes 3d Modelling? Creative types (Generic lander => ?type=generic) from all sorts of social circles. Also, probably gamers (Gamer centric lander => ?type=gamer). So on.
I apply that variable to the body's class, which can be used to completely reorganize the layout. Then, I select different images for key parts of the layout based on that variable as well. The entire site changes. Different fonts can be loaded, different layout, different content.
I keep this organized via extensive includes. This sounds ugly, but it's not if you stick to a convention. You have to know the limitations of your foundation html, and you can't make too many exceptions. Sure, you could output extra crap based on if the type was gamer or generic, but you're going down the road to a product that should probably be contained in its own landing page if it needs to be that different.
I have a few landing pages which can be switched between several contents and styles (5 or 6 'themes'), but the primary purpose of keeping them grouped within the same url is only to maintain focus on the fact that that's where a certain type of traffic goes to in order to convert for this specific thing. Overlapping the purpose of these landing pages is a terrible idea.
Anyway, dream up a great template, outline a rigid convention for development, keep each theme very separate, and go to town on it. I find doing it right saves a load of time, but be careful - Doing it wrong can cost a lot of time too.
Have a look at htaccess URL Rewrite. Then your user (and google) can use a url like domanin.com/landing/cars but on your server the script will be executed as if someone entered domain.com/landing.php?page=cars;
If you use feed content to populate the pages you should use some kind of caching to ensure that you do NOT reload all feed on every requests/reloads the page.
Checking the feeds every 1 to 5 minutes should be enough and the very structure of feeds allows you to identify new items easily.
About URL rewrite: http://www.workingwith.me.uk/articles/scripting/mod_rewrite
A nice template engine for generating pages from feets is phptal (http://phptal.org)
You can load the feet as xml and directly use it in your template.
test.xml:
<foo><bar>baz!!!</bar></foo>
template.html:
<html><head /><body> ${xml/foo/bar}</body></html>
sample.php:
$xml = simplexml_load_file('test.xml');
$tal = new PHPTAL('template.html');
$tal->xml = $xml;
echo $tal->execute();
And it does support loops and conditional elements.
If you are not needing real time data then you can do this in a few parts
A script which pulls data from your rss feeds and stores the data somewhere (sql db?), timed by something like cron. It could also tag the entries into categories.
A template in php taking the url arguments and then adding the requested data and displaying it for the user. Really quite easy to do with php, probably a good project to use to learn as well if you are that way inclined
I am creating a PHP app that will display some classifieds/listings based on user location. For eg:
Our classifieds from Chicago:
Classified 1
Classified 2
Classified 3
now, I also want to display "classifieds" from some other classified sites into my own page. Like this:
More Classifieds from Chicago (courtsey of XYZ.com)
Classified 1
Classified 2
Classified 3
Classified 4
More Classifieds from Chicago (courtsey of ABC.com)
Classified 1
Classified 2
Classified 3
This way, user can see classifieds hosted on my server AND as well as classifieds from other common classified sites.
Is it possible this? Note that 1) there are no "RSS" feeds available for importing these classifieds; and 2)if possible I'd like to show these lists in widget format. That is display a iframe/widget box (not sure what the technical term is) and display all external-classifieds in that box.
See a rough mockup here: http://i.imgur.com/O19MR.jpg
I was thinking I could load the other classified sites into "iframes" but then I'd get the whole site (including their header/footer, logo etc.). I just want some relevant "classified" section from their site.
You want to look into doing some screen scraping through a spider and parser setup. You can use CURL or file_get_contents to bring in the web page, then use regular expressions and string operators to filter out the data you want, then build a page to display it. This is an overly simplified version of the full answer, but if i gave you the 100's of lines of code to complete this, that would be cheating!
Given the lack of API or feed, the only thing I can think of is to have to pull the relevant URLs and scrape the data from them. It should be pretty simple with a mix of file_get_contents and DOMDocument to parse the data, as long as the markup is tidy.
The best option i can think is set up a web crawler asynchronous that fetches the data from those sites.
You could set it up to crawl every day at 00:00 and store the content in your database, something like:
external_classified
id
site_source
city_id
extra_data
After that you could get it from your PHP app with no problems.
EDIT: Note that the solution i'm thinking is asynchronous! The other answers use an synchronous action to get the data. I think it's a waste of time to fetch the same classifies over and over again. Although, to be fair, those solutions are simpler to implement.
I am writing some code to fetch news from rss feed and publish n items at once every m hours to another site.
I compare the update xml file with the previous one saved on server using PHP.
I load the two xml into php array and the latest post is filter out using array_diff_assoc().
If the number of the latest post>n, the older one will be publish first, the rest will be done next time. Therefore I need some ways to store which item have publish or not.
What is the simplest way to do so? I don't want to apply mySQL/S for such a simple task.
Can't you just store those not published? Then each time, pull up the old, stored ones, and append to the list those new ones ID'd by array_diff_assoc(). Publish n, and if number > n, store the new list of unpublished ones.
As to how to store them, I'm not a PHP programmer, but what about using PHP's serialize and unserialize functions? In python, I'd use the pickle module if I had to store data objects of some type, and I understand those are the PHP equivalent.
I have about 50 feeds (and growing) that I would like to filter before adding them to Google Reader. Each of the feeds will be filtered for the same keywords. If a keyword match is found, that item will be removed from the feed. Basically I'm just trying to eliminate some noise.
I know I can do this with Yahoo Pipes, but I'm looking for a self-hosted solution.
I'd like to pass a feed to a script on my server. That script will filter out unwanted feed items based on a list of defined keywords. The filtered feed will be the result. I plan to then add the feed to Google Reader.
(BTW, why doesn't Google Reader have filters like Gmail?)
Try using a RSS library like Simplepie. From there, writing the filter logic should be easy-peasy.
Try ReFilter. Looks nice.