What RSS parser should I use in PHP? - php

I am searching an RSS parser written in PHP. The problem is not that I cannot find one. The problem is that there are too many and it's hard to decide which one to use (especially when I have no experience with them and to try them is too time consuming).
Can anybody recommend me a "good" RSS parser?
The following requirements are important to me (given in order of importance):
It's able to extract all information given in the feed (not only title, description and link but everything what is there, for example feeds author, feeds icon, items tags and so on).
It should be able to read not only RSS feeds but also Atom feeds.
It should be tolerant to "broken" RSS (Atom) feeds.
It should be simple to use.

My defacto answer will be "have you tried SimplePie?", it's a very good XML parser but you'll have to have a look at their demo to see how it handles broken feeds :-)

In addition to SimplePie already mentioned, there is Zend_Feed (which can be used standalone) and since this is XML anyway, you can also use any of the native XML extensions, like DOM or XMLReader.

Related

Good, solid documentation of PHP DOM

I've been trying to do some simple DOM parsing of HTML documents and am really shocked at how difficult it is to do.
I've looked into some of the many alternatives to PHP's DOM classes (like simple xml parser and simple HTML DOM). I found a very effective dom2array function too, which is useful for extremely basic parsing where you just want raw values of elements.
None of these alternatives is really compelling though.
PHP documentation of the DOM is typically lacking in detail and largely useless. A lot of the comments are actually really helpful though.
The tutorials I've found online typically cover only the very very basics like writing a 20 line XML document or parsing all the p tags in a document. Meh.
Are there any sites (or books) that go into detail specifically on working with the DOM using PHP's DOM libraries?
The DOM is a language-independent interface and documented in detail by the W3C.
That being said, if your aim is extremely simple parsing of (typically) structured information, XML may not be the correct format in the first place; XML includes a variety of advanced features (namespaces, DTDs, XSLT, distinction between attributes and text, markup instead of structured information). If that's the case, consider JSON, which is extremely easy to parse and generate.
Anything that says "DOM" in the name or claims to support it should support the DOM API as defined by the W3C, and you should consider their documentation normative for everything but the language-specific parts.
I should have titled my post, "Easiest way to parse HTML DOM in PHP". 'Easiest' is not a very good word, I know. It's all relative to what you're trying to do. What I'm doing is pretty straight-forward. I want to parse standalone HTML documents and present the content in a different context.
These are the things I wanted to do:
Parse basic properties like title and body
Alter all file references (images, links, css, js) to point to a valid location
Add/remove attributes from tags (dealing with 1995 HTML here)
Strip inline styles
I ended up going with Simple HTML DOM Parser
It has a very small learning curve and gives easy read/write access to the DOM. End of story. It does seem to choke on nested elements sometimes though.

Automatically generate RSS feeds

I have information stored in a database that I want to use to create RSS feeds.
What is the best way to do this?
Also, are there any PHP library/functions that I can pass the data to and they will take care of ensuring that any characters that need to be encoded/stripped are dealt with?
PHP Universal Feed Generator is the one you are looking for.
It supports RSS 1.0, RSS 2.0 and ATOM
If you know how to dynamically create an XML, it's pretty much the same, you just need to look on way to format an RSS, and off you go.
After you created the rss - you can validate it here:
http://validator.w3.org/feed/
Here is a short wiki article on how it's supposed to be formatted: http://en.wikipedia.org/wiki/Rss
I prefer the Zend_Feed component, which is part of Zend Framework. Just have a look at Zend_Feed_Writer in the Reference Guide, to see how to export data as a feed.
http://careers.stackoverflow.com/jobs/feed
Just look at this RSS-example (right click for Source Code). It's a functional and used RSS and all you really need is to create a HTML-similiar page with dynamic data yourself.
EDIT:
I personally don't see the point of using a plugin for this. It's so similiar to HTML that you may aswell just create it with given tags in above example.

Importing /scraping page content form other sites?

i've been playing with php and also http://www.alchemyapi.com/, and embed.ly
but i was wondering if there other options out there to import and parse a webpage, any page, either is a news site or a blog...
thanks
To fetch the data: curl, file_get_contents (may be others those are the two common)
To parse the data: PHP: DOM, SimpleXML preg_match**
Since it was tagged with PHP, I only gave working information for PHP. There are tons of ways to do this, if you can narrow your question down to what you are trying to do it would help. The better ways to parse any site, is through their RSS feed if they have one, or through their API, speculating that they offer up the content you want via RSS/API.
** preg_match is not a great alternative it does "work" but better to use the DOM / Simple XML functions if possible.
I wrote a crawler at work using cURL and preg_match
Before I chose to do it that way, I had looked at DOM Parsers http://php.net/manual/en/book.dom.php

PHP: RSS parsing

I want to parse a websites news section. It has a RSS subscribe button but the outlook looks odd and I'm not sure how to parse it.
http://www.networkroi.co.uk/DesktopModules/ArticleManager/ArticleRss.aspx?id=324&pid=0
It's not in XML which would have been a lot easier.
Here is the news page with that link on it - http://www.networkroi.co.uk/News/tabid/99/Default.aspx
I would like to parse it with PHP if possible, though I really just want to dislay the info as it looks there..
Any help most appreciated
Jonesy
Take a look at excellent SimplePie class for parsing rss feeds with PHP.
SimplePie is a very fast and
easy-to-use class, written in PHP,
that puts the 'simple' back into
'really simple syndication'. Flexible
enough to suit beginners and veterans
alike, SimplePie is focused on speed,
ease of use, compatibility and
standards compliance.

Multiple REST Feeds to MYSQL Database - Using PHP

I've got a number of REST feeds I'd like to store in a MYSQL database, can anyone suggest a solution for this? Something PHP related appreciated....
It's not PHP related, but PERL has both a REST interface and a DBI interface (for interfacing with MYSQL).
http://metacpan.org/pod/WWW::REST
There are many other REST interfaces for Google, Twitter, etc. Just search CPAN modules at search.cpan.org
To my knowledge there is no such thing as a REST feed. There are RSS feeds and Atom feeds, so I will assume you are talking about one of those.
Both are based on XML so I suggest you find an XML parser for PHP and do an HTTP request to get the feed contents, parse the XML into a DOM and then copy the DOM data into MYSQL!
I'm not sure how to be more precise.
Are you looking for someone to write the code?
Ok, I'm assuming you are talking about "RSS" feeds. Here's a great opensource library that makes it easy -- http://simplepie.org/ . Point it at an RSS or Atom feed, it will give you back PHP arrays and objects. From there you can interpret them and save them any way you want.
Depending on what you actually want to do with the database, you could use RSS as an XML clob format. Not fast, but easy. Again, it totally depends on what you want to do with the database.

Categories