I need to get data from a web site written in PHP. My boss wants me to provide an RSS feed to get updated content.
The problem is that I need to provide several informations (at least a dozen different field). Is returning data as XML a better way than RSS?
RSS is a form of XML.
If you find yourself outputting the same sorts of data as what is in the RSS specification, it definately doesn't hurt to output in the RSS spec. That way, you can syndicate your content.
It's really going to depend on what the data is and how it's going to be consumed.
RSS is XML, but it's XML meant to syndicate data with a consistent format (there's a pretty good overview here: http://cyber.law.harvard.edu/rss/rss.html), and that allows feed readers and other consumers to know how to process and display them. So if your boss wants to look at this data in his or her feed reader of choice, then go for RSS.
If the data is more varied or arbitrary, and is going to be consumed by some sort of application or other processor on the other end, then XML is probably a better solution.
RSS is an XML schema that's good for publishing articles, news, bulletins. RSS will be very consumable - every device and app, it seems, knows how to consume RSS.
A custom XML schema may fit you better based on your requirements for all the different fields. But you will not have a vast audience of ready consumers for that schema.
Some questions you might ask yourself:
who do I want to consume this information? A wide variety, basically "everybody"? Or is it going to be consumed by a limited audience, let's say 3 or 4 partner companies. IF the former, that tends to recommend RSS. If the latter, it is neutral.
what information do you really need to convey? Will it fit nicely in an RSS Schema? If not, that recommends a custom XML schema. If it does ffit in RSS, why not use it.
You have to weigh the importance of your answers to get to a conclusion.
Related
I was wondering is there any need to use XML for large web projects, say in a social networking site?
Currently am just coding in normal PHP and HTML files. If I use XML files is that going to provide any convenience, like enhancing the processing speed of docs or reduce coding weight?
I don't know XML by now, also tell is it too much different from HTML?
Where HTML has a fixed set of tags with defined meaning, mostly relating to presentation, in XML you can define your own set of tags with meaning particular to your application or domain.
You probably don't need XML to get started building your social networking site, but down the road you could use it to export a user's social graph in a standard and readily processable form.
Do not look to XML for "enhancing processing speed or reducing coding weight." Look to it for standardized data exchange, especially for document-based data. (JSON will tend to work better for purely performance and coding weight goals; XML will work better for document-based data or where industry standard formats can be leveraged.)
XML is the same syntactically to xhtml, basically HTML but with certain extra constraints, it is not used to render web pages if that's what you're asking. (Unless you use XSLT)
Often used in Service Oriented Applications, you can use XML to provide your data to other services, apart from that, it's used in configuration. Imagine XML as a counterpart to JSON.
XML/JSON = Computer to Computer
HTML = Computer to Human
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
There are many websites and blog which provide RSS feeds, but on the other hand there are also many which do not. I want to turn that type of web page into RSS feeds.
I found some solutions using through Google like Feed43, Page2rss, Dapper etc, but I want an Open Source project which can perform this task or any tutorial explaining about it.
Please give me suggestions and if you can explain, you are most welcome.
My preferable language is PHP.
There's nothing magic about RSS. I suggest you read this tutorial to understand how to build an RSS feed from scratch:
http://www.xul.fr/en-xml-rss.html
Then use your PHP skills to build one from your content. A generic HTML-to-RSS scraper can be found online by searching for "html to rss converter" or whatever, but most of these will be hosted solutions and the RSS feeds they produce aren't that great. A good RSS feed requires understanding the content that you're syndicating, not just the raw HTML. IMHO.
In general there is not going to be any "one size fites all" solution to something like this. You'll have to examine the HTML structure of the blog you want to build an RSS feed from, then parse out the content you are interested in, and stick it into an RSS feed.
Here's some PHP things to help get you started:
Parsing HTML:
DOMDocument (swiss-army-knife of HTML/XML parsing)
SimpleXML (easy to use, but requires valid XML)
Tidy (can be used to clean up bad HTML)
Understanding RSS Feeds:
http://en.wikipedia.org/wiki/RSS
To construct them with PHP, you can once again use DOMDocument or SimpleXML. Another option is, depending on the format of the HTML you want to convert into RSS, you may be able to create an XSLT stylesheet to transform it.
There is no simple or concrete answer to this question, but I will get you started.
First, you need to build a crawler of sorts. Typically, you are going to want this to be multi-threaded and run in the background on your server. This might be as simple as forking PHP processes on the server, but you might find a more efficient way, depending on how much traffic you expect.
Now probably the best way to start would be to read the DOM. See http://php.net/manual/en/class.domdocument.php Look for headings and try to associate them with the paragraphs below them. Beware though that probably less than half the sites out there (and likely far fewer from the ones that don't already have a feed) don't structure their site in an organized way. But, it is a place to start.
There are plenty of element attributes too you can use, such as alt text. Also, in time you may find a lot of sites using a particular template that you can write code to handle directly.
You should also have something to read existing feeds. If a site has a feed, no sense in generating one for it, right? Use SimplePie to get started, but there are alternatives you don't like it. http://simplepie.org/
Once you have parsed the page, you'll want a database backend to track it and changes and what not.
From there, you need something to generate the feed. There are plenty of OOP classes for doing this. Often times, I just write my own, but that is up to you.
If you build sites with the simple symphony cms then yes, its very easy. See this snippet of a tutorial. Learn here
I have been making HTML/PHP/MySQL database apps for quite a while now. I have avoided using XML/XSLT in any application since I just pull the data out and format it within my PHP script, and display it.
Assuming I am not wanting my data to be portable to other people's applications (via XML), is there any reason to implement an XML/XSLT based web app or is it a matter of preference?
Thanks.
I use XML/XSLT as a template engine.
Througout my script, I gather my data as nodes and put them in an XML object. When I need to display data, I feed this XML object to an XSLT and display the result.
It is a matter of preference.
XML/XSLT are useful when transforming XML to multiple other XML formats (rss, xhtml etc...), so if you don't need this kind of functionality, don't go with it.
They also add a cost in complexity and processing power. Again, if you don't need it, don't use them.
Just a quick question I know how I would build a cms using a database but why would you want to create a cms with xml?
What are the pros and con's using xml also if I was to build a cms with xml would I need the help of a database of does xml just remove the need of a database?
I havent't seen CMS without a database in a while.
I think most of those were developed because "a long time ago" you didn't always get access to a database when purchasing/renting webspace.
You might be interested in storing your data in a changing format. XML definitely allows that - being able to define your own tags at will is somewhat akin to being able to add and remove columns without migrating data.
XML can remove the usage of a database - but as the size of the XML file grows, lookup and search become ever more costly. For a personal content management system - especially one where you are looking at the beginning of a file in your most common use case - it could be an acceptable solution.
Making a CMS like this would be something like using TiddlyWiki, which is a single html file that hosts an entire wiki.
For even slightly larger scale CMS, I would immediately opt for a database - probably SQLite for smaller scale, because it's the thing to do nowadays.
I've got a number of REST feeds I'd like to store in a MYSQL database, can anyone suggest a solution for this? Something PHP related appreciated....
It's not PHP related, but PERL has both a REST interface and a DBI interface (for interfacing with MYSQL).
http://metacpan.org/pod/WWW::REST
There are many other REST interfaces for Google, Twitter, etc. Just search CPAN modules at search.cpan.org
To my knowledge there is no such thing as a REST feed. There are RSS feeds and Atom feeds, so I will assume you are talking about one of those.
Both are based on XML so I suggest you find an XML parser for PHP and do an HTTP request to get the feed contents, parse the XML into a DOM and then copy the DOM data into MYSQL!
I'm not sure how to be more precise.
Are you looking for someone to write the code?
Ok, I'm assuming you are talking about "RSS" feeds. Here's a great opensource library that makes it easy -- http://simplepie.org/ . Point it at an RSS or Atom feed, it will give you back PHP arrays and objects. From there you can interpret them and save them any way you want.
Depending on what you actually want to do with the database, you could use RSS as an XML clob format. Not fast, but easy. Again, it totally depends on what you want to do with the database.