What is the use of XML in web-based applications? - php

I was wondering is there any need to use XML for large web projects, say in a social networking site?
Currently am just coding in normal PHP and HTML files. If I use XML files is that going to provide any convenience, like enhancing the processing speed of docs or reduce coding weight?
I don't know XML by now, also tell is it too much different from HTML?

Where HTML has a fixed set of tags with defined meaning, mostly relating to presentation, in XML you can define your own set of tags with meaning particular to your application or domain.
You probably don't need XML to get started building your social networking site, but down the road you could use it to export a user's social graph in a standard and readily processable form.
Do not look to XML for "enhancing processing speed or reducing coding weight." Look to it for standardized data exchange, especially for document-based data. (JSON will tend to work better for purely performance and coding weight goals; XML will work better for document-based data or where industry standard formats can be leveraged.)

XML is the same syntactically to xhtml, basically HTML but with certain extra constraints, it is not used to render web pages if that's what you're asking. (Unless you use XSLT)
Often used in Service Oriented Applications, you can use XML to provide your data to other services, apart from that, it's used in configuration. Imagine XML as a counterpart to JSON.
XML/JSON = Computer to Computer
HTML = Computer to Human

Related

how to get a script tag value with php [duplicate]

I'm looking for a way to make a small preview of another page from a URL given by the user in PHP.
I'd like to retrieve only the title of the page, an image (like the logo of the website) and a bit of text or a description if it's available. Is there any simple way to do this without any external libraries/classes? Thanks
So far I've tried using the DOCDocument class, loading the HTML and displaying it on the screen, but I don't think that's the proper way to do it
I recommend you consider simple_html_dom for this. It will make it very easy.
Here is a working example of how to pull the title, and first image.
<?php
require 'simple_html_dom.php';
$html = file_get_html('http://www.google.com/');
$title = $html->find('title', 0);
$image = $html->find('img', 0);
echo $title->plaintext."<br>\n";
echo $image->src;
?>
Here is a second example that will do the same without an external library. I should note that using regex on HTML is NOT a good idea.
<?php
$data = file_get_contents('http://www.google.com/');
preg_match('/<title>([^<]+)<\/title>/i', $data, $matches);
$title = $matches[1];
preg_match('/<img[^>]*src=[\'"]([^\'"]+)[\'"][^>]*>/i', $data, $matches);
$img = $matches[1];
echo $title."<br>\n";
echo $img;
?>
You may use either of these libraries. As you know each one has pros & cons, so you may consult notes about each one or take time & try it on your own:
Guzzle: An Independent HTTP client, so no need to depend on cURL, SOAP or REST.
Goutte: Built on Guzzle & some of Symfony components by Symfony developer.
hQuery: A fast scraper with caching capabilities. high performance on scraping large docs.
Requests: Famous for its user-friendly usage.
Buzz: A lightweight client, ideal for beginners.
ReactPHP: Async scraper, with comprehensive tutorials & examples.
You'd better check them all & use everyone in its best intended occasion.
This question is fairly old but still ranks very highly on Google Search results for web scraping tools in PHP. Web scraping in PHP has advanced considerably in the intervening years since the question was asked. I actively maintain the Ultimate Web Scraper Toolkit, which hasn't been mentioned yet but predates many of the other tools listed here except for Simple HTML DOM.
The toolkit includes TagFilter, which I actually prefer over other parsing options because it uses a state engine to process HTML with a continuous streaming tokenizer for precise data extraction.
To answer the original question of, "Is there any simple way to do this without any external libraries/classes?" The answer is no. HTML is rather complex and there's nothing built into PHP that's particularly suitable for the task. You really need a reusable library to parse generic HTML correctly and consistently. Plus you'll find plenty of uses for such a library.
Also, a really good web scraper toolkit will have three major, highly-polished components/capabilities:
Data retrieval. This is making a HTTP(S) request to a server and pulling down data. A good web scraping library will also allow for large binary data blobs to be written directly to disk as they come down off the network instead of loading the whole thing into RAM. The ability to do dynamic form extraction and submission is also very handy. A really good library will let you fine-tune every aspect of each request to each server as well as look at the raw data it sent and received on the wire. Some web servers are extremely picky about input, so being able to accurately replicate a browser is handy.
Data extraction. This is finding pieces of content inside retrieved HTML and pulling it out, usually to store it into a database for future lookups. A good web scraping library will also be able to correctly parse any semi-valid HTML thrown at it, including Microsoft Word HTML and ASP.NET output where odd things show up like a single HTML tag that spans several lines. The ability to easily extract all the data from poorly designed, complex, classless tags like ASP.NET HTML table elements that some overpaid government employees made is also very nice to have (i.e. the extraction tool has more than just a DOM or CSS3-style selection engine available). Also, in your case, the ability to early-terminate both the data retrieval and data extraction after reading in 50KB or as soon as you find what you are looking for is a plus, which could be useful if someone submits a URL to a 500MB file.
Data manipulation. This is the inverse of #2. A really good library will be able to modify the input HTML document several times without negatively impacting performance. When would you want to do this? Sanitizing user-submitted HTML, transforming content for a newsletter or sending other email, downloading content for offline viewing, or preparing content for transport to another service that's finicky about input (e.g. sending to Apple News or Amazon Alexa). The ability to create a custom HTML-style template language is also a nice bonus.
Obviously, Ultimate Web Scraper Toolkit does all of the above...and more:
I also like my toolkit because it comes with a WebSocket client class, which makes scraping WebSocket content easier. I've had to do that a couple of times.
It was also relatively simple to turn the clients on their heads and make WebServer and WebSocketServer classes. You know you've got a good library when you can turn the client into a server....but then I went and made PHP App Server with those classes. I think it's becoming a monster!
You can use SimpleHtmlDom for this. and then look for the title and img tags or what ever else you need to do.
I like the Dom Crawler library. Very easy to use, has lots of options like:
$crawler = $crawler
->filter('body > p')
->reduce(function (Crawler $node, $i) {
// filters every other node
return ($i % 2) == 0;
});

What is the relation between PHP and XML?

I'm learning about PHP and web coding.
Specifically, the PHP book I'm using that covers PHP 5.3 (by Matt Doyle and published by Wrox), says:
XML ... lets you create text documents that can hold data in a structured way...
XML isn't really a language but rather a sepcification for creating your own markup languages...
Wikipedia says of XML:
As of 2009, hundreds of XML-based languages have been developed,[8] including RSS, Atom, SOAP, and XHTML. XML-based formats have become the default for many office-productivity tools, including Microsoft Office (Office Open XML), OpenOffice.org and LibreOffice (OpenDocument), and Apple's iWork.[9] XML has also been employed as the base language for communication protocols, such as XMPP.
It sounds like XML is more like a protocol, a standard for allowing compuers to communicate and share information.
So XML is like a grammar I can use to create a markup language, but the language I create only formats data?
I want help defining the relationship between PHP and XML.
When during the processesing of PHP and HTML does XML get parsed?
XML is not a grammar (that's another thing entirely). XML (as the name suggests) is a markup language that essentially defines a set of rules that describe something. The "something" could be a protocol, the structure of a document, or any kind of data. XML is designed to be machine readable and human readable (although in my opinion, with bias towards the former ;)).
XML documents use something called a schema which describes the structure of the XML itself, and so you can validate an XML document against a schema to make sure that it is well-formed.
There is no relation between PHP and XML. XML is something that PHP can consume and produce. There is nowhere during processing that PHP consumes or produces XML unless you explicitly tell PHP to do so.
XML is sometimes used as sort a of "glue" that allows dissimilar or disparate systems to communicate with each other, but even that is just one of its functions. For example, PHP can consume XML produced by a program written in another language entirely, or XML produced by some website. PHP can also produce XML which can then be consumed by a program written in another language, or by some other source. As you found from the Wikipedia article, SOAP uses XML and this allows clients written in different languages to consume data exposed by a SOAP service.
XML gives a starting point for a lot of technologies, particularly web technologies.
While PHP can be used for other things, its origins are in the web and it is still most heavily used there. As such, it would be sorely lacking if it couldn't deal with such a core web technology as XML. Likewise, it has support for other key web technologies like URIs, and those heavily used with the web like streams and database connections.
XML is only a data format specification. It was once very hipped but has somewhat faded in favor of JSON - it is still VERY popular because there are many protocols using it as the data interchange format.
PHP is generic enough (as a programming language) to generate XML as well as any other data format.
Since XML is such an important data format, every respectable programming language is expected to easily consume XML as well, and PHP is no exception.

Is there any reason to use XML/XSLT when using a PHP/MySQL App?

I have been making HTML/PHP/MySQL database apps for quite a while now. I have avoided using XML/XSLT in any application since I just pull the data out and format it within my PHP script, and display it.
Assuming I am not wanting my data to be portable to other people's applications (via XML), is there any reason to implement an XML/XSLT based web app or is it a matter of preference?
Thanks.
I use XML/XSLT as a template engine.
Througout my script, I gather my data as nodes and put them in an XML object. When I need to display data, I feed this XML object to an XSLT and display the result.
It is a matter of preference.
XML/XSLT are useful when transforming XML to multiple other XML formats (rss, xhtml etc...), so if you don't need this kind of functionality, don't go with it.
They also add a cost in complexity and processing power. Again, if you don't need it, don't use them.

XML Content Management System

Just a quick question I know how I would build a cms using a database but why would you want to create a cms with xml?
What are the pros and con's using xml also if I was to build a cms with xml would I need the help of a database of does xml just remove the need of a database?
I havent't seen CMS without a database in a while.
I think most of those were developed because "a long time ago" you didn't always get access to a database when purchasing/renting webspace.
You might be interested in storing your data in a changing format. XML definitely allows that - being able to define your own tags at will is somewhat akin to being able to add and remove columns without migrating data.
XML can remove the usage of a database - but as the size of the XML file grows, lookup and search become ever more costly. For a personal content management system - especially one where you are looking at the beginning of a file in your most common use case - it could be an acceptable solution.
Making a CMS like this would be something like using TiddlyWiki, which is a single html file that hosts an entire wiki.
For even slightly larger scale CMS, I would immediately opt for a database - probably SQLite for smaller scale, because it's the thing to do nowadays.

RSS or XML

I need to get data from a web site written in PHP. My boss wants me to provide an RSS feed to get updated content.
The problem is that I need to provide several informations (at least a dozen different field). Is returning data as XML a better way than RSS?
RSS is a form of XML.
If you find yourself outputting the same sorts of data as what is in the RSS specification, it definately doesn't hurt to output in the RSS spec. That way, you can syndicate your content.
It's really going to depend on what the data is and how it's going to be consumed.
RSS is XML, but it's XML meant to syndicate data with a consistent format (there's a pretty good overview here: http://cyber.law.harvard.edu/rss/rss.html), and that allows feed readers and other consumers to know how to process and display them. So if your boss wants to look at this data in his or her feed reader of choice, then go for RSS.
If the data is more varied or arbitrary, and is going to be consumed by some sort of application or other processor on the other end, then XML is probably a better solution.
RSS is an XML schema that's good for publishing articles, news, bulletins. RSS will be very consumable - every device and app, it seems, knows how to consume RSS.
A custom XML schema may fit you better based on your requirements for all the different fields. But you will not have a vast audience of ready consumers for that schema.
Some questions you might ask yourself:
who do I want to consume this information? A wide variety, basically "everybody"? Or is it going to be consumed by a limited audience, let's say 3 or 4 partner companies. IF the former, that tends to recommend RSS. If the latter, it is neutral.
what information do you really need to convey? Will it fit nicely in an RSS Schema? If not, that recommends a custom XML schema. If it does ffit in RSS, why not use it.
You have to weigh the importance of your answers to get to a conclusion.

Categories