how i can get parts of web pages by php

how i can get parts of web pages by php - php

i want make a news site gets its content from other news sites,
open the rss feed and feach url and open the html dom of the page then
get just the text of the news
i think i have to use the DOMDocument class of the php?
<?php
$doc = new DOMDocument();
$doc->loadHTML("<html><body>Test<br></body></html>");
echo $doc->saveHTML();
?>
http://www.php.net/manual/en/class.domdocument.php

RSS feeds are XML. To get the links here I would use simpleXML. To load the page you can use cURL or HttpRequest.
To analyse the returned code I would use DOMDocument, too! Alternatively you could use simpleHtmlDom.

Related

Php file_get_contents() issue

With php file_get_contents() i want just only the post and image. But it's get whole page. (I know there is other way to do this)
Example:
$homepage = file_get_contents('http://www.bdnews24.com/details.php?cid=2&id=221107&hb=5',
true);
echo $homepage;
It's show full page. Is there any way to show only the post which cid=2&id=221107&hb=5.
Thanks a lot.

Use PHP's DomDocument to parse the page. You can filter it more if you wish, but this is the general idea.
$url = 'http://www.bdnews24.com/details.php?cid=2&id=221107&hb=5';
// Create new DomDocument
$doc = new DomDocument();
$doc->loadHTMLFile($url);
// Get the post
$post = $doc->getElementById('opage_mid_left');
var_dump($post);
Update:
Unless the image is a requirement, I'd use the printer-friendly version: http://www.bdnews24.com/pdetails.php?id=221107, it's much cleaner.

You will need to parse the resulting HTML using a DOM parser to get the HTML of only the part you want. I like PHP Simple HTML DOM Parser, but as Paul pointed out, PHP also has it's own.

you can extract the
<div id="page">
//POST AND IMAGE EXIST HERE
</div>
part from the fetched contents using regex and push it on your page...

get contents from another page using php?

I'm not sure this is possible or not.
I want a php script when executed , it will go to a page (on a different domain) and get the html contents of it and inside the html there's links , and that script is able to get each link's href.
html code:
<div id="somediv">
Yahoo
Google
Facebook
</div>
The output code(which php will echo out) will be
http://yahoo.com
http://google.com
http://facebook.com
I have heard of cURL in php can do something like this but not exactly like this , i'm a bit confused , i hope some can guide me on this.
Thanks.

have a look at something like http://simplehtmldom.sourceforge.net/

Using DOM and XPath:
<?php
$doc = new DOMDocument();
$doc->loadHTMLFile("http://www.example.com/"); // or you could load from a string using loadHTML();
$xpath = new DOMXpath($doc);
$elements = $xpath->query("//div[#id='somediv']//a");
foreach($elements as $elem){
echo $elem->getAttribute('href');
}
BTW: you should read up on DOM and XPath.

Text/html page to XML using proxy php

I have a problem in converting a html page to xml, that I can call a specific tag name and accsess to data in specific tag. I tried with XMLHttpRequest, but doesn't work. Then I tried with XMLHttpRequest responseText and then convert String to XML with DOM parser, but that neither work (errors with parsing).I will need to use php proxy which will convert text to XML and here I need help...
Thanks for answers!

If I got it right you can retrive the HTML with file_get_contents(); and then traverse it with DOMDocument();
Example:
<?php
$doc = new DOMDocument();
$doc->loadHTML(file_get_contents($file));
$elements = $doc->getElementsByTagName('*');
?>

Browsers do a very good job to convert non-xml to DOM.
Load desired document with XMLHttpRequest
Insert responseText into a html div element with elem.innerHTML = xhr.responseText
Access children using DOM API.

How to write this crawler in php?

I need to create a php script.
The idea is very simple:
When I send a link of a blogpost to this php script, then the webpage is crawled and the first image with the title page are saved on my server.
What PHP function I have to use for this crawler ?

Use PHP Simple HTML DOM Parser
// Create DOM from URL
$html = file_get_html('http://www.example.com/');
// Find all images
$images = array();
foreach($html->find('img') as $element) {
$images[] = $element->src;
}
Now $images array have images links of given webpage. Now you can store your desired image in database.

HTML Parser: HTMLSQL
Features: you can get external html file, http or ftp link and parse content.

Well, you'll have to use quite a few functions :)
But I'm going to assume that you're asking specifically about finding the image, and say that you should use a DOM parser like Simple HTML DOM Parser, then curl to grab the src of the first img element.

I would user file_get_contents() and a regular expression to extract the first image tags src attribute.
CURL or a HTML Parser seem overkill in this case, but you are welcome to check it out.

How to get specific content with PHP and DOM Document?

I have a url I want to grab. I only want a short piece of content from it. The content in question is in a div that has a ID of sample.
<div id="sample">
Content
</div>
I can grab the file like so:
$url= file_get_contents('http://www.example.com/');
But how do I select just that sample div.
Any ideas?

I'd recommend using the PHP Simple HTML DOM Parser.
Then you can do:
$html = file_get_html('http://www.example.com/');
$html->find('div[#sample]', 0);

I would recommend something like Simple HTML DOM, although if you are very sure of the format, you may wish to look at using regex to extract the data you want.

A while ago, I released an open source library named PHPPowertools/DOM-Query, which allows you to (1) load an HTML file and then (2) select or change parts of your HTML much like you'd do it with jQuery.
Using that library, here's how you'd select the sample div for your example :
use \PowerTools\DOM_Query;
// Get file content
$htmlcode = file_get_contents('http://www.example.com/');
// Create a new DOM_Query object
$H = new DOM_Query($htmlcode);
// Find the elements that match selector "div#sample"
$s = $H->select('div#sample');

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

how i can get parts of web pages by php - php

RSS feeds are XML. To get the links here I would use simpleXML. To load the page you can use cURL or HttpRequest. To analyse the returned code I would use DOMDocument, too! Alternatively you could use simpleHtmlDom.

Related

Php file_get_contents() issue

get contents from another page using php?

Text/html page to XML using proxy php

How to write this crawler in php?

How to get specific content with PHP and DOM Document?

Categories

Resources