Get contents of a div from a URL [duplicate] - php

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
How to implement a web scraper in PHP?
How to parse and process HTML with PHP?
I need to crawl through a page and get the contents of a particular div. I have php and javascript as my two main options. How can it be done?

There are many ways to get the contents of an url:
First Method:
http://simplehtmldom.sourceforge.net/
Simple HTML DOM Parser
Second Method :
<?php
$contents = file_get_contents("http://www.url.com");
$contents = strip_tags($contents, "<div>");
preg_match_all("/<div/>(?:[^<]*)<\/div>/is", $contents, $file_contents);
?>
Third Method:
`You can use jquery like Selectors :`
http://api.jquery.com/category/selectors/

This is quite a basic method to do it PHP and it returns the content in plain text. However you might consider revising the regex for your particular need.
<?php
$link = file_get_contents("http://www.domain.com");
$file = strip_tags($link, "<div>");
preg_match_all("/<div/>(?:[^<]*)<\/div>/is", $file, $content);
print_r($content);
?>

You can use SimpleDomParser as documented here http://simplehtmldom.sourceforge.net/manual.htm
it requires PHP5+ though, but the nice thing is you can find tags on an HTML page with selectors just like jQuery.

Specifically with jQuery, if you have a div like the following:
<div id="cool_div">Some content here</div>
You could use jQuery to get the contents of the div like this:
$('#cool_div').text(); // will return text version of contents...
$('#cool_div').html(); // will return HTML version of contents...
If you're using PHP to generate the content of the page, then you should be able to get a decent handle on the content and manipulate it even before it's returned to the screen and displayed. Hope this helps!

Using PHP, you can try the DOMDocument class and the getElements() function

Related

How to get desire innertext from html tag in simple html dom

I have some text in which there is codes. I want to get last text from the link. here is an example
Some textBeezfeed.cu.ma<br>
another textGoogle.com<br>
I want to get Google.com text from the above code. I have tried and use Simple html dom. Anyway Here is my code
<?PHP
require_once('simple_html_dom.php');
$html = new simple_html_dom();
function tags($ddd){
$bbb=$ddd->find('a',1);
foreach($bbb as $bs){
echo $bs->innertext;
}
}
$html = str_get_html('Some textBeezfeed.cu.ma<br>
another textGoogle.com<br>');
echo tags($html);
?>
I want to get Google.com how to get. Please help me
I strongly recommend you use some external library to parse HTML. Any HTML you need. As you need today or in future needs.
Some very good tools are named inside these stackoverflow post.
I personally use simplehtmldom.sourceforge.net since ages with very good results.

simple html DOM cant see all hrefs

Im trying to retrieve the youtube link of a certain site. But when using the simple html DOM parser it cant find the links im looking for.
$new_html = file_get_html("https://www.bia2.com/video/Amir-Shamloo/Delam-Tange/");
foreach ($new_html->find('href') as $youtube) {
echo $youtube;
}
it should find the link: https://www.youtube.com/watch?v=vJ2aNG0aJPU.
does someone know what the problem is here?
That particular link is inserted via JavaScript via onYouTubeIframeAPIReady("vJ2aNG0aJPU") during the onload event.
SimpleHtmlDom (or any other PHP based HTML parser for that matter) will not execute any JavaScript. They just parse the markup returned by the webserver.
You'd need a scraper capable of executing Javascript before you can scrape it. Or you can match the argument to that function and assemble the link yourself.
On a side note: $new_html->find('href') will try to find any elements named "href", which is obviously wrong. To get all href attributes for any element, you'd have to use *[href] instead.
On another side not: SimpleHtmlDom is a crap library. Consider your options:
How do you parse and process HTML/XML in PHP?

i want to get data from another website and display it on mine but with my style.css

So my school has this very annoying way to view my rooster.
you have to bypass 5 links to get to my rooster.
this is the link for my class (it updates weekly without changing the link)
https://webuntis.a12.nl/WebUntis/?school=roc%20a12#Timetable?type=1&departmentId=0&id=2147
i want to display the content from that page on my website but with my
own stylesheet.
i don't mean this:
<?php
$homepage = file_get_contents('http://www.example.com/');
echo $homepage;
?>
or an iframe....
I think this can be better done using jquery and ajax. You can get jquery to load the target page, use selectors to strip out what you need, then attach it to your document tree. You should then be able to style it anyway you like.
I would recommend you to use the cURL library: http://www.php.net/manual/en/curl.examples.php
But you have to extract part of the page you want to display, because you will get the whole HTML document.
You'd probably read the whole page into a string variable (using file_get_contents like you mentioned for example) and parse the content, here you have some possibilities:
Regular expressions
Walking the DOM tree (eg. using PHPs DOMDocument classes)
After that, you'd most likely replace all the style="..." or class="..." information with your own.

Php file_get_contents() issue

With php file_get_contents() i want just only the post and image. But it's get whole page. (I know there is other way to do this)
Example:
$homepage = file_get_contents('http://www.bdnews24.com/details.php?cid=2&id=221107&hb=5',
true);
echo $homepage;
It's show full page. Is there any way to show only the post which cid=2&id=221107&hb=5.
Thanks a lot.
Use PHP's DomDocument to parse the page. You can filter it more if you wish, but this is the general idea.
$url = 'http://www.bdnews24.com/details.php?cid=2&id=221107&hb=5';
// Create new DomDocument
$doc = new DomDocument();
$doc->loadHTMLFile($url);
// Get the post
$post = $doc->getElementById('opage_mid_left');
var_dump($post);
Update:
Unless the image is a requirement, I'd use the printer-friendly version: http://www.bdnews24.com/pdetails.php?id=221107, it's much cleaner.
You will need to parse the resulting HTML using a DOM parser to get the HTML of only the part you want. I like PHP Simple HTML DOM Parser, but as Paul pointed out, PHP also has it's own.
you can extract the
<div id="page">
//POST AND IMAGE EXIST HERE
</div>
part from the fetched contents using regex and push it on your page...

Creating a personalization engine with php

I am new to php and I want to create an php engine which changes the web content of a webpage with PHP with the use of data in mysql. For example (changing the order of navigation links on a webpage with the order of highest click count) I am not sure how PHP will read the HTML file and change the elements in the HTML file and also output the HTML file with the changes. Is this possible?
I am not quite sure why you would want to generate the html, read it, change it and then output it. It seems to be a lot easier to just generate it the way you want to in the first place.
I am not sure how PHP will read the HTML file and change the elements in the HTML file and also output the HTML file with the changes. Is this possible?
You could use file_get_contents:
$html = file_get_contents($url);
Then use a html-parser like Simple HTML DOM Parser, change what you want to do and output it.
If you want to modify HTML structure, use ganon - HTML DOM parser for PHP
include('path/ganon.php');
// Parse the google code website into a DOM
$html = file_get_dom('http://code.google.com/');
foreach($html('p[class]') as $element) {
echo $element->class, "<br>\n";
}

Categories