How to analyze HTML Page using PHP? - php

Hi I want to analyze HTML page by following example.
I can put url page to textfield then press enter.
I get some text from HTML page such as title, h1, div id="do" or so on.
How I can do it by using PHP?
Thanks!

With file_get_contents() or the like and an HTML parser.

You want to use PHP's curl functions, here's a quick tutorial.

Use file_get_contents or cURL, then use the explode functions to get data from the tags if using file_get_contents OR use "regex" for the particular html tag for getting the data

Related

How to get dom with headless-chromium-php

I'm using headless-chromium-php, but getHtml function seems to get the html source code.
https://github.com/chrome-php/chrome#get-the-page-html
Instead, I want to get the DOM displayed in the chrome browser.
so, How can i do it
I want to get the html source after browser rendering.
As you surmise, you need to wait for the page to finish loading, including any javascript rendering; have a look at the example earlier on in that documentation
[https://github.com/chrome-php/chrome#evaluate-script-on-the-page] to get the inner html.

simple html DOM cant see all hrefs

Im trying to retrieve the youtube link of a certain site. But when using the simple html DOM parser it cant find the links im looking for.
$new_html = file_get_html("https://www.bia2.com/video/Amir-Shamloo/Delam-Tange/");
foreach ($new_html->find('href') as $youtube) {
echo $youtube;
}
it should find the link: https://www.youtube.com/watch?v=vJ2aNG0aJPU.
does someone know what the problem is here?
That particular link is inserted via JavaScript via onYouTubeIframeAPIReady("vJ2aNG0aJPU") during the onload event.
SimpleHtmlDom (or any other PHP based HTML parser for that matter) will not execute any JavaScript. They just parse the markup returned by the webserver.
You'd need a scraper capable of executing Javascript before you can scrape it. Or you can match the argument to that function and assemble the link yourself.
On a side note: $new_html->find('href') will try to find any elements named "href", which is obviously wrong. To get all href attributes for any element, you'd have to use *[href] instead.
On another side not: SimpleHtmlDom is a crap library. Consider your options:
How do you parse and process HTML/XML in PHP?

i want to get data from another website and display it on mine but with my style.css

So my school has this very annoying way to view my rooster.
you have to bypass 5 links to get to my rooster.
this is the link for my class (it updates weekly without changing the link)
https://webuntis.a12.nl/WebUntis/?school=roc%20a12#Timetable?type=1&departmentId=0&id=2147
i want to display the content from that page on my website but with my
own stylesheet.
i don't mean this:
<?php
$homepage = file_get_contents('http://www.example.com/');
echo $homepage;
?>
or an iframe....
I think this can be better done using jquery and ajax. You can get jquery to load the target page, use selectors to strip out what you need, then attach it to your document tree. You should then be able to style it anyway you like.
I would recommend you to use the cURL library: http://www.php.net/manual/en/curl.examples.php
But you have to extract part of the page you want to display, because you will get the whole HTML document.
You'd probably read the whole page into a string variable (using file_get_contents like you mentioned for example) and parse the content, here you have some possibilities:
Regular expressions
Walking the DOM tree (eg. using PHPs DOMDocument classes)
After that, you'd most likely replace all the style="..." or class="..." information with your own.

Save contents of iFrame to html file?

I am building a rich text editor using JavaScript and an editable iFrame, and I am wondering how I can use php to save the contents of the iframe to an html file.
WARNING The link http://simple.procoding.net/2008..... provided by iWantSimpleLife generates a Javascript Obfuscation alert. That site may be infected.
Correct me if I'm wrong, but isn't an iframe basically a page you're retrieving ?
If so, then file_get_contents ( http://php.net/manual/fr/function.file-get-contents.php ) should do the trick.
First, With Javascript copy the contents of the Html text of the iframe and then paste them to text in a textarea.
Then submit the text area contents to PHP script that safe it in a text file with html extension on your webserver.
Recall file system manual of PHP
http://www.php.net/manual/en/ref.filesystem.php
Use javascript to get the content. http://simple.procoding.net/2008/03/21/how-to-access-iframe-in-jquery/
This took me about 5 minutes to Google.

Getting the first URL of an image search result with google image API in PHP

did you know a php script (a class will be nice) who get the url of the first image result of a google api image search? Thanks
Example.
<?php echo(geturl("searchterm")) ?>
I have found a solution to get the first image from Google Image result using Simple HTML DOM as Sarfraz told.
Kindly check the below code. Currently it is working fine for me.
$search_keyword=str_replace(' ','+',$search_keyword);
$newhtml =file_get_html("https://www.google.com/search?q=".$search_keyword."&tbm=isch");
$result_image_source = $newhtml->find('img', 0)->src;
echo '<img src="'.$result_image_source.'">';
You should be able do that easily with Simple HTML DOM.
Note: See the examples on their site for more information.
A HTML DOM parser written in PHP5+ let you manipulate HTML in a very easy way!
Find tags on an HTML page with selectors just like jQuery.

Categories