PHP convert string to html and parse the html file - php

I would like to parse an HTML file in order to extract some information.
My code is:
$url = 'http://localhost/myFiles/';
$response = file_get_contents($url);
$html = new simple_html_dom();
$html->load_file($response);
if (!empty($html)) {
foreach($html->find('tr td a') as $a) {
echo $a->href.", ";
}
}
As I can see, $response is a string and not an html file. That's why I get error message: Call to a member function find() on a non-object.

You can choose to load htmls instead of contents as follows
$url = 'http://localhost/myFiles/';
$html = file_get_html($url);
foreach($html->find('tr td a') as $a) {
echo $a->href.", ";
}

Related

Xpath Contains value or not PHP if / else condition

Trying to find contents inside any website or not. If content present then needs to do something else some other thing.
Xpath | PHP
<?php
//Load the HTML page
$html = file_get_contents('http://www.example.com/');
//Parse it. Here we use loadHTML as a static method
//to parse the HTML and create the DOM object in one go.
#$dom = DOMDocument::loadHTML($html);
//Init the XPath object
$xpath = new DOMXpath($dom);
$vals = $xpath->query( '//script[not[contains(text(), "sample")]]' );
if (($vals) > 0) {
echo 'finding text, not in the website';
} else {
echo "Test";
}
?>
I am not able to explore how to do the if else condition for this.
you can use count()
if ((count($vals)) > 0) {
echo 'finding text, not in the website';
} else {
echo "Test";
}

Redirect output intended for web page to file

I want to save the html code for a webpage generated in php. The basic idea I am trying is:
ob_start (); $buffered = true;
fclose (STDOUT);
STDOUT = fopen ("Test/test.htm","wb");
$site_name = 'Cory';
chdir ('..');
include ($site_name.'_V'.$v_defs["sv"].'/Begin.php'); // Draws website
fclose (STDOUT);
I have tried all the variations I can think of the fopen command but I always get a parse error on it.
i use this functions to get the outer html of a page:
function get_inner_html( $node ) {
$innerHTML= '';
$children = $node->childNodes;
foreach ($children as $child) {
$innerHTML .= $child->ownerDocument->saveXML( $child );
}
return $innerHTML;
}
function get_html_table($link,$element,$class){
//$data=date("Y-m-d");
$html = file_get_contents($link); //get the html returned from the following url
$poke_doc = new DOMDocument();
libxml_use_internal_errors(false); //disable libxml errors
libxml_use_internal_errors(true);
if(!empty($html)){ //if any html is actually returned
$poke_doc->loadHTML($html);
libxml_clear_errors(); //remove errors for yucky html
$poke_xpath = new DOMXPath($poke_doc);
$poke_type = $poke_xpath->query("//".$element."[#class='".$class."']");
$table = "<table>";
foreach($poke_type as $type){
$table .= get_inner_html($type);
}
$table .= "</table>";
return $table;
}
step 2 :
echo get_html_table('https://www.link.com','table','class');
//first parameter is the link,second: type of dom element, last is the class
$content=get_html_table('https://www.link.com','table','class');
after this you can save the $content into a file

Get all value of h1 tag

I want to get all value of h1 tag and I found this article: getting all values from h1 tags using php
Then I tried to using:
<?php
include "functions/simple_html_dom.php";
function getTextBetweenTags($string, $tagname) {
// Create DOM from string
$html = str_get_html($string);
$titles = array();
// Find all tags
foreach($html->find($tagname) as $element) {
$titles[] = $element->plaintext;
}
return $titles;
}
echo getTextBetweenTags("http://mydomain.com/","h1");
?>
But it's not running and I get:
Notice: Array to string conversion in C:\xampp\htdocs\checker\abc.php
on line 14 Array
Plz help me fix it. I want to get all h1 tag of a website with input data is URL of that website. Thanks so much!
You're trying to echo an array, which will result into an error. And the function is little bit off. Example:
include 'functions/simple_html_dom.php';
function getTextBetweenTags($url, $tagname) {
$values = array();
$html = file_get_html($url);
foreach($html->find($tagname) as $tag) {
$values[] = trim($tag->innertext);
}
return $values;
}
$output = getTextBetweenTags('http://www.stackoverflow.com/', 'h1');
echo '<pre>';
print_r($output);

Simple HTML Dom PHP RECURSION Error in return value

I am using Simple HTML Dom, trying to get strings from a website. When I print out $title[0] within the function it shows just one string, but when I safe it in the return array and print out the return value, I receive a never ending text with RECURSION.
I don't understand why it would work with the second variable $oTitle.
<?php
include 'scripts/simple_html_dom.php';
function getDetails($id) {
$url = "http://www.something.com";
$html = file_get_html ( $url );
$title = $html->find('span[itemprop=name]');
print_r($title[0] . PHP_EOL); //prints out the correct title
$oTitle = "Something"; //there is also code for this variable but it works as it should
$details = array("Title" => $title[0], "Original Title" => $oTitle);
return $details;
flush ();
}
$values = getDetails($number);
print_r($values); //code breakes here
?>
Take a look at this page: http://simplehtmldom.sourceforge.net/
As I can see, you're using this parser.
In order to get HTML content you should use something like this:
// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');
// Find all images
foreach($html->find('img') as $element)
echo $element->src . '<br>';
// Find all links
foreach($html->find('a') as $element)
echo $element->href . '<br>';
In order to drop content, you should use something like this:
// Dump contents (without tags) from HTML
echo file_get_html('http://www.google.com/')->plaintext;
Try this code:
<?php
include 'simple_html_dom.php';
function getDetails() {
$url = "http://www.godaddy.com";
$html = file_get_html ( $url );
$title = getTitle($url);
echo $title; //prints out the correct title
$oTitle = "Something"; //there is also code for this variable but it works as it should
$details = array("Title" => $title, "Original Title" => $oTitle);
return $details;
flush ();
}
function getTitle($Url){
$str = file_get_contents($Url);
if(strlen($str)>0){
preg_match("/\<title\>(.*)\<\/title\>/",$str,$title);
return $title[1];
}
}
$values = getDetails();
print_r($values); //code breakes here
?>

Html in a DOMElement with ZF

I'm using Zend Framework with Zend_Dom_Query to get a page and find in paragraphs.
Here my source code :
$dom = new Zend_Dom_Query($newsData);
$content = '';
$results = $dom->query('p');
foreach ($results as $result) {
$content .= $result->nodeValue;
}
With that, if the paragraph contains others html elements, they're deleted.
For example, if the code is : <p>My link</p>, the nodeValue (or textContent) is My link and not My Link.
How can I keep the html in the content of a DOMElement ?
Thank you
class IndexController extends Zend_Controller_Action
{
function getInnerHTML($Node)
{
$Document = new DOMDocument();
$Document->appendChild($Document->importNode($Node,true));
return $Document->saveHTML();
}
function domAction ()
{
$this->_helper->ViewRenderer->setNoRender ();
$newsData = '<body><p>My link</p></body>';
$dom = new Zend_Dom_Query($newsData);
$content = '';
$results = $dom->query('p/*');
foreach ($results as $result) {
$content .= $this->getInnerHtml ($result);
}
echo htmlentities ($content);
}
}

Categories