How to pull a value from a website ( Using cUrl in PHP) - php

I am trying the following script to pull a value from a certain website, however I think this is not a valid DOM Document, I want to know if there is an alternate way?
<?php
$curl_handle=curl_init();
curl_setopt($curl_handle,CURLOPT_URL,'http://www.indiagoldrate.com/gold-rate-in-mumbai-today.htm');
curl_setopt($curl_handle,CURLOPT_CONNECTTIMEOUT,2);
curl_setopt($curl_handle,CURLOPT_RETURNTRANSFER,1);
$buffer = curl_exec($curl_handle);
curl_close($curl_handle);
if (empty($buffer))
{
print "Sorry, example.com are a bunch of poopy-heads.<p>";
}
else
{
print $buffer;
}
?>

Have you tried file_get_contents try this,
$str= htmlentities(file_get_contents('http://www.indiagoldrate.com/gold-rate-in-mumbai-today.htm'));
Read file-get-contents

Though the page in your code (http://www.indiagoldrate.com/gold-rate-in-mumbai-today.htm) is not a valid DOM document, you still can parse it with PHP's DOMDocument.
For example here we'll get the price of 1g 22k gold in Mumbay city today:
libxml_use_internal_errors(true); //get rid of the warnings
$dom = new DOMDocument;
$dom->loadHTML($buffer);
$xp = new DOMXPath($dom);
$price = $xp->query('//*[#id="right_center"]/table[1]/tr[3]/td[2]/table/tr[1]/td[2]')->item(0)->nodeValue;
libxml_clear_errors();
libxml_use_internal_errors(false);
var_dump($price);

Related

How to get all existing images src in this url with PHP?

There are 6 images in this url which I want to give src images. My goal is getting all images src with PHP but only one image src coming.
<?php
require_once ('simple_html_dom/simple_html_dom.php');
$html = file_get_html('https://www.zara.com/tr/en/flatform-derby-shoes-with-reversible-fringe-p15318201.html?v1=5276035&v2=734142');
foreach($html->find('img') as $element){
echo $element->src . '<br>';
}
?>
After looking at the Simple HTML DOM bug tracker. It seems like they are having some issues fetching values that aren't real URL's.
Looking at the source of the page you're trying to fetch, only one image actually does have a URL. The rest has inline images: src="data:image/png;base64,...".
I would suggest using PHP's own DOMDocument for this.
Here's a working solution (with comments):
<?php
// Get the HTML from the URL
$data = file_get_contents("https://www.zara.com/tr/en/flatform-derby-shoes-with-reversible-fringe-p15318201.html?v1=5276035&v2=734142");
$doc = new DOMDocument;
// DOMDocument throws a bunch of errors since the HTML isn't 100% valid
// (and for all HTML5-tags) but it will sort them out.
// Let's just tell it to fix it in silence.
libxml_use_internal_errors(true);
$doc->loadHTML($data);
libxml_clear_errors();
// Fetch all img-tags and get the 'src' attributes.
foreach ($doc->getElementsByTagName('img') as $img) {
echo $img->getAttribute('src') . '<br />';
}
Demo: https://www.tehplayground.com/sh4yJ8CqIwypwkCa
Actually those base64encodes are the images base64ecnoded images. As far as this page you want to parse although the images are base64 encoded the a tags that are the parents of the images actually are containing the image urls.
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch,CURLOPT_URL,"https://www.zara.com/tr/en/flatform-derby-shoes-with-reversible-fringe-p15318201.html?v1=5276035&v2=734142");
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
$data = curl_exec($ch);
curl_close($ch);
and now the data manipulation
libxml_use_internal_errors(true);
$siteData = new DOMDocument();
$siteData->loadHTML($data);
$a = $siteData->getElementsByTagName("a"); //get the a tags
for($i=0;$i<$a->length;$i++){
if($a->item($i)->getAttribute("class")=="_seoImg"){ //_seoImg class is the image class
echo $a->item($i)->getAttribute("href").'<br/>';
}
}
and the result is
//static.zara.net/photos///2017/I/1/1/p/5318/201/040/3/w/560/5318201040_2_1_1.jpg?ts=1508311623896
//static.zara.net/photos///2017/I/1/1/p/5318/201/040/3/w/560/5318201040_1_1_1.jpg?ts=1508311816920
//static.zara.net/photos///2017/I/1/1/p/5318/201/040/3/w/560/5318201040_2_3_1.jpg?ts=1508311715728
//static.zara.net/photos///2017/I/1/1/p/5318/201/040/3/w/560/5318201040_2_10_1.jpg?ts=1508315639664
//static.zara.net/photos///2017/I/1/1/p/5318/201/040/3/w/560/5318201040_2_2_1.jpg?ts=1508311682567

How to get the HTML of from an URL in PHP?

I want the HTML code from the URL.
Actually I want following things from the data at one URL.
1. blog titile
2. blog image
3. blod posted date
4. blog description or actual blog text
I tried below code but no success.
<?php
$c = curl_init('http://54.174.50.242/blog/');
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
//curl_setopt(... other options you want...)
$html = curl_exec($c);
if (curl_error($c))
die(curl_error($c));
// Get the status code
$status = curl_getinfo($c, CURLINFO_HTTP_CODE);
curl_close($c);
echo "Status :".$status; die;
?>
Please help me out to get the necessary data from the URL(http://54.174.50.242/blog/).
Thanks in advance.
You are halfway there. You curl request is working and $html variable is containing blog page source code. Now you need to extract data, that you need, from html string. One way to do it is by using DOMDocument class.
Here is something you could start with:
$c = curl_init('http://54.174.50.242/blog/');
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
$html = curl_exec($c);
$dom = new DOMDocument;
// disable errors on invalid html
libxml_use_internal_errors(true);
$dom->loadHTML($html);
$list = $dom->getElementsByTagName('title');
$title = $list->length ? $list->item(0)->textContent : '';
// and so on ...
You can also simpllify that by using method loadHTMLFile on DOMDocument class, that way you don't have to worry about all curl code boilerplate:
$dom = new DOMDocument;
// disable errors on invalid html
libxml_use_internal_errors(true);
$dom->loadHTMLFile('http://54.174.50.242/blog/');
$list = $dom->getElementsByTagName('title');
$title = $list->length ? $list->item(0)->textContent : '';
echo $title;
// and so on ...
You Should use Simple HTML Parser . And extract html using
$html = #file_get_html($url);foreach($html->find('article') as element) { $title = $dom->find('h2',0)->plaintext; .... }
I am also using this, Hope it is working.

Parsing only text content from url

I am trying to parse text content from url given. Here is the code:
<?php
$url = 'http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page';
$content = file_get_contents($url);
echo $content; // This parse everything on the page, including image + everything
$text=escapeshellarg(strip_tags($content));
echo "</br>";
echo $text; // This gives source code also, not only the text content over page
?>
I want to get only the text written over page. No page source code. Any idea for this? I already googled but above method only present everywhere.
You can use DOMDocument and DOMNode
$doc = new DOMDocument();
$doc->loadHTMLFile($url);
$xpath = new DOMXPath($doc);
foreach($xpath->query("//script") as $script) {
$script->parentNode->removeChild($script);
}
$textContent = $doc->textContent; //inherited from DOMNode
Instead of using xpath, you can also do:
$doc = new DOMDocument();
$doc->loadHTMLFile($url); // Load the HTML
foreach($doc->getElementsByTagName('script') as $script) { // for all scripts
$script->parentNode->removeChild($script); // remove script and content
// so it will not appear in text
}
$textContent = $doc->textContent; //inherited from DOMNode, get the text.
$content = file_get_contents(strip_tags($url));
This will remove the HTML tags coming form the page
To remove html tag use:
$text = strip_tags($text);
A simple cURL will solve the issue. [TESTED]
<?php
$ch = curl_init("http://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); //Sorry forgot to add this
echo strip_tags(curl_exec($ch));
curl_close($ch);
?>

PHP retrieve inner HTML as string from URL using DOMDocument [duplicate]

This question already has answers here:
How to get innerHTML of DOMNode?
(9 answers)
Closed 9 years ago.
I've been picking bits and pieces of code, you can see roughly what I'm trying to do, obviously this doesn't work and is utterly wrong:
<?php
$dom= new DOMDocument();
$dom->loadHTMLFile('http://example.com/');
$data = $dom->getElementById("profile_section_container");
$html = $data->saveHTML();
echo $html;
?>
Using a CURL call, I am able to retrieve the document URL source:
function curl_get_file_contents($URL)
{
$c = curl_init();
curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($c, CURLOPT_URL, $URL);
$contents = curl_exec($c);
curl_close($c);
if ($contents) return $contents;
else return FALSE;
}
$f = curl_get_file_contents('http://example.com/');
echo $f;
So how can I use this now to instantiate a DOMDocument object in PHP and extract a node using getElementById
This is the code you will need to avoid any malformed HTML errors:
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTMLFile('http://example.com/');
$data = $dom->getElementById("banner");
echo $data->nodeValue."\n"
To dump whole HTML source you can call:
echo $dom->saveHTML();
<?php
$f = curl_get_file_contents('http://example.com/')
$dom = new DOMDocument();
#$dom->loadHTML($f);
$data = $dom->getElementById("profile_section_container");
$html = $dom->saveHTML($data);
echo $html;
?>
It would help if you provided the example html.
i'm not sure but i remember once i wanted to use this i was unbale to load some external url as file because the php.ini directve allow-url-fopen was set to off ...
So check your pnp.ini or try to open url with fopen to see if you can read the url as a file
<?php
$f = file_get_contents(url);
var_dump($f); // just to see the content
?>
Regards;
mimiz
Try this:
$dom= new DOMDocument();
$dom->loadHTMLFile('http://example.com/');
$data = $dom->getElementById("profile_section_container")->item(0);
$html = $data->saveHTML();
echo $html;
i think that now you can use DOMDocument::loadHTML
Maybe you should try Doctype existence (with a regexp) and then add it if necessary, for being sure to have it declare ...
Regards
Mimiz

parsing html through get_file_contents()

is have been told that the best way to parse html is through DOM like this:
<?
$html = "<span>Text</span>";
$doc = new DOMDocument();
$doc->loadHTML( $html);
$elements = $doc->getElementsByTagName("span");
foreach( $elements as $el)
{
echo $el->nodeValue . "\n";
}
?>
but in the above the variable $html can't be a url, or can it??
wouldnt i have to use to function get_file_contents() to get the html of a page?
You have to use DOMDocument::loadHTMLFile to load HTML from an URL.
$doc = new DOMDocument();
$doc->loadHTMLFile($path);
DOMDocument::loadHTML parses a string of HTML.
$doc = new DOMDocument();
$doc->loadHTML(file_get_contents($path));
It can be, but it depends on allow_url_fopen being enabled in your PHP install. Basically all of the PHP file-based functions can accept a URL as a source (or destination). Whether such a URL makes sense is up to what you're trying to do.
e.g. doing file_put_contents('http://google.com') is not going to work, as you'd be attempting to do an HTTP upload to google, and they're not going allow you to replace their homepage...
but doing $dom->loadHTML('http://google.com'); would work, and would suck in google's homepage into DOM for processing.
If you're having trouble using DOM, you could use CURL to parse. For example:
$url = "http://www.davesdaily.com/";
$curl = curl_init();
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_URL, $url);
$input = curl_exec($curl);
$regexp = "<span class=comment>([^<]*)<\/span>";
if(preg_match_all("/$regexp/siU", $input, $matches, PREG_SET_ORDER)) {
foreach($matches as $match);
}
echo $match[0];
The script should grab the text between <span class=comment> and </span> and store inside an array $match. This should echo Entertainment.

Categories