How to get data from a table on a webpage in PHP - php

I am trying to get data from four rows each with two rows from a webpage. After some reading around I have tried the following code;
<?PHP
require('simple_html_dom.php');
$ch = curl_init();
$userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)';
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
$target_url = 'http://www.boz.zm/(S(0m5hxtuuoex4xqjkzrpbsh55))/Startpage.aspx';
curl_setopt($ch, CURLOPT_URL,$target_url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 30);
$html = curl_exec($ch);
if (!$html)
{
echo "<br />cURL error number:" .curl_errno($ch);
echo "<br />cURL error:" . curl_error($ch);
exit;
}
else
{
echo "<br> Think the page was nabbed";
$dom = new DOMDocument();
#$dom->loadHTML($html);
$xpath = new DOMXpath($dom);
$tableData = array();
foreach($xpath->query('//table[#id="_ctl0_zmain_Dg_ExchangeRates"]/tr[position()<5]') as $node)
{
$rowData = array();
foreach($xpath->query('td', $node) as $cell)
{
$rowdat = $cell->textContent;
$rowData[] = $rowdat;
}
$tableDate[]=$rowData;
}
print_r($tableData);
}
?>
Only returns an empty array.
I would like to put the values of each row in a multidimensional array so I can easily work with them. Any ideas on how I can achieve this task, even if its a different approach from what im trying to do I dont mind.? Thanks in Advance.

It is only a mistyping: you have written : $tableDate[]=$rowData; instead of $tableData[]=$rowData;

Related

Cant use loadHTMLfile or file_get_contents for external URL

I want to know Groupon active deals so I write a scraper like:
libxml_use_internal_errors(true);
$dom = new DOMDocument();
#$dom->loadHTMLFile('https://www.groupon.com/browse/new-york?category=food-and-drink&minPrice=1&maxPrice=999');
$xpath = new DOMXPath($dom);
$entries = $xpath->query("//li[#class='slot']//a/#href");
foreach($entries as $e) {
echo $e->textContent . '<br />';
}
but when I run this function browser loading all time, just loading something but don't show any error.
How can I fix it? Not just case with Groupon - I also try other websites but also don't work. WHy?
What about using CURL to loading page data.
Not just case with Groupon - I also try other websites but also don't work
I think this code will help you but you should expect unexpected situations for each website which you want to scrap.
<?php
$dom = new DOMDocument();
$data = get_url_content('https://www.groupon.com', true);
#$dom->loadHTML($data);
$xpath = new DOMXPath($dom);
$entries = $xpath->query("//label");
foreach($entries as $e) {
echo $e->textContent . '<br />';
}
function get_url_content($url = null, $justBody = true)
{
/* Init CURL */
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
curl_setopt($ch, CURLOPT_HTTPHEADER, []);
$data = curl_exec($ch);
if ($justBody)
$data = #(explode("\r\n\r\n", $data, 2))[1];
var_dump($data);
return $data;
}

JSON array foreach loop showing Invalid argument supplied for foreach()

I want to display http://services.xyz.com/tours.asmx/Sample?type=2&CityID=1146&days=4 this array data in HTML.
Here is my code:
ini_set("display_errors", 1);
$url = "http://services.xyz.com/tours.asmx/Sample?type=2&CityID=1146&days=4";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
//curl_setopt($ch, CURLOPT_POSTFIELDS, $fields); // set the fields to post
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // make sure we get the response back
$xml = curl_exec($ch); // execute the post
curl_close($ch); // close our session
include "XMLParser.php";
$parser = new XMLParser();
$tour = $parser->parseString($xml);
print_r($tour);
//$errors = array_filter($tour);
if (empty($tour)) {
echo 'Error';
}else{
echo 'No Error';
}
//print_r($xml);
//var_dump( $xml);
//echo $xml;
//$xmlfile = file_get_contents($tour);
$ob= simplexml_load_string($tour);
//$json = json_encode($ob);
$configData = json_decode($ob, true);
foreach($configData as $result){
echo '<tr>';
echo '<td>'.$result->name.'</td>';
echo '<td>'.$result->phone.'</td>';
echo '<td>'.$result->email.'</td>';
echo '</tr>';
}
I am getting this error Warning: Invalid argument supplied for foreach() in C:
Please help me.
Thanks
Here is my final Working code to help someone else who need this code.
<?php
ini_set("display_errors", 1);
//$days = "http://services.xyz.com/tours.asmx/Days?type=2&cityID=1146";
$url = "http://services.xyz.com/tours.asmx/Sample?type=2&CityID=1146&days=4";//Replace this with your own Web services Link
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // make sure we get the response back
$xml = curl_exec($ch); // execute the post
curl_close($ch); // close our session
$var = simplexml_load_string($xml);
$configData = json_decode($var, true);
/*echo '<pre>';
print_r($configData);
echo '</pre>';*/
foreach ($configData as $row) {
echo 'Day:' . $row['Day'].'<br>';
echo 'Time:' .$row['Time'].'<br>';
echo 'Description:' .$row['Description'].'<br>';
}?>

Getting site title in unknown format using Php Curl and Dom-Document

I want to get site title using site url with most of the site it is working but it is getting some not readable text with japennese and chinnese site.
Here is my function
function file_get_contents_curl($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
Use
use--------
$html = $this->file_get_contents_curl($url);
Parsing
$doc = new DOMDocument();
#$doc->loadHTML($html);
$nodes = $doc->getElementsByTagName('title');
$title = $nodes->item(0)->nodeValue;
I am getting this ouput "ã¢ã¡ã¼ãIDç»é² ã¡ã¼ã«ã®ç¢ºèªï½Ameba(ã¢ã¡ã¼ã)"
Site URL : https://user.ameba.jp/regist/registerIntro.do?campaignId=0053&frmid=3051
Please help me out suggest some way to get exact site title in any language.
//example
/* MEthod----------4 */
function file_get_contents_curl($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$uurl="http://www.piaohua.com/html/xuannian/index.html";
$html = file_get_contents_curl($uurl);
//parsing begins here:
$doc = new DOMDocument();
#$doc->loadHTML($html);
$nodes = $doc->getElementsByTagName('title');
//get and display what you need:
if(!empty($nodes->item(0)->nodeValue)){
$title = utf8_decode($nodes->item(0)->nodeValue);
}else{
$title =$uurl;
}
echo $title;
Make sure your script is using utf-8 encoding by adding following line to the begining of the file
mb_internal_encoding('UTF-8');
After doing so, remove utf8_decode function from your code. Everything should work fine without it
[DOMDocument::loadHtml]1 function gets encoding from html page meta tag. So you could have problems if page do not excplicitly specifies its encoding.
Simply add this line on top of your PHP Code.
header('Content-Type: text/html;charset=utf-8');
The code..
<?php
header('Content-Type: text/html;charset=utf-8');
function file_get_contents_curl($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$html = file_get_contents_curl('http://www.piaohua.com/html/lianxuju/2013/1108/27730.html');
$doc = new DOMDocument();
#$doc->loadHTML($html);
$nodes = $doc->getElementsByTagName('title');
echo $title = $nodes->item(0)->nodeValue;

Problem with curl, xpath query

I need some help with my xpath query. I can get this code to work with just about every site I need to scrape except this small part of a particular site... I just get a blank page... Does anyone have an idea on how I can do this better?
//
$target_url = "http://www.teambuy.ca/vancouver/";
$userAgent = 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)';
// make the cURL request to $target_url
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT,$userAgent);
curl_setopt($ch, CURLOPT_URL,$target_url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$html= curl_exec($ch);
if (!$html) {
echo "<br />cURL error number:" .curl_errno($ch);
echo "<br />cURL error:" . curl_error($ch);
exit;
}
// parse the html into a DOMDocument
$dom = new DOMDocument();
#$dom->loadHTML($html);
// grab all the on the page
$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body/div[#id='pagewrap']/div[#id='content']/div[#id='bottomSection']/div[#id='bottomRight']/div[#id='sideDeal']/div[2]/div/a/center/span");
foreach ($hrefs as $e) {
$e->nodeValue;
}
$insert = $e->nodeValue;
echo "$insert";
--EDIT--
No luck...
Fatal error: Call to a member function loadHTMLfile() on a non-object in ... Line 4
//
$xpath_query = $dom->loadHTMLfile("http://www.teambuy.ca/vancouver/");
$hrefs = $xpath_query->evaluate("/html/body/div[7]/div[4]/div[3]/div[2]/div[1]/div[2]/div/a/center/span");
foreach ($hrefs as $e) {
echo $e->nodeValue;
}
$insert = $e->nodeValue;
echo "$insert";
don't use cURL. just use
$dom->loadHTMLFile("http://www.teambuy.ca/calgary/");
don't use
$xpath = new DOMXPath($dom);
just use
$href = $dom->xpath($xpath_query);
I imagine your xpath query could be simplified as well...
also,
foreach ($hrefs as $e) {
$e->nodeValue;
}
does nothing. might want to try this instead.
foreach ($hrefs as $e) {
echo $e->nodeValue;
}

xpath with curl gives empty result?

This code gives me an empty result. I expect it to print out the titles from the XML-file. I need to use Curl to get the file.
<?php
function get_url($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$xml_content = get_url("http://www.e24.se/?service=rss&type=latest");
$dom = new DOMDocument();
#$dom->loadXML($xml_content);
$xpath = new DomXPath($dom);
$results = $xpath->query('//channel//title/text()');
foreach ($results as $result)
{
echo $result->title . "<br />";
}
?>
I found it already. The loop is wrong. It should be...
foreach ($results as $result)
{
echo $result->nodeValue . "<br />";
}

Categories