I am a beginner in PHP programming. I have this script in which I'm trying to get a string multiple times, each time with different "login" data, from an external website. I am using PHP, cURL, DOM and XPath. The fact is that my code seems to work only if I don't use a foreach construct to loop the entire operation. But I don't know how else I could repeat this operation changing the data from time to time.
The situation is: I have just logged in, and now the site ask me to fill two more fields that are necessary to proceed to the next page where I can get the string that I need. The next portion of code is contained in a if block.
// A function to automatically select the form fields:
function form_fields($xpath, $query) {
$inputs = $xpath->query($query);
$fields = array();
foreach ($inputs as $input) {
$key = $input->attributes->getNamedItem('name')->nodeValue;
$type = $input->nodeName;
$value = $input->attributes->getNamedItem('value')->nodeValue;
$fields[$key] = $value;
}
return $fields;
}
// Executing the XPath queries to fill the fields:
$opzutenza = 'incarichi';
$action = $xpath->query("//form[#name='fm_$opzutenza']")->item(0)->attributes->getNamedItem('action')->nodeValue;
curl_setopt($ch, CURLOPT_URL, $action);
$fields = form_fields($xpath, "//form[#name='fm_$opzutenza']/input");
curl_setopt($ch, CURLOPT_POSTFIELDS, http_build_query($fields));
$html = curl_exec($ch);
$dom = new DomDocument();
#$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
// The strings that I need to get depend on each value contained in this select element:
$options = $xpath->query("//select[#name='sceltaincarico']/option");
$partiteiva = array();
foreach($options as $option){
$partiteiva[] = $option->nodeValue;
unset($partiteiva[0]);
}
} // -----------> END OF 'IF' BLOCK
$queriesNA = array();
foreach ($partiteiva as $piv) {
$queryNA = ".//select[#name='sceltaincarico']/option[text()='$piv']";
$queriesNA[] = $queryNA;
}
// And this is the problematic loop:
foreach($queriesNA as $querypiv){
$form = $xpath->query("//form[#name='fm_scelta_tipo_incarico']")->item(0);
$action = $form->attributes->getNamedItem('action')->nodeValue;
#$option = $xpath->query($querypiv, $form);
curl_setopt($ch, CURLOPT_URL, $action);
$fields = [
'sceltaincarico' => $option->item(0)->attributes->getNamedItem('value')->nodeValue,
'tipoincaricante' => 'incDiretto'
];
curl_setopt($ch, CURLOPT_POSTFIELDS, http_build_query($fields)); // ----> Filling the last field
curl_exec($ch);
curl_setopt($ch, CURLOPT_URL, 'https://website.com/dp/api');
curl_exec($ch);
curl_setopt($ch, CURLOPT_URL, 'https://website.com/cons/cons-services/sc/tokenB2BCookie/get');
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_NOBODY, true);
$http = curl_exec($ch);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_NOBODY, false);
function parse_headers($http) {
$headers = explode("\r\n", $http);
$hdrs = array();
foreach($headers as $h) {
#list($k, $v) = explode(':', $h);
$hdrs[trim($k)] = trim($v);
}
return $hdrs;
}
$hdrs = parse_headers($http);
$tokens = array(
"x-token: ".$hdrs['x-token'],
"x-b2bcookie: ".$hdrs['x-b2bcookie']
);
curl_setopt($ch, CURLOPT_HTTPHEADER, $tokens);
curl_setopt($ch, CURLOPT_URL, "https://website.com/cons/cons-services/rs/disclaimer/accetta"); // Accepting the disclaimer...
curl_exec($ch);
curl_setopt($ch, CURLOPT_URL, "https://website.com/portale/web/guest/home");
$html = curl_exec($ch); // Finally got to the page that I need
$dom = new DomDocument();
#$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
// Selecting the string:
$string = $xpath->query("//div[#class='informativa']/strong[2]");
$nomeazienda = array();
foreach ($string as $str) {
$nomeazienda[] = $str->childNodes->item(0)->nodeValue;
}
// Going back to the initial page so the loop can start again from the beginning:
$piva_page = 'https://website.com/portale/scelta-utenza-lavoro?....';
curl_setopt($ch, CURLOPT_URL, $piva_page);
$html = curl_exec($ch);
$dom = new DomDocument();
#$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
}
curl_close($ch);
These are the error messages:
Notice: Trying to get property 'attributes' of non-object...
Fatal error: Uncaught Error: Call to a member function getNamedItem() on null...
Error: Call to a member function getNamedItem() on null...
The function getNamedItem() is the first one just after the malfunctioning loop, and so are the 'attributes'.
Related
I'm trying to get the HTML Code of the Instagram's Embed pages for my API, but it returns me a strange error and I do not know what to do now, because I'm new to PHP. The code works on other websites.
I tried it already on other websites like apple.com and the strange thing is that when I call this function on the 'normal' post page it works, the error only appears when I call it on the '/embed' URL.
This is my PHP Code:
<?php
if (isset($_GET['url'])) {
$filename = $_GET['url'];
$file = file_get_contents($filename);
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($file);
libxml_use_internal_errors(false);
$bodies = $dom->getElementsByTagName('body');
assert($bodies->length === 1);
$body = $bodies->item(0);
for ($i = 0; $i < $body->children->length; $i++) {
$body->remove($body->children->item($i));
}
$stringbody = $dom->saveHTML($body);
echo $stringbody;
}
?>
I call the API like this:
https://api.com/get-website-body.php?url=http://instagr.am/p/BoLVWplBVFb/embed
My goal is to get the body of the website, like it is when I call this code on the https://apple.com URL for example.
You can use direct url to scrape the data if you use CURL and its faster than file_get_content. Here is the curl code for different urls and this will scrape the body data alone.
if (isset($_GET['url'])) {
// $website_url = 'https://www.instagram.com/instagram/?__a=1';
// $website_url = 'https://apple.com';
// $website_url = $_GET['url'];
$website_url = 'http://instagr.am/p/BoLVWplBVFb/embed';
$curl = curl_init();
//curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($curl, CURLOPT_HEADER, false);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_URL, $website_url);
curl_setopt($curl, CURLOPT_REFERER, $website_url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0(Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/66.0');
$str = curl_exec($curl);
curl_close($curl);
$json = json_decode($str, true);
print_r($str); // Just taking tha page as it is
// Taking body part alone and play as your wish
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($str);
libxml_use_internal_errors(false);
$bodies = $dom->getElementsByTagName('body');
foreach ($bodies as $key => $value) {
print_r($value);// You will all content of body here
}
}
NOTE: Here you don't want to use https://api.com/get-website-body.php?url=....
I've written a script in php to fetch links and write them in a csv file from the main page of wikipedia. The script does fetch the links accordingly. However, I can't write the populated results in a csv file. When I execute my script, It does nothing, no error either. Any help will be highly appreciated.
My try so far:
<?php
include "simple_html_dom.php";
$url = "https://en.wikipedia.org/wiki/Main_Page";
function fetch_content($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
$htmlContent = curl_exec($ch);
curl_close($ch);
$dom = new simple_html_dom();
$dom->load($htmlContent);
$links = array();
foreach ($dom->find('a') as $link) {
$links[]= $link->href . '<br>';
}
return implode("\n", $links);
$file = fopen("itemfile.csv","w");
foreach ($links as $item) {
fputcsv($file,$item);
}
fclose($file);
}
fetch_content($url);
?>
1.You are using return in your function, that's why nothing gets written in the file as code stops executing after that.
2.Simplified your logic with below code:-
$file = fopen("itemfile.csv","w");
foreach ($dom->find('a') as $link) {
fputcsv($file,array($link->href));
}
fclose($file);
So the full code needs to be:-
<?php
//comment these two lines when script started working properly
error_reporting(E_ALL);
ini_set('display_errors',1); // 2 lines are for Checking and displaying all errors
include "simple_html_dom.php";
$url = "https://en.wikipedia.org/wiki/Main_Page";
function fetch_content($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
$htmlContent = curl_exec($ch);
curl_close($ch);
$dom = new simple_html_dom();
$dom->load($htmlContent);
$links = array();
$file = fopen("itemfile.csv","w");
foreach ($dom->find('a') as $link) {
fputcsv($file,array($link->href));
}
fclose($file);
}
fetch_content($url);
?>
The reason the file does not get written is because you return out of the function before that code can even be executed.
I have created a bot, and i want to send file (document) using my bot to my clients, after sending document using following code, title will be full path of my file on my own device ( my pc ), how can I change title to file name only? is that even possible?
Sending code:
protected function perform($method, $params) {
$url = new Url(TELEGRAM_API_URL . $this->bot->tokken . "/" . $method);
$fields = [];
foreach($params as $param => $val)
if($val != NULL && !cnull::is($val) && substr($param, 0, 1) != '_')
$fields[$param] = $val;
#
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url->getUrl());
curl_setopt($ch, CURLOPT_POST, count($fields));
curl_setopt($ch, CURLOPT_POSTFIELDS, $fields);
curl_setopt($ch, CURLOPT_HTTPHEADER, ['Content-Type:multipart/form-data']);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$out = curl_exec($ch);
curl_close($ch);
#
$content = json_decode($out);
return $content;
}
public function sendDocument($chat_id,$_document,$_is_file_id=false,$reply_to_message_id = NULL, $reply_markup = NULL) {
if($_is_file_id)
$document = $_document;
else
$document = new CURLFile(realpath($_document));
return self::perform(__FUNCTION__, get_defined_vars());
}
// ......
$tg->sendDocument(USER_CHAT_ID,"filename.mp4");
This is result:
I've found a solution by using ->setPostFilename() for CURLFile
here it is:
change this method:
public function sendDocument($chat_id,$_document,$_is_file_id=false,$reply_to_message_id = NULL, $reply_markup = NULL) {
if($_is_file_id)
$document = $_document;
else
$document = new CURLFile(realpath($_document));
return self::perform(__FUNCTION__, get_defined_vars());
}
to:
public function sendDocument($chat_id,$_document,$_title=null,$_is_file_id=false,$reply_to_message_id = NULL, $reply_markup = NULL) {
if($_is_file_id)
$document = $_document;
else{
$document = new CURLFile(realpath($_document));
$document->setPostFilename($_title);
}
return self::perform(__FUNCTION__, get_defined_vars());
}
// ......
$tg->sendDocument(USER_CHAT_ID,"filename.mp4","title of file");
I am facing some times Problem in getting url data by curl method specially website data is is in other language like arabic etc
My curl function is
function file_get_contents_curl($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
$data = curl_exec($ch);
$info = curl_getinfo($ch, CURLINFO_CONTENT_TYPE);
//checking mime types
if(strstr($info,'text/html')) {
curl_close($ch);
return $data;
} else {
return false;
}
}
And how i am getting data
$html = file_get_contents_curl($checkurl);
$grid ='';
if($html)
{
$doc = new DOMDocument();
#$doc->loadHTML($html);
$nodes = $doc->getElementsByTagName('title');
#$title = $nodes->item(0)->nodeValue;
#$metas = $doc->getElementsByTagName('meta');
for ($i = 0; $i < $metas->length; $i++)
{
$meta = $metas->item($i);
if($meta->getAttribute('name') == 'description')
$description = $meta->getAttribute('content');
}
I am getting all data correctly from some arabic websites like
http://www.emaratalyoum.com/multimedia/videos/2012-04-08-1.474873
and when i give this youtube url
http://www.youtube.com/watch?v=Eyxljw31TtU&feature=g-logo&context=G2c4f841FOAAAAAAAFAA
it shows symbols..
what setting i have to do to show exactly the same title description.
Introduction
Getting Arabic can be very tricky but they are some basic steps you need to ensure
Your document must output UTF-8
Your DOMDocument must read in UTF-8 fromat
Problem
When getting Youtube information its already given the information in "UTF-8" format and the retrieval process adds addition UTF-8 encoding .... not sure why this occurs but a simple utf8_decode would fix the issue
Example
header('Content-Type: text/html; charset=UTF-8');
echo displayMeta("http://www.emaratalyoum.com/multimedia/videos/2012-04-08-1.474873");
echo displayMeta("http://www.youtube.com/watch?v=Eyxljw31TtU&feature=g-logo&context=G2c4f841FOAAAAAAAFAA");
Output
emaratalyoum.com
التقطت عدسات الكاميرا حارس مرمى ريال مدريد إيكر كاسياس في موقف محرج قبل لحظات من بداية مباراة النادي الملكي مع أبويل القبرصي في ذهاب دور الثمانية لدوري أبطال
youtube.com
أوروبا.ففي النفق المؤدي إلى الملعب، قام كاسياس بوضع إصبعه في أنفه، وبعدها قام بمسح يده في وجه أحدبنات سعوديات: أريد "شايب يدللني ولا شاب يعللني"
Function Used
displayMeta
function displayMeta($checkurl) {
$html = file_get_contents_curl($checkurl);
$grid = '';
if ($html) {
$doc = new DOMDocument("1.0","UTF-8");
#$doc->loadHTML($html);
$nodes = $doc->getElementsByTagName('title');
$title = $nodes->item(0)->nodeValue;
$metas = $doc->getElementsByTagName('meta');
for($i = 0; $i < $metas->length; $i ++) {
$meta = $metas->item($i);
if ($meta->getAttribute('name') == 'description') {
$description = $meta->getAttribute('content');
if (stripos(parse_url($checkurl, PHP_URL_HOST), "youtube") !== false)
return utf8_decode($description);
else {
return $description;
}
}
}
}
}
*file_get_contents_curl*
function file_get_contents_curl($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
$data = curl_exec($ch);
$info = curl_getinfo($ch, CURLINFO_CONTENT_TYPE);
// checking mime types
if (strstr($info, 'text/html')) {
curl_close($ch);
return $data;
} else {
return false;
}
}
I believe this will work... utf8_decode() your attribute..
function file_get_contents_curl($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
$data = curl_exec($ch);
$info = curl_getinfo($ch, CURLINFO_CONTENT_TYPE);
//checking mime types
if(strstr($info,'text/html')) {
curl_close($ch);
return $data;
} else {
return false;
}
}
$html = file_get_contents_curl($checkurl);
$grid ='';
if($html)
{
$doc = new DOMDocument();
#$doc->loadHTML($html);
$nodes = $doc->getElementsByTagName('title');
#$title = $nodes->item(0)->nodeValue;
#$metas = $doc->getElementsByTagName('meta');
for ($i = 0; $i < $metas->length; $i++)
{
$meta = $metas->item($i);
if($meta->getAttribute('name') == 'description')
$description = utf8_decode($meta->getAttribute('content'));
}
What happens here is that you're discarding the found Content-Type header that cURL returned in your file_get_contents_curl() function; DOMDocument needs that information to understand the character set that was used on the page.
A somewhat ugly hack, but most generic, is to prefix the returned page with a <meta> tag containing the returned character set from the response headers:
if (strstr($info, 'text/html')) {
curl_close($ch);
return '<meta http-equiv="Content-Type" content="' . $info . '" />' . $data;
}
DOMDocument will accept the misplaced meta tag and do the respective conversions automatically.
I have the following this weather site
And i need to using Xpath but icant return query!
I'm using this xPath and must return 2 Row
$xpath->query('/html/body/table[3]/tbody/tr/td/table/tbody/tr/td[2]/p/table/tbody/tr/td/font/div/center/table/tbody/tr[1]/td[1]/font/font/b');
but not return anythings:
please complete this xpath
i'm using this cod butt show error
Catchable fatal error: Object of class DOMNodeList could not be
converted to string in /home/mysite/curl.php on line 23
<?php
$url="http://www.irimo.ir/farsi/current/index.asp?station=40770";
function file_get_contents_curl($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$dom = new DomDocument();
#$dom->loadHTML($allcont);
$xpath = new DomXPath($dom);
$return = $xpath->query('/html/body/table[3]/tbody/tr/td/table/tbody/tr/td[2]/p/table/tbody/tr/td/font/div/center/table/tbody/tr[3]/td/font/b');
echo $return;
echo $xpath;
?>
Try
$xpath->query('/html/body/table[3]/tbody/tr/td/table/tbody/tr/td[2]/p/table/tbody/tr/td/font/div/center/table/tbody/tr[3]/td/font/b');