Simple html dom div downloading issue

Simple html dom div downloading issue - php

Here's some php code that i wrote. It's mainly based on docs.
It's obviously using simple html dom
The problem is it doesnt really work and i dunno why.
<?php
include("simple_html_dom.php");
$context = stream_context_create();
stream_context_set_params($context, array('user_agent' => "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2049.0 Safari/537.36"));
$html = file_get_html('http://www.ask.fm', 0, $context);
$elem = $html->find('div[id=heads]', 0);
var_dump($elem);
?>
What i want is to set useragent which i tried to do above that sentence. And then i want to download div with id "heads". That's not much but i couldnt figure it out in any way.

<?php
include "simplehtmldom_1_5/simple_html_dom.php";
function curl($url)
{
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
// you may set this options if you need to follow redirects. Though I didn't get any in your case
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
$content = curl_exec($curl);
curl_close($curl);
return $content;
}
$html = str_get_html(curl("http://www.ask.fm"));
echo $elem = $html->find('div[id=heads]', 0);
?>
I think it is useful for you

Related

PHP Setting custom header starting with ':'

I need to setup some custom headers start with ":".
$option['headers'][] = ":authority: example.com"; //<-- Here is the problem
$option['headers'][] = "accept-encoding: gzip, deflate, br";
$option['post'] = json_encode(array("Domain"=>"example.com"));
$url = "https://www.google.com";
$ch = curl_init($url);
curl_setopt($ch,CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.62 Safari/537.36");
curl_setopt($ch,CURLOPT_FOLLOWLOCATION,1);
curl_setopt($ch,CURLOPT_COOKIEFILE,"file.cookie");
curl_setopt($ch,CURLOPT_COOKIEJAR,"file.cookie");
curl_setopt($ch,CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch,CURLOPT_HEADER,0);
curl_setopt($ch,CURLOPT_VERBOSE, true);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $option['post']);
curl_setopt($ch, CURLOPT_HTTPHEADER, $option['headers']);
$getdata = curl_exec($ch);
I try to replace the ":" with chr(58) but same problem. I get error 55 and from log "* Failed sending HTTP POST request". If I comment first line is working, but I really need that header. I'm stuck here. Any solutions?

:authority: looks like an HTTP/2 psuedo header, and you can't set them like this with curl. curl will however pass it on itself and it will use the same content as it would set for Host: to make it work the same way, independently of which HTTP version that is eventually used (it will also work with HTTP/3).

Scraping a website for price data using PHP but it returns zero(==$0) may be the website is blocking me. How to over come it?

This is the code that i have used:
$curl = curl_init("https://www.flipkart.com/curren-cu2-345656-analog-watch-boys-men/p/itmeax4wh4ujcfft?pid=WATEAX4WGYNYWVCM&srno=b_1_1&otracker=hp_omu_Deals%20of%20the%20Day_5_15c7e867-d35a-4431-a4a0-da39f043bc1f_0&lid=LSTWATEAX4WGYNYWVCMHVLY32");
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.10 (KHTML, like Gecko) Chrome/8.0.552.224 Safari/534.10');
$html = curl_exec($curl);
curl_close($curl);
$dom = new DOMDocument();
#$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$tableRows = $xpath->query('//*[#id="container"]/div/div[2]/div[2]/div/div/div[1]/div/div[2]/div/div[2]/div[2]/div[1]/div/div[1]');
echo $tableRows[0];
echo $tableRows[1];
echo $tableRows[2];
foreach ($tableRows as $row) {
echo $row . "<br>";
}
It shows zero, while i open the source in F12 developer mode it shows "==$0" adjacent to the div, how to i overcome this ?

As such flipkart is https so its blocking your request. To overcome this issue. Please use following two lines of code in addition with your curl request.
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 2);

Non-empty string empty after return

I'm writing a php script that deals with page processing via cURL, so I have a function to get and return pages by URL
function get_url($Url){
if (!function_exists('curl_init')){
die('Sorry cURL is not installed!');
}
set_time_limit (20);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $Url);
curl_setopt($ch, CURLOPT_HTTPHEADER, array("Cookie: age_gate_birthday=19901101"));
curl_setopt($ch, CURLOPT_REFERER, "http://www.facebook.com");
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2049.0 Safari/537.36");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($ch);
return $output;
}
echoing $output in this function always returns a string of HTML, however if I call on this function in another function
function get_vid ($sql, $url) {
$data = get_url($url);
...
the returned value is an empty string, despite the fact that $output had value when get_url() was doing its thing.
Weirdly enough, the error only exists with specific URLs, but works fine with others.
Thank you for trying to help!
UPDATE: It seems CURL returns FALSE randomly on specific links, which seems to be a culprit of this issue, however curl_error is empty, so I'm unable to identify the cause of this.

I think it's because you get a http redirect.
Try to check http code like this :
if (curl_getinfo($ch,CURLINFO_HTTP_CODE) == 302) {
// Manage http redirect here
}

PHP cURL, file_get_contents blank page

I'm trying to get a page content with cURL or file_get_content. On many websites it's working but i'm trying to do that on a friend's server and it's not.
I think there is a protection with header or things like that. I get the following error code : 401 forbidden. If i try to reach the same page with a normal browser it works.
Here is my code for the file_get_contents function :
$homepage = file_get_contents('http://192.168.1.3');
echo $homepage; // just a test to see if the page is loaded, it's not.
if (preg_match("/my regex/", $homepage)) {
// ... some code
}
I also tryed with cURL :
$url = urlencode('http://192.168.1.3');
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0');
$result = curl_exec($ch) or die("Not working");
curl_close($ch);
echo $result; // not working ..
Nothing works, maybe i should add more args to curl_setopt ...
Thanks.
PS : If i try with linux (wget) i get an error, but if i try with aria2c it's working.

HTTP Status 401 means that UNAUTHORIZED. You need send the server with username and passwd。
With file_get_contents, you add the second param . That's a context-steam, which you can set header info.
You'd better to use curl for file_get_contents intend to access local file, as it's a block function. Add the option as following, it's a basic authorize.
curl_setopt($ch,CURLOPT_USERPWD,"my_username:my_password");

try this update with useragent
<?php
$curlSession = curl_init();
curl_setopt($curlSession, CURLOPT_URL, 'http://192.168.1.3/');
curl_setopt($curlSession,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_setopt($curlSession, CURLOPT_BINARYTRANSFER, true);
curl_setopt($curlSession, CURLOPT_RETURNTRANSFER, true);
$homepage = curl_exec($curlSession);
curl_close($curlSession);
echo $homepage ;
?>
if still getting blank page you have to install this add-on on firefox and see the "request-headers" and "response-headers"

download a source code of web pages using curl php

I am trying to download a source code of web pages using curl php code but its downloading only for few pages for rest pages file is empty.
I googled it but im not getting solution.
My source code is :-
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $strurl);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch,CURLOPT_USERAGENT, 'CURL via PHP');
$out = curl_exec($ch);
$fp = fopen('f1.html', 'w');
fwrite($fp, $out);
fclose($fp);
curl_close($ch);
What options to add ? Where i am wrong ?
Pls help.

Try setting a user-agent that suggests you're a browser. Some servers will block curl/wget/etc.
For example: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.22 (KHTML, like Gecko) Chrome/25.0.1364.152 Safari/537.22

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Simple html dom div downloading issue - php

Related

PHP Setting custom header starting with ':'

Scraping a website for price data using PHP but it returns zero(==$0) may be the website is blocking me. How to over come it?

Non-empty string empty after return

PHP cURL, file_get_contents blank page

download a source code of web pages using curl php

Categories

Resources