php download all images of webpage from link

php download all images of webpage from link - php

When i download the html using curl or file_get_contents I don't get the <img scr=....
It's a matter with the fact that images appear after some delay? here is the site: https://www.tumbex.com/memes.tumblr/posts?page=2
and code (first try)
$html = file_get_contents('https://www.tumbex.com/memes.tumblr/posts?page=2');
and code (second try)
$html = get_dataa('https://www.tumbex.com/memes.tumblr/posts?page=2');
echo($html);
function get_dataa($url) {
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0)");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST,false);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER,false);
curl_setopt($ch, CURLOPT_MAXREDIRS, 10);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}

just press F12 on your chrome, then go to Network tab.
there find the api url of tumbex, then copy that with the request header.
if done,you can use curl to that url (api url) to get response..
this my code
<?php
$page = 1; //change number of page here
$url = "https://api.1.tumbex.com/api/tumblr/posts?tumblr=memes&type=posts&page=$page&tag=";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "GET");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$headers = array(
'Accept: */*',
'Authorization: Bearer 0fae0f237b33e781a6884295b39c6e903484ef1ee3190bd51f07dd9881bdccbd',
'content-type: application/json; charset=UTF-8',
'Referer: https://www.tumbex.com/',
'Accept: application/json, text/plain, */*',
'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36',
'x-csrf-token: MVMyb2hLTWtQdEJEYjJ0SER1dEwvZz09',
'x-requested-with: XMLHttpRequest'
);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
$res = curl_exec($ch); //result is json
$json = json_decode($res, true);
$edan = $json['response']['posts'];
for($i=0; $i<count($edan); $i++){
$get_post = $edan[$i];
$type = $get_post['detected_type']; //get type
//$get_photo = $get_post['blocks'][0]['content'][0]['hd']; -> get image url
//$get_video = $get_post['blocks'][0]['content'][0]['media']['url']; -> get video url
//$get_text1 = $get_post['blocks'][0]['content'][0]['text']; -> get text 1
//$get_text2 = $get_post['blocks'][0]['content'][1]['text']; -> get text 2
if($type == 'photo'){
$get_photo = $get_post['blocks'][0]['content'][0]['hd'];
echo "<img src='".$get_photo."' height='120' width='160'><br>";
}
}
You must know, the bearer token and x-csrf-token always changingif the result is blank, that means the Bearer token and x-csrf-token has expiredBut, You can solve that manually or use other curl to auto-fetch the bearer token and x-csrf-token..

Related

Whatsapp Business Cloud API returning empty string when trying to download media

I'm using the following lines of code (PHP) after successfuly retriving the media URL and then storing it in the $mediaURL variable for the file request, but it's returning an empty string. Already tried with postman and it returns a 500 internal server error...
** Edited **
self::writeLog('Media URL: '.$mediaURL);
self::writeLog('Preparing to download media - id: '.$media_id);
$curl = curl_init($mediaURL);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
$headers = array(
"Authorization: Bearer ".self::$auth_token,
);
curl_setopt($curl, CURLOPT_HTTPHEADER, $headers);
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
if (($resp = curl_exec($curl)) === false) {
self::writeLog('cURL Error: '.curl_error($curl));
} else if ($resp == '') {
self::writeLog('Empty string.');
self::writeLog('URL: '.$mediaURL);
self::writeLog('Headers: '.$headers[0]);
} else {
self::writeLog($resp);
}
writeLog is just a method that I use to write these messages on a txt file.

i faced this issue before and the reason was not passing User-Agent to the API
lead to wrong values returns
I used the following method to download whatsapp-cloud media & it works fine with me
public function grabImage(string $url,string $saveto) : bool {
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 0);
curl_setopt($ch, CURLOPT_TIMEOUT, 400);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch,CURLOPT_CUSTOMREQUEST , "GET");
curl_setopt($ch,CURLOPT_ENCODING , "");
$headers = [];
$headers[] = "Authorization: Bearer " . self::$adminToken;
$headers[] = "Accept-Language:en-US,en;q=0.5";
$headers[] = "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36";
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
$raw = curl_exec($ch);
$httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if((int)$httpcode == 200){
// here save the $row content of file
}
return false;
}

Curl in PHP like a Real Browser Still Detected as a Bot

So I'm trying to just get the HTML from a page. I have added any possible data into curl headers SSL anything. But they still know that its a CURL BOT. How can I bypass this or how they do it?
When I visit other pages from them I dont get Detected as a Bot only when I'm on search
$url = "https://suchen.mobile.de/fahrzeuge/search.html?damageUnrepaired=NO_DAMAGE_UNREPAIRED&isSearchRequest=true&maxPowerAsArray=PS&minPowerAsArray=PS&scopeId=C";
$data = curl($url);
echo $data;
function curl($url, $post = "") {
$cookie = "cookie.txt";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_COOKIESESSION, true);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_ENCODING, '');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36');
curl_setopt($ch, CURLOPT_HTTPHEADER, array('authority: suchen.mobile.de', 'path: /fahrzeuge/search.html?damageUnrepaired=NO_DAMAGE_UNREPAIRED&isSearchRequest=true&maxPowerAsArray=PS&minPowerAsArray=PS&scopeId=C', 'scheme: https', 'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8', 'accept-encoding: gzip, deflate, br', 'accept-language: en-US,en;q=0.9', 'upgrade-insecure-requests: 1'));
$data = curl_exec ($ch);
if (curl_error($ch))
return "Bad";
if (curl_getinfo($ch)["http_code"] == 200)
return $data;
}

PHP Curl request response 403 - not authorized

I'm trying to run a PHP curl GET in domain https://www.submarino.com.br. I'm getting 403 (Forbidden). I then tried by the chrome developper to copy the curl request provided in the url and run in bash and get the same answer
My PHP code
public function GetStatusHost(){
$headers = Array();
$headers[] = 'Host:www.submarino.com.br';
$headers[] = 'Connection:keep-alive';
$headers[] = 'Cache-Control:max-age=0';
$headers[] = 'Upgrade-Insecure-Requests: 1';
$headers[] = 'Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8';
$headers[] = 'Accept-Language:pt-BR,pt;q=0.9,en-US;q=0.8,en;q=0.7,es;q=0.6,pt-PT;q=0.5';
$headers[] = 'Accept-Encoding:gzip, deflate, br';
$headers[] = "If-None-Match:W/'5beff-TgqN41ZtNiOTyAH2bpA4lvYiKTE'";
$ch = curl_init($this->GetKeywordUrl());
$tmp = "/home/leilao/public_html/tmp/curl_cookie/cookie.txt";
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36");
curl_setopt($ch, CURLOPT_HTTPHEADER,$headers);
//curl_setopt($ch, CURLOPT_COOKIEJAR, $tmp);
//curl_setopt($ch, CURLOPT_COOKIEFILE,$tmp);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
$result = curl_exec($ch);
$info = curl_getinfo($ch);
curl_close($ch); echo $info['http_code']; exit;
return $info['http_code'];
}
Bash
Any ideas how to fixed this problem?
Response using header SomeHugeOAuthaccess_tokenThatIReceivedAsAString:

Maybe your site (Host:www.submarino.com.br) use an auth credentials to provide the permission for method access.
So try to add this lines with a specific credentials:
curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_BASIC);
curl_setopt($ch, CURLOPT_USERPWD, "username:password"); //Your credentials goes here
or
if there a base64string Authorization is used then you can use this also
$header[] = "Authorization: Bearer $access_token";
or
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Authorization', 'OAuth ' . $atoken));

file_get_contents not working in wayfair page

i am having a problem with PHP file_get_contents.i am trying to fetch inforamtion following url but is getting captcha page.
$link = 'http://www.wayfair.com/a/product_review_page/get_update_reviews_json?_format=json&product_sku=KUS1523&page_number=5&sort_order=relevance&filter_rating=&filter_tag=&item_per_page=5';
$Page_information = file_get_contents($link);
print_r($Page_information);
Also i am trying to get page information using php curl but same captcha page is display.
$cookie='cookie.txt';
if(!file_exists($cookie)){
$fh = fopen($cookie, "w");
fwrite($fh, "");
fclose($fh);
}
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_URL, "http://www.wayfair.com/a/product_review_page/get_update_reviews_json?_format=json&product_sku=KUS1523&page_number=5&sort_order=relevance&filter_rating=&filter_tag=&item_per_page=5");
curl_setopt($ch, CURLOPT_BINARYTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_COOKIE,1);
curl_setopt($ch, CURLOPT_COOKIEJAR,$cookie);
curl_setopt($ch, CURLOPT_COOKIEFILE,$cookie);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2);
$result11 = curl_exec($ch);
print_r($result11);

If you analyze the headers from a browser where cookies and javascript are disabled you should see the bare minimum sent - some, perhaps all might be required and are set with the context argument.
/* set the options for the stream context */
$args=array(
'http'=>array(
'method' => "GET",
'header' => array(
'User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:44.0) Gecko/20100101 Firefox/44.0',
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Host: www.wayfair.com',
'Accept-Encoding: gzip, deflate'
)
)
);
/* create the context */
$context=stream_context_create( $args );
$link = 'http://www.wayfair.com/a/product_review_page/get_update_reviews_json?_format=json&product_sku=KUS1523&page_number=5&sort_order=relevance&filter_rating=&filter_tag=&item_per_page=5';
/* Get the response from remote url */
$res = file_get_contents( $link, FILE_TEXT, $context );
/* process the response */
print_r( $res );

$url = "http://www.wayfair.com/a/product_review_page/get_update_reviews_json?_format=json&product_sku=KUS1523&page_number=5&sort_order=relevance&filter_rating=&filter_tag=&item_per_page=5";
$cookie = getcwd().DIRECTORY_SEPARATOR.'cookie.txt';
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_COOKIE,1);
curl_setopt($ch, CURLOPT_COOKIEJAR,$cookie);
curl_setopt($ch, CURLOPT_COOKIEFILE,$cookie);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2);
//added
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36");
$result11 = curl_exec($ch);
print_r($result11);
try this

How to do a GET request on PHP using CURL

I have a simple GET request that I am trying to make and get the results back. I have tried it in Postman without any headers or body and it works just fine. I have even put it in my browser and it returns a good result. But, when I do it in PHP I am not getting anything. This is what my code looks like. What am I doing wrong?
$curl = curl_init();
curl_setopt($curl,CURLOPT_URL,'http://********/vizportal/api/web/v1/auth/kerberosLogin');
curl_setopt($curl,CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_POST, 0);
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, '20');
$resp = curl_exec($curl);
echo $resp;

use this header to send header like browser to server :
$curl = curl_init('http://********/vizportal/api/web/v1/auth/kerberosLogin');
curl_setopt($curl, CURLOPT_POST, 0);
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, '20');
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
// curl_setopt($curl, CURLOPT_HEADER, true);
// curl_setopt($curl, CURLINFO_HEADER_OUT, true); // enable tracking
curl_setopt($curl, CURLOPT_HTTPHEADER, array(
'Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Encoding:gzip, deflate, sdch',
'Accept-Language:en-US,en;q=0.6',
'Cache-Control:max-age=0',
'Connection:keep-alive',
'Host:www.********.tld ', // for example : www.google.com
'Referer: http://********/vizportal/api/web/v1/auth/kerberosLogin',
'Upgrade-Insecure-Requests:1',
'User-Agent:Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36',
));
$response = curl_exec($curl);
curl_close($curl);

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

php download all images of webpage from link - php

Related

Whatsapp Business Cloud API returning empty string when trying to download media

Curl in PHP like a Real Browser Still Detected as a Bot

PHP Curl request response 403 - not authorized

file_get_contents not working in wayfair page

How to do a GET request on PHP using CURL

Categories

Resources