cURL PHP return weird string - php

Recently, I want to scraping a website using CURL PHP. And the problem come. It return weird string combination and symbol. I really confused about it. I have set the encoding, both in header and declared it in curlopt.
Here is the coding I used to scrap.
$ch = curl_init();
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path);
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
//curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_ENCODING, 'gzip,deflate,br');
curl_exec($ch);
curl_close($ch);
And this is the header I sent :
$header = [
':authority: www.airpaz.com',
':method: GET',
':path: $path,
':scheme: https',
'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'accept-encoding: gzip, deflate, br',
'accept-language: en-US,en;q=0.9',
'cache-control: max-age=0',
'referer: $referer',
'upgrade-insecure-requests: 1',
'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36'
];
When I run it, it return exactly like the image below :
Can anyone tell what's the problem is? Thanks for your time. It will help me a lot

Related

Instagram curl PHP 2023 post media url: /?__a=1&__d=dis

I trying to get the item on the next link (exemple):
https://www.instagram.com/p/D?__a=1&__d=dis
After some session i got the fallow message:
{"message":
"Please wait a few minutes before you try again.",
"require_login":true,
"status":"fail"
}
I am using the curl to fix this problem but with no results, some ideias?
<?php
header("Access-Control-Allow-Origin: *");
$arrSetHeaders = array(
'origin: https://www.instagram.com',
'authority: www.instagram.com',
'upgrade-insecure-requests: 1',
'Host: www.instagram.com',
'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3',
'accept-encoding: gzip, deflate, br',
'accept-language: en-GB,en;q=0.9',
'cache-control: max-age=0',
'sec-ch-ua: "Not_A Brand";v="99", "Google Chrome";v="109", "Chromium";v="109"',
'sec-fetch-dest: document',
'sec-fetch-mode: navigate',
'sec-fetch-site: same-origin',
);
$ch = curl_init("https://www.instagram.com/p/D?__a=1&__d=dis");
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36");
curl_setopt($ch, CURLOPT_COOKIEJAR, dirname(__FILE__) . "/temp/cookies.txt" );
curl_setopt($ch, CURLOPT_COOKIEFILE, dirname(__FILE__) . "/temp/cookies.txt" );
curl_setopt($ch, CURLOPT_HTTPHEADER, $arrSetHeaders, "*");
curl_setopt($ch, CURLOPT_REFERER, 'https://www.instagram.com/accounts/login/');
curl_setopt($ch, CURLOPT_COOKIE, "set-cookie: csrftoken=???; set-cookie: ds_user_id=???; set-cookie: sessionid=???");
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$data = curl_exec($ch);
echo '<pre style="word-wrap: break-word; white-space: pre-wrap;">' . $data . '</pre>';

CURL Returning Strange Characters

I'm trying to grab the source code of a website so I can parse out football fixtures, my code is:
<?php
$url = "https://www.bbc.co.uk/sport/football/scores-fixtures/2019-03-06";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
'User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0.2) Gecko/20100101 Firefox/6.0.2',
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language: en-gb,en;q=0.5',
'Accept-Encoding: gzip, deflate',
'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7',
'Proxy-Connection: Close',
'Cookie: PREF=ID=2bb051bfbf00e95b:U=c0bb6046a0ce0334:',
'Cache-Control: max-age=0',
'Connection: Close'
));
$output = curl_exec($ch);
curl_close($ch);
echo substr($output, 0, 12);
?>
Output of the substring shown is:
���
I need the output in standard text, is that compressed or something?
How do I fix this please?
Thanks.
I need the output in standard text, is that compressed or something?
Yes, exactly that: it's gzip-compressed. Your options are a) decompress it using e.g. gzdecode b) tell the server you don't want a gzip-encoded response; the easiest way is to let curl handle this for you:
delete 'Accept-Encoding: gzip, deflate', from your header array
Add: curl_setopt($ch, CURLOPT_ENCODING, 'identity'); somewhere before you curl_exec()

Why does my sequence of curl requests through PHP return me a 504 gateway error?

I extract from a database table a set of almost 1500 data, and for each of this data I should call an endpoint through CURL in this way:
for($i=0; $i <1500; $i++) {
$headers = [
'Host: www.hostname.it',
'Accept: application/json, text/javascript, */*; q=0.01',
'X-Requested-With: XMLHttpRequest',
'Accept-Language: it-it',
'Content-Type: application/x-www-form-urlencoded; charset=UTF-8',
'Origin: https://www.desiderimagazine.it',
'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/603.1.30 (KHTML, like Gecko) Version/10.1 Safari/603.1.30',
'Connection: close',
'Referer: https://www.hpstname.it/page/registrazione?utm_source=gate2000&utm_medium=display&utm_campaign=ghh_gen18_regist&utm_content=leadcampaign2',
'Content-Length: '.mb_strlen($post_fields, '8bit')
];
$ch = curl_init($endpoint);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch,CURLOPT_ENCODING , "");
curl_setopt($cu, CURLOPT_USERAGENT, "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/601.6.17 (KHTML, like Gecko) Version/9.1.1 Safari/601.6.17");
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, 24000);
curl_setopt($ch, CURLOPT_CONNECTIONTIMEOUT, 24000);
curl_setopt($ch, CURLOPT_POSTFIELDS,$post_fields); //Post Fields
$result = curl_exec($ch);
// .. here I save the result on database
}
I run the script inserting its url on the browser and it works fine ( I correctly see the results on the database and the endpoint response) for the first 20-30 data, more or less. After that I sistematically get a
504 error - Gateway error timeout
I suspect it could be the way I execute it, but there must be some configuration I can change on my code in order to fix it.
Thanks

php curl to instagram returns odd result

include_once('simple_html_dom.php');
$usuario = "username";
$password = "password";
$url = 'https://www.instagram.com/';
$url_login = 'https://www.instagram.com/accounts/login/ajax/';
$user_agent = array("Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 ",
"(KHTML, like Gecko) Chrome/48.0.2564.103 Safari/537.36");
$ch = curl_init();
$headers = [
'Accept-Encoding: gzip, deflate',
'Accept-Language: en-US;q=0.6,en;q=0.4',
'Connection: keep-alive',
'Content-Length: 0',
'Host: www.instagram.com',
'Origin: https://www.instagram.com',
'Referer: https://www.instagram.com/',
'User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.103 Safari/537.36',
'X-Instagram-AJAX: 1',
'X-Requested-With: XMLHttpRequest'
];
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
curl_setopt($ch, CURLOPT_COOKIEFILE, "/tmp/cookie/pruebalogininsta2.txt");
curl_setopt($ch, CURLOPT_REFERER, $sTarget);
curl_setopt($ch, CURLOPT_HEADER, TRUE);
$html = curl_exec($ch);
preg_match_all('/^Set-Cookie:\s*([^;]*)/mi', $html, $matches);
$cookies = array();
foreach($matches[1] as $item) {
parse_str($item, $cookie);
$cookies = array_merge($cookies, $cookie);
}
$headers = [
'Accept-Encoding: gzip, deflate',
//'Accept-Language: ru-RU,ru;q=0.8,en-US;q=0.6,en;q=0.4',
'Accept-Language: en-US;q=0.6,en;q=0.4',
'Connection: keep-alive',
'Content-Length: 0',
'Host: www.instagram.com',
'Origin: https://www.instagram.com',
'Referer: https://www.instagram.com/',
'User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.103 Safari/537.36',
'X-Instagram-AJAX: 1',
'X-Requested-With: XMLHttpRequest'
];
$cadena_agregar_vector = 'X-CSRFToken:'. $cookies["csrftoken"];
$headers[] = $cadena_agregar_vector ;
$sPost = "username=".$usuario . "&password=". $password ;
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_POSTFIELDS, $sPost);
curl_setopt($ch, CURLOPT_URL, $url_login);
$html2 = curl_exec($ch);
curl_setopt($ch, CURLOPT_URL, "http://www.instagram.com/");
$html4 = curl_exec($ch);
echo $html4;
this is what I get
the problem is the way you hardcode Accept-Encoding: gzip, deflate, this makes curl send the encoding header indeed, but it does not turn on the decoding feature of curl, thus you get the raw data, without curl decoding it for you.
remove 'Accept-Encoding: gzip, deflate', and add curl_setopt($ch, CURLOPT_ENCODING, 'gzip, deflate'); , and curl will decode it for you (provided that curl is compiled with gzip & deflate support) - or better yet, just do curl_setopt($ch, CURLOPT_ENCODING, ''); , and curl will automatically list all supported encodings, so you dont run into the encoding problem where curl isn't compiled with gzip support.
on an unrelated note, you probably want to use CURLOPT_USERAGENT, not set the user-agent header manually. else, the UA-string will just be sent with this 1 request, and be reset on the next request, while CURLOPT_USERAGENT is kept until curl_close($ch)
edit: on my first revision of this post, i wrote CURLOPT_POSTFIELDS instead of CURLOPT_ENCODING, sorry, fixed that
edit 2: on another unrelated note, you're encoding the username/password wrong. instead of $sPost = "username=".$usuario . "&password=". $password ;, do
$sPost=http_build_query(array('username'=>$usuario,'password'=>$password));, else accounts with & or = or NULLs in the password or username wont work properly
The answer posted by #hanshenrik should really be accepted. But if you just want an easy solution that works and is not incorrect, remove the 'Accept-Encoding: gzip, deflate' from your headers array.

videoslasher video download script php

I got a problem with downloading video from videoslasher.com. I wrote a script to get source link and it is working fine.
Example link:
http://storage2.videoslasher.com/free/K/K1/K1Z6TBW88QCM.flv?h=XeahiwUFe_yHiduYx5EzBg&e=1377200810
But when i am downloading with script i get 403 (forbidden).
Code downloading video:
$headers[] = 'User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0';
$headers[] = 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8';
$headers[] = 'Accept-Language: pl,en-us;q=0.7,en;q=0.3';
$headers[] = 'Accept-Encoding: gzip, deflate';
$headers[] = 'Connection: keep-alive';
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_REFERER, 'http://www.videoslasher.com/static/player/flowplayer.commercial-3.2.7.swf');
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt( $ch, CURLOPT_HTTPHEADER, $headers );
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_COOKIEFILE,'cookie.txt');
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookie.txt');
curl_setopt($ch, CURLOPT_BUFFERSIZE, 4096000);
echo curl_exec($ch);
curl_close($ch);
Does anyone know what could be the problem?

Categories