PHP check download link without downloading the file

PHP check download link without downloading the file - php

On my site I have a couple links for downloading a file, but I want to make a php script that check if the download link is still online.
This is the code I'm using:
$cl = curl_init($url);
curl_setopt($cl,CURLOPT_CONNECTTIMEOUT,10);
curl_setopt($cl,CURLOPT_HEADER,true);
curl_setopt($cl,CURLOPT_NOBODY,true);
curl_setopt($cl,CURLOPT_RETURNTRANSFER,true);
if(!curl_exec($cl)){
echo 'The download link is offline';
die();
}
$code = curl_getinfo($cl, CURLINFO_HTTP_CODE);
if($code != 200){
echo 'The download link is offline';
}else{
echo 'The download link is online!';
}
The problem is that it downloads the whole file which makes it really slow, and I only need to check the headers. I saw that curl has an option CURLOPT_CONNECT_ONLY, but the webhost I'm using has php version 5.4 which doesn't have that option. Is there any other way I can do this?

Try add curl_setopt( $cl, CURLOPT_CUSTOMREQUEST, 'HEAD' ); to send HEAD request.

Related

get_headers() used on live site is not returning any array but on localhost it is

When I use the function get_headers($url) where $url = "https://www.example.com/product.php?id=15" on my live site then it is not returning any array from given url. I get nothing. But when the same code is used on my localhost, I get following:
Array
(
[0] => HTTP/1.1 200 OK
[1] => Cache-Control: private
[2] => Content-Type: text/html; charset=utf-8
[3] => Server: Microsoft-IIS/8.5
[4] => Set-Cookie: ASP.NET_SessionId=wumg0dyscw3c4pmaliwehwew; path=/; HttpOnly
[5] => X-AspNetMvc-Version: 4.0
[6] => X-AspNet-Version: 4.0.30319
[7] => X-Powered-By: ASP.NET
[8] => Date: Fri, 18 Aug 2017 13:06:18 GMT
[9] => Connection: close
[10] => Content-Length: 73867
)
So, why the function is not working successfully on live?
EDIT
<?php
if(isset($_POST['prdurl']))
{
$url = $_POST['prdurl'];
print_r(get_headers($url)); // not getting any array on live but on localhost
if(is_array(#get_headers($url)))
{
// some code goes here...
}
else
{
echo "URL doesn't exist!"
}
}
?>
One more thing to note down here is that I'm using file_get_html to retrieve the html page from the remote url. It's working on my localhost but not on live as well.

preg_match() expects parameter 2 to be string, array Given not working [duplicate]

This question already has answers here:
Reference - What does this error mean in PHP?
(38 answers)
Closed 3 years ago.
<?php
$a="https://sayat.me/chitmarike";
$html=file_get_contents("$a");
$headers = get_headers($a);
preg_match('~id="bar" value="([^"]*)"~', $html, $img);
$img1 = $img[1];
echo $img1;
preg_match('/(?<=csam=).*?(?=;)/', $headers, $cook);
$cook1 = $cook[1];
echo $cook1;
?>
I want to extract the value of csam from the cookie header.
This is what it looks like:
Array
(
[0] => HTTP/1.1 200 OK
[1] => Date: Fri, 07 Apr 2017 19:05:03 GMT
[2] => Content-Type: text/html; charset=UTF-8
[3] => Connection: close
[4] => Set-Cookie: __cfduid=d6dea25f00686a7cef5f0a3d21195207c1491599902; expires=Sat, 07-Apr-18 19:05:23 GMT; path=/; domain=.sayat.me; HttpOnly
[5] => Set-Cookie: PHPSESSID=m3hvgquu2vtcp9ingqmkttqgs2; path=/
[6] => Expires: Thu, 19 Nov 1981 08:52:00 GMT
[7] => Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
[8] => Pragma: no-cache
[9] => Set-Cookie: csam=5844bc1d44; expires=Fri, 07-Apr-2017 19:35:36 GMT; Max-Age=1800; path=/
[10] => X-CSRF-Protection: SAM v2.0
[11] => Set-Cookie: country=IN; expires=Sun, 07-May-2017 19:05:36 GMT; Max-Age=2592000; path=/
[12] => Vary: Accept-Encoding
[13] => McID: sam-web4
[14] => Server: cloudflare-nginx
[15] => CF-RAY: 34bf420dae9069fb-LHR
)
But i am getting this error
Warning: preg_match() expects parameter 2 to be string, array given in
C:\xampp\htdocs\sayat\index.php on line 9
What am i doing wrong?

Spending time to read and well understand an error message isn't wasted time. Error messages are simple and clear: preg_match() expects parameter 2 to be string, array given. Conclusion, in: preg_match('/(?<=csam=).*?(?=;)/', $headers, $cook);, $headers is an array when preg_match expects the subject (the second parameter) to be a string, nothing more, nothing less.
Problem, $headers is filled by get_headers that returns an array. Two possible ways to solve the problem:
implode the array and search the resulting string with your pattern or rewrite it like this: /csam=\K[^;]+/
set the second parameter of get_headers to 1 and use the array structure to find the information you want:
Example:
$a="https://sayat.me/chitmarike";
$headers = get_headers($a, 1);
foreach ($headers['Set-Cookie'] as $v) {
if ( strpos($v, 'csam=') === 0 ) {
$cook = substr($v, 5, strpos($v, ';') - 5);
break;
}
}

Function get_headers returns an array, but preg_match requires a string, as it is said in the error.
Concatenate the result of get_headers before calling preg_match.

how to identify the web server name of remote host

According to this solution link it shows how to get the web server name for a local web server but how to do the same for a remote server by URL ?
i.e. $_SERVER['software'] returns name like Apache/2.2.21 (Win32) PHP/5.3.10
how can I apply this solution for a remote server - example here: http://browserspy.dk/webserver.php
I want to be able to specify the name of the remote server i.e. $url = 'www.domain.com'; - I want to get the web server name as shown above for host name specified in $url
I am only interested in the web server name

One method of doing this is using PHP's get_headers() function which return the web-servers response headers
$url = 'http://php.net';
print_r(get_headers($url));
which will return
Array
(
[0] => HTTP/1.1 200 OK
[1] => Server: nginx/1.6.2
[2] => Date: Fri, 08 May 2015 13:21:44 GMT
[3] => Content-Type: text/html; charset=utf-8
[4] => Connection: close
[5] => X-Powered-By: PHP/5.6.7-1
[6] => Last-Modified: Fri, 08 May 2015 13:10:12 GMT
[7] => Content-language: en
[8] => X-Frame-Options: SAMEORIGIN
[9] => Set-Cookie: COUNTRY=NA%2C95.77.98.186; expires=Fri, 15-May-2015 13:21:44 GMT; Max-Age=604800; path=/; domain=.php.net
[10] => Set-Cookie: LAST_NEWS=1431091304; expires=Sat, 07-May-2016 13:21:44 GMT; Max-Age=31536000; path=/; domain=.php.net
[11] => Link: <http://php.net/index>; rel=shorturl
[12] => Vary: Accept-Encoding
)
as you can see you have the server header which tells you that they are running nginx/1.6.2
or you can add the second parameter to the function which will return the allready parsed headers
$url = 'http://php.net';
$headers = get_headers($url, true);
echo $headers['Server']; //ngnix/1.6.2

trainoasis is right, you can use :
$_SERVER['SERVER_SOFTWARE']
OR
$_SERVER['SERVER_SIGNATURE']
OR
gethostbyaddr($_SERVER['REMOTE_ADDR']);

Google Play links validation via PHP

I want to check via script if Google Play link for app is valid:
https://play.google.com/store/apps/details?id=com.ketchapp.zigzaggame - valid
https://play.google.com/store/apps/details?id=com.ketchapp.zigzaggamessdasd - invalid
but every script what I bought or is free is giving for me 404 or 303 response. There is some redirect probably.
How to validate links like that. I need to check some 1000 links in my ad system if apps exist in Google Play store.
I will write myself loops, reading from database, etc. but please someone familiar with php, help with the check. I spended some $300 for this and got cheated by 2 people, that is "checking" link. Always 404 or 303.

Try this :
<?php
/**
* Check google play app
*
* #param string $url Url to check
*
* #return boolean True if it exists, false otherwise
* #throws \Exception On Curl error, an exception is thrown
*/
function checkGooglePlayApp($url)
{
$curlOptions = array(
CURLOPT_RETURNTRANSFER => true,
CURLOPT_CUSTOMREQUEST => 'GET',
CURLOPT_URL => $url
);
$ch = curl_init();
curl_setopt_array($ch, $curlOptions);
$result = curl_exec($ch);
$http_code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
if ($curl_error = curl_error($ch))
{
throw new \Exception($curl_error, Exception::CURL_ERROR);
}
curl_close($ch);
return $http_code == '200';
}
$url = 'https://play.google.com/store/apps/details?id=com.ketchapp.zigzaggameERRORERROR';
$result = checkGooglePlayApp($url);
var_dump($result); // Should return false
$url = 'https://play.google.com/store/apps/details?id=com.ketchapp.zigzaggame';
$result = checkGooglePlayApp($url);
var_dump($result); // Should return true
It will return :
bool(false)
bool(true)

This can be easily done with the get_headers function. For example:
Incorrect URL
$file = 'https://play.google.com/store/apps/details?id=com.ketchapp.zigzaggamessdasd';
$file_headers = get_headers($file);
print_r($file_headers);
Will return:
Array
(
[0] => HTTP/1.0 404 Not Found
[1] => Cache-Control: no-cache, no-store, max-age=0, must-revalidate
[2] => Pragma: no-cache
[3] => Expires: Fri, 01 Jan 1990 00:00:00 GMT
[4] => Date: Tue, 03 Mar 2015 04:23:31 GMT
[5] => Content-Type: text/html; charset=utf-8
[6] => Set-Cookie: NID=67=QFThy03gh34QypYfoLFTz7bJDI-qzXvuzI05DtrF3aVs1L7NJO9byV6kemHRVVkViz-sodx3Z0GuCQTu9a_1JvToen6ZtjfhNy8MH6DDgH6zix2I4Gm9mauBPCxipnlG;Domain=.google.com;Path=/;Expires=Wed, 02-Sep-2015 04:23:31 GMT;HttpOnly
[7] => P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info."
[8] => X-Content-Type-Options: nosniff
[9] => X-Frame-Options: SAMEORIGIN
[10] => X-XSS-Protection: 1; mode=block
[11] => Server: GSE
[12] => Alternate-Protocol: 443:quic,p=0.08
[13] => Accept-Ranges: none
[14] => Vary: Accept-Encoding
)
If the file does exist, will return:
Array
(
[0] => HTTP/1.0 200 OK
[1] => Content-Type: text/html; charset=utf-8
[2] => Set-Cookie: PLAY_PREFS=CgJVUxC6uYnvvSkourmJ770p:S:ANO1ljKvPst7-nSw; Path=/; Secure; HttpOnly
[3] => Set-Cookie: NID=67=iFUl_Ls8EhAJE7STIJD7Wdq6NF-y4i6Xrlb78My75ZaruVWlAKObDRDNGDddGxD0hSsLRpvrQK7Tp5nuKCgGg2jF1GUf9_4H_zYsUDQ548Be2n8EDjp9clDfXKLYjmSg;Domain=.google.com;Path=/;Expires=Wed, 02-Sep-2015 04:26:14 GMT;HttpOnly
[4] => Cache-Control: no-cache, no-store, max-age=0, must-revalidate
[5] => Pragma: no-cache
[6] => Expires: Fri, 01 Jan 1990 00:00:00 GMT
[7] => Date: Tue, 03 Mar 2015 04:26:14 GMT
[8] => P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info."
[9] => X-Content-Type-Options: nosniff
[10] => X-Frame-Options: SAMEORIGIN
[11] => X-XSS-Protection: 1; mode=block
[12] => Server: GSE
[13] => Alternate-Protocol: 443:quic,p=0.08
[14] => Accept-Ranges: none
[15] => Vary: Accept-Encoding
)
So you can create a script like:
<?php
$files = ['https://play.google.com/store/apps/details?id=com.ketchapp.zigzaggame', 'https://play.google.com/store/apps/details?id=com.ketchapp.zigzaggamesadasd'];
foreach($files as $file)
{
$headers = get_headers($file);
if($headers[0] == 'HTTP/1.0 404 Not Found')
{
return false;
}
else
{
return true;
}
}
?>

You can simply do as
function checkGooglePlayApp($url)
{
$headers = get_headers($url);
return $headers[0] == 'HTTP/1.0 404 Not Found';
}
$inValid = checkGooglePlayApp("https://play.google.com/store/apps/details?id=com.ketchapp.zigzaggame");
if(!$inVald)
{
echo "URL Valid";
}
else{
echo "URL Invalid";
}

Why is this returning a "Not Found" with PHP and cURL?

My script works with all other links I tried, and i get the same response with cURL also (and this is a lot smaller, so I like this code):
<?php
$url = $_GET['url'];
$header = get_headers($url,1);
print_r($header);
function get_url($u,$h){
if(preg_match('/200/',$h[0])){
echo file_get_contents($u);
}
elseif(preg_match('/301/',$h[0])){
$nh = get_headers($h['Location']);
get_url($h['Location'],$nh);
}
}
get_url($url,$header);
?>
But for:
http://www.anthropologie.com/anthro/catalog/productdetail.jsp?subCategoryId=HOME-TABLETOP-UTENSILS&id=78110&catId=HOME-TABLETOP&pushId=HOME-TABLETOP&popId=HOME&sortProperties=&navCount=355&navAction=top&fromCategoryPage=true&selectedProductSize=&selectedProductSize1=&color=sil&colorName=SILVER&isProduct=true&isBigImage=&templateType=
And:
http://www.urbanoutfitters.com/urban/catalog/productdetail.jsp?itemdescription=true&itemCount=80&startValue=1&selectedProductColor=&sortby=&id=14135412&parentid=A_FURN_BATH&sortProperties=+subCategoryPosition,&navCount=56&navAction=poppushpush&color=&pushId=A_FURN_BATH&popId=A_DECORATE&prepushId=&selectedProductSize=
(and all Anthropologie product links). I'm assuming other sites I have no yet found act this way also. Here is my header response:
Array
(
[0] => HTTP/1.1 200 OK
[Server] => Apache
[X-Powered-By] => Servlet 2.4; JBoss-4.2.0.GA_CP05 (build: SVNTag=JBPAPP_4_2_0_GA_CP05 date=200810231548)/JBossWeb-2.0
[X-ATG-Version] => version=RENTLUFEQyxBVEdQbGF0Zm9ybS85LjFwMSxBREMgWyBEUFNMaWNlbnNlLzAgIF0=
[Content-Type] => text/html;charset=ISO-8859-1
[Date] => Sat, 24 Jul 2010 23:47:47 GMT
[Content-Length] => 21669
[Connection] => keep-alive
[Set-Cookie] => Array
(
[0] => JSESSIONID=65CA111ADBF267A3B405C69A325576F8.app46-node2; Path=/
[1] => visitCount=1; Expires=Fri, 29-May-2026 00:41:07 GMT; Path=/
[2] => UOCCII:=; Expires=Mon, 23-Aug-2010 23:47:47 GMT; Path=/
[3] => LastVisited=2010-07-24; Expires=Fri, 29-May-2026 00:41:07 GMT; Path=/
)
)
I'm guessing maybe it has to do with the cookies? Any ideas?

Install fiddler and see what is actually being sent.
You can also try setting your user-agent to a real browser. Sometimes sites try to prevent scraping by checking this.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP check download link without downloading the file - php

Try add curl_setopt( $cl, CURLOPT_CUSTOMREQUEST, 'HEAD' ); to send HEAD request.

Related

get_headers() used on live site is not returning any array but on localhost it is

preg_match() expects parameter 2 to be string, array Given not working [duplicate]

how to identify the web server name of remote host

Google Play links validation via PHP

Why is this returning a "Not Found" with PHP and cURL?

Categories

Resources