PHP check download link without downloading the file - php

On my site I have a couple links for downloading a file, but I want to make a php script that check if the download link is still online.
This is the code I'm using:
$cl = curl_init($url);
curl_setopt($cl,CURLOPT_CONNECTTIMEOUT,10);
curl_setopt($cl,CURLOPT_HEADER,true);
curl_setopt($cl,CURLOPT_NOBODY,true);
curl_setopt($cl,CURLOPT_RETURNTRANSFER,true);
if(!curl_exec($cl)){
echo 'The download link is offline';
die();
}
$code = curl_getinfo($cl, CURLINFO_HTTP_CODE);
if($code != 200){
echo 'The download link is offline';
}else{
echo 'The download link is online!';
}
The problem is that it downloads the whole file which makes it really slow, and I only need to check the headers. I saw that curl has an option CURLOPT_CONNECT_ONLY, but the webhost I'm using has php version 5.4 which doesn't have that option. Is there any other way I can do this?

CURLOPT_CONNECT_ONLY would be good, but it’s only available in PHP 5.5 & abodes. So instead, try using get_headers. Or even use another method using fopen, stream_context_create & stream_get_meta_data. First the get_headers method:
// Set a test URL.
$url = "https://www.google.com/";
// Get the headers.
$headers = get_headers($url);
// Check if the headers are empty.
if(empty($headers)){
echo 'The download link is offline';
die();
}
// Use a regex to see if the response code is 200.
preg_match('/\b200\b/', $headers[0], $matches);
// Act on whether the matches are empty or not.
if(empty($matches)){
echo 'The download link is offline';
}
else{
echo 'The download link is online!';
}
// Dump the array of headers for debugging.
echo '<pre>';
print_r($headers);
echo '</pre>';
// Dump the array of matches for debugging.
echo '<pre>';
print_r($matches);
echo '</pre>';
And the output of this—including the dumps used for debugging—would be:
The download link is online!
Array
(
[0] => HTTP/1.0 200 OK
[1] => Date: Sat, 14 Jun 2014 15:56:28 GMT
[2] => Expires: -1
[3] => Cache-Control: private, max-age=0
[4] => Content-Type: text/html; charset=ISO-8859-1
[5] => Set-Cookie: PREF=ID=6e3e1a0d528b0941:FF=0:TM=1402761388:LM=1402761388:S=4YKP2U9qC6aMgxpo; expires=Mon, 13-Jun-2016 15:56:28 GMT; path=/; domain=.google.com
[6] => Set-Cookie: NID=67=Wun72OJYmuA_TQO95WXtbFOK5g-xU53PQZ7dAIBtzCaBWxhXzduHQZfBVPf4LpaK3MVH8ZKbrBIc3-vTKuMlEnMdpWH0mcft5pA_0kCoe4qolDmednpPJqezZF_HyfXD; expires=Sun, 14-Dec-2014 15:56:28 GMT; path=/; domain=.google.com; HttpOnly
[7] => P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info."
[8] => Server: gws
[9] => X-XSS-Protection: 1; mode=block
[10] => X-Frame-Options: SAMEORIGIN
[11] => Alternate-Protocol: 443:quic
)
Array
(
[0] => 200
)
And here is another method using fopen, stream_context_create & stream_get_meta_data. The benefit of this method is it gives you a bit more info on what actions were taken to fetch the URL in addition to the headers:
// Set a test URL.
$url = "https://www.google.com/";
// Set the stream_context_create options.
$opts = array(
'http' => array(
'method' => 'HEAD'
)
);
// Create context stream with stream_context_create.
$context = stream_context_create($opts);
// Use fopen with rb (read binary) set and the context set above.
$handle = fopen($url, 'rb', false, $context);
// Get the headers with stream_get_meta_data.
$headers = stream_get_meta_data($handle);
// Close the fopen handle.
fclose($handle);
// Use a regex to see if the response code is 200.
preg_match('/\b200\b/', $headers['wrapper_data'][0], $matches);
// Act on whether the matches are empty or not.
if(empty($matches)){
echo 'The download link is offline';
}
else{
echo 'The download link is online!';
}
// Dump the array of headers for debugging.
echo '<pre>';
print_r($headers);
echo '</pre>';
And here is the output of that:
The download link is online!
Array
(
[wrapper_data] => Array
(
[0] => HTTP/1.0 200 OK
[1] => Date: Sat, 14 Jun 2014 16:14:58 GMT
[2] => Expires: -1
[3] => Cache-Control: private, max-age=0
[4] => Content-Type: text/html; charset=ISO-8859-1
[5] => Set-Cookie: PREF=ID=32f21aea66dcfd5c:FF=0:TM=1402762498:LM=1402762498:S=NVP-y-kW9DktZPAG; expires=Mon, 13-Jun-2016 16:14:58 GMT; path=/; domain=.google.com
[6] => Set-Cookie: NID=67=mO_Ihg4TgCTizpySHRPnxuTp514Hou5STn2UBdjvkzMn4GPZ4e9GHhqyIbwap8XuB8SuhjpaY9ZkVinO4vVOmnk_esKKTDBreIZ1sTCsz2yusNLKA9ht56gRO4uq3B9I; expires=Sun, 14-Dec-2014 16:14:58 GMT; path=/; domain=.google.com; HttpOnly
[7] => P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info."
[8] => Server: gws
[9] => X-XSS-Protection: 1; mode=block
[10] => X-Frame-Options: SAMEORIGIN
[11] => Alternate-Protocol: 443:quic
)
[wrapper_type] => http
[stream_type] => tcp_socket/ssl
[mode] => rb
[unread_bytes] => 0
[seekable] =>
[uri] => https://www.google.com/
[timed_out] =>
[blocked] => 1
[eof] =>
)

Try add curl_setopt( $cl, CURLOPT_CUSTOMREQUEST, 'HEAD' ); to send HEAD request.

Related

get_headers() used on live site is not returning any array but on localhost it is

When I use the function get_headers($url) where $url = "https://www.example.com/product.php?id=15" on my live site then it is not returning any array from given url. I get nothing. But when the same code is used on my localhost, I get following:
Array
(
[0] => HTTP/1.1 200 OK
[1] => Cache-Control: private
[2] => Content-Type: text/html; charset=utf-8
[3] => Server: Microsoft-IIS/8.5
[4] => Set-Cookie: ASP.NET_SessionId=wumg0dyscw3c4pmaliwehwew; path=/; HttpOnly
[5] => X-AspNetMvc-Version: 4.0
[6] => X-AspNet-Version: 4.0.30319
[7] => X-Powered-By: ASP.NET
[8] => Date: Fri, 18 Aug 2017 13:06:18 GMT
[9] => Connection: close
[10] => Content-Length: 73867
)
So, why the function is not working successfully on live?
EDIT
<?php
if(isset($_POST['prdurl']))
{
$url = $_POST['prdurl'];
print_r(get_headers($url)); // not getting any array on live but on localhost
if(is_array(#get_headers($url)))
{
// some code goes here...
}
else
{
echo "URL doesn't exist!"
}
}
?>
One more thing to note down here is that I'm using file_get_html to retrieve the html page from the remote url. It's working on my localhost but not on live as well.

preg_match() expects parameter 2 to be string, array Given not working [duplicate]

This question already has answers here:
Reference - What does this error mean in PHP?
(38 answers)
Closed 3 years ago.
<?php
$a="https://sayat.me/chitmarike";
$html=file_get_contents("$a");
$headers = get_headers($a);
preg_match('~id="bar" value="([^"]*)"~', $html, $img);
$img1 = $img[1];
echo $img1;
preg_match('/(?<=csam=).*?(?=;)/', $headers, $cook);
$cook1 = $cook[1];
echo $cook1;
?>
I want to extract the value of csam from the cookie header.
This is what it looks like:
Array
(
[0] => HTTP/1.1 200 OK
[1] => Date: Fri, 07 Apr 2017 19:05:03 GMT
[2] => Content-Type: text/html; charset=UTF-8
[3] => Connection: close
[4] => Set-Cookie: __cfduid=d6dea25f00686a7cef5f0a3d21195207c1491599902; expires=Sat, 07-Apr-18 19:05:23 GMT; path=/; domain=.sayat.me; HttpOnly
[5] => Set-Cookie: PHPSESSID=m3hvgquu2vtcp9ingqmkttqgs2; path=/
[6] => Expires: Thu, 19 Nov 1981 08:52:00 GMT
[7] => Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
[8] => Pragma: no-cache
[9] => Set-Cookie: csam=5844bc1d44; expires=Fri, 07-Apr-2017 19:35:36 GMT; Max-Age=1800; path=/
[10] => X-CSRF-Protection: SAM v2.0
[11] => Set-Cookie: country=IN; expires=Sun, 07-May-2017 19:05:36 GMT; Max-Age=2592000; path=/
[12] => Vary: Accept-Encoding
[13] => McID: sam-web4
[14] => Server: cloudflare-nginx
[15] => CF-RAY: 34bf420dae9069fb-LHR
)
But i am getting this error
Warning: preg_match() expects parameter 2 to be string, array given in
C:\xampp\htdocs\sayat\index.php on line 9
What am i doing wrong?
Spending time to read and well understand an error message isn't wasted time. Error messages are simple and clear: preg_match() expects parameter 2 to be string, array given. Conclusion, in: preg_match('/(?<=csam=).*?(?=;)/', $headers, $cook);, $headers is an array when preg_match expects the subject (the second parameter) to be a string, nothing more, nothing less.
Problem, $headers is filled by get_headers that returns an array. Two possible ways to solve the problem:
implode the array and search the resulting string with your pattern or rewrite it like this: /csam=\K[^;]+/
set the second parameter of get_headers to 1 and use the array structure to find the information you want:
Example:
$a="https://sayat.me/chitmarike";
$headers = get_headers($a, 1);
foreach ($headers['Set-Cookie'] as $v) {
if ( strpos($v, 'csam=') === 0 ) {
$cook = substr($v, 5, strpos($v, ';') - 5);
break;
}
}
Function get_headers returns an array, but preg_match requires a string, as it is said in the error.
Concatenate the result of get_headers before calling preg_match.

how to identify the web server name of remote host

According to this solution link it shows how to get the web server name for a local web server but how to do the same for a remote server by URL ?
i.e. $_SERVER['software'] returns name like Apache/2.2.21 (Win32) PHP/5.3.10
how can I apply this solution for a remote server - example here: http://browserspy.dk/webserver.php
I want to be able to specify the name of the remote server i.e. $url = 'www.domain.com'; - I want to get the web server name as shown above for host name specified in $url
I am only interested in the web server name
One method of doing this is using PHP's get_headers() function which return the web-servers response headers
$url = 'http://php.net';
print_r(get_headers($url));
which will return
Array
(
[0] => HTTP/1.1 200 OK
[1] => Server: nginx/1.6.2
[2] => Date: Fri, 08 May 2015 13:21:44 GMT
[3] => Content-Type: text/html; charset=utf-8
[4] => Connection: close
[5] => X-Powered-By: PHP/5.6.7-1
[6] => Last-Modified: Fri, 08 May 2015 13:10:12 GMT
[7] => Content-language: en
[8] => X-Frame-Options: SAMEORIGIN
[9] => Set-Cookie: COUNTRY=NA%2C95.77.98.186; expires=Fri, 15-May-2015 13:21:44 GMT; Max-Age=604800; path=/; domain=.php.net
[10] => Set-Cookie: LAST_NEWS=1431091304; expires=Sat, 07-May-2016 13:21:44 GMT; Max-Age=31536000; path=/; domain=.php.net
[11] => Link: <http://php.net/index>; rel=shorturl
[12] => Vary: Accept-Encoding
)
as you can see you have the server header which tells you that they are running nginx/1.6.2
or you can add the second parameter to the function which will return the allready parsed headers
$url = 'http://php.net';
$headers = get_headers($url, true);
echo $headers['Server']; //ngnix/1.6.2
trainoasis is right, you can use :
$_SERVER['SERVER_SOFTWARE']
OR
$_SERVER['SERVER_SIGNATURE']
OR
gethostbyaddr($_SERVER['REMOTE_ADDR']);

Google Play links validation via PHP

I want to check via script if Google Play link for app is valid:
https://play.google.com/store/apps/details?id=com.ketchapp.zigzaggame - valid
https://play.google.com/store/apps/details?id=com.ketchapp.zigzaggamessdasd - invalid
but every script what I bought or is free is giving for me 404 or 303 response. There is some redirect probably.
How to validate links like that. I need to check some 1000 links in my ad system if apps exist in Google Play store.
I will write myself loops, reading from database, etc. but please someone familiar with php, help with the check. I spended some $300 for this and got cheated by 2 people, that is "checking" link. Always 404 or 303.
Try this :
<?php
/**
* Check google play app
*
* #param string $url Url to check
*
* #return boolean True if it exists, false otherwise
* #throws \Exception On Curl error, an exception is thrown
*/
function checkGooglePlayApp($url)
{
$curlOptions = array(
CURLOPT_RETURNTRANSFER => true,
CURLOPT_CUSTOMREQUEST => 'GET',
CURLOPT_URL => $url
);
$ch = curl_init();
curl_setopt_array($ch, $curlOptions);
$result = curl_exec($ch);
$http_code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
if ($curl_error = curl_error($ch))
{
throw new \Exception($curl_error, Exception::CURL_ERROR);
}
curl_close($ch);
return $http_code == '200';
}
$url = 'https://play.google.com/store/apps/details?id=com.ketchapp.zigzaggameERRORERROR';
$result = checkGooglePlayApp($url);
var_dump($result); // Should return false
$url = 'https://play.google.com/store/apps/details?id=com.ketchapp.zigzaggame';
$result = checkGooglePlayApp($url);
var_dump($result); // Should return true
It will return :
bool(false)
bool(true)
This can be easily done with the get_headers function. For example:
Incorrect URL
$file = 'https://play.google.com/store/apps/details?id=com.ketchapp.zigzaggamessdasd';
$file_headers = get_headers($file);
print_r($file_headers);
Will return:
Array
(
[0] => HTTP/1.0 404 Not Found
[1] => Cache-Control: no-cache, no-store, max-age=0, must-revalidate
[2] => Pragma: no-cache
[3] => Expires: Fri, 01 Jan 1990 00:00:00 GMT
[4] => Date: Tue, 03 Mar 2015 04:23:31 GMT
[5] => Content-Type: text/html; charset=utf-8
[6] => Set-Cookie: NID=67=QFThy03gh34QypYfoLFTz7bJDI-qzXvuzI05DtrF3aVs1L7NJO9byV6kemHRVVkViz-sodx3Z0GuCQTu9a_1JvToen6ZtjfhNy8MH6DDgH6zix2I4Gm9mauBPCxipnlG;Domain=.google.com;Path=/;Expires=Wed, 02-Sep-2015 04:23:31 GMT;HttpOnly
[7] => P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info."
[8] => X-Content-Type-Options: nosniff
[9] => X-Frame-Options: SAMEORIGIN
[10] => X-XSS-Protection: 1; mode=block
[11] => Server: GSE
[12] => Alternate-Protocol: 443:quic,p=0.08
[13] => Accept-Ranges: none
[14] => Vary: Accept-Encoding
)
If the file does exist, will return:
Array
(
[0] => HTTP/1.0 200 OK
[1] => Content-Type: text/html; charset=utf-8
[2] => Set-Cookie: PLAY_PREFS=CgJVUxC6uYnvvSkourmJ770p:S:ANO1ljKvPst7-nSw; Path=/; Secure; HttpOnly
[3] => Set-Cookie: NID=67=iFUl_Ls8EhAJE7STIJD7Wdq6NF-y4i6Xrlb78My75ZaruVWlAKObDRDNGDddGxD0hSsLRpvrQK7Tp5nuKCgGg2jF1GUf9_4H_zYsUDQ548Be2n8EDjp9clDfXKLYjmSg;Domain=.google.com;Path=/;Expires=Wed, 02-Sep-2015 04:26:14 GMT;HttpOnly
[4] => Cache-Control: no-cache, no-store, max-age=0, must-revalidate
[5] => Pragma: no-cache
[6] => Expires: Fri, 01 Jan 1990 00:00:00 GMT
[7] => Date: Tue, 03 Mar 2015 04:26:14 GMT
[8] => P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info."
[9] => X-Content-Type-Options: nosniff
[10] => X-Frame-Options: SAMEORIGIN
[11] => X-XSS-Protection: 1; mode=block
[12] => Server: GSE
[13] => Alternate-Protocol: 443:quic,p=0.08
[14] => Accept-Ranges: none
[15] => Vary: Accept-Encoding
)
So you can create a script like:
<?php
$files = ['https://play.google.com/store/apps/details?id=com.ketchapp.zigzaggame', 'https://play.google.com/store/apps/details?id=com.ketchapp.zigzaggamesadasd'];
foreach($files as $file)
{
$headers = get_headers($file);
if($headers[0] == 'HTTP/1.0 404 Not Found')
{
return false;
}
else
{
return true;
}
}
?>
You can simply do as
function checkGooglePlayApp($url)
{
$headers = get_headers($url);
return $headers[0] == 'HTTP/1.0 404 Not Found';
}
$inValid = checkGooglePlayApp("https://play.google.com/store/apps/details?id=com.ketchapp.zigzaggame");
if(!$inVald)
{
echo "URL Valid";
}
else{
echo "URL Invalid";
}

Why is this returning a "Not Found" with PHP and cURL?

My script works with all other links I tried, and i get the same response with cURL also (and this is a lot smaller, so I like this code):
<?php
$url = $_GET['url'];
$header = get_headers($url,1);
print_r($header);
function get_url($u,$h){
if(preg_match('/200/',$h[0])){
echo file_get_contents($u);
}
elseif(preg_match('/301/',$h[0])){
$nh = get_headers($h['Location']);
get_url($h['Location'],$nh);
}
}
get_url($url,$header);
?>
But for:
http://www.anthropologie.com/anthro/catalog/productdetail.jsp?subCategoryId=HOME-TABLETOP-UTENSILS&id=78110&catId=HOME-TABLETOP&pushId=HOME-TABLETOP&popId=HOME&sortProperties=&navCount=355&navAction=top&fromCategoryPage=true&selectedProductSize=&selectedProductSize1=&color=sil&colorName=SILVER&isProduct=true&isBigImage=&templateType=
And:
http://www.urbanoutfitters.com/urban/catalog/productdetail.jsp?itemdescription=true&itemCount=80&startValue=1&selectedProductColor=&sortby=&id=14135412&parentid=A_FURN_BATH&sortProperties=+subCategoryPosition,&navCount=56&navAction=poppushpush&color=&pushId=A_FURN_BATH&popId=A_DECORATE&prepushId=&selectedProductSize=
(and all Anthropologie product links). I'm assuming other sites I have no yet found act this way also. Here is my header response:
Array
(
[0] => HTTP/1.1 200 OK
[Server] => Apache
[X-Powered-By] => Servlet 2.4; JBoss-4.2.0.GA_CP05 (build: SVNTag=JBPAPP_4_2_0_GA_CP05 date=200810231548)/JBossWeb-2.0
[X-ATG-Version] => version=RENTLUFEQyxBVEdQbGF0Zm9ybS85LjFwMSxBREMgWyBEUFNMaWNlbnNlLzAgIF0=
[Content-Type] => text/html;charset=ISO-8859-1
[Date] => Sat, 24 Jul 2010 23:47:47 GMT
[Content-Length] => 21669
[Connection] => keep-alive
[Set-Cookie] => Array
(
[0] => JSESSIONID=65CA111ADBF267A3B405C69A325576F8.app46-node2; Path=/
[1] => visitCount=1; Expires=Fri, 29-May-2026 00:41:07 GMT; Path=/
[2] => UOCCII:=; Expires=Mon, 23-Aug-2010 23:47:47 GMT; Path=/
[3] => LastVisited=2010-07-24; Expires=Fri, 29-May-2026 00:41:07 GMT; Path=/
)
)
I'm guessing maybe it has to do with the cookies? Any ideas?
Install fiddler and see what is actually being sent.
You can also try setting your user-agent to a real browser. Sometimes sites try to prevent scraping by checking this.

Categories