Check if File Exists on Amazon s3 Signed URL - php

I have create a signed $URL for Amazon s3 and it opens perfectly in the browser.
http://testbucket.com.s3.amazonaws.com/100-game-play-intro-1.m4v?AWSAccessKeyId=AKIAJUAjhkhkjhMO73BF5Q&Expires=1378465934&Signature=ttmsAUDgJjCXepwEXvl8JdFu%2F60%3D
**Bucket name and accesskey changed in this example
I am however trying to then use the function below to check (using curl) that the file exists. It fails the CURL connection. If I replace $URL above with the url of an image outside of s3 then this code works perfectly.
I know the file exists in amazon but can't work out why this code fails if using a signed url as above
Any ideas?
Thanks
Here is my code.
function remoteFileExists($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, false);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
//don't fetch the actual file, only get header to check if file exists
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_NOBODY, true);
$result = curl_exec($ch);
curl_close($ch);
if ($result !== false) {
$statusCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
if ($statusCode == 200) {
$ret = true;
} else {
$ret = false;
}
} else {
$ret='connection failed';
}
return $ret;
}

When using CURLOPT_NOBODY, libcurl sends an HTTP HEAD request, not a GET request.
...the string to be signed is formed by appending the REST verb, content-md5 value, content-type value, expires parameter value, canonicalized x-amz headers (see recipe below), and the resource; all separated by newlines.
— http://s3.amazonaws.com/doc/s3-developer-guide/RESTAuthentication.html
The "REST verb" -- e.g., GET vs HEAD -- must be consistent between the signature you generate, and the request that make, so a signature that is valid for GET will not be valid for HEAD and vice versa.
You will need to sign a HEAD request instead of a GET request in order to validate a file in this way.

You can check by the header part.
$full_url = 'https://www.example.com/image.jpg';
$file_headers = #get_headers($full_url);
if($file_headers && strpos($file_headers[0], '200 OK')){
// enter code here
}
Or If you are using AWS S3 then you can also use this one.
if(!class_exists('S3')){
require('../includes/s3/S3.php');
}
S3::setAuth(awsAccessKey, awsSecretKey);
$info = S3::getObjectInfo($bucketName, $s3_furl);
// check for $info value and apply your condition.

Related

cURL taking long time to get the final URL of redirect URL

Below code snippet is to get the final URL(which has media/zip/rar file) from redirect URL by using cURL. It gets the final URL, no doubt about it, but what it does is according to the size of file it varies in time to get URL.
Suppose file at final URL is 1MB, it will take around 5sec to retrieve. But if the file is about 35MB, it takes time about 150 sec. I think cURL is downloading result and finally fetching the URL from result.
<?php
echo get_rurl("x_url");//1.2MB -> 5-10sec
//echo get_rurl("y_url");//31.6MB -> 150sec
function get_rurl($url){
// initialize cURL
$curl = curl_init($url);
curl_setopt_array($curl, array(
CURLOPT_RETURNTRANSFER => true,
CURLOPT_FOLLOWLOCATION => true,
));
// execute the request
$result = curl_exec($curl);
// fail if the request was not successful
if ($result === false) {
curl_close($curl);
return null;
}
// extract the target url
$redirectUrl = curl_getinfo($curl, CURLINFO_EFFECTIVE_URL);
curl_close($curl);
return $redirectUrl;
}
?>
i cant use file_get_content() because i just want to get the final URL from given redirect URL.
So in short - how to get the final URL from redirect URL without downloading results.
Hope i make it clear. Any help will be appreciated.
This works fine with CURLINFO_EFFECTIVE_URL, but for it the option CURLOPT_FOLLOWLOCATION must set to TRUE. This is on the grounds that CURLINFO_EFFECTIVE_URL returns precisely what it says, the effective url that ends up getting loaded. If the CURLOPT_FOLLOWLOCATION=False then the effective url will be requested url, else it will be final url that is redirected to.
I did this using curl_getinfo. which gives me information regarding the last transfer
<?php
echo get_rurl("xurl");
//echo get_rurl("yurl");
function get_rurl($url){
// initialize cURL
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url); //specify your URL
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false); //disable follow redirects
$http_data = curl_exec($ch); //hit the $url
$curl_info = curl_getinfo($ch);
return $curl_info['redirect_url'];// extract final url
}
?>
or
Even you can use CURLINFO_REDIRECT_URL or CURLINFO_EFFECTIVE_URL depending upon your use cases. refer here
<?php
echo get_rurl("xurl");
//echo get_rurl("yurl");
function get_rurl($url){
// initialize cURL
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url); //specify your URL
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false); //disable follow redirects
$http_data = curl_exec($ch); //hit the $url
return curl_getinfo($ch, CURLINFO_REDIRECT_URL);
}
?>
Hope this helps to others users too.
According to the documentation of libcurl (https://curl.haxx.se/libcurl/c/CURLOPT_FOLLOWLOCATION.html), this is exactly as is expected when using CURLOPT_FOLLOWLOCATION => true,. You probably want to change this to false.

How can I get HTTP headers (Location) from some URL?

I have some address (for example: http://example.com/b-out/3456/3212/).This address i must pass through curl. I know that this URL redirects to another URL (like http://sdss.co/go/36a7fe71189fec14c85636f33501f6d2/?...). And this another URL located in the headers (Location) of first URL. How can I get second URL in some variable?
Perform a request to the first URL, confirm a redirect takes place and read the Location header. From PHP cURL retrieving response headers AND body in a single request? and Check headers in PHP cURL server response:
$curlHandle = curl_init();
curl_setopt($curlHandle, CURLOPT_URL, $url);
curl_setopt($curlHandle, CURLOPT_HEADER, 1);
curl_setopt($curlHandle, CURLOPT_NOBODY, 1);
curl_setopt($curlHandle, CURLOPT_FOLLOWLOCATION, 0);
curl_setopt($curlHandle, CURLOPT_RETURNTRANSFER, 1);
$redirectResponse = curl_exec($curlHandle);
The options being set there mean: return the response headers, don't return the response body, don't automatically follow redirects and return the result in the exec-call.
Now you've got the HTTP response headers, without body, in $redirectResponse. You'll now need to verify that it's a redirect:
$statusCode = curl_getinfo($curlHandle, CURLINFO_HTTP_CODE);
if ($statusCode == 301 || $statusCode == 302 || $statusCode == 303)
{
$headerLength = curl_getinfo($curlHandle, CURLINFO_HEADER_SIZE);
$responseHeaders = substr($redirectResponse, 0, $headerLength);
$redirectUrl = getLocationHeader($responseHeaders);
}
Then create a function to do that:
function getLocationHeader($responseHeaders)
{
}
In there you'll want to explode() the $responseHeaders on HTTP newline (\r\n) and find the header starting with location.
Alternatively, you can use a more abstract HTTP client library like Zend_Http_Client, where it is a little easier to obtain the headers.
I did it like CodeCaster said. This is my function 'getLocationHeader':
function getLocationHeader($responseHeaders)
{
if (preg_match('/Location:(.+)Vary/is', $redirectResponse, $loc))
{
$location = trim($loc[1]);
return $location;
}
return FALSE;
}

Save copy of a photo from Google Places API Place Photos using curl

I'm trying to grab a photo from Google Place Photos using curl and save it on my server.
The request format as per the Google API documentation is like this:
https://maps.googleapis.com/maps/api/place/photo?maxwidth=400&photoreference=CoQBegAAAFg5U0y-iQEtUVMfqw4KpXYe60QwJC-wl59NZlcaxSQZNgAhGrjmUKD2NkXatfQF1QRap-PQCx3kMfsKQCcxtkZqQ&sensor=true&key=AddYourOwnKeyHere
So I tried this function:
function download_image1($image_url, $image_file){
$fp = fopen ($image_file, 'w+');
$ch = curl_init($image_url);
// curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); // enable if you want
curl_setopt($ch, CURLOPT_FILE, $fp); // output to file
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, 1000); // some large value to allow curl to run for a long time
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0');
// curl_setopt($ch, CURLOPT_VERBOSE, true); // Enable this line to see debug prints
curl_exec($ch);
curl_close($ch); // closing curl handle
fclose($fp); // closing file handle
}
download_image1($photo, "test.jpg");
..where $photo holds the request url.
This is not working, it saves an empty image with header errors, it probably is because the request is not the actual url of the photo. Also, in the request url, it's not possible to know which image extension I'm going to get (jpg, png, gif, etc) so that's another problem.
Any help on how to save the photos appreciated.
EDIT: I get the header errors "Can't read file header" in my image viewer software when I try to open the image. The script itself doesn't show any errors.
I found a solution here:
http://kyleyu.com/?q=node/356
It gives a very useful function to return the actual URL after redirection:
function get_furl($url)
{
$furl = false;
// First check response headers
$headers = get_headers($url);
// Test for 301 or 302
if(preg_match('/^HTTP\/\d\.\d\s+(301|302)/',$headers[0]))
{
foreach($headers as $value)
{
if(substr(strtolower($value), 0, 9) == "location:")
{
$furl = trim(substr($value, 9, strlen($value)));
}
}
}
// Set final URL
$furl = ($furl) ? $furl : $url;
return $furl;
}
So you pass the Google Place Photo request uRL to this function and it returns the actual URL of the photo after the redirection which then can be used with CURL. It also explains that sometimes, the curl option curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); doesn't always work.
https://stackoverflow.com/a/23540352/2979237
We can minimize the above code to change as adding 1 as the second parameter of get_header() function like $headers = get_headers($url, 1);. that will return the associate values.

Why are the Twitter api calls so slow?

When I execute the following code it takes between 10-12 seconds to respond.
Is the problem with Twitter or with our server?
I really need to know as this is part of the code to display tweets on our website and a 12 second load time is just not acceptable!
function get_latest_tweets($username)
{
print "<font color=red>**". time()."**</font><br>";
$path = 'http://api.twitter.com/1/statuses/user_timeline/' . $username.'.json?include_rts=true&count=2';
$jason = file_get_contents($path);
print "<font color=red>**". time()."**</font><br>";
}
Thanks
When you put the URL into your browser (http://api.twitter.com/1/statuses/user_timeline/username.json?include_rts=true&count=2) how long does it take for the page to appear? If it's quick then you need to start the search at your server.
use curl instead of file_get_contents() to request, so that response will be compressed. Here is the curl function which iam using.
function curl_file_get_contents($url)
{
$curl = curl_init();
curl_setopt($curl,CURLOPT_URL,$url); //The URL to fetch. This can also be set when initializing a session with curl_init().
curl_setopt($curl,CURLOPT_RETURNTRANSFER,TRUE); //TRUE to return the transfer as a string of the return value of curl_exec() instead of outputting it out directly.
curl_setopt($curl,CURLOPT_ENCODING , "gzip");
curl_setopt($curl, CURLOPT_FAILONERROR, TRUE); //To fail silently if the HTTP code returned is greater than or equal to 400.
curl_setopt($curl,CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, TRUE);
$contents = curl_exec($curl);
curl_close($curl);
return $contents;
}

PHP cURL: Get target of redirect, without following it

The curl_getinfo function returns a lot of metadata about the result of an HTTP request. However, for some reason it doesn't include the bit of information I want at the moment, which is the target URL if the request returns an HTTP redirection code.
I'm not using CURLOPT_FOLLOWLOCATION because I want to handle specific redirect codes as special cases.
If cURL can follow redirects, why can't it tell me what they redirect to when it isn't following them?
Of course, I could set the CURLOPT_HEADER flag and pick out the Location header. But is there a more efficient way?
This can be done in 4 steps:
Step 1. Initialise curl
curl_init($ch); //initialise the curl handle
//COOKIESESSION is optional, use if you want to keep cookies in memory
curl_setopt($this->ch, CURLOPT_COOKIESESSION, true);
Step 2. Get the headers for $url
curl_setopt($ch, CURLOPT_URL, $url); //specify your URL
curl_setopt($ch, CURLOPT_HEADER, true); //include headers in http data
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false); //don't follow redirects
$http_data = curl_exec($ch); //hit the $url
$curl_info = curl_getinfo($ch);
$headers = substr($http_data, 0, $curl_info['header_size']); //split out header
Step 3. Check if you have the correct response code
if (!($curl_info['http_code']>299 && $curl_info['http_code']<309)) {
//return, echo, die, whatever you like
return 'Error - http code'.$curl_info['http_code'].' received.';
}
Step 4. Parse the headers to get the new URL
preg_match("!\r\n(?:Location|URI): *(.*?) *\r\n!", $headers, $matches);
$url = $matches[1];
Once you have the new URL you can then repeat steps 2-4 as often as you like.
You can simply use it: (CURLINFO_REDIRECT_URL)
$info = curl_getinfo($ch, CURLINFO_REDIRECT_URL);
echo $info; // the redirect URL without following it
as you mentioned, disable the CURLOPT_FOLLOWLOCATION option (before executing) and place my code after executing.
CURLINFO_REDIRECT_URL - With the CURLOPT_FOLLOWLOCATION option
disabled: redirect URL found in the last transaction, that should be
requested manually next. With the CURLOPT_FOLLOWLOCATION option
enabled: this is empty. The redirect URL in this case is available in
CURLINFO_EFFECTIVE_URL
Refrence
curl doesn't seem to have a function or option to get the redirect target, it can be extracted using various techniques:
From the response:
Apache can respond with a HTML page in case of a 301 redirect (Doesn't seem to be the case with 302's).
If the response has a format similar to:
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved here.</p>
<hr>
<address>Apache/2.2.16 (Debian) Server at www.xxx.yyy Port 80</address>
</body></html>
You can extract the redirect URL using DOMXPath:
$i = 0;
foreach($urls as $url) {
if(substr($url,0,4) == "http") {
$c = curl_init($url);
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
$result = #curl_exec($c);
$status = curl_getinfo($c,CURLINFO_HTTP_CODE);
curl_close($c);
$results[$i]['code'] = $status;
$results[$i]['url'] = $url;
if($status === 301) {
$xml = new DOMDocument();
$xml->loadHTML($result);
$xpath = new DOMXPath($xml);
$href = $xpath->query("//*[#href]")->item(0);
$results[$i]['target'] = $href->attributes->getNamedItem('href')->nodeValue;
}
$i++;
}
}
Using CURLOPT_NOBODY
There is a faster way however, as #gAMBOOKa points out; Using CURLOPT_NOBODY. This approach just sends a HEAD request instead of GET (not downloading the actual content, so it should be faster and more efficient) and stores the response header.
Using a regex the target URL can be extracted from the header:
foreach($urls as $url) {
if(substr($url,0,4) == "http") {
$c = curl_init($url);
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
curl_setopt($c, CURLOPT_NOBODY,true);
curl_setopt($c, CURLOPT_HEADER, true);
$result = #curl_exec($c);
$status = curl_getinfo($c,CURLINFO_HTTP_CODE);
curl_close($c);
$results[$i]['code'] = $status;
$results[$i]['url'] = $url;
if($status === 301 || $status === 302) {
preg_match("#https?://([-\w\.]+)+(:\d+)?(/([\w/_\-\.]*(\?\S+)?)?)?#",$result,$m);
$results[$i]['target'] = $m[0];
}
$i++;
}
}
No there is no more efficient way
Your can use CURLOPT_WRITEHEADER + VariableStream
So.. you could write headers to variable and parse it
I had the same problem and curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false); was of any help.
So, I decided not to use CURL but file_get_contents instead:
$data = file_get_contents($url);
$data = str_replace("<meta http-equiv=\"Refresh\" content=\"0;","<meta",$data);
The last line helped me to block the redirection although the product is not a clean html code.
I parsed the data and could retrieve the redirection URL I wanted to get.

Categories