Why can I not download files from some sites like this? - php

This is my php source code:
<?php
$path = '/images/one.jpg';
$imm = 'http://www.allamoda.eu/wp-content/uploads/2012/05/calzedonia_290x435.jpg';
if( $content = file_get_contents($imm) ){
file_put_contents($path, $content);
echo "Yes";
}else{
echo "No";
}
?>
and I get this error:
Warning: file_get_contents(http://www.allamoda.eu/wp-content/uploads/2012/05/calzedonia_290x435.jpg) [function.file-get-contents]: failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden in /opt/lampp/htdocs/test/down.php on line 4
No
Why ?

There are some headers expected by the server(especially Accept and User-Agent). Use the stream_context -argument of file_get_contents() to provide them:
<?php
$path = '/images/one.jpg';
$opts = array(
'http'=>array(
'method'=>"GET",
'header'=>"Accept-language: en\r\n" .
"Accept:image/png,image/*;q=0.8,*/*;q=0.5 \r\n".
"Host: www.allamoda.eu\r\n" .
"User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:12.0) Gecko/20100101 Firefox/12.0\r\n"
)
);
$context = stream_context_create($opts);
$imm = 'http://www.allamoda.eu/wp-content/uploads/2012/05/calzedonia_290x435.jpg';
if( $content = file_get_contents($imm,false,$context) ){
file_put_contents($path, $content);
echo "Yes";
}else{
echo "No";
}
?>

You are not allowed to download this file, the server allamoda.eu says (HTTP 403).

Nothing wrong with the code. The server simply is not letting you (either you have too much requests to it, or it just blocks all scripts scraping it).

You're not allowed to open the file directly. But you can try to fetch it's content by using sockets:
function getRemoteFile($url)
{
// get the host name and url path
$parsedUrl = parse_url($url);
$host = $parsedUrl['host'];
if (isset($parsedUrl['path'])) {
$path = $parsedUrl['path'];
} else {
// the url is pointing to the host like http://www.mysite.com
$path = '/';
}
if (isset($parsedUrl['query'])) {
$path .= '?' . $parsedUrl['query'];
}
if (isset($parsedUrl['port'])) {
$port = $parsedUrl['port'];
} else {
// most sites use port 80
$port = '80';
}
$timeout = 10;
$response = '';
// connect to the remote server
$fp = #fsockopen($host, '80', $errno, $errstr, $timeout );
if( !$fp ) {
echo "Cannot retrieve $url";
} else {
// send the necessary headers to get the file
fputs($fp, "GET $path HTTP/1.0\r\n" .
"Host: $host\r\n" .
"User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.3) Gecko/20060426 Firefox/1.5.0.3\r\n" .
"Accept: */*\r\n" .
"Accept-Language: en-us,en;q=0.5\r\n" .
"Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n" .
"Keep-Alive: 300\r\n" .
"Connection: keep-alive\r\n" .
"Referer: http://$host\r\n\r\n");
// retrieve the response from the remote server
while ( $line = fread( $fp, 4096 ) ) {
$response .= $line;
}
fclose( $fp );
// strip the headers
$pos = strpos($response, "\r\n\r\n");
$response = substr($response, $pos + 4);
}
// return the file content
return $response;
}
Example:
$content = getRemoteFile('http://www.allamoda.eu/wp-content/uploads/2012/05/calzedonia_290x435.jpg');
Source

Related

Writing multiple post requests using single connection - PHP

I am writing to a server using the following snippet.
$fp = connect();
$sent_requests = 0;
function connect() {
$addr = gethostbyname("example.com");
$fp = fsockopen("$addr", 80, $errno, $errstr);
socket_set_blocking( $fp, false );
if (!$fp) {
echo "$errstr ($errno)<br />\n";
exit(1);
} else{
echo "Connected\n";
return $fp;
}
}
function sendTestCalls($load){
global $fp, $sent_requests;
if(!$fp){
echo "reconnecting";
$sent_requests = 0;
//echo stream_get_contents($fp) . "\n";
fclose($fp);
$fp = connect();
}
$data = "POST /test HTTP/2.0\r\n";
$data.= "Host: example.com\r\n";
$data.= "Content-Type: application/json\r\n";
$data.= "Content-Length: ".strlen($load)."\r\n";
$data.= "Connection: Keep-Alive\r\n";
$data.= "xYtU87BVFluc6: 1\r\n";
$data.= "\r\n" . $load;
$bytesToWrite = strlen($data);
$totalBytesWritten = 0;
while ($totalBytesWritten < $bytesToWrite) {
$bytes = fwrite($fp, substr($data, $totalBytesWritten));
$totalBytesWritten += $bytes;
}
$sent_requests++;
}
$time = time();
for($i=0; $i<1000; $i++) {
sendTestCalls('{"justtesting": "somevalue"}');
}
fclose($fp);
$time_taken = time() - $time;//might be a bit inaccurate
echo "Time Taken: " . $time_taken . "\n";
When I check my access logs on my server less than 1000 post requests are received (in the range of 0 to 900). What am I doing wrong here?
EDIT1
I suppose my socket is timing out. What should I do to check if it has disconnected in such a scenario reconnect. I tried using stream_get_meta_data($fp) but it had no effect.
Try to insert this before each request:
$info = stream_get_meta_data($fp);
if ($info['timed_out']) {
fclose($fp);
$fp = connect();
}
I had found a solution to this at that point using php_curl and KEEP-ALIVE
Here is my updated version:
function sendCall(&$curl_handle, $data){
curl_setopt($curl_handle, CURLOPT_CUSTOMREQUEST, "POST");
curl_setopt($curl_handle, CURLOPT_POSTFIELDS, $data);
curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, true);
curl_setopt(
$curl_handle,
CURLOPT_HTTPHEADER,
array(
'Content-Type: application/json',
'Connection: Keep-Alive',
'Content-Length: ' . strlen($data)
)
);
$response = curl_exec($curl_handle); //Check response?
if($err = curl_error($curl_handle)) {
error_log("Error - $err Status - Reconnecting" );
$curl_handle = curl_init(curl_getinfo($curl_handle, CURLINFO_EFFECTIVE_URL));
sendCall($curl_handle, $data);
}
}
This function gives me an almost always alive connection. (Never got the error log in more than a week). Hope it helps anyone looking for the same.

How to POST to an XML API?

We are integrating the travel API of www.transhotel-dev.com.
The code is like this:
<?php
$servletHOST = "www.transhotel-dev.com";
$servletPATH = "/interfaces/SController";
$pXML = "<?xml version=\"1.0\" encoding=\"iso-8859-1\"?><Login><Username>Username</Username><Password>Password</Password></Login>";
$pCall = "Login";
$postdata = "pXML=" . urlencode($pXML) . "&pCall=" . urlencode($pCall);
$fp = pfsockopen($servletHOST, 1184);
if ($fp) {
fputs($fp, "POST $servletPATH HTTP/1.0\n");
fputs($fp, "Accept: */*\n");
$strlength = strlen( $postdata );
fputs($fp, "Content-length: " . $strlength . "\n\n");
fputs($fp, $postdata . "\n" );
$output = "";
while (!feof($fp)) {
$output .= fgets($fp, 1024);
}
fclose($fp);
echo $output;
}
?>
HTTP compression and POST method are required to go beyond this point. Can anybody help?
The following calls require the use of https secure protocol
(https://www.transhotel-dev.com:1449/interfaces/SController):
Login
AddAmountCardHPlus
GetNifInvoices
NifAgencyReservations
NifHotelReservations
ConfirmReservation (When contain the data of a credit card)
BuildSearchForm
LoginRQ
LoginB2B
CreateAgency
NifActivitiesReservations
GetActivitiesProvider
NifTransfersReservations
GetTransfersProvider
LoginHPlus
UserLogInHPlus
so you should use ssl protocol for login action:
$fp = pfsockopen("ssl://www.transhotel-dev.com", 1449);

The best way for decoding gzip compressed response

Example Code:
<?php
set_time_limit(0);
$authorization = ""; // OAuth authorization credentials
$fp = fsockopen("ssl://userstream.twitter.com", 443);
$headers = array(
"GET /1.1/user HTTP/1.1",
"Host: userstream.twitter.com",
"Accept-Encoding: deflate, gzip",
"Authorization: OAuth {$authorization}",
"",
"",
);
fwrite($fp, implode("\r\n", $headers));
while (!feof($fp)) {
if (!$size = hexdec(fgets($fp))) {
break;
}
echo DECODE_FUNCTION(fread($fp, $size));
fgets($fp); // SKIP CRLF
}
This example works if I implement DECODE_FUNCTION as:
function DECODE_FUNCTION($str) {
$filename = stream_get_meta_data($fp = tmpfile())['uri'];
fwrite($fp, $str);
ob_start();
readgzfile($filename);
return ob_get_clean();
}
However, these cases fails:
function DECODE_FUNCTION($str) {
return gzuncompress($str);
}
or
function DECODE_FUNCTION($str) {
return gzinflate($str);
}
or
function DECODE_FUNCTION($str) {
return gzdecode($str);
}
Creating temporary files seems to have much overheads.
What is the best way?
Thank you.

Making POST request to web serivce

Simply put, I need to make a POST request to a web service using a php script. The problem is that the php version on the server is 4.4.x and curl is disabled. Any ideas how I can make the call and read the response?
You can use fopen and stream_context_create, as per the example on the stream_context_create page:
$context = stream_context_create(array(
'http' => array (
'method' => 'GET'
)
));
$fp = fopen ('http://www.example.com', 'r', $context);
$text = '';
while (!feof($fp)) {
$text .= fread($fp, 8192);
}
fclose($fp);
Also, see HTTP context options and Socket context options to see the options you can set.
you could basically use socket (fsockopen) and fputs like this :
$port = 80;
$server = "domain.com";
$valuesInPost = 'param=value&ahah=ohoho';
$lengthOfThePost = strlen($valuesInPost);
if($fsock = fsockopen($server, $port, $errno, $errstr)){
fputs($fsock, "POST /path/to/resource HTTP/1.1 \r\n");
fputs($fsock,"Host: $server \r\n");
fputs($fsock,"User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13 \r\n");
fputs($fsock,"Accept-Language: fr,fr-fr;q=0.8,en-us;q=0.5,en;q=0.3 \r\n");
fputs($fsock,"Keep-Alive: 115 \r\n");
fputs($fsock,"Connection: keep-alive\r\n");
fputs($fsock,"Referer: http://refererYou.want\r\n");
fputs($fsock,"Content-Type: application/x-www-form-urlencoded\r\n");
fputs($fsock,"Content-Length: $lengthOfThePost\r\n\r\n");
fputs($fsock,"$valuesInPost\r\n\r\n");
$pcontent = "";
// results
while (!feof($fsock))
$pcontent .= fgets($fsock, 1024);
// echoes response
echo $pcontent;
}
There might be some syntax errors due to like rewriting.
Note you can use the port you want.
Do you have access to PEAR http_request?

Improve HTTP GET PHP scripts

This code is getting the headers and content from $url, and prints it to the browser. It is really slow, and it's not because the server. How can I improve this?
$headers = get_headers($url);
foreach ($headers as $value)
header($value);
$fh = fopen($url, "r");
fpassthru($fh);
Thanks
Why make two requests when one will do?
$fh = fopen($url, 'r');
foreach ($http_response_header as $value) {
header($value);
}
fpassthru($fh);
Or:
$content = file_get_contents($url);
foreach ($http_response_header as $value) {
header($value);
}
echo $content;
I'm not sure why you're opening a connection there on line 6 if you already have and have printed out the headers. Is this doing more than printing out headers?
If you are really looking to just proxy a page, the cURL functions are much more efficient:
<?
$curl = curl_init("http://www.google.com");
curl_setopt($curl, CURLOPT_HEADER, true);
curl_exec($curl);
curl_close($curl);
?>
Of course, cURL has to be enabled on your server, but it's not uncommon.
Are you trying to make a proxy? If so, here is a recipe, in proxy.php:
<?php
$host = 'example.com';
$port = 80;
$page = $_SERVER['REQUEST_URI'];
$conn = fsockopen($host, $port, $errno, $errstr, 180);
if (!$conn) throw new Exception("$errstr ($errno)");
$hbase = array();
$hbase[] = 'User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)';
if (!empty($_SERVER['HTTP_REFERER'])) $hbase[] = 'Referer: '.str_ireplace($_SERVER['HTTP_HOST'], $host, $_SERVER['HTTP_REFERER']);
if (!empty($_SERVER['HTTP_COOKIE'])) $hbase[] = 'Cookie: '.$_SERVER['HTTP_COOKIE'];
$hbase = implode("\n", $hbase);
if ($_SERVER['REQUEST_METHOD'] === 'POST') {
$post = file_get_contents("php://input");
$length = strlen($post);
$request = "POST $page HTTP/1.0\nHost: $host\n$hbase\nContent-Type: application/x-www-form-urlencoded\nContent-Length: $length\n\n$post";
} else $request = "GET $page HTTP/1.0\nHost: $host\n$hbase\n\n";
do {
$conn = fsockopen($host, 80, $errno, $errstr, 180);
if (!$conn) throw new Exception("$errstr ($errno)");
fputs($conn, $request);
$header = false;
$body = false;
stream_set_blocking($conn, false);
$info = stream_get_meta_data($conn);
while (!feof($conn) && !$info['timed_out']) {
$str = fgets($conn);
if (!$str) {
usleep(50000);
continue;
}
if ($body !== false) $body .= $str;
else $header .= $str;
if ($body === false && $str == "\r\n") $body = '';
$info = stream_get_meta_data($conn);
}
fclose($conn);
} while ($info['timed_out']);
$header = str_ireplace($host, $_SERVER['HTTP_HOST'], $header);
if (stripos($body, $host) !== false) $body = str_ireplace($host, $_SERVER['HTTP_HOST'], $body);
$header = str_replace('domain=.example.com; ', '', $header);
$header_array = explode("\r\n", $header);
foreach ($header_array as $line) header($line);
if (strpos($header, 'Content-Type: text') !== false) {
$body = str_replace('something', '', $body);
}
echo $body;
In .htaccess:
Options +FollowSymlinks
RewriteEngine on
RewriteBase /
RewriteRule ^(.*)$ proxy.php [QSA,L]
You may be able to pinpoint the slowness by changing $url to a know fast site, or even a local webserver. The only thing that seems to be possible is a slow response from the server.
Of course as suggested by GZipp, if you're going to output the file contents as well, just do it with a single request. That would make the server you're requesting from happier.

Categories