curl, how to download files with special characters?

curl, how to download files with special characters? - php

my downloading function:
public static function download($a)
{
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $a);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_FAILONERROR, 1);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($curl, CURLOPT_TIMEOUT, 15);
$a= curl_exec($curl);
curl_close($curl);
return $a;
}
and given this link:
http://example.com/x.txt
is works well. But with a special case:
http://example.com/fájl/név/with ékezetek.txt
then its a "400 - bad request." curl_errno is 22 then. How to download it then? urlencode is not OK since it encode the hostname itself.
EDIT: those url are given "outside", I have no influence of it!

cURL and file_get_contents does not work with URL containing special character (accented letters, quotes, ...)
Solution:
$data = file_get_contents(urlEncode('http://www.test.com/file/file_planta_1ª.jpg'));
function urlEncode($url)
{
$parts = parse_url($url);
$parts['path'] = implode('/', array_map('urlencode', explode('/', $parts['path'])));
return $parts['scheme'].'://'.$parts['host'].$parts['path'];
}

try to make a
base64_encode
of the data, and a
base64_decode
to get original data

Improved Andrey's answer.
One will get a better result using rawurlencode() function.
function fileUrlEncode($url)
{
$parts = parse_url($url);
$parts['path'] = implode('/', array_map('rawurlencode', explode('/', $parts['path'])));
return $parts['scheme'].'://'.$parts['host'].$parts['path'];
}

Related

How to grab data from plain text response of API Calls in PHP?

I'm getting API response in plain text. I need to grab data from that text response and need to store them as variables.
API Calling:
$url="http://91.101.61.111:99/SendRequest/?mobile=9999999999&id=11011&reqref=501";
$request_timeout = 60;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_TIMEOUT, $request_timeout);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $request_timeout);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$output = curl_exec($ch);
$curl_error = curl_errno($ch);
curl_close($ch);
API response in plain text:
REQUEST ACCEPTED your ref=501 system_reference=BA01562
I need to grab data from the above plain text response as variables, like below:
$status = "REQUEST ACCEPTED";
$myref = "501";
$sysref = "BA01562";
I have tried:
$explode1 = explode(" ", $output);
$explode2 = explode("=", $explode1[3]);
$explode3 = explode("=", $explode1[4]);
$status = $explode1[0]." ".$explode1[1];
$myref = $explode2[1];
$sysref = $explode3[1];
I know this is not a proper way to do this. But I am not able to figure out the proper way to do it since I'm a newbie.
Please help! Thank you!

you can use a preg_match, something like:
$rc = preg_match('/([\w\s]+) your ref=([\d]+) system_reference=([\w]+)/', $plain_response, $matches);
if ($rc)
{
$status = $matches[1];
$myref = $matches[2];
$sysref = $matches[3];
}
but of course, just as #Don't panic said, you need a bit more knowledge of the API, to be sure about parsing. The example i gived is a bit childish. Anyway, when you will be sure about the format, use regexp with preg_match.

Regex unit test passes but doesn't appear to work properly actually trying to use it

This is a link to the String in a linter.
And this is the Expression itself:
(?i)\b((?:https?:\/\/|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))
I'm trying to validate almost ANY web url with this expression.
We can see here that it passes the unit tests as expected:
Yet as I said, when I try to run my code it seems to ignore validation...has me scratching my head.
These is the relevant portion of code:
//kindly taken from here: http://stackoverflow.com/a/34589895/2226328
function checkPageSpeed($url){
if (function_exists('file_get_contents')) {
$result = #file_get_contents($url);
}
if ($result == '') {
$ch = curl_init();
$timeout = 60;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER,1);//get the header
curl_setopt($ch, CURLOPT_NOBODY,1);//and *only* get the header
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);//get the response as a string from curl_exec(), rather than echoing it
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt($ch, CURLOPT_FRESH_CONNECT,1);//don't use a cached version of the url
$result = curl_exec($ch);
curl_close($ch);
}
return $result;
}
function pingGoogle($url){
echo "<h1>".$url."</h1>";
if(strtolower(substr($url, 0, 4)) !== "http") {
echo "adding http:// to $url <br/>";
$url = "http://".$url;
echo "URL is now $url <br/>";
}
//original idea from https://gist.github.com/dperini/729294
$re = "/(?i)\\b((?:https?:\\/\\/|www\\d{0,3}[.]|[a-z0-9.\\-]+[.][a-z]{2,4}\\/)(?:[^\\s()<>]+|\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\))+(?:\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\)|[^\\s`!()\\[\\]{};:'\\\".,<>?«»“”‘’]))/";
$test = preg_match($re, $url);
var_export($test);
if( $test === 1) {
echo "$url passes pattern Test...let's check if it's actually valid ...";
pingGoogle("hjm.google.cm/");
pingGoogle("gamefaqs.com");
}
else
{
echo "URL formatted proper but isn't an active URL! <br/>";
}
}

Holy moly that's a regex and a half...
Consider using parse_url to let PHP do the processing for you. Since you're only interested in the domain name, try:
$host = parse_url($url, PHP_URL_HOST);
if( $host === null) {
echo "Failed to parse, no host found";
}
else {
// do something with supposed host here
}

Have you considered simply using PHP's built in validation filter, FILTER_VALIDATE_URL along with filter_var() for this? It is probably better than rolling your own regex-based solution both in terms of simplifying your code and in terms of performance.
http://php.net/manual/en/function.filter-var.php
http://php.net/manual/en/filter.filters.validate.php

PHP curl - having a bit of trouble with special/unique/rare characters

I have the following code on my server running on php 5.2.*;
$curl = curl_init();
//$sumName = curl_escape($curl, $sumNameWeb);
$summonerName = urlencode($summonerName);
$url = "https://euw.api.pvp.net/api/lol/euw/v1.4/summoner/by-name/{$summonerName}?api_key=".$key;
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_URL, $url);
$result = curl_exec($curl);
$result = utf8_encode($result);
$obj = json_decode($result, true);
$statusCode = curl_getinfo($curl, CURLINFO_HTTP_CODE);
curl_close($curl);
It works fine, however when it comes to special characters like; ë Ö å í .. etc it fails to connect.. I have been trying different ways maybe i would find a fix but i am failing to do so..
ok i have found my error!! however this is my situation.. it is connecting to the server and getting the data.. AND i am using $sumNameWeb to access the JSON when it is decoded however the returned $sumNameWeb special character has changed.. here is the code to access the JSON;
$sumID = $obj[$sumNameWeb]["id"];
$sumLvl = $obj[$sumNameWeb]["summonerLevel"];
an example is, entering ë and returning Ã« from the server

Try This
Try to set one more curl parameter into your curl request that filters garbage data from result.
curl_setopt($curl, CURLOPT_ENCODING ,"");
I hope this helps you!!

urlencode encode non-ASCII characters according to the UTF-8 charset encoding. So most likely your problem is that your text (source code) is in other encoding (different from UTF-8). You have to ensure it has UTF-8 encoding.

Add header in the page before any sending curl.
header('Content-Type: text/html; charset=utf-8');

I faced the same problem. urlencode would not work with these links. I had to specifically replace them my self.
$curl = curl_init();
//$sumName = curl_escape($curl, $sumNameWeb);
$summonerName = urlencode($summonerName);
$url = "https://euw.api.pvp.net/api/lol/euw/v1.4/summoner/by-name/{$summonerName}?api_key=".$key;
$str = $url;
$str = str_replace("{", "%7B", $str);
$str = str_replace("$", "%24", $str);
$str = str_replace("}", "%7D", $str);
$url = $str;
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_URL, $url);
$result = curl_exec($curl);
$result = utf8_encode($result);
$obj = json_decode($result, true);
$statusCode = curl_getinfo($curl, CURLINFO_HTTP_CODE);
curl_close($curl);
this should work. If additional characters need to be replaced you can find out their link substitute by following this link: url encoder

Get Final URL From Double Shortened URL (t.co -> bit.ly -> final)

I couldn't convert a double shortened URL to expanded URL successfully using the below function I got from here:
function doShortURLDecode($url) {
$ch = #curl_init($url);
#curl_setopt($ch, CURLOPT_HEADER, TRUE);
#curl_setopt($ch, CURLOPT_NOBODY, TRUE);
#curl_setopt($ch, CURLOPT_FOLLOWLOCATION, FALSE);
#curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$response = #curl_exec($ch);
preg_match('/Location: (.*)\n/', $response, $a);
if (!isset($a[1])) return $url;
return $a[1];
}
I got into trouble when the expanded URL I got was again a shortened URL, which has its expanded URL.
How do I get final expanded URL after it has run through both URL shortening services?

Since t.co uses HTML redirection through the use of JavaScript and/or a <meta> redirect we need to grab it's contents first. Then extract the bit.ly URL from it to perform a HTTP header request to get the final location. This method does not rely on cURL to be enabled on server and uses all native PHP5 functions:
Tested and working!
function large_url($url)
{
$data = file_get_contents($url); // t.co uses HTML redirection
$url = strtok(strstr($data, 'http://bit.ly/'), '"'); // grab bit.ly URL
stream_context_set_default(array('http' => array('method' => 'HEAD')));
$headers = get_headers($url, 1); // get HTTP headers
return (isset($headers['Location'])) // check if Location header set
? $headers['Location'] // return Location header value
: $url; // return bit.ly URL instead
}
// DEMO
$url = 'http://t.co/dd4b3kOz';
echo large_url($url);

Finally found a way to get the final url of a double shortened url. The best way is to use longurl api for it.
I am not sure if it is the correct way, but i am at last getting the output as the final url needed :)
Here's what i did:
<?php
function TextAfterTag($input, $tag)
{
$result = '';
$tagPos = strpos($input, $tag);
if (!($tagPos === false))
{
$length = strlen($input);
$substrLength = $length - $tagPos + 1;
$result = substr($input, $tagPos + 1, $substrLength);
}
return trim($result);
}
function expandUrlLongApi($url)
{
$format = 'json';
$api_query = "http://api.longurl.org/v2/expand?" .
"url={$url}&response-code=1&format={$format}";
$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, $api_query );
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, 0);
curl_setopt($ch, CURLOPT_HEADER, false);
$fileContents = curl_exec($ch);
curl_close($ch);
$s1=str_replace("{"," ","$fileContents");
$s2=str_replace("}"," ","$s1");
$s2=trim($s2);
$s3=array();
$s3=explode(",",$s2);
$s4=TextAfterTag($s3[0],(':'));
$s4=stripslashes($s4);
return $s4;
}
echo expandUrlLongApi('http://t.co/dd4b3kOz');
?>
The output i get is:
"http://changeordie.therepublik.net/?p=371#proliferation"
The above code works.
The code that #cryptic shared is also correct ,but i could not get the result on my server (maybe because of some configuration issue).
If anyone thinks that it could be done by some other way, please feel free to share it.

Perhaps you should just use CURLOPT_FOLLOWLOCATION = true and then determine the final URL you were directed to.

In case the problem is not a Javascript redirect as in t.co or a <META http-equiv="refresh"..., this is reslolving stackexchange URLs like https://stackoverflow.com/q/62317 fine:
public function doShortURLDecode($url) {
$ch = #curl_init($url);
#curl_setopt($ch, CURLOPT_HEADER, TRUE);
#curl_setopt($ch, CURLOPT_NOBODY, TRUE);
#curl_setopt($ch, CURLOPT_FOLLOWLOCATION, FALSE);
#curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$response = #curl_exec($ch);
$cleanresponse= preg_replace('/[^A-Za-z0-9\- _,.:\n\/]/', '', $response);
preg_match('/Location: (.*)[\n\r]/', $cleanresponse, $a);
if (!isset($a[1])) return $url;
return parse_url($url, PHP_URL_SCHEME).'://'.parse_url($url, PHP_URL_HOST).$a[1];
}
It cleans the response of any special characters, that can occur in the curl output before cuttoing out the result URL (I ran into this problem on a php7.3 server)

Barcode image storing in local folder not working

I used barcode generation in my site. When I used it within HTML tags it's working fine.
<div class="barcode_img"><img src="<?php echo AT::getUrl(); ?>/barcode/image.php?code=code39&o=1&dpi=150&t=30&r=1&rot=0&text=TEST NAME WITH SPACE&f1=Arial.ttf&f2=10&a1=&a2=&a3=" class="barcode fr"/></div>
I need to get that image to store it in local folder called "media/barcode/". For that, I used the code below:
$valid_barcodename="testimage";
$barcodeurl = AT::getUrl() . "barcode/image.php?code=code39&o=1&dpi=150&t=30&r=1&rot=0&text=TEST NAME WITH SPACE&f1=Arial.ttf&f2=10&a1=&a2=&a3=";
$barcode_img = 'media/barcode/testing_' .$valid_barcodename . '.png';
file_put_contents($barcode_img, file_get_contents($barcodeurl));
The image stored in that folder is not empty. When I analyzed it I found, if I give the name "TEST NAME WITH SPACE" without space (TESTNAMEWITHSPACE), it works.
However if I give it with space it won't work. What is the issue?
Note: AT::getUrl() - used for get my base url.

Spaces have special meaning in URLs, so you have to encode them:
$text = urlencode("TEST NAME WITH SPACE")
$barcodeurl = AT::getUrl() . "barcode/image.php?code=code39&o=1&dpi=150&t=30&r=1&rot=0&text=". $text ."&f1=Arial.ttf&f2=10&a1=&a2=&a3=";
In the above code, $text now contains your text, encoded and ready to be used in your URL (you will notice the spaces have been replaced with %20 codes).

Note: If you're opening a URI with special characters, such as spaces,
you need to encode the URI with urlencode().
http://docs.php.net/file_get_contents
Alternatively, you can use cURL if enabled on your server.
function curl($url, $setopt = array(), $post = array())
{
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 2);
if( ! empty($post))
{
curl_setopt($curl, CURLOPT_POST, 1);
curl_setopt($curl, CURLOPT_POSTFIELDS, $post);
}
if( ! empty($setopt))
{
foreach($setopt as $key => $value)
{
curl_setopt($curl, constant($key), $value);
}
}
$data = curl_exec($curl);
curl_close($curl);
return $data;
}
Usage:
file_put_contents($barcode_img, curl($barcodeurl));

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

curl, how to download files with special characters? - php

try to make a base64_encode of the data, and a base64_decode to get original data

Improved Andrey's answer. One will get a better result using rawurlencode() function. function fileUrlEncode($url) { $parts = parse_url($url); $parts['path'] = implode('/', array_map('rawurlencode', explode('/', $parts['path']))); return $parts['scheme'].'://'.$parts['host'].$parts['path']; }

Related

How to grab data from plain text response of API Calls in PHP?

Regex unit test passes but doesn't appear to work properly actually trying to use it

PHP curl - having a bit of trouble with special/unique/rare characters

Get Final URL From Double Shortened URL (t.co -> bit.ly -> final)

Barcode image storing in local folder not working

Categories

Resources