I've read tons of cURL tutorials (I'm using PHP) and there's always the same basic code, which doesn't work for me! No specific errors, just no result.
I want to make a HTTP request from Wikipedia and get the result in JSON format.
Here's the code :
$handle = curl_init();
$url = "http://fr.wikipedia.org/w/api.php?action=query&titles=Albert%20Einstein&prop=info&format=json";
curl_setopt_array($handle,
array(
CURLOPT_URL => $url,
CURLOPT_RETURNTRANSFER => true
)
);
$output = curl_exec($handle);
if (!$output) {
exit('cURL Error: '.curl_error($handle));
}
$result= json_decode($output,true);
print_r($result);
curl_close($handle);
Would like to know what I'm doing wrong.
Your code is correct but it seems Wikipedia doesn't send back the data when using PHP curl (maybe some headers or other parameters must be set for it to work).
If all you need is to retrieve some data though, you can simply use file_get_contents which works fine:
$output = file_get_contents("http://fr.wikipedia.org/w/api.php?action=query&titles=Albert%20Einstein&prop=info&format=json");
echo $output;
Edit:
Just for information, I found what the issue is. When running curl -v on that URL, the following comes up:
* Trying 91.198.174.192...
* Connected to fr.wikipedia.org (91.198.174.192) port 80 (#0)
> GET /w/api.php?action=query&titles=Albert%20Einstein&prop=info&format=json HTTP/1.1
> Host: fr.wikipedia.org
> User-Agent: curl/7.47.0
> Accept: */*
>
< HTTP/1.1 301 Moved Permanently
< Date: Wed, 17 May 2017 13:54:31 GMT
< Server: Varnish
< X-Varnish: 852298595
< X-Cache: cp3031 int
< X-Cache-Status: int
< Set-Cookie: WMF-Last-Access=17-May-2017;Path=/;HttpOnly;secure;Expires=Sun, 18 Jun 2017 12:00:00 GMT
< Set-Cookie: WMF-Last-Access-Global=17-May-2017;Path=/;Domain=.wikipedia.org;HttpOnly;secure;Expires=Sun, 18 Jun 2017 12:00:00 GMT
< X-Client-IP: 86.214.172.57
< Location: https://fr.wikipedia.org/w/api.php?action=query&titles=Albert%20Einstein&prop=info&format=json
< Content-Length: 0
< Connection: keep-alive
<
* Connection #0 to host fr.wikipedia.org left intact
So what's happening is that the actual content is on the https url, not http, so by requesting https://fr.wikipedia.org/w/api.php?action=query&titles=Albert%20Einstein&prop=info&format=json it should work directly.
The reason it works with file_get_contents is because in this case the redirection is done automatically.
Related
I started a project with laravel 8, I was adjusting the seeders to generate fake data with faker in the database, I have a table with images which I generate random images with faker, my ImageFactory.php file is like this:
namespace Database\Factories;
use App\Models\Image;
use Illuminate\Database\Eloquent\Factories\Factory;
class ImageFactory extends Factory
{
/**
* The name of the factory's corresponding model.
*
* #var string
*/
protected $model = Image::class;
/**
* Define the model's default state.
*
* #return array
*/
public function definition()
{
return [
'url' => 'posts/' . $this->faker->image('./public/storage/posts', 640, 480, null, false)
];
}
}
when executing the command:
php artisan migrate:fresh --seed
generates this error in the console:
copy(https://via.placeholder.com/640x480.png/001144?text=corrupti): Failed to open stream: Connection timed out at vendor/fakerphp/faker/src/Faker/Provider/Image.php:121
Looking for a solution to this I found that in the Image.php file that generates error, after the line where CURLOPT_FILE is placed these two lines and the problem would be solved, so that it would look like this:
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false); //aggregated line
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); //aggregated line
$success = curl_exec($ch) && curl_getinfo($ch, CURLINFO_HTTP_CODE) === 200; existente
However this did not solve my problem and the error still persists, I am using laravel 8 and php 8.
Update
I'm using Ubuntu 18.04, running the command curl -vo /dev/null "via.placeholder.com/640x480.png/001144?text=corrupti" I got this:
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying 2600:3c00::f03c:91ff:fe60:d792...
* TCP_NODELAY set
* Trying 45.33.24.119...
* TCP_NODELAY set
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Connected to via.placeholder.com (45.33.24.119) port 80 (#0)
> GET /640x480.png/001144?text=corrupti HTTP/1.1
> Host: via.placeholder.com
> User-Agent: curl/7.58.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: nginx/1.6.2
< Date: Sat, 13 Feb 2021 14:36:37 GMT
< Content-Type: image/png
< Content-Length: 1558
< Last-Modified: Sat, 09 Jan 2021 14:00:02 GMT
< Connection: keep-alive
< ETag: "5ff9b6e2-616"
< Expires: Sat, 20 Feb 2021 14:36:37 GMT
< Cache-Control: max-age=604800
< X-Cache: L1
< Accept-Ranges: bytes
<
{ [1126 bytes data]
100 1558 100 1558 0 0 3386 0 --:--:-- --:--:-- --:--:-- 3386
* Connection #0 to host via.placeholder.com left intact
I have to get the url and image name from returned facebook api response. I have the response results. I have tried to get the image url and image name from the following. Please help me to get the location url and image name
preg_match('/Location: (.*?)\n/', $header, $matches);
output:
HTTP/2 302
x-app-usage: {"call_count":16,"total_cputime":0,"total_time":4}
x-fb-rlafr: 0
location: https://xxxxx.net/v/cccc/cccc/130282202_3518020318246580_4104659942029629494_o.jpg?_nc_cat=104&ccb=2&_nc_sid=9e2e56&_nc_ohc=pErMyD3PYFkAX8b7JiO&_nc_ht=scontent-ort2-1.xx&tp=6&oh=db3843917c53f747c3c3f860ca9144d1&oe=6040C6ED
expires: Sat, 01 Jan 2000 00:00:00 GMT
x-fb-request-id: dddddd
strict-transport-security: max-age=15552000; preload
x-fb-trace-id: dddddd
facebook-api-version: v3.2
content-type: image/jpeg
x-fb-rev: 1003270116
cache-control: private, no-cache, no-store, must-revalidate
pragma: no-cache
access-control-allow-origin: *
x-fb-debug: cvvvvvvvvvvvvvvvvvvvvvvvvvvv
content-length: 0
date: Fri, 05 Feb 2021 06:41:05 GMT
alt-svc: h3-29=":443"; ma=3600,h3-27=":443"; ma=3600
$img_array[$key]['url'] = trim(substr($matches['0'],10)); // to get the location url
// print_r($img_array[$key]['url']);
$img_array[$key]['name'] = substr($b['name'],0,-16); // to get the image name
preg_match('/location: (.*?)\n/', $header, $matches);
PHP code below fails to retrieve correct characters when used :
echo $html = file_get_contents("http://www.tsetmc.com/tsev2/data/instinfofast.aspx?i=65883838195688438&c=34+");
the result is :
���\�%PKJDA��ۈ�0�o'�z��W�"�7o�E��J:�%�+�=o�h#Ĥ�T�Jv�L�$��IT��1҈IY �B L�g�Mt����� �S]>>�����������j#�Tu97������#"jD��C�3x0�����I"("D�W��Bd��9������J�^ȑ���T��[e��K����r�ZB����r�Z#�w��4G� � �C�b�%8��PR�/���ع���a=�o��s���H�G�
This is because the output is 'gzip'ed, you need to 'unzip' it (see 'Content-Encoding'):
D:\Temp>curl -v "http://www.tsetmc.com/tsev2/data/instinfofast.aspx?i=65883838195688438&c=34+" -o output.data
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying 79.175.151.173...
* TCP_NODELAY set
* Connected to www.tsetmc.com (79.175.151.173) port 80 (#0)
> GET /tsev2/data/instinfofast.aspx?i=65883838195688438&c=34+ HTTP/1.1
> Host: www.tsetmc.com
> User-Agent: curl/7.55.1
> Accept: */*
>
< HTTP/1.1 200 OK
< Cache-Control: public, max-age=1
< Content-Type: text/html; charset=utf-8
< Content-Encoding: gzip
< Expires: Sat, 21 Dec 2019 09:43:48 GMT
< Last-Modified: Sat, 21 Dec 2019 09:43:47 GMT
< Vary: *
< Server: Microsoft-IIS/10.0
< X-Powered-By: ASP.NET
< X-Powered-By: ARR/3.0
< X-Powered-By: ASP.NET
< Date: Sat, 21 Dec 2019 09:42:59 GMT
< Content-Length: 155
<
{ [155 bytes data]
100 155 100 155 0 0 155 0 0:00:01 --:--:-- 0:00:01 662
* Connection #0 to host www.tsetmc.com left intact
D:\Temp>
unzipping (on Windows):
D:\Temp>"c:\Program Files\7-Zip\7z.exe" x output.data output
7-Zip 18.05 (x64) : Copyright (c) 1999-2018 Igor Pavlov : 2018-04-30
Scanning the drive for archives:
1 file, 155 bytes (1 KiB)
Extracting archive: output.data
--
Path = output.data
Type = gzip
Headers Size = 10
Everything is Ok
Size: 239
Compressed: 155
D:\Temp>type output
12:29:59,A ,9055,9098,9131,9072,9217,9000,3582,17432646,158598409673,0,20191221,122959;;2#100400#9055#9055#20091#1,2#60000#9050#9058#554#1,1#1000#9040#9059#993#2,;66660,417193,674167;13450748,3981898,0,13913408,3519238,1255,9,0,899,11;;;1;
D:\Temp>
This is an example script from a larger application, but shows the general process of what I'm trying to do. If I have the following script:
<?php
ob_start();
setcookie('test1', 'first');
setcookie('test1', 'second');
setcookie('test1', 'third');
setcookie('test2', 'keep');
//TODO remove duplicate test1 from headers
ob_end_clean();
die('end test');
I get the following response (as viewed via Fiddler):
HTTP/1.1 200 OK
Date: Tue, 25 Apr 2017 21:54:45 GMT
Server: Apache/2.4.17 (Win32) OpenSSL/1.0.2d PHP/5.5.30
X-Powered-By: PHP/5.5.30
Set-Cookie: test1=first
Set-Cookie: test1=second
Set-Cookie: test1=third
Set-Cookie: test2=keep
Content-Length: 8
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Content-Type: text/html
end test
The problem is that Set-Cookie: test1... exists 3 different times, therefore increasing the header size unnecessarily. (Again, this is a simplified example -
in reality, I'm dealing with ~10 duplicate cookies in the ~800-byte range.)
Is there anything I can write in place of the TODO that would get rid of the header either completely or so it only shows once? i.e. the following is my end goal:
HTTP/1.1 200 OK
Date: Tue, 25 Apr 2017 21:54:45 GMT
Server: Apache/2.4.17 (Win32) OpenSSL/1.0.2d PHP/5.5.30
X-Powered-By: PHP/5.5.30
Set-Cookie: test1=third
Set-Cookie: test2=keep
Content-Length: 8
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Content-Type: text/html
end test
though the Set-Cookie: test1=third could not exist too and that's fine, but Set-Cookie: test2=keep needs to remain. When I try setcookie('test1', '', 1); to delete the cookie, it adds an additional header to mark it as expired:
Set-Cookie: test1=first
Set-Cookie: test1=second
Set-Cookie: test1=third
Set-Cookie: test2=keep
Set-Cookie: test1=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; Max-Age=0
And if I try removing the header like:
if (!headers_sent()) {
foreach (headers_list() as $header) {
if (stripos($header, 'Set-Cookie: test1') !== false) {
header_remove('Set-Cookie');
}
}
}
it removes all Set-Cookie headers when I only want test1 removed.
As you suggested in that last block of code, the headers_list() function could be used to check what headers have been sent. Using that, the last values for each cookie could be stored in an associative array. The names and values can be extracted using explode() (along with trim()).
When multiple cookies with the same name have been detected, we can use the header_remove() call like you had, but then set the cookies to the final values. See the example below, as well as this example phpfiddle.
if (!headers_sent()) {
$cookiesSet = []; //associative array to store the last value for each cookie
$rectifyCookies = false; //multiple values detected for same cookie name
foreach (headers_list() as $header) {
if (stripos($header, 'Set-Cookie:') !== false) {
list($setCookie, $cookieValue) = explode(':', $header);
list($cookieName, $cookieValue) = explode('=', trim($cookieValue));
if (array_key_exists($cookieName, $cookiesSet)) {
$rectifyCookies = true;
}
$cookiesSet[$cookieName] = $cookieValue;
}
}
if ($rectifyCookies) {
header_remove('Set-Cookie');
foreach($cookiesSet as $cookieName => $cookieValue) {
//might need to consider optional 3rd - 8th parameters
setcookie($cookieName, $cookieValue);
}
}
}
Output:
Cache-Control max-age=0, no-cache, no-store, must-revalidate
Connection keep-alive
Content-Encoding gzip
Content-Type text/html; charset=utf-8
Date Wed, 26 Apr 2017 15:31:33 GMT
Expires Wed, 11 Jan 1984 05:00:00 GMT
Pragma no-cache
Server nginx
Set-Cookie test1=third
test2=keep
Transfer-Encoding chunked
Vary Accept-Encoding
I don't understand why you think that the cookie removing code you showed us would remove the setcookie for test2.
If your code is setting the same cookie multiple times then you need to change your code so it stops setting the cookie multiple times! Anything else is a sloppy workaround.
I'm having issues parsing a csv file in php, using fopen() taking in API data.
My code works when I use a URL that displays the csv file in the browser as stated in 1) below. But I get random characters outputted from a URL that ends in format=csv as seen in 2) below.
1) Working URL: Returned expected values
https://www.kimonolabs.com/api/csv/duo2mkw2?apikey=yjEl780lSQ8IcVHkItiHzzUZxd1wqSJv
2) Not Working URL: Returns random characters
https://www.parsehub.com/api/v2/projects/tM9MwgKrh0c4b81WDT_4FkaC/last_ready_run/data?api_key=tD3djFMGmyWmDUdcgmBVFCd3&format=csv
Here is my code: - using URL (2) above
<?php
$f_pointer=fopen("https://www.parsehub.com/api/v2/projects/tM9MwgKrh0c4b81WDT_4FkaC/ last_ready_run/data?api_key=tD3djFMGmyWmDUdcgmBVFCd3&format=csv","r");
while(! feof($f_pointer)){
$ar=fgetcsv($f_pointer);
echo $ar[1];
echo "<br>";
}
?>
Output: For URL mentioned in (2) above:
root#MorryServer:/# php testing.php
?IU?Q?JL?.?/Q?R??/)?J-.?))VH?/OM?K-NI?T0?P?*ͩT0204jzԴ?H???X???# D??K
Correct Output: If I use URL Type as stated in (1)
root#MorryServer:/# php testing.php
PHP Notice: Undefined offset: 1 in /testing.php on line 24
jackpot€2,893,210
This is an encoding problem.
The given file contains UTF-8 chars. These are read by the fgetcsv function, which is binary safe. Line Endings are Unix-Format ("\n").
The output on the terminal is scrumbled. Looking at the headers sent, we see:
GET https://www.parsehub.com/api/v2/projects/tM9MwgKrh0c4b81WDT_4FkaC/last_ready_run/data?api_key=tD3djFMGmyWmDUdcgmBVFCd3&format=csv --> 200 OK
Connection: close
Date: Sat, 11 Jul 2015 13:15:24 GMT
Server: nginx/1.6.2
Content-Encoding: gzip
Content-Length: 123
Content-Type: text/csv; charset=UTF-8
Last-Modified: Fri, 10 Jul 2015 11:43:49 GMT
Client-Date: Sat, 11 Jul 2015 13:15:23 GMT
Client-Peer: 107.170.197.156:443
Client-Response-Num: 1
Client-SSL-Cert-Issuer: /C=GB/ST=Greater Manchester/L=Salford/O=COMODO CA Limited/CN=COMODO RSA Domain Validation Secure Server CA
Client-SSL-Cert-Subject: /OU=Domain Control Validated/OU=PositiveSSL/CN=www.parsehub.com
Mind the Content-Encoding: gzip: fgetcsv working on an URL doesn't obviously handle gzip encosing. The scrumbled String is just the gzipped content of the "file".
Look at the gzip lib of PHP to first deflate that before parsing it.
Proof:
srv:~ # lwp-download 'https://www.parsehub.com/api/v2/projects/tM9MwgKrh0c4b81WDT_4FkaC/last_ready_run/data?api_key=tD3djFMGmyWmDUdcgmBVFCd3&format=csv' data
123 bytes received
srv:~ # file data
data: gzip compressed data, was "tcW80-EcI6Oj2TYPXI-47XwK.csv", from Unix, last modified: Fri Jul 10 11:43:48 2015, max compression
srv:~ # gzip -d < data
"title","jackpot"
"Lotto Results for Wednesday 08 July 2015","€2,893,210"
To get the proper output, minimal changes are need: Just add a stream wrapper:
<?php
$f_pointer=fopen("compress.zlib://https://www.parsehub.com/api/v2/projects/tM9MwgKrh0c4b81WDT_4FkaC/last_ready_run/data?api_key=tD3djFMGmyWmDUdcgmBVFCd3&format=csv","r");
if ( $f_pointer === false )
die ("invalid URL");
$ar = array();
while(! feof($f_pointer)){
$ar[]=fgetcsv($f_pointer);
}
print_r($ar);
?>
Outputs:
Array
(
[0] => Array
(
[0] => title
[1] => jackpot
)
[1] => Array
(
[0] => Lotto Results for Wednesday 08 July 2015
[1] => €2,893,210
)
)