Varnish seems not to be caching pages at all - php

I have a PHP Symfony application which is served by nginx.
* << BeReq >> 492062
- Begin bereq 492061 fetch
- Timestamp Start: 1572337898.474535 0.000000 0.000000
- BereqMethod GET
- BereqURL /
- BereqProtocol HTTP/1.0
- BereqHeader Host: xxx
- BereqHeader X-Forwarded-Host: xxx
- BereqHeader X-Real-IP: xxx
- BereqHeader X-Forwarded-Proto: https
- BereqHeader HTTPS: on
- BereqHeader User-Agent: Wget/1.19.4 (linux-gnu)
- BereqHeader Accept: */*
- BereqHeader X-Forwarded-For: 127.0.0.1
- BereqProtocol HTTP/1.1
- BereqHeader Accept-Encoding: gzip
- BereqHeader X-Varnish: 492062
- VCL_call BACKEND_FETCH
- VCL_return fetch
- BackendOpen 26 boot.default 127.0.0.1 8080 127.0.0.1 43676
- BackendStart 127.0.0.1 8080
- Timestamp Bereq: 1572337898.474685 0.000150 0.000150
- Timestamp Beresp: 1572337903.642006 5.167471 5.167321
- BerespProtocol HTTP/1.1
- BerespStatus 200
- BerespReason OK
- BerespHeader Server: nginx/1.14.0 (Ubuntu)
- BerespHeader Content-Type: text/html; charset=UTF-8
- BerespHeader Transfer-Encoding: chunked
- BerespHeader Connection: keep-alive
- BerespHeader Set-Cookie: PHPSESSID=slaurqvo3msh9uklerbht0nd2h; path=/; domain=.xxx; HttpOnly
- BerespHeader Cache-Control: max-age=3600, public
- BerespHeader Date: Tue, 29 Oct 2019 08:31:39 GMT
- BerespHeader Age: 20
- BerespHeader Content-Encoding: gzip
- TTL RFC 3600 10 0 1572337904 1572337884 1572337899 0 3600
- VCL_call BACKEND_RESPONSE
- TTL VCL 86420 10 0 1572337884
- TTL VCL 86420 3600 0 1572337884
- TTL VCL 140 3600 0 1572337884
- VCL_return deliver
- BerespHeader Vary: Accept-Encoding
- Storage malloc Transient
- ObjProtocol HTTP/1.1
- ObjStatus 200
- ObjReason OK
- ObjHeader Server: nginx/1.14.0 (Ubuntu)
- ObjHeader Content-Type: text/html; charset=UTF-8
- ObjHeader Set-Cookie: PHPSESSID=slaurqvo3msh9uklerbht0nd2h; path=/; domain=.xxx; HttpOnly
- ObjHeader Cache-Control: max-age=3600, public
- ObjHeader Date: Tue, 29 Oct 2019 08:31:39 GMT
- ObjHeader Content-Encoding: gzip
- ObjHeader Vary: Accept-Encoding
- Fetch_Body 2 chunked stream
- Gzip u F - 24261 118266 80 80 194017
- BackendReuse 26 boot.default
- Timestamp BerespBody: 1572337903.644744 5.170209 0.002738
- Length 24261
- BereqAcct 275 0 275 342 24261 24603
- End
* << Request >> 492061
- Begin req 492060 rxreq
- Timestamp Start: 1572337898.474380 0.000000 0.000000
- Timestamp Req: 1572337898.474380 0.000000 0.000000
- ReqStart 127.0.0.1 57354
- ReqMethod GET
- ReqURL /
- ReqProtocol HTTP/1.0
- ReqHeader Host: xxx
- ReqHeader X-Forwarded-Host: xxx
- ReqHeader X-Real-IP: xxx
- ReqHeader X-Forwarded-For: xxx
- ReqHeader X-Forwarded-Proto: https
- ReqHeader HTTPS: on
- ReqHeader Cache-Control: max-age=15000
- ReqHeader Connection: close
- ReqHeader User-Agent: Wget/1.19.4 (linux-gnu)
- ReqHeader Accept: */*
- ReqHeader Accept-Encoding: identity
- ReqUnset X-Forwarded-For: xxx
- ReqHeader X-Forwarded-For: xxx, 127.0.0.1
- VCL_call RECV
- ReqUnset X-Forwarded-For: xxx, 127.0.0.1
- ReqHeader X-Forwarded-For: 127.0.0.1
- VCL_return hash
- ReqUnset Accept-Encoding: identity
- VCL_call HASH
- VCL_return lookup
- HitMiss 492059 104.991348
- VCL_call MISS
- VCL_return fetch
- Link bereq 492062 fetch
- Timestamp Fetch: 1572337903.643494 5.169113 5.169113
- RespProtocol HTTP/1.1
- RespStatus 200
- RespReason OK
- RespHeader Server: nginx/1.14.0 (Ubuntu)
- RespHeader Content-Type: text/html; charset=UTF-8
- RespHeader Set-Cookie: PHPSESSID=slaurqvo3msh9uklerbht0nd2h; path=/; domain=.xxx; HttpOnly
- RespHeader Cache-Control: max-age=3600, public
- RespHeader Date: Tue, 29 Oct 2019 08:31:39 GMT
- RespHeader Content-Encoding: gzip
- RespHeader Vary: Accept-Encoding
- RespHeader X-Varnish: 492061
- RespHeader Age: 20
- RespHeader Via: 1.1 varnish (Varnish/5.2)
- VCL_call DELIVER
- VCL_return deliver
- Timestamp Process: 1572337903.643520 5.169140 0.000026
- RespUnset Content-Encoding: gzip
- RespHeader Accept-Ranges: bytes
- RespHeader Connection: close
- Gzip U D - 24261 118266 80 80 194017
- Timestamp Resp: 1572337903.645192 5.170812 0.001672
- ReqAcct 313 0 313 381 118266 118647
- End
* << Session >> 492060
- Begin sess 0 HTTP/1
- SessOpen 127.0.0.1 57354 a0 127.0.0.1 80 1572337898.474305 24
- Link req 492061 rxreq
- SessClose TX_EOF 5.171
- End
Somehow, Varnish seems to save the website in cache:
- VCL_call DELIVER
- VCL_return deliver
VCL Configuration:
vcl 4.0;
# Default backend definition. Set this to point to your content server.
backend default {
.host = "127.0.0.1";
.port = "8080";
}
sub vcl_recv {
// Remove all cookies except the session ID.
if (req.http.Cookie) {
set req.http.Cookie = ";" + req.http.Cookie;
set req.http.Cookie = regsuball(req.http.Cookie, "; +", ";");
set req.http.Cookie = regsuball(req.http.Cookie, ";(PHPSESSID)=", "; \1=");
set req.http.Cookie = regsuball(req.http.Cookie, ";[^ ][^;]*", "");
set req.http.Cookie = regsuball(req.http.Cookie, "^[; ]+|[; ]+$", "");
if (req.http.Cookie == "") {
// If there are no more cookies, remove the header to get page cached.
unset req.http.Cookie;
}
}
}
sub vcl_backend_response {
# Happens after we have read the response headers from the backend.
#
# Here you clean the response headers, removing silly Set-Cookie headers
# and other mistakes your backend does.
if (beresp.http.Surrogate-Control ~ "ESI/1.0") {
unset beresp.http.Surrogate-Control;
set beresp.do_esi = true;
}
}
sub vcl_deliver
{
# Insert Diagnostic header to show Hit or Miss
if (obj.hits > 0) {
set resp.http.X-Cache = "HIT";
set resp.http.X-Cache-Hits = obj.hits;
}
else {
set resp.http.X-Cache = "MISS";
}
}
What is wrong there?

You shall not cache pages with a Set-Cookie Header!
also you are quoting the wrong lines to determine whether it was cached or not:
VCL_call MISS

Related

How to get the matched records from graph facebook API?

I have to get the url and image name from returned facebook api response. I have the response results. I have tried to get the image url and image name from the following. Please help me to get the location url and image name
preg_match('/Location: (.*?)\n/', $header, $matches);
output:
HTTP/2 302
x-app-usage: {"call_count":16,"total_cputime":0,"total_time":4}
x-fb-rlafr: 0
location: https://xxxxx.net/v/cccc/cccc/130282202_3518020318246580_4104659942029629494_o.jpg?_nc_cat=104&ccb=2&_nc_sid=9e2e56&_nc_ohc=pErMyD3PYFkAX8b7JiO&_nc_ht=scontent-ort2-1.xx&tp=6&oh=db3843917c53f747c3c3f860ca9144d1&oe=6040C6ED
expires: Sat, 01 Jan 2000 00:00:00 GMT
x-fb-request-id: dddddd
strict-transport-security: max-age=15552000; preload
x-fb-trace-id: dddddd
facebook-api-version: v3.2
content-type: image/jpeg
x-fb-rev: 1003270116
cache-control: private, no-cache, no-store, must-revalidate
pragma: no-cache
access-control-allow-origin: *
x-fb-debug: cvvvvvvvvvvvvvvvvvvvvvvvvvvv
content-length: 0
date: Fri, 05 Feb 2021 06:41:05 GMT
alt-svc: h3-29=":443"; ma=3600,h3-27=":443"; ma=3600
$img_array[$key]['url'] = trim(substr($matches['0'],10)); // to get the location url
// print_r($img_array[$key]['url']);
$img_array[$key]['name'] = substr($b['name'],0,-16); // to get the image name
preg_match('/location: (.*?)\n/', $header, $matches);

encoding language fails

PHP code below fails to retrieve correct characters when used :
echo $html = file_get_contents("http://www.tsetmc.com/tsev2/data/instinfofast.aspx?i=65883838195688438&c=34+");
the result is :
���\�%PKJDA��ۈ�0�o'�z��W�"�7o�E��J:�%�+�=o�h#Ĥ�T�Jv�L�$��IT��1҈IY �B L�g�Mt����� �S]>>�����������j#�Tu97������#"jD��C�3x0�����I"("D�W��Bd��9������J�^ȑ���T��[e��K����r�ZB����r�Z޼#�w��4G� � �C�b�%8��PR�/���ع���a=�o��s���H�G�
This is because the output is 'gzip'ed, you need to 'unzip' it (see 'Content-Encoding'):
D:\Temp>curl -v "http://www.tsetmc.com/tsev2/data/instinfofast.aspx?i=65883838195688438&c=34+" -o output.data
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying 79.175.151.173...
* TCP_NODELAY set
* Connected to www.tsetmc.com (79.175.151.173) port 80 (#0)
> GET /tsev2/data/instinfofast.aspx?i=65883838195688438&c=34+ HTTP/1.1
> Host: www.tsetmc.com
> User-Agent: curl/7.55.1
> Accept: */*
>
< HTTP/1.1 200 OK
< Cache-Control: public, max-age=1
< Content-Type: text/html; charset=utf-8
< Content-Encoding: gzip
< Expires: Sat, 21 Dec 2019 09:43:48 GMT
< Last-Modified: Sat, 21 Dec 2019 09:43:47 GMT
< Vary: *
< Server: Microsoft-IIS/10.0
< X-Powered-By: ASP.NET
< X-Powered-By: ARR/3.0
< X-Powered-By: ASP.NET
< Date: Sat, 21 Dec 2019 09:42:59 GMT
< Content-Length: 155
<
{ [155 bytes data]
100 155 100 155 0 0 155 0 0:00:01 --:--:-- 0:00:01 662
* Connection #0 to host www.tsetmc.com left intact
D:\Temp>
unzipping (on Windows):
D:\Temp>"c:\Program Files\7-Zip\7z.exe" x output.data output
7-Zip 18.05 (x64) : Copyright (c) 1999-2018 Igor Pavlov : 2018-04-30
Scanning the drive for archives:
1 file, 155 bytes (1 KiB)
Extracting archive: output.data
--
Path = output.data
Type = gzip
Headers Size = 10
Everything is Ok
Size: 239
Compressed: 155
D:\Temp>type output
12:29:59,A ,9055,9098,9131,9072,9217,9000,3582,17432646,158598409673,0,20191221,122959;;2#100400#9055#9055#20091#1,2#60000#9050#9058#554#1,1#1000#9040#9059#993#2,;66660,417193,674167;13450748,3981898,0,13913408,3519238,1255,9,0,899,11;;;1;
D:\Temp>

cURL not workig as expected on php shell_exec()

I'm trying to download a file from an external server, on which I submit a form and returns a download file (PDF).
Copying the request as cURL from the Network tab in Chrome works fine in the Terminal (it downloads the PDF) but not in shell_exec() (I get the submit form page as output).
Here are the verbose output from both curls.
This one belows works fine:
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying 107.180.12.118...
* TCP_NODELAY set
* Connected to operaciones.ahmex.com.mx (107.180.12.118) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/ssl/certs/ca-certificates.crt
CApath: /etc/ssl/certs
} [5 bytes data]
* (304) (OUT), TLS handshake, Client hello (1):
} [512 bytes data]
* (304) (IN), TLS handshake, Server hello (2):
{ [102 bytes data]
* TLSv1.2 (IN), TLS handshake, Certificate (11):
{ [2881 bytes data]
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
{ [333 bytes data]
* TLSv1.2 (IN), TLS handshake, Server finished (14):
{ [4 bytes data]
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
} [70 bytes data]
* TLSv1.2 (OUT), TLS change cipher, Client hello (1):
} [1 bytes data]
* TLSv1.2 (OUT), TLS handshake, Finished (20):
} [16 bytes data]
* TLSv1.2 (IN), TLS handshake, Finished (20):
{ [16 bytes data]
* SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384
* ALPN, server accepted to use h2
* Server certificate:
* subject: OU=Domain Control Validated; CN=operaciones.ahmex.com.mx
* start date: May 25 17:30:17 2019 GMT
* expire date: Jul 24 17:04:13 2020 GMT
* subjectAltName: host "operaciones.ahmex.com.mx" matched cert's "operaciones.ahmex.com.mx"
* issuer: C=US; ST=Arizona; L=Scottsdale; O=GoDaddy.com, Inc.; OU=http://certs.godaddy.com/repository/; CN=Go Daddy Secure Certificate Authority - G2
* SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
} [5 bytes data]
* Using Stream ID: 1 (easy handle 0x55f8419d44c0)
} [5 bytes data]
> POST /potentials/generar_certificado HTTP/2
> Host: operaciones.ahmex.com.mx
> authority: operaciones.ahmex.com.mx
> cache-control: max-age=0
> origin: https://operaciones.ahmex.com.mx
> upgrade-insecure-requests: 1
> content-type: multipart/form-data; boundary=----WebKitFormBoundary1IwLan7m4erUfeoh
> user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.87 Safari/537.36
> sec-fetch-mode: navigate
> sec-fetch-user: ?1
> accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3
> sec-fetch-site: same-origin
> referer: https://operaciones.ahmex.com.mx/potentials/
> accept-encoding: gzip, deflate, br
> accept-language: en-US,en;q=0.9
> cookie: ci_session=(I removed this)
> Content-Length: 971
>
{ [5 bytes data]
* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
} [5 bytes data]
* We are completely uploaded and fine
{ [5 bytes data]
100 971 0 0 100 971 0 701 0:00:01 0:00:01 --:--:-- 700
100 971 0 0 100 971 0 407 0:00:02 0:00:02 --:--:-- 406< HTTP/2 200
< date: Tue, 13 Aug 2019 14:57:41 GMT
< server: Apache
< x-powered-by: PHP/5.6.40
< content-disposition: attachment; filename="CER-0196084.pdf"
< cache-control: private, max-age=0, must-revalidate
< pragma: public
< vary: Accept-Encoding,User-Agent
< content-encoding: gzip
< content-type: application/x-download
<
{ [5 bytes data]
100 12590 0 11619 100 971 3823 319 0:00:03 0:00:03 --:--:-- 4141
100 65749 0 64778 100 971 15423 231 0:00:04 0:00:04 --:--:-- 15650
100 65749 0 64778 100 971 12452 186 0:00:05 0:00:05 --:--:-- 12710
100 65749 0 64778 100 971 10443 156 0:00:06 0:00:06 --:--:-- 13442
100 65749 0 64778 100 971 8990 134 0:00:07 0:00:07 --:--:-- 13439
100 65749 0 64778 100 971 7893 118 0:00:08 0:00:08 --:--:-- 10288
100 78659 0 77688 100 971 8609 107 0:00:09 0:00:09 --:--:-- 2675
100 104k 0 103k 100 971 11625 106 0:00:09 0:00:09 --:--:-- 10514
* Connection #0 to host operaciones.ahmex.com.mx left intact
This one not:
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying 107.180.12.118...
* TCP_NODELAY set
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Connected to operaciones.ahmex.com.mx (107.180.12.118) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/ssl/certs/ca-certificates.crt
CApath: /etc/ssl/certs
} [5 bytes data]
* (304) (OUT), TLS handshake, Client hello (1):
} [512 bytes data]
* (304) (IN), TLS handshake, Server hello (2):
{ [102 bytes data]
* TLSv1.2 (IN), TLS handshake, Certificate (11):
{ [2881 bytes data]
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
{ [333 bytes data]
* TLSv1.2 (IN), TLS handshake, Server finished (14):
{ [4 bytes data]
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
} [70 bytes data]
* TLSv1.2 (OUT), TLS change cipher, Client hello (1):
} [1 bytes data]
* TLSv1.2 (OUT), TLS handshake, Finished (20):
} [16 bytes data]
* TLSv1.2 (IN), TLS handshake, Finished (20):
{ [16 bytes data]
* SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384
* ALPN, server accepted to use h2
* Server certificate:
* subject: OU=Domain Control Validated; CN=operaciones.ahmex.com.mx
* start date: May 25 17:30:17 2019 GMT
* expire date: Jul 24 17:04:13 2020 GMT
* subjectAltName: host "operaciones.ahmex.com.mx" matched cert's "operaciones.ahmex.com.mx"
* issuer: C=US; ST=Arizona; L=Scottsdale; O=GoDaddy.com, Inc.; OU=http://certs.godaddy.com/repository/; CN=Go Daddy Secure Certificate Authority - G2
* SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
} [5 bytes data]
* Using Stream ID: 1 (easy handle 0x55a28da8e4c0)
} [5 bytes data]
> POST /potentials/generar_certificado HTTP/2
> Host: operaciones.ahmex.com.mx
> authority: operaciones.ahmex.com.mx
> cache-control: max-age=0
> origin: https://operaciones.ahmex.com.mx
> upgrade-insecure-requests: 1
> content-type: multipart/form-data; boundary=----WebKitFormBoundaryWbbB3yby9oTZCuNV
> user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.87 Safari/537.36
> sec-fetch-mode: navigate
> sec-fetch-user: ?1
> accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3
> sec-fetch-site: same-origin
> referer: https://operaciones.ahmex.com.mx/potentials/
> accept-encoding: gzip, deflate, br
> accept-language: en-US,en;q=0.9
> cookie: ci_session=(I removed this)
> Content-Length: 972
>
{ [5 bytes data]
* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
} [5 bytes data]
* We are completely uploaded and fine
{ [5 bytes data]
< HTTP/2 200
< date: Tue, 13 Aug 2019 15:01:02 GMT
< server: Apache
< x-powered-by: PHP/5.6.40
< vary: Accept-Encoding,User-Agent
< content-encoding: gzip
< content-length: 3120
< content-type: text/html; charset=UTF-8
<
{ [5 bytes data]
100 4092 100 3120 100 972 3312 1031 --:--:-- --:--:-- --:--:-- 4343
* Connection #0 to host operaciones.ahmex.com.mx left intact
Here's the cURL, just removed the cookie session
curl 'https://operaciones.ahmex.com.mx/potentials/generar_certificado' -H 'authority: operaciones.ahmex.com.mx' -H 'cache-control: max-age=0' -H 'origin: https://operaciones.ahmex.com.mx' -H 'upgrade-insecure-requests: 1' -H 'content-type: multipart/form-data; boundary=----WebKitFormBoundaryWbbB3yby9oTZCuNV' -H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.87 Safari/537.36' -H 'sec-fetch-mode: navigate' -H 'sec-fetch-user: ?1' -H 'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3' -H 'sec-fetch-site: same-origin' -H 'referer: https://operaciones.ahmex.com.mx/potentials/' -H 'accept-encoding: gzip, deflate, br' -H 'accept-language: en-US,en;q=0.9' --data-binary $'------WebKitFormBoundaryWbbB3yby9oTZCuNV\r\nContent-Disposition: form-data; name="nombre2"\r\n\r\nSergio\r\n------WebKitFormBoundaryWbbB3yby9oTZCuNV\r\nContent-Disposition: form-data; name="apellido_paterno"\r\n\r\nMendoza\r\n------WebKitFormBoundaryWbbB3yby9oTZCuNV\r\nContent-Disposition: form-data; name="apellido_materno"\r\n\r\nNegrete\r\n------WebKitFormBoundaryWbbB3yby9oTZCuNV\r\nContent-Disposition: form-data; name="rfc"\r\n\r\nMENS8804144J4\r\n------WebKitFormBoundaryWbbB3yby9oTZCuNV\r\nContent-Disposition: form-data; name="importe"\r\n\r\n1500000\r\n------WebKitFormBoundaryWbbB3yby9oTZCuNV\r\nContent-Disposition: form-data; name="banco"\r\n\r\nSantander\r\n------WebKitFormBoundaryWbbB3yby9oTZCuNV\r\nContent-Disposition: form-data; name="estado"\r\n\r\nJalisco\r\n------WebKitFormBoundaryWbbB3yby9oTZCuNV\r\nContent-Disposition: form-data; name="municipio"\r\n\r\nZapopan\r\n------WebKitFormBoundaryWbbB3yby9oTZCuNV\r\nContent-Disposition: form-data; name="action"\r\n\r\nEnviar\r\n------WebKitFormBoundaryWbbB3yby9oTZCuNV--\r\n' --compressed -o cert.pdf
if you are using php why not try readfile method?
If you want to use console wget is better for download instead cURL

cURL HTTP request from WikiMedia API not working

I've read tons of cURL tutorials (I'm using PHP) and there's always the same basic code, which doesn't work for me! No specific errors, just no result.
I want to make a HTTP request from Wikipedia and get the result in JSON format.
Here's the code :
$handle = curl_init();
$url = "http://fr.wikipedia.org/w/api.php?action=query&titles=Albert%20Einstein&prop=info&format=json";
curl_setopt_array($handle,
array(
CURLOPT_URL => $url,
CURLOPT_RETURNTRANSFER => true
)
);
$output = curl_exec($handle);
if (!$output) {
exit('cURL Error: '.curl_error($handle));
}
$result= json_decode($output,true);
print_r($result);
curl_close($handle);
Would like to know what I'm doing wrong.
Your code is correct but it seems Wikipedia doesn't send back the data when using PHP curl (maybe some headers or other parameters must be set for it to work).
If all you need is to retrieve some data though, you can simply use file_get_contents which works fine:
$output = file_get_contents("http://fr.wikipedia.org/w/api.php?action=query&titles=Albert%20Einstein&prop=info&format=json");
echo $output;
Edit:
Just for information, I found what the issue is. When running curl -v on that URL, the following comes up:
* Trying 91.198.174.192...
* Connected to fr.wikipedia.org (91.198.174.192) port 80 (#0)
> GET /w/api.php?action=query&titles=Albert%20Einstein&prop=info&format=json HTTP/1.1
> Host: fr.wikipedia.org
> User-Agent: curl/7.47.0
> Accept: */*
>
< HTTP/1.1 301 Moved Permanently
< Date: Wed, 17 May 2017 13:54:31 GMT
< Server: Varnish
< X-Varnish: 852298595
< X-Cache: cp3031 int
< X-Cache-Status: int
< Set-Cookie: WMF-Last-Access=17-May-2017;Path=/;HttpOnly;secure;Expires=Sun, 18 Jun 2017 12:00:00 GMT
< Set-Cookie: WMF-Last-Access-Global=17-May-2017;Path=/;Domain=.wikipedia.org;HttpOnly;secure;Expires=Sun, 18 Jun 2017 12:00:00 GMT
< X-Client-IP: 86.214.172.57
< Location: https://fr.wikipedia.org/w/api.php?action=query&titles=Albert%20Einstein&prop=info&format=json
< Content-Length: 0
< Connection: keep-alive
<
* Connection #0 to host fr.wikipedia.org left intact
So what's happening is that the actual content is on the https url, not http, so by requesting https://fr.wikipedia.org/w/api.php?action=query&titles=Albert%20Einstein&prop=info&format=json it should work directly.
The reason it works with file_get_contents is because in this case the redirection is done automatically.

Remove duplicate "Set-Cookie" header from PHP response

This is an example script from a larger application, but shows the general process of what I'm trying to do. If I have the following script:
<?php
ob_start();
setcookie('test1', 'first');
setcookie('test1', 'second');
setcookie('test1', 'third');
setcookie('test2', 'keep');
//TODO remove duplicate test1 from headers
ob_end_clean();
die('end test');
I get the following response (as viewed via Fiddler):
HTTP/1.1 200 OK
Date: Tue, 25 Apr 2017 21:54:45 GMT
Server: Apache/2.4.17 (Win32) OpenSSL/1.0.2d PHP/5.5.30
X-Powered-By: PHP/5.5.30
Set-Cookie: test1=first
Set-Cookie: test1=second
Set-Cookie: test1=third
Set-Cookie: test2=keep
Content-Length: 8
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Content-Type: text/html
end test
The problem is that Set-Cookie: test1... exists 3 different times, therefore increasing the header size unnecessarily. (Again, this is a simplified example -
in reality, I'm dealing with ~10 duplicate cookies in the ~800-byte range.)
Is there anything I can write in place of the TODO that would get rid of the header either completely or so it only shows once? i.e. the following is my end goal:
HTTP/1.1 200 OK
Date: Tue, 25 Apr 2017 21:54:45 GMT
Server: Apache/2.4.17 (Win32) OpenSSL/1.0.2d PHP/5.5.30
X-Powered-By: PHP/5.5.30
Set-Cookie: test1=third
Set-Cookie: test2=keep
Content-Length: 8
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Content-Type: text/html
end test
though the Set-Cookie: test1=third could not exist too and that's fine, but Set-Cookie: test2=keep needs to remain. When I try setcookie('test1', '', 1); to delete the cookie, it adds an additional header to mark it as expired:
Set-Cookie: test1=first
Set-Cookie: test1=second
Set-Cookie: test1=third
Set-Cookie: test2=keep
Set-Cookie: test1=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; Max-Age=0
And if I try removing the header like:
if (!headers_sent()) {
foreach (headers_list() as $header) {
if (stripos($header, 'Set-Cookie: test1') !== false) {
header_remove('Set-Cookie');
}
}
}
it removes all Set-Cookie headers when I only want test1 removed.
As you suggested in that last block of code, the headers_list() function could be used to check what headers have been sent. Using that, the last values for each cookie could be stored in an associative array. The names and values can be extracted using explode() (along with trim()).
When multiple cookies with the same name have been detected, we can use the header_remove() call like you had, but then set the cookies to the final values. See the example below, as well as this example phpfiddle.
if (!headers_sent()) {
$cookiesSet = []; //associative array to store the last value for each cookie
$rectifyCookies = false; //multiple values detected for same cookie name
foreach (headers_list() as $header) {
if (stripos($header, 'Set-Cookie:') !== false) {
list($setCookie, $cookieValue) = explode(':', $header);
list($cookieName, $cookieValue) = explode('=', trim($cookieValue));
if (array_key_exists($cookieName, $cookiesSet)) {
$rectifyCookies = true;
}
$cookiesSet[$cookieName] = $cookieValue;
}
}
if ($rectifyCookies) {
header_remove('Set-Cookie');
foreach($cookiesSet as $cookieName => $cookieValue) {
//might need to consider optional 3rd - 8th parameters
setcookie($cookieName, $cookieValue);
}
}
}
Output:
Cache-Control max-age=0, no-cache, no-store, must-revalidate
Connection keep-alive
Content-Encoding gzip
Content-Type text/html; charset=utf-8
Date Wed, 26 Apr 2017 15:31:33 GMT
Expires Wed, 11 Jan 1984 05:00:00 GMT
Pragma no-cache
Server nginx
Set-Cookie test1=third
                     test2=keep
Transfer-Encoding chunked
Vary Accept-Encoding
I don't understand why you think that the cookie removing code you showed us would remove the setcookie for test2.
If your code is setting the same cookie multiple times then you need to change your code so it stops setting the cookie multiple times! Anything else is a sloppy workaround.

Categories