I'm trying to get this CrunchBase API page as a string in PHP. When I visit that page in a browser, I get the full response (some 230K characters); however, when I try to get the page in a script, the response is much shorter (24341 characters on a server and 36629 characters locally, with exactly the same number of characters for other long CrunchBase pages). To get the page, I am using a function almost identical to drupal_http_request() although I'm not using Drupal. (I have also tried using cURL and file_get_contents() and got the same result. And now that I'm thinking about it I have experienced the same from CrunchBase in Python in the past.)
What could be causing this and how can I fix it? PHP 5.3.2, Apache 2.2.14, Ubuntu 10.04. Here are additional details on the response:
[protocol] => HTTP/1.1
[headers] => Array
(
[content-type] => text/javascript; charset=utf-8
[connection] => close
[status] => 200 OK
[x-powered-by] =>
[etag] => "d809fc56a529054e613cd13e48d75931"
[x-runtime] => 0.00453
[content-length] => 230310
[cache-control] => private, max-age=0, must-revalidate
[server] => nginx/1.0.10 + Phusion Passenger 3.0.11 (mod_rails/mod_rack)
)
I don't think it's a user agent issue as I used User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/20.0.1092.0 Safari/536.6 in the request.
UPDATE
According to this thread I needed to add the Accept-Encoding: gzip, deflate header to the request. That does result in a longer request result, but now I have to figure out how to inflate it. The gzinflate() function fails with a Warning: Data error. Any thoughts on how to inflate the response?
See the comments in the PHP docs about gzinflate(), specifically the remarks about stripping the initial bytes. The last comment did the trick for me:
<?php $dec = gzinflate(substr($enc,10)); ?>
Though it seems that the number of bytes to be stripped depends on the original encoder. Another comment has a more thorough solution, and a reference to RFC1952 for further reading.
Evidently gzdecode() is meant to address to this issue, but it hasn't been released yet.
ps -- I deleted my comment about the returned data being plain text. I was wrong.
Related
How can I validate a Shopify store's URL? Given a URL how can I know whether it is a valid URL or 404 page not found? I'm using PHP. I've tried using PHP get_headers().
<?php
$getheadersvalidurlresponse= get_headers('https://test072519.myshopify.com/products/test-product1'); // VALID URL
print_r($getheadersvalidurlresponse);
$getheadersinvalidurlresponse= get_headers('https://test072519.myshopify.com/products/test-product1451'); // INVALID URL
print_r($getheadersinvalidurlresponse);
?>
But for both valid and invalid URLs, I got the same response.
Array
(
[0] => HTTP/1.1 403 Forbidden
[1] => Date: Wed, 08 Jul 2020 13:27:52 GMT
[2] => Content-Type: text/html
[3] => Connection: close
..............
)
I'm expecting 200 OK status code for valid URL and 404 for invalid URL.
Can anyone please help to check whether given shopify URL is valid or not using PHP?
Thanks in advance.
This happens because Shopify differentiates between bot requests and actual genuine requests to avoid denial of service attack up to a certain point. To overcome this problem, you will have to specify the user-agent header to mimic a browser request for an appropriate HTTP response.
As an improvement, you can make a HEAD request instead of a GET request(as get_headers() uses GET request by default, as mentioned in the examples) because here we are only concerned about response metadata and not response body.
Snippet:
<?php
$opts = array(
'http'=>array(
'method'=> "HEAD",
'header'=> "User-agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36"
)
);
$headers1 = get_headers('https://test072519.myshopify.com/products/test-product1',0,stream_context_create($opts));
$headers2 = get_headers('https://test072519.myshopify.com/products/test-product1451',0,stream_context_create($opts));
echo "<pre>";
print_r($headers1);
print_r($headers2);
I am trying to use the Google Flight RPC but am having trouble building a json array to post to it. The only documentation I've found is here. It describes what needs to be sent but doesn't explain how to construct the json array. Specifically, using PHP, I'm not sure how to build and post a json array using the following example from part 1 of the linked documentation.
[,[[,"fs","[,[,[\"SJC\"]\n,\"2012-04-05\",[\"EWR\",\"JFK\",\"LGA\"]\n,\"2012-04-12\"]\n]\n"]
]
,[,[[,"b_ca","54"]
,[,"f_ut","search;f=SJC;t=EWR,JFK,LGA;d=2012-04-05;r=2012-04-12"]
,[,"b_lr","11:36"]
,[,"b_lr","1:1528"]
,[,"b_lr","2:1827"]
,[,"b_qu","3"]
,[,"b_qc","1"]
]
]
]
The above does not appear to be properly formatted to be put into a json array. Additionally, HTTP request headers are needed. I assume these are set via cURL? I'm not clear on the documentation on how to do this.
I've tried several attempts and I don't get back any of the responses that are shown in the documentation.
Edit: Awesome, I got a Tumbleweed badge for this question! I would appreciate help. Thanks.
I was not able to find any documentation either.
However, if you go to https://www.google.com/flights/, open chrome console or firebug, click the network tab, then run a search, you will see that headers and body it makes for the RPC POST.
Sample Headers from the transfer:
Request URL:(I BROKE THE LINK) GOOGLE(dot)com /flights/rpc
Request Method:POST
Status Code:200 OK
Request Headers
:host:(I BROKE THE LINK) GOOGLE (dot) com
:method:POST
:path:/flights/rpc
:scheme:https
:version:HTTP/1.1
accept:/
accept-encoding:gzip,deflate,sdch
accept-language:en-US,en;q=0.8
content-length:169
content-type:application/json; charset=UTF-8
cookie:PREF=ID=f472fc4bbb95bc2b:U=9da5b7e4c1d04bda:FF=0:LD=en:TM=1390684154:LM=1390749713:GM=1:S=orUAMb3qaxBh99PJ; HSID=AHlw351sj7B7Om0t_; SSID=AKycPxLzyXkc4_tZJ; APISID=xKH5zAdc9vfBtiDy/Ab5TlD_Z4w2nP64Wl; SAPISID=7awo9qDssc3wr-fN/AQYOdvCN-I-UwtXQ1; NID=67=XnUn_DGdQDaeczlvXe-qTy9vy8gnQwhFwfRi52TRFS-_Dg-J58CgTGUY6Tkn3cCJYCcVJhK8unOrdffpgzeKed2jPqSazVI4Xplo5fW8-6wXoNi97L2gdoaOms0dKj4iOODoZpzd4DG_8YdQQcH6fl5xY__N929CJr8pdcAUwgnKf8X_mI8sLSB7CKVyS4ZvbGMCAiMLwIs1gJJz-UbppSj; S=travel-flights=5OJmMrbJoqLfOFzkZy285A; SID=DQAAAM0AAAAIGD56aXyxAxrRCSROmPy8AEtV3DaEwKT48aaZ98S35Nss09ishDZ3RxNT6ksikfAOJo-MLYVodF3jr-6imwzC8tRd7cxe-OoyafCZiGaf0qhp-yza4VZlAMInxGPhVae7wSXCRXlqb-wbYHBCHUSz_K5kYpvKwqC8pWuQ_6AUZa3WWqB6OmYpxuihxn3UxSve95zpkziyaDX0JFzUjyWX-0O_iIWZiEztywwyKVWCVv27ByGjIYTYV1G2byExt5M9-kEFpE_v0x8KgU7vleT
dnt:1
origin:(I BROKE THE LINK) SSL GOOGLE(dot)com
referer:(I BROKE THE LINK) SSL GOOGLE(dot)com flights
user-agent:Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.107 Safari/537.36
x-gwt-cctoken:ADS25WMm8S7W0MlpX1-Lf_yNzQCrke7t6OvH2kFLkBJIH_Q-YTuu8VSHmgIxzFtGaL87SsM5PcZECRBP7IqMCbM5QKFVdWrw9hRIkHoL5oiyCzEu2ZCnKuhqvv2sUKcg4Z_HnajCZmM7aQ9nYsVMQnFxqrkgB2Cz7rAIP47zPJ_rakoyxlGE4yJvcuUeiQ
x-gwt-module-base:https://www.google.com/flights/static/
x-gwt-permutation:C8210E5F468630F84E578D8EDE10A1A0
Request Payload
[,[[,"no","[]","1531191655318648",11]],[,[[,"b_al","no:74"],[,"b_ahr","no:s"],[,"b_ca","103:34541"],[,"b_pe","4F2F79B9E3FB0.A40E22A.71A7"],[,"b_qu","0"],[,"b_qc","1"]]]]
Response Headersview source
alternate-protocol:443:quic
cache-control:no-cache, no-store, max-age=0, must-revalidate
content-encoding:gzip
content-length:75
content-type:application/json; charset=utf-8
date:Sat, 22 Feb 2014 05:00:17 GMT
expires:Fri, 01 Jan 1990 00:00:00 GMT
pragma:no-cache
server:GSE
status:200 OK
version:HTTP/1.1
x-content-type-options:nosniff
x-frame-options:SAMEORIGIN
x-xss-protection:1; mode=block
So, what you will likely have to do to get this to work correctly is to make a GET on the path /flights and read the headers, then put the headers in your POST request. (I have had to do similar things in the past). In order to figure out what fields are what I would play around with selecting different fields and seeing what changes in the JSON data that it posts.
i have a pretty simple captcha, something like this:
<?php
session_start();
function randomText($length) {
$pattern = "1234567890abcdefghijklmnopqrstuvwxyz";
for($i=0;$i<$length;$i++) {
$key .= $pattern{rand(0,35)};
}
return $key;
}
$textCaptcha=randomText(8);
$_SESSION['tmptxt'] = $textCaptcha;
$captcha = imagecreatefromgif("bgcaptcha.gif");
$colText = imagecolorallocate($captcha, 0, 0, 0);
imagestring($captcha, 5, 16, 7, $textCaptcha, $colText);
header("Content-type: image/gif");
imagegif($captcha);
?>
the problem is that if the user have YSlow installed, the image is query 2 times, so, the captcha is re-generated and never match with the one inserted by the user.
i saw that is only query a second time if i pass the content-type header as gif, if i print it as a normal php, this doesn't happen.
someone have any clue about this? how i can prevent it or identify that the second query is made by YSlow, to do not generate the captcha again.
Regards,
Shadow.
YSlow does request the page components when run, so it sounds like your problem is cases where the user has YSlow installed and it's set to run automatically at each page load.
The best solution may be to adjust your captcha code to not recreate new values within the same session, or if it does to make sure the session variable matches the image sent.
But to your original question about detecting the second query made by YSlow, it's possible if you look at the HTTP headers received.
I just ran a test and found these headers sent with the YSlow request. The User-Agent is set to match the browser (Firefox in my case), but you could check for the presence of X-YQL-Depth as a signal. (YSlow uses YQL for all of its requests.)
Array
(
[Client-IP] => 1.2.3.4
[X-Forwarded-For] => 1.2.3.4, 5.6.7.8
[X-YQL-Depth] => 1
[User-Agent] => Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:8.0.1) Gecko/20100101 Firefox/8.0.1
[Accept-Encoding] => gzip
[Host] => www.example.com
[Connection] => keep-alive
[Via] => HTTP/1.1 htproxy1.ops.sp1.yahoo.net[D1832930] (YahooTrafficServer/1.19.5 [uScM])
)
First, here is what my current system looks like:
CouchDB 1.0.2
PHP 5.3.6
Apache httpd 2.2.19
PECL http 1.7.1
CouchDB-Lucene 0.6.1
I am building a mini search engine with CouchDB and CouchDB-Lucene. When the user enters a query I POST to my PHP script which then queries couchdb-lucene. Couchdb-lucene will then return a list of matching document keys to the PHP script. Then, I POST data (with http_post_data) to a List Function with that list of keys (detailed here, under "Querying Options"). This List Function returns HTML formatted results. This is the part that works.
My needs are now changing and I would like to query only the view and get back JSON. However, when I do, this is the response from the http_post_data call:
HTTP/1.1 415 Unsupported Media Type
Server: CouchDB/1.0.2 (Erlang OTP/R13B)
Date: Sat, 09 Jul 2011 22:22:51 GMT
Content-Type: text/plain;charset=utf-8
Content-Length: 78
Cache-Control: must-revalidate
{"error":"bad_content_type","reason":"Content-Type must be application/json"}
The URL that I generate for this view is correct. I can change my POST call to
http_post_data(url/of/view, $key_string, "Content-Type:application/json");
but nothing will actually be returned (I am looking at output in Firebug). To send back my results, here is the relevant PHP:
HttpResponse::setContentType("application/json");
HttpResponse::setData($response);
$response contains the response from the http_post_data call to CouchDB.
Any suggestions? This has been driving me mad for a day and a bit now.
Thanks.
http_post_data supposed to receive an assoc array (not a string) for options.
You should use array('headers' => array('content-type' => 'application/json')) instead of "Content-Type:application/json"
I'm been stuck on this problem for a while and I'm pretty sure it must be something quite simple that hopefully someone out there can shed some light on.
So, I'm currently using jQuery UI's Autocomplete plugin to reference and external PHP which gets information from a database (in an array) and sends it to a JSON output.
From my PHP file (search.php) when I do this:
echo json_encode($items);
My output (when looking at the search.php file) is this:
["Example 1","Example 2","Example 3","Example 4","Example 5"]
Which is valid JSON according to jsonlint.com
The problem is that when I use jQuery UI's Autocomplete script to reference the external search.php file, Chrome just gives me the following error:
GET http://www.example.com/search.php?term=my+search+term 404 (Not Found)
I have tried inputting the JSON code straight into the 'Source:' declaration in my jQuery, and this works fine, but it will not read the JSON from the external PHP file.
Please can someone help?
Here's my code:
HMTL
<p class="my-input">
<label for="input">Enter your input</label>
<textarea id="input" name="input"
class="validate[required]"
placeholder="Enter your input here.">
</textarea>
</p>
jQuery
$(function() {
$( "#input" ).autocomplete({
source: "http://www.example.com/search.php",
minLength: 2
});
});
PHP
header("Content-type: application/json");
// no term passed - just exit early with no response
if (empty($_GET['term'])) exit ;
$q = strtolower($_GET["term"]);
// remove slashes if they were magically added
if (get_magic_quotes_gpc()) $q = stripslashes($q);
include '../../../my-include.php';
global $globalvariable;
$items = array();
// Get info from WordPress Database and put into array
$items = $wpdb->get_col("SELECT column FROM $wpdb->comments WHERE comment_approved = '1' ORDER BY column ASC");
// echo out the items array in JSON format to be read by my jQuery Autocomplete plugin
echo json_encode($items);
Result
In browser, when information is typed into #input
GET http://www.example.com/search.php?term=Example+1 404 (Not Found)
Update: the real PHP url is here: http://www.qwota.co.uk/wp/wp-content/themes/qwota/list-comments.php?term=Your
Please help!
UPDATE: ANSWER
The answer to my problem has been pointed out by Majid Fouladpour
The problem wasn't with my code but rather with trying to use WordPress' $wpdb global variable as (as far as I understand) it includes it's own headers, and anything outside of it's usual layout will result in a 404 error, even if the file is actually there.
I'm currently trying to get around the problem by creating my own MySQL requests and not using WordPress's global variables / headers.
PS. Majid, I'll come back and give you a 'helpful tick' once StackOverflow lets me! (I'm still a n00b.)
Are you sure the path source: "http://www.example.com/search.php" is correct?
You have to make sure that the target URL exists. If you are really using http://www.example.com/search.php then, wk, it simply does not exist, so this is why it does not work.
Update
Since you have a real URL that's working (I tested it!), here are a few steps you can take:
Make sure there's no typo. If there's one, fix it.
Make sure you can open that URL from your browser. If you cannot, then you might be having network access problems (firewall, proxy, server permission issues, etc.)
Try redirecting to another know URL, just to make sure. The 404 error is really a "not found" error. It cannot be anything else.
I think the include is the issue. As Majid pointed out... use the below include instead.
include("../../../wp-load.php");
Good luck!
Your apache server is sending wrong headers. Here is a pair of request and response:
Request
GET /wp/wp-content/themes/qwota/list-comments.php?term=this HTTP/1.1
Host: www.qwota.co.uk
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Cookie: __utma=142729525.1341149814.1305551961.1305551961.1305551961.1; __utmb=142729525.3.10.1305551961; __utmc=142729525; __utmz=142729525.1305551961.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)
Response headers
HTTP/1.1 404 Not Found
Date: Mon, 16 May 2011 13:28:31 GMT
Server: Apache
X-Powered-By: PHP/5.2.14
X-Pingback: http://www.qwota.co.uk/wp/xmlrpc.php
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Cache-Control: no-cache, must-revalidate, max-age=0
Pragma: no-cache
Last-Modified: Mon, 16 May 2011 13:28:31 GMT
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8
Response body
["Bake 'em away... toys.","Content precedes design. Design in the absence of content is not design, it\u2019s decoration.","Hanging on in quiet desperation is the English way.","I'm a reasonable man, get off my case.","Look at me, Damien! It's all for you!","Never get out of the boat... absolutely god damn right.","That gum you like is going to come back in style.","The secret to creativity is knowing how to hide your sources.","Things could be different... but they're not.","Your eyes... they turn me."]
So, even though you receive back response from the server, it has HTTP/1.1 404 Not Found in the headers. Someone may be able to investigate this and provide a potential reason and solution.