HTTP headers printed on top of page (Microsoft Edge) - php

The following 'code' is sometimes (random) printed on a webpage after refresh.
>HTTP/1.1 200 OK
>Date: Fri, 18 Mar 2016 09:05:03 GMT
>Server: Apache
>X-Powered-By: PHP/5.3.6-pl0-gentoo
>X-Frame-Options: DENY
>X-XSS-Protection: 1; mode=block
>X-Content-Type-Options: nosniff
>Expires: Thu, 19 Nov 1981 08:52:00 GMT
>Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
>Pragma: no-cache
>Keep-Alive: timeout=15, max=86
>Connection: Keep-Alive
>Transfer-Encoding: chunked
>Content-Type: text/html
>5
(the last number, 5 in this case, is random, the rest is constant.
This is what I tried to solved this annoying 'bug?':
Removing HTML <head> contents
Removing HTML <body> contents
Removing AJAX (XHR) calls
Updating Smarty (engine that parses the templates)
PHP trim() around output to prevent unnessary spaces before or after <doctype> and <html> tags
Killing almost all PHP code (this is to much to explain here, but since I stripped it down complety I am 99% sure it is not the serverside (PHP) code)
Looking for PHP functions that are able to print these headers (greps for headers_list, getallheaders, apache_request_headers, etc.)
Tried multiple pages, same results, no matter its contents.
My customer sees the seem results on Microsoft Edge browser.
Updated other components, like browser detection
Added PHP ob_start();
Validated HTML
Made sure to clean Javascript console errors (now clean)
Gave a go on WireShark for Windows, to look at what headers are received, but this was to difficult for me. (should I retry?)
This problem sounds a lot like mine, but wasn't helping to fix mine: bugzilla DOT mozilla DOT org/show_bug.cgi?id=229710
Checked other Stack Overflow questions. Could not find a matching question/solution.
More, which I forgot :)
Notes:
The site is server over HTTPS with a valid certificate.
Here is the site link: https://www.10voordeleraar.nl
Attached screenshot links below.
The funny thing is, this only happens on Microsoft Edge, sometimes. It is behaving properly on all other browsers, so do my other sites.
Regards,
Laird
Screenshots:
Printed HTTP headers example on site top
Printed HTTP headers example in DOM inspect

Related

PHP Simple HTML DOM Parser returns gibberish

$html = file_get_html('http://www.livelifedrive.com/');
echo $html->plaintext;
I've no problem scraping other websites but this particular one returns gibberish.
Is it encrypted or something?
Actually, the gibberish you see is a GZIPed content.
When I fetch the content with hurl.it for instance, here are the headers returned by server:
GET http://www.livelifedrive.com/malaysia/ (the url http://www.livelifedrive.com/ resolves to http://www.livelifedrive.com/malaysia/)
Connection: keep-alive
Content-Encoding: gzip <--- The content is gzipped
Content-Length: 18202
Content-Type: text/html; charset=UTF-8
Date: Tue, 31 Dec 2013 10:35:42 GMT
P3p: CP="NOI ADM DEV PSAi COM NAV OUR OTRo STP IND DEM"
Server: nginx/1.4.2
Vary: Accept-Encoding,User-Agent
X-Powered-By: PHP/5.2.17
So once you have scraped the content, unzip it. Here is a sample code:
if ( ! function_exists('gzdecode'))
{
/**
* Decode gz coded data
*
* http://php.net/manual/en/function.gzdecode.php
*
* Alternative: http://digitalpbk.com/php/file_get_contents-garbled-gzip-encoding-website-scraping
*
* #param string $data gzencoded data
* #return string inflated data
*/
function gzdecode($data)
{
// strip header and footer and inflate
return gzinflate(substr($data, 10, -8));
}
}
References:
http://www.php.net/manual/en/function.gzdecode.php#106397
http://digitalpbk.com/php/file_get_contents-garbled-gzip-encoding-website-scraping
There's nothing really like site encryption, if the content can reach your browser and is HTML, it can be scraped.
It's probably because the site uses a lot of Javascript and Flash which cannot be scraped by an HTML parser. Even Google itself is just begginning to make inroads into accurate scraping of flash and Javascript.
To scrape a site in it's browser glory, try Selenium.
Links:
https://code.google.com/p/php-webdriver-bindings/
https://groups.google.com/forum/#!topic/selenium-users/Rj6BYEkz9Q0
A neat tip to know what you can scrape using an HTML scraper, try disabling Javascript and Flash on your browser and loading the website. The content you can view is easily scrapable - the rest you have to be a little more clever in your methods.
Maybe the files on their servers aren't saved as UTF-8?
I've tried your function on several sites and sometimes it works (on servers I know that they save their files as UTF-8, and not just stating those are encoded in UTF-8) and some other times it gives gibberish.
Try testing it yourself on your local machine, parsing files saved as UTF-8 and other encodings, and see what comes up...
$html->plaintext;
This will give you only text but if you need to fetch html then you need to use
$html->innertext;
For more information you can refer http://simplehtmldom.sourceforge.net/manual.htm

Curl, submitting form with __multiselect parameter

I'm trying to submit a (java servlet) form using CURL in PHP, but it seems like there is a problem with the parameters. I cant really understand why its happening since I'm testing the CURL with a identical string parameters that is being used by the browser.
After some research in diverse forums I wasn't able to find a solution to my particular problem.
this is the POSTFIELDS string generated by the browser (and working):
submissionType=pd&__multiselect_PostCodeList=&selectedPostCode=01&selectedPostCode=02&selectedPostCode=03&__multiselect_selectedPostCodes=
and I'm using and identical (for testing) string in the PHP script but it im getting a HTML file as a answers telling "Missing parameters in search query".
I believe that the form
__multiselect_PostCodeList=
&selectedPostCode=01
&selectedPostCode=02
&selectedPostCode=03
&__multiselect_selectedPostCodes=
is quite wired (never see before this) and I'm wondering that it can be the reason of why the post is not working from CURL.
The form seems to be successfully submitted since I'm getting this header
HTTP/1.1 200 OK
Date: Wed, 07 Aug 2013 08:02:56 GMT
Content-length: 1791
Content-type: text/html;charset=UTF-8
X-Powered-By: Servlet/2.4 JSP/2.0
Vary: Accept-Encoding
Content-Encoding: gzip
Connection: Keep-Alive
Note: I tried submitting the same form from Lynx and I'm also getting the same result ("Missing parameters in search query"). So it seems like its only working with browsers like Mozilla or Chrome.
Please some help will be really appreciated, I don't have any more ideas at this point.
Thanks!
Oscar

Error 404 with jQuery Autocomplete JSON referencing external PHP file

I'm been stuck on this problem for a while and I'm pretty sure it must be something quite simple that hopefully someone out there can shed some light on.
So, I'm currently using jQuery UI's Autocomplete plugin to reference and external PHP which gets information from a database (in an array) and sends it to a JSON output.
From my PHP file (search.php) when I do this:
echo json_encode($items);
My output (when looking at the search.php file) is this:
["Example 1","Example 2","Example 3","Example 4","Example 5"]
Which is valid JSON according to jsonlint.com
The problem is that when I use jQuery UI's Autocomplete script to reference the external search.php file, Chrome just gives me the following error:
GET http://www.example.com/search.php?term=my+search+term 404 (Not Found)
I have tried inputting the JSON code straight into the 'Source:' declaration in my jQuery, and this works fine, but it will not read the JSON from the external PHP file.
Please can someone help?
Here's my code:
HMTL
<p class="my-input">
<label for="input">Enter your input</label>
<textarea id="input" name="input"
class="validate[required]"
placeholder="Enter your input here.">
</textarea>
</p>
jQuery
$(function() {
$( "#input" ).autocomplete({
source: "http://www.example.com/search.php",
minLength: 2
});
});
PHP
header("Content-type: application/json");
// no term passed - just exit early with no response
if (empty($_GET['term'])) exit ;
$q = strtolower($_GET["term"]);
// remove slashes if they were magically added
if (get_magic_quotes_gpc()) $q = stripslashes($q);
include '../../../my-include.php';
global $globalvariable;
$items = array();
// Get info from WordPress Database and put into array
$items = $wpdb->get_col("SELECT column FROM $wpdb->comments WHERE comment_approved = '1' ORDER BY column ASC");
// echo out the items array in JSON format to be read by my jQuery Autocomplete plugin
echo json_encode($items);
Result
In browser, when information is typed into #input
GET http://www.example.com/search.php?term=Example+1 404 (Not Found)
Update: the real PHP url is here: http://www.qwota.co.uk/wp/wp-content/themes/qwota/list-comments.php?term=Your
Please help!
UPDATE: ANSWER
The answer to my problem has been pointed out by Majid Fouladpour
The problem wasn't with my code but rather with trying to use WordPress' $wpdb global variable as (as far as I understand) it includes it's own headers, and anything outside of it's usual layout will result in a 404 error, even if the file is actually there.
I'm currently trying to get around the problem by creating my own MySQL requests and not using WordPress's global variables / headers.
PS. Majid, I'll come back and give you a 'helpful tick' once StackOverflow lets me! (I'm still a n00b.)
Are you sure the path source: "http://www.example.com/search.php" is correct?
You have to make sure that the target URL exists. If you are really using http://www.example.com/search.php then, wk, it simply does not exist, so this is why it does not work.
Update
Since you have a real URL that's working (I tested it!), here are a few steps you can take:
Make sure there's no typo. If there's one, fix it.
Make sure you can open that URL from your browser. If you cannot, then you might be having network access problems (firewall, proxy, server permission issues, etc.)
Try redirecting to another know URL, just to make sure. The 404 error is really a "not found" error. It cannot be anything else.
I think the include is the issue. As Majid pointed out... use the below include instead.
include("../../../wp-load.php");
Good luck!
Your apache server is sending wrong headers. Here is a pair of request and response:
Request
GET /wp/wp-content/themes/qwota/list-comments.php?term=this HTTP/1.1
Host: www.qwota.co.uk
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Cookie: __utma=142729525.1341149814.1305551961.1305551961.1305551961.1; __utmb=142729525.3.10.1305551961; __utmc=142729525; __utmz=142729525.1305551961.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)
Response headers
HTTP/1.1 404 Not Found
Date: Mon, 16 May 2011 13:28:31 GMT
Server: Apache
X-Powered-By: PHP/5.2.14
X-Pingback: http://www.qwota.co.uk/wp/xmlrpc.php
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Cache-Control: no-cache, must-revalidate, max-age=0
Pragma: no-cache
Last-Modified: Mon, 16 May 2011 13:28:31 GMT
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8
Response body
["Bake 'em away... toys.","Content precedes design. Design in the absence of content is not design, it\u2019s decoration.","Hanging on in quiet desperation is the English way.","I'm a reasonable man, get off my case.","Look at me, Damien! It's all for you!","Never get out of the boat... absolutely god damn right.","That gum you like is going to come back in style.","The secret to creativity is knowing how to hide your sources.","Things could be different... but they're not.","Your eyes... they turn me."]
So, even though you receive back response from the server, it has HTTP/1.1 404 Not Found in the headers. Someone may be able to investigate this and provide a potential reason and solution.

CURL response different than response to request sent from browser

Attempting to submit a form with CURL, both via PHP and the command line. The response from the server consists of null content (the headers posted below).
When the same URL is submitted via a browser, the response consists of a proper webapge.
Have tried submitting the CURL request parameters via POST and GET via each of the following command line curl flags "-d" "-F" and "-G".
If the query string parameters are posted with "-d" flag, resulting header is:
HTTP/1.1 302 Moved Temporarily
Date: Thu, 02 Jun 2011 21:41:54 GMT
Server: Apache
Set-Cookie: JSESSIONID=DC5F435A96A353289F58593D54B89570; Path=/XXXXXXX
P3P: CP="CAO PSA OUR"
Location: http://www.XXXXXXXX.com/
Content-Length: 0
Connection: close
Content-Type: text/html;charset=UTF-8
Set-Cookie: XXXXXXXXXXXXXXXX=1318103232.20480.0000; path=/
If the query string parameters are posted with "-F" flag, the resulting header is:
HTTP/1.1 100 Continue
HTTP/1.1 500 Internal Server Error
Date: Thu, 02 Jun 2011 21:52:54 GMT
Server: Apache
Content-Length: 1677
Connection: close
Content-Type: text/html;charset=utf-8
Set-Cookie: XXXXXXXXXXXXXX=1318103232.20480.0000; path=/
Vary: Accept-Encoding
<html><head><title>Apache Tomcat/5.5.26 - Error report</title><style><!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--></style> </head><body><h1>HTTP Status 500 - </h1><HR size="1" noshade="noshade"><p><b>type</b> Exception report</p><p><b>message</b> <u></u></p><p><b>description</b> <u>The server encountered an internal error () that prevented it from fulfilling this request.</u></p><p><b>exception</b> <pre>javax.servlet.ServletException: Servlet execution threw an exception<br>
</pre></p><p><b>root cause</b> <pre>java.lang.NoClassDefFoundError: com/oreilly/servlet/multipart/MultipartParser<br>
com.corsis.tuesday.servlet.mp.MPRequest.<init>(MPRequest.java:27)<br>
com.corsis.tuesday.servlet.mp.MPRequest.<init>(MPRequest.java:21)<br>
com.corsis.tuesday.servlet.TuesdayServlet.doPost(TuesdayServlet.java:494)<br>
javax.servlet.http.HttpServlet.service(HttpServlet.java:710)<br>
javax.servlet.http.HttpServlet.service(HttpServlet.java:803)<br>
</pre></p><p><b>note</b> <u>The full stack trace of the root cause is available in the Apache Tomcat/5.5.26 logs.</u></p><HR size="1" noshade="noshade"><h3>Apache Tomcat/5.5.26</h3></body></html>
Questions:
What might cause a server to respond different depending on the nature of the CURL request.
How to successfully submit request via CURL?
HTTP/1.1 100 Continue
I had problems associated with this header before. Some servers simply do not understand it. Try this option to override Expect header.
curl_setopt( $curl_handle, CURLOPT_HTTPHEADER, array( 'Expect:' ) );
To add to what Richard said, I have seen cases where servers check the User-Agent string and behave differently based on its value.
I have just had an experience with this and what fixed it was surprising. In my situation I was logging into a server so I could upload a file, have the server do work on it, and then download the new file. I did this in Chrome first and used the dev tools to capture over 100 HTTP requests in this simple transaction. Most are simply grabbing resources I don't need if I am trying to do all of this from the command line, so I filtered out only the ones I knew at a minimum I should need.
Initially this boiled down to a GET to set the cookie and log in with a username and password, a POST to upload the file, a POST to execute the work on the file, and a GET to retrieve the new file. I could not get the first POST to actually work though. The response from that POST is supposed to be information containing the upload ID, time uploaded, etc, but instead I was getting empty JSON lists even though the status was 200 OK.
I used CURL to spoof the requests from the browser exactly (copying the User-Agent, overriding Expect, etc) and was still getting nothing. Then I started arbitrarily adding in some of the requests that I captured from Chrome between the first GET and POST, and low and behold after adding in a GET request for the JSON history before the POST the POST actually returned what it was supposed to.
TL;DR Some websites require more requests after the initial log in before you can POST. I would try to capture a successful exchange between the server and browser and look at all of the requests. Some requests might not be as superfluous as the seem.

What HTTP header RESPONSES should I be explicitly setting when I output a webpage?

So I just now learned of the X-Robots-Tag which can be set as part of a server response header. Now that I have learned about this particular field, I am wondering if there are any other specific fields I should be setting when I output a webpage via PHP? I did see this list of responses, but what should I be manually setting? What do you like to set manually?
Restated, in addition to...
header('X-Robots-Tag: noindex, nofollow, noarchive, nosnippet', true);
...what else should I be setting?
Thanks in advance!
You don't necessarily need to set any of them manually, and I don't send any unless absolutely necessary: most response headers are the web server's job, not the application's (give or take Location & situational cache-related headers).
As for the "X-*" headers, the X implies they aren't "official," so browsers may or may not interpret them to mean anything - like, you can add an arbitrary "X-My-App-Version" header to a public project to get a rough idea of where people are using it, but it's just extra info unless the requester knows what to do with it.
I think most X-headers are more commonly delivered via HTML as meta tags already. For example, <meta name="robots" content="noindex, nofollow, (etc)" />, which does the same as X-Robots-Tag. That's arguably better handled with the meta tag version anyway, since it won't trip over output buffering as header() can do, and it will be naturally cached since it's part of the page.
These are headers from Stackoverflow (this page), so the answer is, probably none.
You don't want your site indexed (noindex)?
Status=OK - 200
Cache-Control=public, max-age=60
Content-Type=text/html; charset=utf-8
Content-Encoding=gzip
Expires=Tue, 28 Sep 2010 01:23:00 GMT
Last-Modified=Tue, 28 Sep 2010 01:22:00 GMT
Vary=*
Set-Cookie=usr=t=&s=; domain=.stackoverflow.com; expires=Mon, 28-Mar-2011 01:22:00 GMT; path=/; HttpOnly
Date=Tue, 28 Sep 2010 01:21:59 GMT
Content-Length=6929
This header comes handy to me. Characters are displayed correctly, even if meta tag is missing.
Content-Type: text/html; charset=utf-8

Categories