How to check if a webpage exists. jQuery and/or PHP

How to check if a webpage exists. jQuery and/or PHP - php

I want to be able to validate a form to check if a website/webpage exists. If it returns a 404 error then that definitely shouldn't validate. If there is a redirect...I'm open to suggestions, sometimes redirects go to an error page or homepage, sometimes they go to the page you were looking for, so I don't know. Perhaps for a redirect there could be a special notice that suggests the destination address to the user.
The best thing I found so far was like this:
$.ajax({url: webpage ,type:'HEAD',error:function(){
alert('No go.');
}});
That has no problem with 404's and 200's but if you do something like 'http://xyz' for the url it just hangs. Also 302 and the like trigger the error handler too.
This is a generic enough question I would like a complete working code example if somebody can make one. This could be handy for lots of people to use.

It sounds like you don't care about the web page's contents, you just want to see if it exists. Here's how I'd do it in PHP - I can stop PHP from taking up memory with the page's contents.
/*
* Returns false if the page could not be retrieved (ie., no 2xx or 3xx HTTP
* status code). On success, if $includeContents = false (default), then we
* return true - if it's true, then we return file_get_contents()'s result (a
* string of page content).
*/
function getURL($url, $includeContents = false)
{
if($includeContents)
return #file_get_contents($url);
return (#file_get_contents($url, null, null, 0, 0) !== false);
}
For less verbosity, replace the above function's contents with this.
return ($includeContents) ?
#file_get_contents($url) :
(#file_get_contents($url, null, null, 0, 0) !== false)
;
See http://www.php.net/file_get_contents for details on how to specify HTTP headers using a stream context.
Cheers.

First you need to check that the page exists via DNS. That's why you say it "just hangs" - it's waiting for the DNS query to time out. It's not actually hung.
After checking DNS, check that you can connect to the server. This is another long timeout if you're not careful.
Finally, perform the HTTP HEAD and check the status code. There are many, many, many special cases you have to consider here: what does a "temporary internal server error" mean for the page existing? What about "permanently moved"? Look into HTTP status codes.

I've just written a simpler version using PHP:
function url_check($url) {
$x = #fopen($url,"r");
if ($x) {
$reply = 1;
fclose($x);
} else {
$reply = 0;
}
return $reply;
}
Obviously $url is the test URL, returns true (1) or false (0) depending on URL existence.

Maybe you could combine domain checker, and jQuery, domain checker (PHP) can respond 1 or 0 for non-existent domains.
eg. http://webarto.com/snajper.php?domena=stackoverflow.com , will return 1, you can use input blur function to check for it instantly.

Related

How do you echo a SQL SELECT statement from a PHP file called by AJAX?

There's a lot of code in each file, too much to post, so I'm giving you a general idea of what's happening in each file.
index.php
[html dropdown menu code etc.]
scripts.js
[AJAX detects user selection from dropdown, grabs fetch.php which pulls database to generate html code for secondary dropdown selections to put in index.php]
fetch.php
[Generates secondary dropdown code based on user selection and query of the database]
I need to see what exactly is being queried to debug, so I'd like to echo the sql select statement:
$query = "SELECT * FROM databasename WHERE.."
That is in fetch.php when user makes a selection from index.php - How do I do this?

When I deal with AJAX, that I return as JSON, one trick I use is to take advantage of output buffering. You can't just echo or output anything you want because it will mess up the JSON data so for an example,
ob_start(); //turn on buffering at beginning of script.
.... other code ...
print_r($somevar);
.... other code ...
$debug = ob_get_clean(); //put output in a var
$data['debug'] = $debug;
header('Content-Type: application/json');
echo json_encode($data); //echo JSON data.
What this does, is wrap any output from you script into your JSON data so that it's format is not messed up.
Then on the javascript side you can use console.log
$.post(url, input, function(data){
if(data.debug) console.log(data.debug);
});
If you are not used to debugging with console.log(), you can usually hit F12 and open the debugger in most browsers. Then in there the output will be sent to the "console". IE9 had a bit of an issue with console.log() if I recall, but I don't want to go to far off track.
NOTE: Just make sure to not leave this stuff in the code when you move it to production, its very simple to just comment this line out,
//$data['debug'] = $debug;
And then your debug information wont be exposed in production. There are other ways to automatically do this, but it depends on if you do development local then publish to the server. For example you can switch it on the $_SERVER['SERVER_ADDR']; which will be ::1 or 127.0.0.1 when it's local. This has a few drawbacks, mainly the server address is not available from the Command Line Interface (CLI). So typically I will tie it into a global constant that says what "mode" the site is in (included in the common entry point, typically index.php).
if(!defined('ENV_DEVELOPMENT')) define('ENV_DEVELOPMENT','DEVELOPMENT');
if(!defined('ENV_PRODUCTION')) define('ENV_PRODUCTION','PRODUCTION');
if(!defined('ENVIRONMENT')) define('ENVIRONMENT',ENV_DEVELOPMENT);
//site is in Development mode, uncomment for production
//if(!defined('ENVIRONMENT')) define('ENVIRONMENT',ENV_DEVELOPMENT);
Then it is a simple matter to check it:
if(ENVIRONMENT == ENV_PRODUCTION ) $data['debug'] = $debug;
If you know how to use error reporting you can even tie into that using
if(ini_get('display_errors') == 1) $data['debug'] = $debug;
Which will only show the debug when display errors is on.
Hope that helps.
UPDATE
Because I mentioned it in the comments, here is an example of it wrapped in a class (this is a simplified version, so I didn't test this)
class LibAjax{
public static function respond($callback, $options=0, $depth=32){
$result = ['userdata' => [
'debug' => false,
'error' => false
]];
ob_start();
try{
if(!is_callable($callback)){
//I have better exception in mine, this is just more portable
throw new Exception('Callback is not callable');
}
$callback($result);
}catch(\Exception $e){
//example 'Exception[code:401]'
$result['userdata']['error'] = get_class($e).'[code:'.$e->getCode().']';
//if(ENVIRONMENT == ENV_DEVELOPMENT){
//prevents leaking data in production
$result['userdata']['error'] .= ' '.$e->getMessage();
$result['userdata']['error'] .= PHP_EOL.$e->getTraceAsString();
//}
}
$debug = '';
for($i=0; $i < ob_get_level(); $i++){
//clear any nested output buffers
$debug .= ob_get_clean();
}
//if(ENVIRONMENT == ENV_DEVELPMENT){
//prevents leaking data in production
$result['userdata']['debug'] = $debug;
//}
header('Content-Type: application/json');
echo self::jsonEncode($result, $options, $depth);
}
public static function jsonEncode($result, $options=0, $depth=32){
$json = json_encode($result, $options, $depth);
if(JSON_ERROR_NONE !== json_last_error()){
//debug is not passed in this case, because you cannot be sure that, that was not what caused the error. Such as non-valid UTF-8 in the debug string, depth limit, etc...
$json = json_encode(['userdata' => [
'debug' => false,
'error' => json_last_error_msg()
]],$options);
}
return $json;
}
}
Then when you make a AJAX response you just wrap it like this (note $result is pass by reference, this way we don't have to do return, and in the case of an exception we update $result in "real time" instead of on completion)
LibAjax::respond( function(&$result){
$result['data'] = 'foo';
});
If you need to pass additional data into the closure don't forget you can use the use statement, like this.
$otherdata = 'bar';
LibAjax::respond( function(&$result) use($otherdata){
$result['data'][] = 'foo';
$result['data'][] = $otherdata;
});
Sandbox
This handles catching any output and puts it in debug, if the environment is correct (commented out). Please pleas make sure to implement some kind of protection so that the output is not sent to clients on production, I cant stress that enough. It also catches any exceptions puts it in error. And it also handles the header and encoding.
One big benefit to this is consistent structure to your JSON, you will know (on the client side) that if if(data.userdata.error) then you have an exception on the back end. It gives you one place to tweak your headers, JSON encoding etc...
One note in PHP7 you'll have to or should add the Throwable interface (instead of Exception). If you want to catch Error and Exception classes Or do two catch blocks.
Let's just say I do a lot of AJAX and got sick of re-writing this all the time, my actual class is more extensive then this, but that's the gist of it.
Cheers.
UPDATE1
One thing I had to do for things to display was to parse the data variable before I console.log() it
This is typically because you are not passing the correct header back to the browser. If you send (just before calling json_encode)
header('Content-Type: application/json');
This just lets the browser know what type of data it is getting back. One thing most people forget is that on the web all responses are done in text. Even images or file download and web pages. It's all just text, what makes that text into something special is the Content-Type that the browser thinks it is.
One thing to note about header is you cannot output anything before sending the headers. However this plays well with the code I posted because that code will capture all the output and send it after the header is sent.
I updated the original code to have the header, I had it in the more complex class one I posted later. But if you add that in it should get rid of the need to manually parse the JSON.
One last thing I should mention I do is check if I got JSON back or text, you could still get text in the event that some error occurs before the output buffering is started.
There are 2 ways to do this.
If Data is a string that needs to be parsed
$.post(url, {}, function(data){
if( typeof data == 'string'){
try{
data = $.parseJSON(data);
}catch(err){
data = {userdata : {error : data}};
}
}
if(data.userdata){
if( data.userdata.error){
//...etc.
}
}
//....
}
Or if you have the header and its always JSON, then its a bit simpler
$.post(url, {}, function(data){
if( typeof data == 'string'){
data = {userdata : {error : data}};
}
if(data.userdata){
if( data.userdata.error){
//...etc.
}
}
//....
}
Hope that helps!
UPDATE2
Because this topic comes up a lot, I put a modified version of the above code on my GitHub you can find it here.
https://github.com/ArtisticPhoenix/MISC/blob/master/AjaxWrapper/AjaxWrapper.php

Echo the contents and do a die() or exit; afterwards... then in the Network tab of your browser, start it recording, run the Ajax request (it'll fail) but check the resource/name and then view the Response, and it'll show you what was echo'd in the script
Taken from: Request Monitoring in Chrome
Chrome currently has a solution built in.
Use CTRL+SHIFT+I (or navigate to Current Page Control > Developer > Developer Tools.
In the newer versions of Chrome, click the Wrench icon > Tools > Developer Tools.) to enable the Developer Tools.
From within the developer tools click on the Network button. If it isn't already, enable it for the session or always.
Click the "XHR" sub-button.
Initiate an AJAX call.
You will see items begin to show up in the left column under "Resources".
Click the resource and there are 2 tabs showing the headers and return content.
Other browsers also have a Network tab, but you will need to use what I commented to get the string value of the query.
ArtisticPhoenix solution above is delightful.

socket_recv not receiving full data

I'm sending from browser through Websocket an image data of around 5000 bytes but this line is receiving total of 1394 bytes only:
while ($bytes = socket_recv($socket, $r_data, 4000, MSG_DONTWAIT)) {
$data .= $r_data;
}
This is after handshake is done which is correctly being received. The json data is being cutoff after 1394 bytes. What could be the reason?
In the browser interface it is sending image as JSON:
websocket.send(JSON.stringify(request));
The browser interface is fine as it is working with other PHP websocket free programs I've tested.
Here is the full source code.

You have our socket set up as non-blocking by specifying MSG_DONTWAIT, so it will return EAGAIN after it reads the first chunk of data, rather than waiting for more data. Remove the MSG_DONTWAIT flag and use MSG_WAITALL instead, so that it waits for all the data to be received.
There are a few ways of knowing if you have received all the data you are expecting:
Send the length of the data. This is useful if you want to send multiple blocks of variable length content. For example if I want to send three strings, I might first send a "3" to tell the receiver how many string to expect, then for each one I would send the length of the string, followed by the string data.
Use fixed length messages. If you are expecting multiple messages but each one is the same size, then you can just read from the socket until you have at least that many bytes and then process the message. Note that you may receive more than one message (including partial messages) in a single recv() call.
Close the connection. If you are sending only one message, then you can half-close the connection. This works because TCP connections maintain separate states for sending and receiving, so the server and close the sending connection yet leave the receiving one open for the client's reply. In this case, the server sends all its data to the client and then calls socket_shutdown(1)
1 and 2 are useful if you want to process the data while receiving it - for example if you are writing a game, chat application, or something else where the socket stays open and multiple messages are passed back and forth. #3 is the easiest one, and is useful when you just want to receive all the data in one go, for example a file download.

1394 is around the common size of an MTU, especially if you are tunnelled through a VPN (are you?).
You can't expect to read all the bytes in one call, the packets may be fragmented according to the network MTU.

Just my 2 cents on this.
socket_recv can return false on an error. Where it can also receive zero (0) bytes in non-blocking IO.
Your check in your loop should be:
while(($bytes = socket_recv($resource, $r_data, 4000, MSG_DONTWAIT)) !== false) {}
Altough I would check the socket for errors also and add some usleep call to prevent "CPU burn".
$data = '';
$done = false;
while(!$done) {
socket_clear_error($resource);
$bytes = #socket_recv($resource, $r_data, 4000, MSG_DONTWAIT);
$lastError = socket_last_error($resource);
if ($lastError != 11 && $lastError > 0) {
// something went wrong! do something
$done = true;
}
else if ($bytes === false) {
// something went wrong also! do something else
$done = true;
}
else if (intval($bytes) > 0) {
$data .= $r_data;
}
else {
usleep(2000); // prevent "CPU burn"
}
}

I'm wondering if you are having issues with your websockets connection. The while-loop you quote above looks to me to reside in a part of the code where the client handshake has failed, it's in the else of if($client->getHandshake()) { ... } else { ... }.
As far as I can tell the $client is a separate class, so I can't see what the class looks like or what Client::getHandshake() does, but I'm guessing it is the getter of a boolean that holds the success or failure of the websocket upgrade handshake.
If I'm correct the handshake fails and the connection is closed by the client. From the code I can see that the server-code you are using requires version 13 of the spec. You do no mention which client-side library you are using, but other servers will accept other versions than this server.
Please make sure your client-library supports the latest version.
Posting the verbose output from the server when it gets an incoming connection and the transfer fails will be of help if what I'm suggesting is wrong.

BUT, isn't the portion of the code that you pasted contained in the else block? The else block that to me looks like the hand shake did not go through?
Could you print the received bytes as string?

I don't think your question is correct. According to the source code, if the handshake has succeeded then this section of code is executed:
$data = '';
while (true) {
$ret = socket_recv($socket, $r_data, 4000, MSG_DONTWAIT);
if ($ret === false) {
$this->console("$myidentity socket_recv error");
exit(0);
}
$data .= $r_data;
if (strlen($data) > 4000) {
print "breaking as data len is more than 4000\n";
break;
} else {
print "curr datalen=" . strlen($data) . "\n";
}
}
If the program really goes to the code section that you provided then it will be worth to look into why the handshake failed.
The Server Class has a third parameter verboseMode which when set to true will provide you with detailed debug logs on what exactly is happening.
We will just be speculating without the debug log, but if the debug log is provided we can come up with a better suggestion.

Attempting to load again a URL when it fails

The following function receives a string parameter representing an url and then loads the url in a simple_html_dom object. If the loading fails, it attemps to load the url again.
public function getSimpleHtmlDomLoaded($url)
{
$ret = false;
$count = 1;
$max_attemps = 10;
while ($ret === false) {
$html = new simple_html_dom();
$ret = $html->load_file($url);
if ($ret === false) {
echo "Error loading url: $url\n";
sleep(5);
$count++;
$html->clear();
unset($html);
if ($count > $max_attemps)
return false;
}
}
return $html;
}
However, if the url loading fails one time, it keeps failing for the current url, and after the max attemps are over, it also keeps failing in the next calls to the function with the rest of the urls it has to process.
It would make sense to keep failing if the urls were temporarily offline, but they are not (I've checked while the script was running).
Any ideas why this is not working properly?
I would also like to point out, that when starts failing to load the urls, it only gives a warning (instead of multiple ones), with the following message:
PHP Warning: file_get_contents(http://www.foo.com/resource): failed
to open stream: HTTP request failed! in simple_html_dom.php on line
1081
Which is prompted by this line of code:
$ret = $html->load_file($url);

I have tested your code and it works perfectly for me, every time I call that function it returns valid result from the first time.
So even if you load the pages from the same domain there can be some protection on the page or server.
For example page can look for some cookies, or the server can look for your user agent and if it see you as an bot it would not serve correct content.
I had similar problems while parsing some websites.
Answer for me was to see what is some page/server expecting and make my code simulate that. Everything, from faking user agent to generating cookies and such.
By the way have you tried to create a simple php script just to test that 'simple html dom' parser can be run on your server with no errors? That is the first thing I would check.
On the end I must add that in one case, while I failed in numerous tries for parsing one page, and I could not win the masking game. On the end I made an script that loads that page in linux command line text browser lynx and saved the whole page locally and then I parsed that local file which worked perfect.

may be it is a problem of load_file() function itself.
Problem was, that the function error_get_last() returns all privious erros too, don't know, may be depending on PHP version?
I solved the problem by changing it to (check if error changed, not if it is null)
(or use the non object function: file_get_html()):
function load_file()
{
$preerror=error_get_last();
$args = func_get_args();
$this->load(call_user_func_array('file_get_contents', $args), true);
// Throw an error if we can't properly load the dom.
if (($error=error_get_last())!==$preerror) {
$this->clear();
return false;
}
}

Why is this HTTP request continually looping?

I'm probably overlooking something really obvious here.
Comments are in to help explain any library specific code.
public function areCookiesEnabled() {
$random = 'cx67ds';
// set cookie
cookie::set('test_cookie', $random);
// try and get cookie, if not set to false
$testCookie = cookie::get('test_cookie', false);
$cookiesAppend = '?cookies=false';
// were we able to get the cookie equal ?
$cookiesEnabled = ($testCookie === $random);
// if $_GET['cookies'] === false , etc try and remove $_GET portion
if ($this->input->get('cookies', false) === 'false' AND $cookiesEnabled) {
url::redirect(str_replace($cookiesAppend, '', url::current())); // redirect
return false;
}
// all else fails, add a $_GET[]
if ( ! $cookiesEnabled) {
url::redirect(url::current().$cookiesAppend);
}
return $cookiesEnabled;
}
Firstly, I wanted an easy way to check if cookies were enabled. I achieved this, but in the event of no cookies, there was an ugly ?cookies=false in the URL.
That was OK, but then if you reloaded the page and did have cookies enabled again, I wanted to redirect the user so it stripped off ?cookies=false in the URL (allowing the method to recheck and learn that cookies now are enabled.).

After $cookiesEnabled = ($testCookie === $random);, there are 4 cases:
$cookiesEnabled is true and $_GET['cookies'] === 'false' is true
$cookiesEnabled is true and $_GET['cookies'] === 'false' is false
$cookiesEnabled is false and $_GET['cookies'] === 'false' is true
$cookiesEnabled is false and $_GET['cookies'] === 'false' is false
Case 1 is handled by the first if block. The return statement is intended to handle cases 2 and 3; the second if block is intended to handle only case 4, but it catches both case 3 and 4. In case 3, the URL already has ?cookies=false, but since $cookiesEnabled is false, we redirect to add ?cookies=false, and cycle back into case 3.

You must be leaving something out since there is no loop in that code. If you meant that the browser is looping (e.g. getting continuous redirects), then I recommend installing the Live HTTP Headers extension to Firefox and watch what the browser and server are actually saying to each other. Putting in some logging code in the snippet above might also be instructive.
Update for comment:
Then I really recommend putting in print statements inside the ifs so you can see which ones you're going through and what the various values are. Clearly something is not getting set the way you thought it would be, so now you need to find out what it actually is.
One thing I have encountered several times is that the code itself is OK, but there is a .htaccess file that is working against you, so go double check any .htaccess files in any of the directories, starting from DOCUMENT_ROOT.

weird problem with file_get_contents and twitter

I made this function to verify a user's twitter credentials. Its running on two different webservers.
<?
function twitauth($username, $password){
if(#file_get_contents("http://".$username.":".$password."#twitter.com//account/verify_credentials.xml")){
return "1";}
else {
return "0";}
}
?>
On my webserver, it works fine. On the other one, it ALWAYS returns 1! Even when password is intentionally wrong.
What in the world would cause one server to do one thing, and the other to do something else?

When I visit that url with any combination of username/password it always returns something, whether it's auth successful or failure. file_get_contents() only returns FALSE when it fails to open the requested url.
It seems to me for your function to be successful you would have to parse the return value to determine whether or not the auth was successful.

Remove the '#' sign from the function to see the error message (if there is one).
Some PHP configurations don't allow opening files over the HTTP protocol, so look into cURL, or try looking up the official Twitter API to see if they have authentication functions for you to use.

I came up with an alternative solution.
<?
function twitauth($username, $password){
$xml = #simplexml_load_file("http://".$username.":".$password."#twitter.com/statuses/friends_timeline.xml");
$noway = $xml->error;
$errorcheck = "Could not authenticate you.";
if($noway == $errorcheck){
return "0";
} else {
return "1";
}
}
?>

The # symbol (error suppression) in front of file_get_contents might be suppressing an error. Try removing it and see what error you get. Also, you might be seeing different behavior on different servers due to php configuration. Specifically, the allow_url_fopen setting changes file_get_contents ability to work with URLs. Check this setting on both servers (maybe with ini_get() or find the setting in the output of phpinfo().

Here is an updated response that isn't returning booleans as strings, and it's weird to check if its the error message before checking if its not the error message.
<?php
function twitauth($username, $password){
$xml = #simplexml_load_file("http://". urlencode($username) .":". urlencode($password) ."#twitter.com/statuses/friends_timeline.xml");
return ($xml->error != "Could not authenticate you.") ? true : false;
}
?>
file_get_contents() will only return the response of the page, which can be an authenticated user or a bad response, you need to use SimpleXML or what not to parse the response to determine whether or not they were authenticated. Which looks like:
<?xml version="1.0" encoding="UTF-8"?>
<user>
<id>800316</id>
<name>Garrett</name>
<screen_name>garrettb</screen_name>
<location>WHER>!, CA, USA</location>
<description>Build websites, wants to be rich, and loves my Mac. You?</description>
<profile_image_url>http://a1.twimg.com/profile_images/185221952/pic_normal.png</profile_image_url>
<url></url>
<protected>false</protected>
<followers_count>158</followers_count>
<profile_background_color>352726</profile_background_color>
<profile_text_color>3E4415</profile_text_color>
<profile_link_color>D02B55</profile_link_color>
<profile_sidebar_fill_color>99CC33</profile_sidebar_fill_color>
<profile_sidebar_border_color>829D5E</profile_sidebar_border_color>
<friends_count>139</friends_count>
<created_at>Wed Feb 28 06:03:17 +0000 2007</created_at>
<favourites_count>18</favourites_count>
<utc_offset>-28800</utc_offset>
<time_zone>Pacific Time (US & Canada)</time_zone>
<profile_background_image_url>http://s.twimg.com/a/1251845223/images/themes/theme5/bg.gif</profile_background_image_url>
<profile_background_tile>false</profile_background_tile>
<statuses_count>1781</statuses_count>
<notifications></notifications>
<verified>false</verified>
<following></following>
<status>
<created_at>Wed Sep 02 19:07:59 +0000 2009</created_at>
<id>3716655439</id>
<text>#lucaspatton09 take a picture, I want to see.</text>
<source><a href="http://www.atebits.com/" rel="nofollow">Tweetie</a></source>
<truncated>false</truncated>
<in_reply_to_status_id>3716512637</in_reply_to_status_id>
<in_reply_to_user_id>59230940</in_reply_to_user_id>
<favorited>false</favorited>
<in_reply_to_screen_name>lucaspatton09</in_reply_to_screen_name>
</status>
</user>
If the request is denied (bad access), it will have a authentication dialog drop down, which is probably causing you problems.

file_get_contents usually gives warning and returns nothing upon encountering http error code, but in case of your other server it probably returns body of error page (maybe it can be set up by some configuration).
Code below should work for both cases:
if(strpos(
#file_get_contents("http://".$username.":".$password."#twitter.com//account/verify_credentials.xml"),
"Could not authenticate you.") === false) {
echo "credentials ok";
} else {
echo "credentials not ok";
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.