Getting HTTP response of a URL in PHP - XML - php

I am trying to retrieve the status of the url. I am writing php code to retrieve it but i am not getting the output. Nothing is being displayed.
I am reading the url's from the xml file and storing it in variable. I am doing
file_get_contents($url);
echo $http_respone_header[0];
$url contains the url which i have read from the xml file.

The thing You are doing is not getting URL status but content of the site. In case of wrong/invalid URL file_get_contents() returns false as it is described in documentation in here
If You are trying to get status You can simply use the solution described in other topic on this site.
<?php
$url = 'http://www.wp.pl';
$handle = curl_init($url);
curl_setopt($handle, CURLOPT_RETURNTRANSFER, TRUE);
/* Get the HTML or whatever is linked in $url. */
$response = curl_exec($handle);
/* Check for 404 (file not found). */
$httpCode = curl_getinfo($handle, CURLINFO_HTTP_CODE);
echo $httpCode;
curl_close($handle);
?>
Link to the mentioned topic: here

Related

A better way to test if video or a 404 page?

I am using the following code to see if a video is ready to embed on my site. It exits if the video is a 404 page and continues if it is anything else, including a video of course. Well, in theory.
I was confused for like an hour why it stopped working but it must be because the video is ready now and my code is trying to include the whole video in my code rather than just getting a header or something.
Is there a better way to do it please??
$url=$videourl;
$handle = curl_init($url);
curl_setopt($handle, CURLOPT_RETURNTRANSFER, TRUE);
/* Get the HTML or whatever is linked in $url. */
$response = curl_exec($handle);
/* Check for 404 (file not found). */
$httpCode = curl_getinfo($handle, CURLINFO_HTTP_CODE);
if($httpCode == 404) {
exit("video not ready!");
}
curl_close($handle);
It sound like you want to make a HEAD request. Use:
curl_setopt($handle, CURLOPT_NOBODY, true);

Grabbing Destination Link of a Redirect

Hopefully I am just overlooking this.
I am trying to grab the destination URL of a redirect link using PHP. It's to get the site URL of an affiliate/cloaked link.
Best example: http://tinyurl.com/2tx goes to google.com
NOTE: This is an example, the links are created dynamically
Right now I pass the URL through
www.mysite.com/redirect.php?link=http://tinyurl.com/2tx
Here is the code from the site - NOTE: since the URL has Ampersands in them I had to go this route over GET.
<?php
$name = http_build_query($_GET);
// which you would then may want to strip away the first 'name='
$name = substr($name, strlen('name='));
//change link to a nice URL
$url = rawurldecode($name);
?>
I have a simple script that grabs the URL, how could I process the URL to get the destination URL?
Hopefully that's not too confusing.
Cheers,
Robb
You should post some of your code next time. I assume you are using cURL to do this. It's fairly simple:
//sanitize
$ch = curl_init($_GET['link']);
//follow redirects
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_exec($ch);
$url = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
EDIT: per Dagon, you just want to "know the url but not go there." It is more efficient to use this setting if you only need to know the url but not get its contents:
curl_setopt($ch, CURLOPT_NOBODY, true);
Here is how I would do it (Read the comments):
<?php
// Connect to the page:
$ch = curl_init("http://tinyurl.com/2tx");
// Don't get the body (remove if you want the body):
curl_setopt($ch, CURLOPT_NOBODY, true);
// Follow the page redirects:
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
// Retun the data as a string (Remove to echo to the page):
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
// Execute:
curl_exec($ch);
// Get data:
print_r($data = curl_getinfo($ch));
// Get just the url:
echo $data["url"];
Make a HTTP HEAD Request to the URL you have. You will get back an HTTP 301 or 302 response with the destination URL.
Example: Put your URL here to see the response returned when making an HTTP Head Request.
This might be an encoding issue. The parameter in your URL is not encoded, so it's probably damaged when trying to get it using $_GET.
You want to use this URL:
www.mysite.com/redirect.php?link=http%3A%2F%2Ftinyurl.com%2F2tx
You can encode URL variables in PHP using the urlencode() function. The variable that (I think) you want can now be accessed like this:
echo $_GET['link']; // http://tinyurl.com/2tx

Validate link href attribute

I need to periodically loop through links in my PHP database to check whether the link leads to valid page. If the link has expired or is invalid, I don't want to output it. How can I check that the href value leads to a valid page efficiently?
Thanks for any *pointers.
You can also use multiple CUrl request each time to check all list more faster. Check here
Look into curl. It allows you to pull a site in php http://www.php.net/manual/en/function.curl-exec.php Then just check for either a status code on the response or something like a title tag.
I'm kind of a noob myself, but I would suggest using cURL. A quick Google search on using revealed the following code (which I haven't tested):
<?php
$statusCode = validate($_REQUEST['url']);
if ($statusCode==’200′)
echo ‘Voila! URL ‘.$_REQUEST['url'].
’ exists, returned code is :’.$statusCode;
else
echo ‘Opps! URL ‘.$_REQUEST['url'].
’ does NOT exist, returned code is :’.$statusCode;
function validateurl($url)
{
// Initialize the handle
$ch = curl_init();
// Set the URL to be executed
curl_setopt($ch, CURLOPT_URL, $url);
// Set the curl option to include the header in the output
curl_setopt($ch, CURLOPT_HEADER, true);
// Set the curl option NOT to output the body content
curl_setopt($ch, CURLOPT_NOBODY, true);
/* Set to TRUE to return the transfer
as a string of the return value of curl_exec(),
instead of outputting it out directly */
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
// Execute it
$data = curl_exec($ch);
// Finally close the handle
curl_close($ch);
/* In this case, we’re interested in
only the HTTP status code returned, therefore we
use preg_match to extract it, so in the second element
of the returned array is the status code */
preg_match(“/HTTP\/1\.[1|0]\s(\d{3})/”,$data,$matches);
return $matches[1];
}
?>
Source: http://www.ajaxapp.com/2009/03/23/to-validate-if-an-url-exists-use-php-curl/

Check if a URL in anchor tag exists on a page on remote URL

My project requires a free user to put a URL to project site on their website to get SEO and back links for project site. So I want to check if
<a href='http://examplesite.com'>example site</a>
exists on page specified by URL given by registering user.
I would have to use this check multiple times so want a less resource hungry solution.
I don't think curl is that much more resource hungry (if at all) than using some other PHP function to fetch the remote resource - they all use the same basic principles.
But file_get_contents is a viable alternative if curl isn't available. You can use the new stream contexts to mimick most of curl's capabilities like sending an appropriate user-agent header, etc.
There is nothing wrong with this code resource wise.
// Initializes a new session and return a cURL handle
$handle = curl_init($url);
// Sets an option on the given cURL session handle
curl_setopt($handle, CURLOPT_RETURNTRANSFER, TRUE);
// Execute the given cURL session
$response = curl_exec($handle);
// Gets information about the last transfer.
// CURLINFO_HTTP_CODE - Last received HTTP code
$httpCode = curl_getinfo($handle, CURLINFO_HTTP_CODE);
if($httpCode == 404) {
// your code here
} else {
// your code here
}
curl_close($handle);
Thanks everyone. This function did what I wanted:
function checkurl($url, $urltocheckfor){
$input = #file_get_contents($url);
$regexp = "<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>";
$matches = array();
if(preg_match_all("/$regexp/siU", $input, $matches)) {
if(in_array($urltocheckfor))
return true;
else
return false;
}
}

PHP - How to check URLS for 404/Timeout?

Here is my structure:
MYSQL: Table: toys ---> Column: id, URL. How do I get my PHP script to check all of those URLs to see if they are alive or have page 404's? Try not to echo or diplay the results on page. I will need to to record in MYSQL with a extra column "checks".
Results will be in this format:
http://asdasd.adas --- up --- 404
It will be in PHP/Curl if possible. I have been trying for ages. I gave up so decided to ask here.
URL's are all located in my database.
In cURL, there's the curl_getinfo function, that returns some info about the current handle:
<?php
// Create a curl handle
$ch = curl_init('http://www.yahoo.com/');
// Execute
curl_exec($ch);
//fill here the error/timeout checks.
$http_code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
I trust you're able to run a SQL query and enumerate through the results, so here's the cURL part. For each URL, send it a HEAD request, and check the result code.
<?php
$handle = curl_init($yourURL);
curl_setopt($handle, CURLOPT_NOBODY, true);
curl_exec($handle);
$result = curl_getinfo($handle, CURLINFO_HTTP_CODE);
// $result now contains the HTTP result code the page sent
?>

Categories