Validate link href attribute - php

I need to periodically loop through links in my PHP database to check whether the link leads to valid page. If the link has expired or is invalid, I don't want to output it. How can I check that the href value leads to a valid page efficiently?
Thanks for any *pointers.

You can also use multiple CUrl request each time to check all list more faster. Check here

Look into curl. It allows you to pull a site in php http://www.php.net/manual/en/function.curl-exec.php Then just check for either a status code on the response or something like a title tag.

I'm kind of a noob myself, but I would suggest using cURL. A quick Google search on using revealed the following code (which I haven't tested):
<?php
$statusCode = validate($_REQUEST['url']);
if ($statusCode==’200′)
echo ‘Voila! URL ‘.$_REQUEST['url'].
’ exists, returned code is :’.$statusCode;
else
echo ‘Opps! URL ‘.$_REQUEST['url'].
’ does NOT exist, returned code is :’.$statusCode;
function validateurl($url)
{
// Initialize the handle
$ch = curl_init();
// Set the URL to be executed
curl_setopt($ch, CURLOPT_URL, $url);
// Set the curl option to include the header in the output
curl_setopt($ch, CURLOPT_HEADER, true);
// Set the curl option NOT to output the body content
curl_setopt($ch, CURLOPT_NOBODY, true);
/* Set to TRUE to return the transfer
as a string of the return value of curl_exec(),
instead of outputting it out directly */
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
// Execute it
$data = curl_exec($ch);
// Finally close the handle
curl_close($ch);
/* In this case, we’re interested in
only the HTTP status code returned, therefore we
use preg_match to extract it, so in the second element
of the returned array is the status code */
preg_match(“/HTTP\/1\.[1|0]\s(\d{3})/”,$data,$matches);
return $matches[1];
}
?>
Source: http://www.ajaxapp.com/2009/03/23/to-validate-if-an-url-exists-use-php-curl/

Related

PHP cURL returns encrypted html page

I'm trying to get simple html code, from cURL GET-request on PHP.
Default get-request on url, like http://example.com/ (not exacly this domain), returns html code I need, but get-request on page of this domain, like http://example.com/something returns gzip encrypted data, or something.
What I already tried to fix this issue:
curl_setopt(ch, CURLOPT_ENCODING, ''); // returns ''
curl_setopt(ch, CURLOPT_ENCODING, 'gzip'); // returns ''
curl_setopt(ch, CURLOPT_ENCODING, 'gzip,compressed'); // returns ''
$html = gzdecode($data); // data error
By the way, on inspector, like Fiddler, this page returns similar wierd symbols, but it fixes by one click: 'Click to decrypt'. How I can decrypt my data programmatically, using PHP?
If I understood you well, you need to get the content in HTML from an url.
Please, check this link:
Get HTML from URL using curl in PHP
You don't need to use CURLOPT_ENCODING in curl_setopt.
EDIT
I tried this and it works:
<?php
function get_data($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$html_content = get_data('https://stackoverflow.com/questions/61548866/php-curl-returns-encrypted-html-page/61549219?noredirect=1#comment108875034_61549219');
echo "You are getting HTML code from an url <br>".$html_content;
?>
Image with test working in localhost
Thank you, I hope it helps you.

Grabbing Destination Link of a Redirect

Hopefully I am just overlooking this.
I am trying to grab the destination URL of a redirect link using PHP. It's to get the site URL of an affiliate/cloaked link.
Best example: http://tinyurl.com/2tx goes to google.com
NOTE: This is an example, the links are created dynamically
Right now I pass the URL through
www.mysite.com/redirect.php?link=http://tinyurl.com/2tx
Here is the code from the site - NOTE: since the URL has Ampersands in them I had to go this route over GET.
<?php
$name = http_build_query($_GET);
// which you would then may want to strip away the first 'name='
$name = substr($name, strlen('name='));
//change link to a nice URL
$url = rawurldecode($name);
?>
I have a simple script that grabs the URL, how could I process the URL to get the destination URL?
Hopefully that's not too confusing.
Cheers,
Robb
You should post some of your code next time. I assume you are using cURL to do this. It's fairly simple:
//sanitize
$ch = curl_init($_GET['link']);
//follow redirects
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_exec($ch);
$url = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
EDIT: per Dagon, you just want to "know the url but not go there." It is more efficient to use this setting if you only need to know the url but not get its contents:
curl_setopt($ch, CURLOPT_NOBODY, true);
Here is how I would do it (Read the comments):
<?php
// Connect to the page:
$ch = curl_init("http://tinyurl.com/2tx");
// Don't get the body (remove if you want the body):
curl_setopt($ch, CURLOPT_NOBODY, true);
// Follow the page redirects:
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
// Retun the data as a string (Remove to echo to the page):
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
// Execute:
curl_exec($ch);
// Get data:
print_r($data = curl_getinfo($ch));
// Get just the url:
echo $data["url"];
Make a HTTP HEAD Request to the URL you have. You will get back an HTTP 301 or 302 response with the destination URL.
Example: Put your URL here to see the response returned when making an HTTP Head Request.
This might be an encoding issue. The parameter in your URL is not encoded, so it's probably damaged when trying to get it using $_GET.
You want to use this URL:
www.mysite.com/redirect.php?link=http%3A%2F%2Ftinyurl.com%2F2tx
You can encode URL variables in PHP using the urlencode() function. The variable that (I think) you want can now be accessed like this:
echo $_GET['link']; // http://tinyurl.com/2tx

Header() substitute

Hi I am new to php and want to know some alternate function for the header('location:mysit.php');
I am in a scenario that I am sending the request like this:
header('Location: http://localhost/(some external site).php'&?var='test')
something like this but what I wanna do is that I want to send values of variables to the external site but I actually dont want that page to pop out.
I mean variables should be sent to some external site/page but on screen I want to be redirected to my login page. But seemingly I dont know any alternative please guide me. Thx.
You are searching for PHP cUrl:
$ch = curl_init();
// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, "http://www.example.com/");
curl_setopt($ch, CURLOPT_HEADER, 0);
// grab URL and pass it to the browser
curl_exec($ch);
// close cURL resource, and free up system resources
curl_close($ch);
Set the location header to the place you actually want to redirect the browser to and use something like cURL to make an HTTP request to the remote site.
The way you usually would do that is by sending those parameters by cURL, parse the return values and use them however you need.
By using cURL you can pass POST and GET variables to any URL.
Like so:
$ch = curl_init('http://example.org/?aVariable=theValue');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch);
curl_close($ch);
Now, in $result you have the response from the URL passed to curl_init().
If you need to post data, the code needs a little more:
$ch = curl_init('http://example.org/page_to_post_to.php');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, 'variable1=value1&variable2=value2');
$result = curl_exec($ch);
curl_close($ch);
Again, the result from your POST reqeust is saved to $result.
You could connect to another URL in the background in numerous ways. There's cURL ( http://php.net/curl - already mentioned here in previous comments ), there's fopen ( http://php.net/manual/en/function.fopen.php ), there's fsockopen ( http://php.net/manual/en/function.fsockopen.php - little more advanced )

setting cookie through curl

I am trying to set cookie through cURL in PHP but it fails. my php code looks like this
$ch=curl_init();
$url="http://localhost/javascript%20cookies/test_cookies.html";
curl_setopt($ch, CURLOPT_COOKIE, 'user=1');
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$contents = curl_exec($ch);
curl_close($ch);
echo $contents;
?>
the file test_cookies.html contains javascript that checks for a cookie and if it finds it dispalys the content with additional user content.
but when i use the above script it displays the contents of the page test_cookies.html but not with the additional user content which means that it is not setting the cookie.
i tried writing another script like this
<?php
header("Set-Cookie:user=1");
header("Location:test_cookies.html");
?>
this works and sets the cookie and shows the additional user content too.
I also tried using
curl_setopt($ch,CURLOPT_COOKIEFILE,"cookie.txt");
curl_setopt($ch,CURLOPT_COOKIEJAR,"cookie.txt");
this is writing the cookie information to the file but not reading it when fetching the page.
can somebody help?
Since javascript conducts the check in the browser, you should set the cookie before sending the output to the browser. So you need to combine both scripts:
$ch=curl_init();
$url="http://localhost/javascript%20cookies/test_cookies.html";
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$contents = curl_exec($ch);
curl_close($ch);
header("Set-Cookie:user=1");
echo $contents;
Explanation:
Please note that we are looking at two transfers here:
data is fetched by curl, and then
it is sent to the browser.
This is a special case where you are using curl to get the content from localhost, but in real-life uses you'd use curl to get content from a 3rd host.
If you receive different content based on whether a cookie is sent in the request or not, then you should set the cookie with curl. In most cases you can then send the content with no additional tweaking to the browser. But here, you make the decision with checking for a cookie in the browser, so you don't need the first cookie setting, and you do need the second one.
JavaScript will not work by getting page from curl.

PHP Proxy for getting other domain content

Can I write a PHP file (index.php) that when someone point it browser to
http://www.domain.org/some?params=a&b=1
it returns the content of
http://www.OTHERdomain.org/some?params=a&b=1
Should I use culr?
From http://www.php.net/manual/en/curl.examples-basic.php#88055, this is the code you need:
<?php
// create curl resource
$ch = curl_init();
// set url
curl_setopt($ch, CURLOPT_URL, "example.com");
//return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// $output contains the output string
$output = curl_exec($ch);
// close curl resource to free up system resources
curl_close($ch);
?>
You could create a page called proxy.php or something like that that takes a URL as the parameter. Then, you can replace domain.org with otherdomain.org in the URL. Then use CURL to get the contents and return it.

Categories