This is the whole crawler code that I am trying to build. This code is a single domain crawler. But it has a big problem, when I checked the database it was saving some of the links again and again, which creates an infinite loop. I want to solve this problem without using my database because checking each link for a presence in my database will make this crawler slow. How can I do that? + If you have any suggestions to make it faster?
<?php
include_once('ganon.php');
ini_set('display_errors', '1');
function gethost($link)
{
$link = trim($link, '/');
if (!preg_match('#^http(s)?://#', $link))
{
$link = 'http://' . $link;
}
$urlParts = parse_url($link);
$domain = preg_replace('/^www\./', '', $urlParts['host']);
return $domain;
}
function store($raw, $link)
{
$html = str_get_dom($raw);
$title = $html('title', 0)->getPlainText();
$con = #mysqli_connect('somehost', 'someuser', 'somepassword', 'somedatabase');
if (!$con)
{
echo "Error: " . mysqli_connect_error();
exit();
}
$query = "INSERT INTO `somedatabase`.`sometable` (`title`, `url`) VALUES ('$title', '$link');";
mysqli_query($con, $query);
mysqli_close($con);
echo $title."<br>";
}
function crawl_save_crawl($target)
{
$curl = curl_init($target);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
$result = curl_exec($curl);
if(curl_errno($curl))
{
echo 'Curl error: ' . curl_error($curl);
}
curl_close($curl);
$dom = str_get_dom($result);
foreach($dom('a') as $element)
{
$href = $element->href;
if (0 !== strpos($href, 'http'))
{
$path = '/' . ltrim($href, '/');
if (extension_loaded('http'))
{
$href = http_build_url("http://www.".gethost($target), array('path' => $path));
}
else
{
$parts = parse_url("http://www.".gethost($target));
$href = $parts['scheme'] . '://';
if (isset($parts['user']) && isset($parts['pass']))
{
$href .= $parts['user'] . ':' . $parts['pass'] . '#';
}
$href .= $parts['host'];
if (isset($parts['port']))
{
$href .= ':' . $parts['port'];
}
$href .= dirname($parts['path'], 1).$path;
}
}
if (gethost($target) == gethost($href))
{
crawl_save_crawl($href);
}
}
store($result, $target);
}
$url=$_GET['u'];
crawl_save_crawl($url);
?>
In your crawl_save_crawl() function, you could store the links you've already visited and therefore stop the code going back to them. Using a static variable isn't ideal, but in such a limited piece of code it serves the purpose it was intended for (to hold it's value across calls).
This doesn't stop it searching things other pages have searched, but stops it looping in itself.
function crawl_save_crawl($target)
{
static $alreadyDone = null;
if ( $alreadyDone == null ) {
$alreadyDone = [$target];
}
In the first loop, this will add the current reference in as this would previously have caused it to be missing.
Then before you call the same routing, you can test if it's already been checked...
$visit = trim(str_replace(["http://","https://"], "", $href), '/');
if (in_array($visit, $alreadyDone) === false &&
gethost($target) == gethost($href))
{
$alreadyDone[] = $visit;
crawl_save_crawl($href);
}
This may still look as though it's visiting the same page, but as your logic sometimes creates a href with 'www.' at the start, this may be different than without it. So when ccrawling stackoverflow, this means there is stackoverflow.com and www.stackoverflow.com.
Related
In the code below, the code would stop from executing further when Kiyoh would not be reachable. This is not good for production. So I was wondering what the best way would be to replace the die function, in a way that the content would execute further -even when Kiyoh would not be reachable-.
The code looks like:
<?php
// Get KiyOh rating
$readdir = $_SERVER['DOCUMENT_ROOT'] . '/kiyoh/';
$file = 'kiyohdata.dat';
// Open the file to get existing content
$kiyohdata = file_get_contents($readdir . $file);
if( $kiyohdata === false ) { // NOT CACHED
$xml = simplexml_load_file('https://www.kiyoh.nl/widgetfeed.php?company=YYYYY') or die("Error: Cannot create object");
$kiyohdata = explode(",", $xml->channel->description);
if(!empty($kiyohdata)) {
file_put_contents($readdir . $file, serialize($kiyohdata));
}
} else { // IS CACHED
$kiyohdata = unserialize($kiyohdata);
}
$cijfer = str_replace('Average score ', '', $kiyohdata[0]);
$cijfer = str_replace('.', ',', $cijfer);
$aantal = str_replace(' Total reviews ', '', $kiyohdata[2]);
?>
I tried with placing an exception: throw new Exception("Kiyoh is not available at the moment");, but the page crashes anyway (Magento 1.9.4.3 webshop).
this wil send you the error per email
<?php
$to_email_address = webmaster#example.com
// Get KiyOh rating
$readdir = $_SERVER['DOCUMENT_ROOT'] . '/kiyoh/';
$file = 'kiyohdata.dat';
// Open the file to get existing content
$kiyohdata = file_get_contents($readdir . $file);
if( $kiyohdata === false ) { // NOT CACHED
$xml = simplexml_load_file('https://www.kiyoh.nl/widgetfeed.php?company=YYYYY') or mail($to_email_address,"GOOGOO hapened !","Error: Cannot create object");
$kiyohdata = explode(",", $xml->channel->description);
if(!empty($kiyohdata)) {
file_put_contents($readdir . $file, serialize($kiyohdata));
}
} else { // IS CACHED
$kiyohdata = unserialize($kiyohdata);
}
$cijfer = str_replace('Average score ', '', $kiyohdata[0]);
$cijfer = str_replace('.', ',', $cijfer);
$aantal = str_replace(' Total reviews ', '', $kiyohdata[2]);
?>
I want to check if a quake3 game server is online or offline. If offline then echo 'Server is offline' if online then echo 'Server is online'.
I'm using this library:
As you see in the library there's already an isOnline function I think that's for server is online or no?! but I don't know how to output that.
Calling the game server data's:
<?php
include 'test/GameServerQuery.php';
$data = GameServerQuery::queryQuake3('1.1.1.1', 28960);
echo 'Hostname: ' . $data['sv_hostname'] . '<br />';
echo 'Players online: ' . $data['sv_maxclients'] . '<br />'; /// How can I count online players / maxclients? ex.: 0/20
echo 'Punkbuster: ' . $data['sv_punkbuster'] . '<br />';
?>
Here is relevant code from the library (in case the link should die or change):
public static function isOnline ($host, $port, $type)
{
if ($type == 'minecraft') { // No need for the full ping
return #fclose (#fsockopen ( $host , $port , $err , $errstr , 2 ));
}
if (method_exists('GameServerQuery', 'query'.$type)) {
return self::{'query'.$type}($host , $port);
}
return #fclose (#fsockopen ( $host , $port , $err , $errstr , 2 ));
}
public static function queryQuake3($host, $port)
{
$reponse = self::ping($host, $port, "\xFF\xFF\xFF\xFFgetstatus\x00");
if ($reponse === false || substr($reponse, 0, 5) !== "\xFF\xFF\xFF\xFFs") {
return false;
}
$reponse = substr($reponse, strpos($reponse, chr(10))+2);
$info = array();
$joueurs = substr($reponse, strpos($reponse,chr(10))+2);
$reponse = substr($reponse, 0, strpos($reponse, chr(10)));
while($reponse != ''){
$info[self::getString($reponse, '\\')] = self::getString($reponse, '\\');
}
if (!empty($joueurs)) {
$info['players'] = array();
while ($joueurs != ''){
$details = self::getString($joueurs, chr(10));
$info['players'][] = array('frag' => self::getString($details, ' '),
'ping' => self::getString($details, ' '),
'name' => $details);
}
}
return $info;
}
private static function ping($host, $port, $command)
{
$socket = #stream_socket_client('udp://'.$host.':'.$port, $errno, $errstr, 2);
if (!$errno && $socket) {
stream_set_timeout($socket, 2);
fwrite($socket, $command);
$buffer = #fread($socket, 1500);
fclose($socket);
return $buffer;
}
return false;
}
private static function getString(&$chaine, $chr = "\x00")
{
$data = strstr($chaine, $chr, true);
$chaine = substr($chaine, strlen($data) + 1);
return $data;
}
It's a static function, just like the one you're already calling. Something like this would do the job, I think:
$result = GameServerQuery::isOnline('1.1.1.1', 28960, "Quake3");
print_r($result);
That will show you what result you get back. I suspect it will be the same as the queryQuake3 function actually, because if you specify "Quake3" as the last parameter, the isOnline function will simply call the "queryQuake3" function and pass the result back directly.
So, the function should return either false if the server is offline or otherwise unresponsive, and either true, or a more complex dataset if it's online.
So in fact I think you could write:
$result = GameServerQuery::isOnline('1.1.1.1', 28960, "Quake3");
if ($result === false) {
echo "Server is offline";
}
else {
echo "Server is online";
}
Right now there is a page redirection using
header("Location: {$_SERVER['HTTP_REFERER']}");
but the URL that the page is redirecting to is something like:
http://localhost:5110/page.php?1st=2&2nd=140413&3rd=547859
how can I remove a part of the URL of the redirection?
the URL should be like :
http://localhost:5110/page.php?1st=2&3rd=547859
If you have another suggestions for this let me know...
thanks.
$referer = parse_url($_SERVER['HTTP_REFERER']);
parse_str($referer['query'], $query);
unset($query['2nd']); // unset the desired element
$referer['query'] = http_build_query($query);
$url = '';
if (array_key_exists('scheme', $referer)) { $url .= "{$referer['scheme']}://"; }
if (array_key_exists('host', $referer)) { $url .= $referer['host']; }
if (array_key_exists('port', $referer)) { $url .= ":{$referer['port']}"; }
if (array_key_exists('path', $referer)) { $url .= $referer['path']; }
if (array_key_exists('query', $referer)) { $url .= "?{$referer['query']}"; }
if (array_key_exists('fragment', $referer)) { $url .= "#{$referer['fragment']}"; }
header("Location: $url");
Try this:
$str = 'http://localhost:5110/page.php?1st=2&2nd=140413&3rd=547859';
echo remove_qs_key($str,"2nd");
function remove_qs_key($url, $key) {
$url = preg_replace('/(?:&|(\?))' . $key . '=[^&]*(?(1)&|)?/i', "$1", $url);
return $url;
}
Result: http://localhost:5110/page.php?1st=2&3rd=547859
$ref = explode("?",$_SERVER['HTTP_REFERER']);
parse_str($ref[1], $qs);
unset($qs['query param to remove']);
$qs = http_build_query($qs);
$ref = $ref[0].'?'.$qs;
$server = $_SERVER['SERVER_NAME']; //Returns the server name(localhost:5110)
$file = $_SERVER['REQUEST_URI']; //Returns the script name and path(/page.php)
echo $server.$file //Returns localhost:5110/page.php
I'm using a PHP script (using cURL) to check whether:
The links in my database are correct (ie return HTTP status 200)
The links are in fact redirected and redirect to an appropriate/similar page (using the contents of the page )
The results of this are saved to a log file and emailed to me as an attachment.
This is all fine and working, however it is slow as all hell and half the time it times out and aborts itself early. Of note, I have about 16,000 links to check.
Was wondering how best to make this run quicker, and what I'm doing wrong?
Code below:
function echoappend ($file,$tobewritten) {
fwrite($file,$tobewritten);
echo $tobewritten;
}
error_reporting(E_ALL);
ini_set('display_errors', '1');
$filename=date('YmdHis') . "linkcheck.htm";
echo $filename;
$file = fopen($filename,"w+");
try {
$conn = new PDO('mysql:host=localhost;dbname=databasename',$un,$pw);
$conn->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
echo '<b>connected to db</b><br /><br />';
$sitearray = array("medical.posterous","ebm.posterous","behavenet","guidance.nice","www.rch","emedicine","www.chw","www.rxlist","www.cks.nhs.uk");
foreach ($sitearray as $key => $value) {
$site=$value;
echoappend ($file, "<h1>" . $site . "</h1>");
$q="SELECT * FROM link WHERE url LIKE :site";
$stmt = $conn->prepare($q);
$stmt->execute(array(':site' => 'http://' . $site . '%'));
$result = $stmt->fetchAll();
$totallinks = 0;
$workinglinks = 0;
foreach($result as $row)
{
$ch = curl_init();
$originalurl = $row['url'];
curl_setopt($ch, CURLOPT_URL, $originalurl);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
$output = curl_exec($ch);
if ($output === FALSE) {
echo "cURL Error: " . curl_error($ch);
}
$urlinfo = curl_getinfo($ch);
if ($urlinfo['http_code'] == 200)
{
echoappend($file, $row['name'] . ": <b>working!</b><br />");
$workinglinks++;
}
else if ($urlinfo['http_code'] == 301 || 302)
{
$redirectch = curl_init();
curl_setopt($redirectch, CURLOPT_URL, $originalurl);
curl_setopt($redirectch, CURLOPT_HEADER, 1);
curl_setopt($redirectch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($redirectch, CURLOPT_NOBODY, false);
curl_setopt($redirectch, CURLOPT_FOLLOWLOCATION, true);
$redirectoutput = curl_exec($redirectch);
$doc = new DOMDocument();
#$doc->loadHTML($redirectoutput);
$nodes = $doc->getElementsByTagName('title');
$title = $nodes->item(0)->nodeValue;
echoappend ($file, $row['name'] . ": <b>redirect ... </b>" . $title . " ... ");
if (strpos(strtolower($title),strtolower($row['name']))===false) {
echoappend ($file, "FAIL<br />");
}
else {
$header = curl_getinfo($redirectch);
echoappend ($file, $header['url']);
echoappend ($file, "SUCCESS<br />");
}
curl_close($redirectch);
}
else
{
echoappend ($file, $row['name'] . ": <b>FAIL code</b>" . $urlinfo['http_code'] . "<br />");
}
curl_close($ch);
$totallinks++;
}
echoappend ($file, '<br />');
echoappend ($file, $site . ": " . $workinglinks . "/" . $totallinks . " links working. <br /><br />");
}
$conn = null;
echo '<br /><b>connection closed</b><br /><br />';
} catch(PDOException $e) {
echo 'ERROR: ' . $e->getMessage();
}
Short answer is use the curl_multi_* methods to parallelize your requests.
The reason for the slowness is that web requests are comparatively slow. Sometimes VERY slow. Using the curl_multi_* functions lets you run multiple requests simultaneously.
One thing to be careful about is to limit the number of requests you run at once. In other words, don't run 16,000 requests at once. Maybe start at 16 and see how that goes.
The following example should help you get started:
<?php
//
// Fetch a bunch of URLs in parallel. Returns an array of results indexed
// by URL.
//
function fetch_urls($urls, $curl_options = array()) {
$curl_multi = curl_multi_init();
$handles = array();
$options = $curl_options + array(
CURLOPT_HEADER => true,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_NOBODY => true,
CURLOPT_FOLLOWLOCATION => true);
foreach($urls as $url) {
$handles[$url] = curl_init($url);
curl_setopt_array($handles[$url], $options);
curl_multi_add_handle($curl_multi, $handles[$url]);
}
$active = null;
do {
$status = curl_multi_exec($curl_multi, $active);
} while ($status == CURLM_CALL_MULTI_PERFORM);
while ($active && ($status == CURLM_OK)) {
if (curl_multi_select($curl_multi) != -1) {
do {
$status = curl_multi_exec($curl_multi, $active);
} while ($status == CURLM_CALL_MULTI_PERFORM);
}
}
if ($status != CURLM_OK) {
trigger_error("Curl multi read error $status\n", E_USER_WARNING);
}
$results = array();
foreach($handles as $url => $handle) {
$results[$url] = curl_getinfo($handle);
curl_multi_remove_handle($curl_multi, $handle);
curl_close($handle);
}
curl_multi_close($curl_multi);
return $results;
}
//
// The urls to test
//
$urls = array("http://google.com", "http://yahoo.com", "http://google.com/probably-bogus", "http://www.google.com.au");
//
// The number of URLs to test simultaneously
//
$request_limit = 2;
//
// Test URLs in batches
//
$redirected_urls = array();
for ($i = 0 ; $i < count($urls) ; $i += $request_limit) {
$results = fetch_urls(array_slice($urls, $i, $request_limit));
foreach($results as $url => $result) {
if ($result['http_code'] == 200) {
$status = "Worked!";
} else {
$status = "FAILED with {$result['http_code']}";
}
if ($result["redirect_count"] > 0) {
array_push($redirected_urls, $url);
echo "{$url}: ${status}\n";
} else {
echo "{$url}: redirected to {$result['url']} and {$status}\n";
}
}
}
//
// Handle redirected URLs
//
echo "Processing redirected URLs...\n";
for ($i = 0 ; $i < count($redirected_urls) ; $i += $request_limit) {
$results = fetch_urls(array_slice($redirected_urls, $i, $request_limit), array(CURLOPT_FOLLOWLOCATION => false));
foreach($results as $url => $result) {
if ($result['http_code'] == 301) {
echo "{$url} permanently redirected to {$result['url']}\n";
} else if ($result['http_code'] == 302) {
echo "{$url} termporarily redirected to {$result['url']}\n";
} else {
echo "{$url}: FAILED with {$result['http_code']}\n";
}
}
}
The above code processes a list of URLs in batches. It works in two passes. In the first pass, each request is configured to follow redirects and simply reports whether each URL ultimately lead to a successful request, or a failure.
The second pass processes any redirected URLs detected in the first pass and reports whether the redirect was a permanent redirection (meaning you can update your database with the new URL), or temporary (meaning you should NOT update your database).
NOTE:
In your original code, you have the following line, which will not work the way you expect it to:
else if ($urlinfo['http_code'] == 301 || 302)
The expression will ALWAYS return TRUE. The correct expression is:
else if ($urlinfo['http_code'] == 301 || $urlinfo['http_code'] == 302)
Also, put
set_time_limit(0);
at the top of your script to stop it aborting when it hits 30 seconds.
I am trying to study class.openid.php because it is simpler and smaller than
lightopenid. for my purposes 200 lines do matter. But class.openid.php does not work with google openID https://www.google.com/accounts/o8/id, prints to me such error:
ERROR CODE: OPENID_NOSERVERSFOUND
ERROR DESCRIPTION: Cannot find OpenID Server TAG on Identity page.
is it possible to make class.openid.php (any version) work with google openID and how to do such thing?
class.openid.php can be taken here but it did not worked for me out of the box so I had to find all <? and replace tham with <?php in case someone would like to see code I've got:
html interface page:
<?php
require('class.openid.v3.php');
if ($_POST['openid_action'] == "login"){ // Get identity from user and redirect browser to OpenID Server
$openid = new SimpleOpenID;
$openid->SetIdentity($_POST['openid_url']);
$openid->SetTrustRoot('http://' . $_SERVER["HTTP_HOST"]);
$openid->SetRequiredFields(array('email','fullname'));
$openid->SetOptionalFields(array('dob','gender','postcode','country','language','timezone'));
if ($openid->GetOpenIDServer()){
$openid->SetApprovedURL('http://' . $_SERVER["HTTP_HOST"] . $_SERVER["PATH_INFO"]); // Send Response from OpenID server to this script
$openid->Redirect(); // This will redirect user to OpenID Server
}else{
$error = $openid->GetError();
echo "ERROR CODE: " . $error['code'] . "<br>";
echo "ERROR DESCRIPTION: " . $error['description'] . "<br>";
}
exit;
}
else if($_GET['openid_mode'] == 'id_res'){ // Perform HTTP Request to OpenID server to validate key
$openid = new SimpleOpenID;
$openid->SetIdentity($_GET['openid_identity']);
$openid_validation_result = $openid->ValidateWithServer();
if ($openid_validation_result == true){ // OK HERE KEY IS VALID
echo "VALID";
}else if($openid->IsError() == true){ // ON THE WAY, WE GOT SOME ERROR
$error = $openid->GetError();
echo "ERROR CODE: " . $error['code'] . "<br>";
echo "ERROR DESCRIPTION: " . $error['description'] . "<br>";
}else{ // Signature Verification Failed
echo "INVALID AUTHORIZATION";
}
}else if ($_GET['openid_mode'] == 'cancel'){ // User Canceled your Request
echo "USER CANCELED REQUEST";
}
?>
<html>
<head>
<title>OpenID Example</title>
</head>
<body>
<div>
<fieldset id="openid">
<legend>OpenID Login</legend>
<form action="<?php echo 'http://' . $_SERVER["HTTP_HOST"] . $_SERVER["PATH_INFO"]; ?>" method="post" onsubmit="this.login.disabled=true;">
<input type="hidden" name="openid_action" value="login">
<div><input type="text" name="openid_url" class="openid_login"><input type="submit" name="login" value="login >>"></div>
<div><a href="http://www.myopenid.com/" class="link" >Get an OpenID</a></div>
</form>
</fieldset>
</div>
<div style="margin-top: 2em; font-family: arial; font-size: 0.8em; border-top:1px solid gray; padding: 4px;">Sponsored by: FiveStores - get your free online store; includes extensive API for developers; <i style="color: gray;">integrated with OpenID</i></div>
</body>
</html>
and php class
<?php
/*
FREE TO USE Under License: GPLv3
Simple OpenID PHP Class
Some modifications by Eddie Roosenmaallen, eddie#roosenmaallen.com
*/
class SimpleOpenID{
var $openid_url_identity;
var $URLs = array();
var $error = array();
var $fields = array(
'required' => array(),
'optional' => array(),
);
function SimpleOpenID(){
if (!function_exists('curl_exec')) {
die('Error: Class SimpleOpenID requires curl extension to work');
}
}
function SetOpenIDServer($a){
$this->URLs['openid_server'] = $a;
}
function SetTrustRoot($a){
$this->URLs['trust_root'] = $a;
}
function SetCancelURL($a){
$this->URLs['cancel'] = $a;
}
function SetApprovedURL($a){
$this->URLs['approved'] = $a;
}
function SetRequiredFields($a){
if (is_array($a)){
$this->fields['required'] = $a;
}else{
$this->fields['required'][] = $a;
}
}
function SetOptionalFields($a){
if (is_array($a)){
$this->fields['optional'] = $a;
}else{
$this->fields['optional'][] = $a;
}
}
function SetIdentity($a){ // Set Identity URL
if ((stripos($a, 'http://') === false)
&& (stripos($a, 'https://') === false)){
$a = 'http://'.$a;
}
$this->openid_url_identity = $a;
}
function GetIdentity(){ // Get Identity
return $this->openid_url_identity;
}
function GetError(){
$e = $this->error;
return array('code'=>$e[0],'description'=>$e[1]);
}
function ErrorStore($code, $desc = null){
$errs['OPENID_NOSERVERSFOUND'] = 'Cannot find OpenID Server TAG on Identity page.';
if ($desc == null){
$desc = $errs[$code];
}
$this->error = array($code,$desc);
}
function IsError(){
if (count($this->error) > 0){
return true;
}else{
return false;
}
}
function splitResponse($response) {
$r = array();
$response = explode("\n", $response);
foreach($response as $line) {
$line = trim($line);
if ($line != "") {
list($key, $value) = explode(":", $line, 2);
$r[trim($key)] = trim($value);
}
}
return $r;
}
function OpenID_Standarize($openid_identity = null){
if ($openid_identity === null)
$openid_identity = $this->openid_url_identity;
$u = parse_url(strtolower(trim($openid_identity)));
if (!isset($u['path']) || ($u['path'] == '/')) {
$u['path'] = '';
}
if(substr($u['path'],-1,1) == '/'){
$u['path'] = substr($u['path'], 0, strlen($u['path'])-1);
}
if (isset($u['query'])){ // If there is a query string, then use identity as is
return $u['host'] . $u['path'] . '?' . $u['query'];
}else{
return $u['host'] . $u['path'];
}
}
function array2url($arr){ // converts associated array to URL Query String
if (!is_array($arr)){
return false;
}
$query = '';
foreach($arr as $key => $value){
$query .= $key . "=" . $value . "&";
}
return $query;
}
function CURL_Request($url, $method="GET", $params = "") { // Remember, SSL MUST BE SUPPORTED
if (is_array($params)) $params = $this->array2url($params);
$curl = curl_init($url . ($method == "GET" && $params != "" ? "?" . $params : ""));
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_HEADER, false);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_HTTPGET, ($method == "GET"));
curl_setopt($curl, CURLOPT_POST, ($method == "POST"));
if ($method == "POST") curl_setopt($curl, CURLOPT_POSTFIELDS, $params);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
$response = curl_exec($curl);
if (curl_errno($curl) == 0){
$response;
}else{
$this->ErrorStore('OPENID_CURL', curl_error($curl));
}
return $response;
}
function HTML2OpenIDServer($content) {
$get = array();
// Get details of their OpenID server and (optional) delegate
preg_match_all('/<link[^>]*rel=[\'"]openid.server[\'"][^>]*href=[\'"]([^\'"]+)[\'"][^>]*\/?>/i', $content, $matches1);
preg_match_all('/<link[^>]*href=\'"([^\'"]+)[\'"][^>]*rel=[\'"]openid.server[\'"][^>]*\/?>/i', $content, $matches2);
$servers = array_merge($matches1[1], $matches2[1]);
preg_match_all('/<link[^>]*rel=[\'"]openid.delegate[\'"][^>]*href=[\'"]([^\'"]+)[\'"][^>]*\/?>/i', $content, $matches1);
preg_match_all('/<link[^>]*href=[\'"]([^\'"]+)[\'"][^>]*rel=[\'"]openid.delegate[\'"][^>]*\/?>/i', $content, $matches2);
$delegates = array_merge($matches1[1], $matches2[1]);
$ret = array($servers, $delegates);
return $ret;
}
function GetOpenIDServer(){
$response = $this->CURL_Request($this->openid_url_identity);
list($servers, $delegates) = $this->HTML2OpenIDServer($response);
if (count($servers) == 0){
$this->ErrorStore('OPENID_NOSERVERSFOUND');
return false;
}
if (isset($delegates[0])
&& ($delegates[0] != "")){
$this->SetIdentity($delegates[0]);
}
$this->SetOpenIDServer($servers[0]);
return $servers[0];
}
function GetRedirectURL(){
$params = array();
$params['openid.return_to'] = urlencode($this->URLs['approved']);
$params['openid.mode'] = 'checkid_setup';
$params['openid.identity'] = urlencode($this->openid_url_identity);
$params['openid.trust_root'] = urlencode($this->URLs['trust_root']);
if (isset($this->fields['required'])
&& (count($this->fields['required']) > 0)) {
$params['openid.sreg.required'] = implode(',',$this->fields['required']);
}
if (isset($this->fields['optional'])
&& (count($this->fields['optional']) > 0)) {
$params['openid.sreg.optional'] = implode(',',$this->fields['optional']);
}
return $this->URLs['openid_server'] . "?". $this->array2url($params);
}
function Redirect(){
$redirect_to = $this->GetRedirectURL();
if (headers_sent()){ // Use JavaScript to redirect if content has been previously sent (not recommended, but safe)
echo '<script language="JavaScript" type="text/javascript">window.location=\'';
echo $redirect_to;
echo '\';</script>';
}else{ // Default Header Redirect
header('Location: ' . $redirect_to);
}
}
function ValidateWithServer(){
$params = array(
'openid.assoc_handle' => urlencode($_GET['openid_assoc_handle']),
'openid.signed' => urlencode($_GET['openid_signed']),
'openid.sig' => urlencode($_GET['openid_sig'])
);
// Send only required parameters to confirm validity
$arr_signed = explode(",",str_replace('sreg.','sreg_',$_GET['openid_signed']));
for ($i=0; $i<count($arr_signed); $i++){
$s = str_replace('sreg_','sreg.', $arr_signed[$i]);
$c = $_GET['openid_' . $arr_signed[$i]];
// if ($c != ""){
$params['openid.' . $s] = urlencode($c);
// }
}
$params['openid.mode'] = "check_authentication";
$openid_server = $this->GetOpenIDServer();
if ($openid_server == false){
return false;
}
$response = $this->CURL_Request($openid_server,'POST',$params);
$data = $this->splitResponse($response);
if ($data['is_valid'] == "true") {
return true;
}else{
return false;
}
}
}
?>
The problem is that Google doesn't just supply an OpenID endpoint.
OpenId endpoints include an identifier for the user.
What we are having here is called a Discovery Url.
This is a static url that you can direct any user to, and the service itself will recognise the user and return a per-user unique identifying url.
This however is NOT implemented correctly by most openid client libraries, including the majority linked on the official openid website.
Even the Zend Framework libraries are incapable of handling that.
However I found a class that I analysed from various perspectives and that I am very satisfied with. At the company I work at we already integrated it successfully in several production environments and have not experienced any problems.
You may also be interested in another post of mine dealing with the issue of making Facebook an openid Provider. The class I am using, that also supports Google, can also be found there:
Best way to implement Single-Sign-On with all major providers?
The class in your question does not support OpenID 2.0 at all. Therefore, it will not work with Google without adding a lot of code.
Are you searching something like :
http://wiki.openid.net/w/page/12995176/Libraries
?
There is a PHP section in that.