Checking for site reciprocal link

Checking for site reciprocal link - php

I'm trying to make a script that will load a desired URL (as entered by user) and check if that page links back to my domain before their domain is published on my site. I'm not very experienced with regular expressions and this is what I have so far:
$loaded = file_get_contents('http://localhost/small_script/page.php');
// $loaded will be equal to the users site they have submitted
$current_site = 'site2.com';
// $current_site is the domain of my site, this the the URL that must be found in target site
$matches = Array();
$find = preg_match_all('/<a(.*?)href=[\'"](.*?)[\'"](.*?)\b[^>]*>(.*?)<\/a>/i', $loaded, $matches);
$c = count($matches[0]);
$z = 0;
while($z<$c){
$full_link = $matches[0][$z];
$href = $matches[2][$z];
$z++;
$check = strpos($href,$current_site);
if($check === false) {
}else{
// The link cannot have the "no follow" tag, this is to check if it does and if so, return a specific error
$pos = strpos($full_link,'no follow');
if($pos === false) {
echo $href;
}
else {
//echo "rel=no follow FOUND";
}
}
}
As you can see, it's pretty messy and I'm entirely sure where it's headed. I was hoping someone could give me a small, fast and concise script that would do exactly what I've attempted.
Load specified URL as entered by user
Check if specified URL links back to my site (if not, return error code #1)
If link is there, check for 'no follow', if found return error code #2
If everything is OK, set a variable to true, so I can continue with other functions (like displaying their link on my page)

this is the code :)
helped by http://www.merchantos.com/makebeta/php/scraping-links-with-php/
<?php
$my_url = 'http://online.bulsam.net';
$target_url = 'http://www.bulsam.net';
$userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)';
// make the cURL request to $target_url
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_URL,$target_url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$html= curl_exec($ch);
if (!$html) {
echo "<br />cURL error number:" .curl_errno($ch);
echo "<br />cURL error:" . curl_error($ch);
exit;
}
// parse the html into a DOMDocument
$dom = new DOMDocument();
#$dom->loadHTML($html);
// grab all the on the page
$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//a");
// find result
$result = is_my_link_there($hrefs, $my_url);
if ($result == 1) {
echo 'There is no link!!!';
} elseif ($result == 2) {
echo 'There is, but it is NO FOLLOW !!!';
} else {
// blah blah blah
}
// used functions
function is_my_link_there($hrefs, $my_url) {
for ($i = 0; $i < $hrefs->length; $i++) {
$href = $hrefs->item($i);
$url = $href->getAttribute('href');
if ($my_url == $url) {
$rel = $href->getAttribute('rel');
if ($rel == 'nofollow') {
return 2;
}
return 3;
}
}
return 1;
}

Related

PHP Curl result giving trouble?

I am working with NCBI Blast api. I created logic to get result in xml format but result is comming in html format and this result again going back to redirect to display xml format. when i check original url in browser it shows direct xml data. Can any one give me solutions.
Visit this url you can find my problem peerscientist.net. This is the code.
class BlastAPI{
public function blastdata(){
$url = "https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Put&QUERY=SSWWAHVEMGPPDPILGVTEAYKRDTNSKK&PROGRAM=blastp&FILTER=L&DATABASE=nr&FORMAT_TYPE=XML";
$ncontent = $this->rid($url);
echo '<br/>RID :'.$ncontent.'<br/>';
$geturl = $this->geturl($ncontent);
echo $geturl;
$fulldata = $this->getfullData($geturl);
if($fulldata != ''){
echo '<pre>';
var_dump($fulldata);
echo '</pre>';
}else{
echo "Code Once";
}
}
public function rid($url){
$surl = $url;
echo $surl;
//header('Content-type: text/html');
$content = file_get_contents($surl);
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHtml($content);
libxml_use_internal_errors(false);
$data = $doc->getElementById("rid");
$rid = $data->getAttribute('value');
return $rid;
}
public function geturl($rid){
$srid = $rid;
$mainurl = "https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Get&RID=$srid&FORMAT_TYPE=XML&DESCRIPTIONS=200&ALIGNMENTS=200&NOHEADER=true";
return $mainurl;
}
public function getfullData($url){
$murl = $url;
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $murl);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
$str = curl_exec($curl);
curl_close($curl);
if ($str === FALSE) {
echo "cURL Error: " . curl_error($curl);
}
return $str;
}
}
$nblast = new BlastAPI();
$nblast->blastdata();

I am trying to extract info from a remote page and parse it to html. What I have is not working

I want to get all the information (date, location, price, etc.) from a remote page and parse it to HTML. I tried this script below but nothing appears to be happening:
<?php
$path = 'https://www.airbnb.com/s/Fukuoka-Prefecture--Japan?checkin=10%2F26%2F2015&checkout=11%2F03%2F2015&guests=&ss_id=xyn63dgs&page=1';
$html = file_get_contents($path);
$dom = new DOMDocument;
$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('div') as $tag) {
if ($tag->getAttribute('class') === 'col-sm-12 row-space-2 col-md-6') {
echo $tag->nodeValue;
}
}

If you look at the error returned from the file_get_contents() it's giving you an error:
HTTP request failed! HTTP/1.0 403 Forbidden in yada/yada/yada.php
Try using a cURL library (or script) that will emulate a browser, similar to below (this is what I use when file_get_contents() fails. If it fails, remember that cURL is basically stateless, so if the destination site uses something based on sessions or cookies, you might be S.O.L.):
<?php
class cURL
{
public $response;
protected $sendHeader;
protected $PostFields;
private $query;
public function __construct($query = '')
{
$this->sendHeader = false;
$this->query = $query;
if(!empty($this->query)) {
if(!is_array($this->query))
$this->response = $this->Connect($this->query);
else
$this->encode();
}
}
public function SendPost($array = array())
{
$this->PostFields['payload'] = $array;
$this->PostFields['query'] = http_build_query($array);
return $this;
}
public function Connect($_url,$deJSON = true)
{
// Remote Connect
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $_url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
if(strpos($_url,"https://") !== false) {
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER,2);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST,2);
}
if(!empty($this->PostFields['payload'])) {
curl_setopt($ch, CURLOPT_POST, count($this->PostFields['payload']));
curl_setopt($ch, CURLOPT_POSTFIELDS, $this->PostFields['query']);
}
if(!empty($this->sendHeader))
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11) AppleWebKit/601.1.56 (KHTML, like Gecko) Version/9.0 Safari/601.1.56');
$decode = curl_exec($ch);
$_response = ($deJSON)? json_decode($decode, true) : $decode;
$error = curl_error($ch);
curl_close($ch);
return (empty($error))? $_response: $error;
}
public function emulateBrowser()
{
$this->sendHeader = true;
return $this;
}
public function encode($_filter = 0)
{
foreach($this->query as $key => $value) {
$string[] = urlencode($key).'='.urlencode($value);
}
if($_filter == true)
$string = array_filter($string);
return implode("&",$string);
}
}
To use:
$path = 'https://www.airbnb.com/s/Fukuoka-Prefecture--Japan?checkin=10%2F26%2F2015&checkout=11%2F03%2F2015&guests=&ss_id=xyn63dgs&page=1';
$cURL = new cURL();
$html = $cURL->emulateBrowser()->connect($path,false);
$dom = new DOMDocument;
$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('div') as $tag) {
if ($tag->getAttribute('class') === 'col-sm-12 row-space-2 col-md-6') {
echo $tag->nodeValue;
}
}

PHP Scrape - Create multidimensional arrays from results - current code only returning one result

I'm quite new to PHP and am creating a web scraper for a project. From this website, https://www.bloglovin.com/en/blogs/1/2/all, I am scraping the blog title, blog url, image url and concatenating a follow through link for later use. As you can see on the page, there are several fields with information for each blogger.
Here is my PHP code so far;
<?php
// Function to make GET request using cURL
function curlGet($url) {
$ch = curl_init(); // Initialising cURL session
// Setting cURL options
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_URL, $url);
$results = curl_exec($ch); // Executing cURL session
curl_close($ch); // Closing cURL session
return $results; // Return the results
}
$blogStats = array();
function returnXPathObject($item) {
$xmlPageDom = new DomDocument();
#$xmlPageDom->loadHTML($item);
$xmlPageXPath = new DOMXPath($xmlPageDom);
return $xmlPageXPath;
}
$blPage = curlGet('https://www.bloglovin.com/en/blogs/1/2/all');
$blPageXpath = returnXPathObject($blPage);
$title = $blPageXpath->query('//*[#id="content"]//div/a/h2/span[1]');
if ($title->length > 0) {
$blogStats['title'] = $title->item(0)->nodeValue;
}
$url = $blPageXpath->query('//*[#id="content"]//div/a/h2/span[2]');
if ($url->length > 0) {
$blogStats['url'] = $url->item(0)->nodeValue;
}
$img = $blPageXpath->query('//*[#id="content"]//div/a/div/#href');
if ($img->length > 0) {
$blogStats['img'] = $img->item(0)->nodeValue;
}
$followLink = $blPageXpath->query('//*[#id="content"]/div[1]/div/a/#href');
if ($followLink->length > 0) {
$blogStats['followLink'] = 'http://www.bloglovin.com' . $followLink->item($i)->nodeValue;
}
print_r($blogStats);
/*$data = $blogStats;
header('Content-Type: application/json');
echo json_encode($data);*/
?>
Currently, this only returns:
Array ( [title] => Fashion Toast [url] => fashiontoast.com [followLink] => http://www.bloglovin.com/blog/4735/fashion-toast )
My question is, what is the best way to loop through each of the results? I've been looking through Stack Overflow and am struggling to find an answer to my question, and my heads going a bit loopy! If anyone could advise me or put me in the right direction, that would be fantastic.
Thank you.
Update:
I'm very sure this is wrong, i'm receiving errors!
<?php
// Function to make GET request using cURL
function curlGet($url) {
$ch = curl_init(); // Initialising cURL session
// Setting cURL options
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_URL, $url);
$results = curl_exec($ch); // Executing cURL session
curl_close($ch); // Closing cURL session
return $results; // Return the results
}
$blogStats = array();
function returnXPathObject($item) {
$xmlPageDom = new DomDocument();
#$xmlPageDom->loadHTML($item);
$xmlPageXPath = new DOMXPath($xmlPageDom);
return $xmlPageXPath;
}
$blogPage = curlGet('https://www.bloglovin.com/en/blogs/1/2/all');
$blogPageXpath = returnXPathObject($blogPage);
$blogger = $blogPageXpath->query('//*[#id="content"]/div/#data-blog-id');
if ($blogger->length > 0) {
$blogStats[] = $blogger->item(0)->nodeValue;
}
foreach($blogger as $id) {
$blPage = curlGet('https://www.bloglovin.com/en/blogs/1/2/all');
$blPageXpath = returnXPathObject($blPage);
$title = $blPageXpath->query('//*[#id="content"]//div/a/h2/span[1]');
if ($title->length > 0) {
$blogStats[$id]['title'] = $title->item(0)->nodeValue;
}
$url = $blPageXpath->query('//*[#id="content"]//div/a/h2/span[2]');
if ($url->length > 0) {
$blogStats[$id]['url'] = $url->item(0)->nodeValue;
}
$img = $blPageXpath->query('//*[#id="content"]//div/a/div/#href');
if ($img->length > 0) {
$blogStats[$id]['img'] = $img->item(0)->nodeValue;
}
$followLink = $blPageXpath->query('//*[#id="content"]/div[1]/div/a/#href');
if ($followLink->length > 0) {
$blogStats[$id]['followLink'] = 'http://www.bloglovin.com' . $followLink->item($i)->nodeValue;
}
}
print_r($blogStats);
/*$data = $blogStats;
header('Content-Type: application/json');
echo json_encode($data);*/ ?>

maybe you want to actually add a dimension to your array. I guess bloggers have a unique id, or somesuch identifier.
moreover, your code seems to execute only once? it might need to be in something like a foreach
I can't do that part for you, but you need an array containing each blogger, or a way to do a while, or for! you have to understand how to iterate over your different bloggers by yourself :)
here an exemple of array of bloggers
[14]['bloggerOne']
[15]['bloggerTwo']
[16]['bloggerThree']
foreach ($blogger as $id => $name)
{
$blPage = curlGet('https://www.bloglovin.com/en/blogs/1/2/' . $name);
// here you have something to do so that $blPage is actually different with each iteration, like changing the url
$blPageXpath = returnXPathObject($blPage);
$title = $blPageXpath->query('//*[#id="content"]//div/a/h2/span[1]');
if ($title->length > 0) {
$blogStats[$id]['title'] = $title->item(0)->nodeValue;
}
$url = $blPageXpath->query('//*[#id="content"]//div/a/h2/span[2]');
if ($url->length > 0) {
$blogStats[$id]['url'] = $url->item(0)->nodeValue;
}
$img = $blPageXpath->query('//*[#id="content"]//div/a/div/#href');
if ($img->length > 0) {
$blogStats[$id]['img'] = $img->item(0)->nodeValue;
}
$followLink = $blPageXpath->query('//*[#id="content"]/div[1]/div/a/#href');
if ($followLink->length > 0) {
$blogStats[$id]['followLink'] = 'http://www.bloglovin.com' . $followLink->item($i)->nodeValue;
}
}
so after the foreach, you array could look like:
['12345']['title'] = whatever
['url'] = url
['img'] = foo
['followLink'] = bar
['4141']['title'] = other
['url'] = urlss
['img'] = foo
['followLink'] = bar
['7415']['title'] = still
['url'] = url4
['img'] = foo
['followLink'] = bar

Optimising PHP cURL based link checker script - currently very slow

I'm using a PHP script (using cURL) to check whether:
The links in my database are correct (ie return HTTP status 200)
The links are in fact redirected and redirect to an appropriate/similar page (using the contents of the page )
The results of this are saved to a log file and emailed to me as an attachment.
This is all fine and working, however it is slow as all hell and half the time it times out and aborts itself early. Of note, I have about 16,000 links to check.
Was wondering how best to make this run quicker, and what I'm doing wrong?
Code below:
function echoappend ($file,$tobewritten) {
fwrite($file,$tobewritten);
echo $tobewritten;
}
error_reporting(E_ALL);
ini_set('display_errors', '1');
$filename=date('YmdHis') . "linkcheck.htm";
echo $filename;
$file = fopen($filename,"w+");
try {
$conn = new PDO('mysql:host=localhost;dbname=databasename',$un,$pw);
$conn->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
echo '<b>connected to db</b><br /><br />';
$sitearray = array("medical.posterous","ebm.posterous","behavenet","guidance.nice","www.rch","emedicine","www.chw","www.rxlist","www.cks.nhs.uk");
foreach ($sitearray as $key => $value) {
$site=$value;
echoappend ($file, "<h1>" . $site . "</h1>");
$q="SELECT * FROM link WHERE url LIKE :site";
$stmt = $conn->prepare($q);
$stmt->execute(array(':site' => 'http://' . $site . '%'));
$result = $stmt->fetchAll();
$totallinks = 0;
$workinglinks = 0;
foreach($result as $row)
{
$ch = curl_init();
$originalurl = $row['url'];
curl_setopt($ch, CURLOPT_URL, $originalurl);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
$output = curl_exec($ch);
if ($output === FALSE) {
echo "cURL Error: " . curl_error($ch);
}
$urlinfo = curl_getinfo($ch);
if ($urlinfo['http_code'] == 200)
{
echoappend($file, $row['name'] . ": <b>working!</b><br />");
$workinglinks++;
}
else if ($urlinfo['http_code'] == 301 || 302)
{
$redirectch = curl_init();
curl_setopt($redirectch, CURLOPT_URL, $originalurl);
curl_setopt($redirectch, CURLOPT_HEADER, 1);
curl_setopt($redirectch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($redirectch, CURLOPT_NOBODY, false);
curl_setopt($redirectch, CURLOPT_FOLLOWLOCATION, true);
$redirectoutput = curl_exec($redirectch);
$doc = new DOMDocument();
#$doc->loadHTML($redirectoutput);
$nodes = $doc->getElementsByTagName('title');
$title = $nodes->item(0)->nodeValue;
echoappend ($file, $row['name'] . ": <b>redirect ... </b>" . $title . " ... ");
if (strpos(strtolower($title),strtolower($row['name']))===false) {
echoappend ($file, "FAIL<br />");
}
else {
$header = curl_getinfo($redirectch);
echoappend ($file, $header['url']);
echoappend ($file, "SUCCESS<br />");
}
curl_close($redirectch);
}
else
{
echoappend ($file, $row['name'] . ": <b>FAIL code</b>" . $urlinfo['http_code'] . "<br />");
}
curl_close($ch);
$totallinks++;
}
echoappend ($file, '<br />');
echoappend ($file, $site . ": " . $workinglinks . "/" . $totallinks . " links working. <br /><br />");
}
$conn = null;
echo '<br /><b>connection closed</b><br /><br />';
} catch(PDOException $e) {
echo 'ERROR: ' . $e->getMessage();
}

Short answer is use the curl_multi_* methods to parallelize your requests.
The reason for the slowness is that web requests are comparatively slow. Sometimes VERY slow. Using the curl_multi_* functions lets you run multiple requests simultaneously.
One thing to be careful about is to limit the number of requests you run at once. In other words, don't run 16,000 requests at once. Maybe start at 16 and see how that goes.
The following example should help you get started:
<?php
//
// Fetch a bunch of URLs in parallel. Returns an array of results indexed
// by URL.
//
function fetch_urls($urls, $curl_options = array()) {
$curl_multi = curl_multi_init();
$handles = array();
$options = $curl_options + array(
CURLOPT_HEADER => true,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_NOBODY => true,
CURLOPT_FOLLOWLOCATION => true);
foreach($urls as $url) {
$handles[$url] = curl_init($url);
curl_setopt_array($handles[$url], $options);
curl_multi_add_handle($curl_multi, $handles[$url]);
}
$active = null;
do {
$status = curl_multi_exec($curl_multi, $active);
} while ($status == CURLM_CALL_MULTI_PERFORM);
while ($active && ($status == CURLM_OK)) {
if (curl_multi_select($curl_multi) != -1) {
do {
$status = curl_multi_exec($curl_multi, $active);
} while ($status == CURLM_CALL_MULTI_PERFORM);
}
}
if ($status != CURLM_OK) {
trigger_error("Curl multi read error $status\n", E_USER_WARNING);
}
$results = array();
foreach($handles as $url => $handle) {
$results[$url] = curl_getinfo($handle);
curl_multi_remove_handle($curl_multi, $handle);
curl_close($handle);
}
curl_multi_close($curl_multi);
return $results;
}
//
// The urls to test
//
$urls = array("http://google.com", "http://yahoo.com", "http://google.com/probably-bogus", "http://www.google.com.au");
//
// The number of URLs to test simultaneously
//
$request_limit = 2;
//
// Test URLs in batches
//
$redirected_urls = array();
for ($i = 0 ; $i < count($urls) ; $i += $request_limit) {
$results = fetch_urls(array_slice($urls, $i, $request_limit));
foreach($results as $url => $result) {
if ($result['http_code'] == 200) {
$status = "Worked!";
} else {
$status = "FAILED with {$result['http_code']}";
}
if ($result["redirect_count"] > 0) {
array_push($redirected_urls, $url);
echo "{$url}: ${status}\n";
} else {
echo "{$url}: redirected to {$result['url']} and {$status}\n";
}
}
}
//
// Handle redirected URLs
//
echo "Processing redirected URLs...\n";
for ($i = 0 ; $i < count($redirected_urls) ; $i += $request_limit) {
$results = fetch_urls(array_slice($redirected_urls, $i, $request_limit), array(CURLOPT_FOLLOWLOCATION => false));
foreach($results as $url => $result) {
if ($result['http_code'] == 301) {
echo "{$url} permanently redirected to {$result['url']}\n";
} else if ($result['http_code'] == 302) {
echo "{$url} termporarily redirected to {$result['url']}\n";
} else {
echo "{$url}: FAILED with {$result['http_code']}\n";
}
}
}
The above code processes a list of URLs in batches. It works in two passes. In the first pass, each request is configured to follow redirects and simply reports whether each URL ultimately lead to a successful request, or a failure.
The second pass processes any redirected URLs detected in the first pass and reports whether the redirect was a permanent redirection (meaning you can update your database with the new URL), or temporary (meaning you should NOT update your database).
NOTE:
In your original code, you have the following line, which will not work the way you expect it to:
else if ($urlinfo['http_code'] == 301 || 302)
The expression will ALWAYS return TRUE. The correct expression is:
else if ($urlinfo['http_code'] == 301 || $urlinfo['http_code'] == 302)

Also, put
set_time_limit(0);
at the top of your script to stop it aborting when it hits 30 seconds.

Error Handling with LastFM API and simplexml_load_file

I'm using simplexml_load_file to pull album information from the LastFM API and having no problems when the requested album matches.
However, when the album is not found, LastFM returns an error, which causes the code below to output a "failed to open stream" error.
I can see that LastFM is giving me exactly what I need, but am unsure how to proceed. What is the proper way to update the code so that this error/error code is correctly handled?
Code:
$feed = simplexml_load_file("http://ws.audioscrobbler.com/2.0/?method=album.getinfo&api_key=".$apikey."&artist=".$artist."&album=".$album."&autocorrect=".$autocorrect);
$albums = $feed->album;
foreach($albums as $album) {
$name = $album->name;
$img = $album->children();
$img_big = $img->image[4];
$img_small = $img->image[2];
$releasedate = $album->releasedate;
$newdate = date("F j, Y", strtotime($releasedate));
if ($img == "") {
$img = $emptyart; }
}
?>

if ($headers[0] == 'HTTP/1.0 400 Bad Request') {
$img_big = $emptyart;
$img_small = $emptyart;
}
That will break with a 403 error ...
Method 1 (not practical):
Basically you can mute the error with # and check if your $feed is empty. In that case, "some error" has happened, either an error with your url or a status failed from last.fm.
$feed = #simplexml_load_file("http://ws.audioscrobbler.com/2.0/?method=album.getinfo&api_key=".$apikey."&artist=".$artist."&album=".$album."&autocorrect=".$autocorrect);
if ($feed)
{
// fetch the album info
}
else
{
echo "No results found.";
}
In both ways, you get no results so you can probably satisfy your needs and showing "No results found" instead of showing him "Last.fm error status code, etc.."
Method 2 (recommended) - Use curl:
function load_file_from_url($url) {
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_REFERER, 'http://www.test.com/');
$str = curl_exec($curl);
if(str === false)
{
echo 'Error loading feed, please try again later.';
}
curl_close($curl);
return $str;
}
function load_xml_from_url($url) {
return #simplexml_load_string(load_file_from_url($url));
}
function getFeed($feedURL)
{
$feed = load_xml_from_url($feedURL);
if ($feed)
{
if ($feed['status'] == "failed")
{
echo "FAIL";
}
else
{
echo "WIN";
}
}
else {echo "Last.fm error";}
}
/* Example:
Album: Heritage
Artist: Opeth
*/
$url = "http://ws.audioscrobbler.com/2.0/?method=album.getinfo&api_key=273f3dc84c2d55501f3dc25a3d61eb39&artist=opeth&album=hwwXCvuiweitage&autocorrect=0";
$feed = getFeed ($url);
// will echo FAIL
$url2 = "http://ws.audioscrobbler.com/2.0/?method=album.getinfo&api_key=273f3dc84c2d55501f3dc25a3d61eb39&artist=opeth&album=heritage&autocorrect=0";
$feed2 = getFeed ($url2);
// will echo WIN

Wouldn't it be better to first get the url, check if it doesn't contain the error, and then use simplexml_load_string instead?

Working along the lines of evaluating the state of the URL prior to passing it to simplexml_load_file, I tried get_headers, and that, coupled with an if/else statement, seems to have gotten things working.
$url = "http://ws.audioscrobbler.com/2.0/?method=album.getinfo&api_key=".$apikey."&artist=".$artist."&album=".$album."&autocorrect=".$autocorrect;
$headers = get_headers($url, 1);
if ($headers[0] == 'HTTP/1.0 400 Bad Request') {
$img_big = $emptyart;
$img_small = $emptyart;
}
else {
$feed = simplexml_load_file($url);
$albums = $feed->album;
foreach($albums as $album) {
$name = $album->name;
$img = $album->children();
$img_big = $img->image[4];
$img_small = $img->image[2];
$releasedate = $album->releasedate;
$newdate = date("F j, Y", strtotime($releasedate));
if ($img == "") {
$img_big = $emptyart;
$img_small = $emptyart;
}
}
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Checking for site reciprocal link - php

Related

PHP Curl result giving trouble?

I am trying to extract info from a remote page and parse it to html. What I have is not working

PHP Scrape - Create multidimensional arrays from results - current code only returning one result

Optimising PHP cURL based link checker script - currently very slow

Error Handling with LastFM API and simplexml_load_file

Categories

Resources