Moz api request is being blocked by Incapsula?

Moz api request is being blocked by Incapsula? - php

When I try to access moz api using below code
$accessID = 'mozscape-key';
$secretKey = 'secert key';
// Set your expires times for several minutes into the future.
// An expires time excessively far in the future will not be honored by the Mozscape API.
$expires = time() + 300;
// Put each parameter on a new line.
$stringToSign = $accessID."\n".$expires;
// Get the "raw" or binary output of the hmac hash.
$binarySignature = hash_hmac('sha1', $stringToSign, $secretKey, true);
// Base64-encode it and then url-encode that.
$urlSafeSignature = urlencode(base64_encode($binarySignature));
// Specify the URL that you want link metrics for.
$objectURL = "www.seomoz.org";
// Add up all the bit flags you want returned.
// Learn more here: https://moz.com/help/guides/moz-api/mozscape/api-reference/url-metrics
$cols = "103079215108";
// Put it all together and you get your request URL.
// This example uses the Mozscape URL Metrics API.
$requestUrl = "http://lsapi.seomoz.com/linkscape/url-metrics/".urlencode($objectURL)."?Cols=".$cols."&AccessID=".$accessID."&Expires=".$expires."&Signature=".$urlSafeSignature;
echo $requestUrl;
die;
// Use Curl to send off your request.
$options = array(
CURLOPT_RETURNTRANSFER => true
);
$ch = curl_init($requestUrl);
curl_setopt_array($ch, $options);
$content = curl_exec($ch);
curl_close($ch);
$f = fopen('tte.txt','a');
fwrite($f,$content);
fclose($f);
print_r($content);
The out it return is below
<html style="height:100%">
<head>
<meta content="NOINDEX, NOFOLLOW" name="ROBOTS">
<meta content="telephone=no" name="format-detection">
<meta content="initial-scale=1.0" name="viewport">
<meta content="IE=edge,chrome=1" http-equiv="X-UA-Compatible">
<title></title>
</head>
<body style="margin:0px;height:100%">
<iframe frameborder="0" height="100%" marginheight="0px" marginwidth="0px"
src="/_Incapsula_Resource?CWUDNSAI=9&xinfo=10-113037580-0%200NNN%20RT(1470041335360%200)%20q(0%20-1%20-1%20-1)%20r(0%20-1)%20B12(8,811001,0)%20U5&incident_id=220010400174850153-812164000562037002&edet=12&cinfo=08000000"
width="100%">Request unsuccessful. Incapsula incident ID:
220010400174850153-812164000562037002</iframe>
<meta content="NOINDEX, NOFOLLOW" name="ROBOTS">
<meta content="telephone=no" name="format-detection">
<meta content="initial-scale=1.0" name="viewport">
<meta content="IE=edge,chrome=1" http-equiv="X-UA-Compatible">
<iframe frameborder="0" height="100%" marginheight="0px" marginwidth="0px"
src="/_Incapsula_Resource?CWUDNSAI=9&xinfo=6-31536099-0%200NNN%20RT(1470041496215%200)%20q(0%20-1%20-1%20-1)%20r(0%20-1)%20B12(8,811001,0)%20U5&incident_id=220010400174850153-224923142338658566&edet=12&cinfo=08000000"
width="100%">Request unsuccessful. Incapsula incident ID:
220010400174850153-224923142338658566</iframe>
</body>
</html>
Seems like incapsula is treating request as robot. Can anyone please help me how I can fix it.

If you said you are using the $requestUrl to a GET (in browser) it works fine, try combining your options array.
It should look like this:
$ch = curl_init();
curl_setopt_array($ch, array(
CURLOPT_RETURNTRANSFER => 1,
CURLOPT_URL => $requestUrl,
CURLOPT_USERAGENT => 'Maybe You Need Agent?'
));
Note about agent (taken from web):
cURL is a behemoth, and has many many possibilities. Some sites might
only serve pages to some user agents, and when working with APIs, some
might request you send a specfici user agent, this is something to be
aware of.
Also worth checking - you have ID of failture from Incapsula - 220010400174850153-224923142338658566
Can you check the logs and see what is there?

Related

Scraping meta data on Japanese websites with some character encoding problems

For a small project on Wordpress, I am trying to scrape some information from site given an URL (namely a thumbnail and the publisher). I know there are few plugin doing similar things but they usually inject the result in the article itself which is not my goal. Furthermore, the one I use tend to have the same issue I have.
My overall goal is to display a thumbnail and the publisher name given a URL in a post custom field. I get my data from the opengraph metatags for the moment (I'm a lazy guy).
The overall code works but I get the usual mangled text when dealing with non-latin characters (and that's 105% of the cases). Even stranger for me : it depends on the site.
I have tried to use ForceUTF8 and gzip compression in curl as recommended in various answers here but the result is still the same (or gets worse).
My only clue for the moment is how the encoding is declared on each page
For example, for 3 URL I was given:
https://www.jomo-news.co.jp/life/oricon/25919
<meta charset="UTF-8" />
<meta property="og:site_name" content="上毛新聞" />
Result > ä¸Šæ¯›æ–°è ž
Not OK
https://entabe.jp/21552/rl-waffle-chocolat-corocoro
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta property="og:site_name" content="えん食べ [グルメニュース]" />
Result > えん食べ [グルメニュース]
OK
https://prtimes.jp/main/html/rd/p/000000008.000026619.html
<meta charset="utf-8">
<meta property="og:site_name" content="プレスリリース・ニュースリリース配信シェアNo.1｜PR TIMES" />
Result > ãƒ—ãƒ¬ã‚¹ãƒªãƒªãƒ¼ã‚¹ãƒ»ãƒ‹ãƒ¥ãƒ¼ã‚¹ãƒªãƒªãƒ¼ã‚¹é… ä¿¡ã‚·ã‚§ã‚¢No.1ï½œPR TIMES
Not OK
For reference, the curl declaration I use
function file_get_contents_curl($url)
{
header('Content-type: text/html; charset=UTF-8');
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
And the scraping function:
function get_news_header_info($url){
//parsing begins here:
$news_result = array("news_img_url" => "", "news_name" => "");
$html = file_get_contents_curl($url);
$doc = new DOMDocument();
#$doc->loadHTML($html);
$metas = $doc->getElementsByTagName('meta');
for ($i = 0; $i < $metas->length; $i++)
{
$meta = $metas->item($i);
if($meta->getAttribute('property') == 'og:site_name')
{
if(! $news_name)
$news_name = $meta->getAttribute('content');
}
//Script continues
}
Anyone knows what is different between these three cases and how I could deal with it ?
EDIT
Looks like that even though all websites declared a UTF-8 charset, after looking at the curl_getinfo() and testing a bunch of charset conversion combinaison, a conversion to ISO-8859-1 was necessary.
So just adding a
iconv("UTF-8", "ISO-8859-1", $scraped_text);
was enough to solve the problem.
For the sake of giving a complete answer, here is the snippet of code to test conversion pairs from this answer by rid-iculous
$charsets = array(
"UTF-8",
"ASCII",
"Windows-1252",
"ISO-8859-15",
"ISO-8859-1",
"ISO-8859-6",
"CP1256"
);
foreach ($charsets as $ch1) {
foreach ($charsets as $ch2){
echo "<h1>Combination $ch1 to $ch2 produces: </h1>".iconv($ch1, $ch2, $text_2_convert);
}
}
Problem solved, have fun!

Looks like even tough all pages declared using UTF-8, some ISO-8859-1 was hidden in places. Using iconv solved the issue.
Edited the question with all the details, case closed !

Unable to pull orderbook, bitfinex api

I am experimenting with the bitfinex API.
I program in PhP and all the docs are JS or Ruby (I really need to learn more Ruby).
Bitfinex API docs #orderbook
I can pull user info, but I am unable to pull the orderbook
code:
<?php
function bitfinex_query($path, array $req = Array())
{
global $config;
// API settings, add your Key and Secret at here
$key = "xxxxxxxxx";
$secret = "xxxxxxxx";
// generate a nonce to avoid problems with 32bits systems
$mt = explode(' ', microtime());
$req['request'] = "/v1".$path;
$req['nonce'] = $mt[1].substr($mt[0], 2, 6);
// generate the POST data string
$post_data = base64_encode(json_encode($req));
$sign = hash_hmac('sha384', $post_data, $secret);
// generate the extra headers
$headers = array(
'X-BFX-APIKEY: '.$key,
'X-BFX-PAYLOAD: '.$post_data,
'X-BFX-SIGNATURE: '.$sign,
);
// curl handle (initialize if required)
static $ch = null;
if (is_null($ch)) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT,
'Mozilla/4.0 (compatible; Bter PHP bot; '.php_uname('a').'; PHP/'.phpversion().')'
);
}
curl_setopt($ch, CURLOPT_URL, 'https://api.bitfinex.com/v1'.$path);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_data);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
// run the query
$res = curl_exec($ch);
if ($res === false) throw new Exception('Curl error: '.curl_error($ch));
//echo $res;
$dec = json_decode($res, true);
if (!$dec) throw new Exception('Invalid data: '.$res);
return $dec;
}
//this works
$api_name = '/orders';
$openorders = bitfinex_query($api_name);
var_dump($openorders);
//broken
$api_name = '/book/BTCUSD';
$orderbook = bitfinex_query($api_name);
//$orderbook = bitfinex_query($api_name, array("limit_asks"=> 1, "group"=> 0));
?>
output:
array(1) {
[0]=>
array(16) {
["id"]=>
int(880337054)
["symbol"]=>
string(6) "btcusd"
["exchange"]=>
NULL
["price"]=>
string(6) "759.02"
["avg_execution_price"]=>
string(3) "0.0"
["side"]=>
string(4) "sell"
["type"]=>
string(14) "exchange limit"
["timestamp"]=>
string(12) "1467544183.0"
["is_live"]=>
bool(true)
["is_cancelled"]=>
bool(false)
["is_hidden"]=>
bool(false)
["oco_order"]=>
NULL
["was_forced"]=>
bool(false)
["original_amount"]=>
string(4) "0.05"
["remaining_amount"]=>
string(4) "0.05"
["executed_amount"]=>
string(3) "0.0"
}
}
PHP Fatal error: Uncaught exception 'Exception' with message 'Invalid data: <!DOCTYPE html>
<!--[if IE 8]>
<html class="no-js lt-ie9" lang="en"> <![endif]-->
<!--[if gt IE 8]><!-->
<html class="no-js" lang="en"> <!--<![endif]-->
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width">
<meta name="description"
content="The largest and most advanced cryptocurrencies exchange">
<meta name="keywords"
content="bitcoin,exchange,bitcoin exchange,litecoin,ethereum,margin,trade">
<meta property="og:title" content="Bitfinex">
<meta property="og:description"
content="The largest and most advanced cryptocurrencies exchange">
<meta property="og:image" content="https://bitfinex.com/assets/bfx-stacked.png">
<meta name="twitter:card" content="summary">
<meta name="twitter:site" content="#bitfinex">
<meta name="twitter:title" content="Bitfinex">
<meta name="twitter:description"
content="The largest and most advanced cryptocurrencies exchange">
<meta name="twitter:image" con in /home/bitfinex/buycoinz.php on line 49
Fatal error: Uncaught exception 'Exception' with message 'Invalid data: <!DOCTYPE html>
<!--[if IE 8]>
<html class="no-js lt-ie9" lang="en"> <![endif]-->
<!--[if gt IE 8]><!-->
<html class="no-js" lang="en"> <!--<![endif]-->
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width">
<meta name="description"
content="The largest and most advanced cryptocurrencies exchange">
<meta name="keywords"
content="bitcoin,exchange,bitcoin exchange,litecoin,ethereum,margin,trade">
<meta property="og:title" content="Bitfinex">
<meta property="og:description"
content="The largest and most advanced cryptocurrencies exchange">
<meta property="og:image" content="https://bitfinex.com/assets/bfx-stacked.png">
<meta name="twitter:card" content="summary">
<meta name="twitter:site" content="#bitfinex">
<meta name="twitter:title" content="Bitfinex">
<meta name="twitter:description"
content="The largest and most advanced cryptocurrencies exchange">
<meta name="twitter:image" con in /home/bitfinex/buycoinz.php on line 49
How do I correctly access the bitfinex api orderbook?

It doesn't look like you are hitting the right endpoint.
Also the orderbook is public. Your example is for authenticated endpoints, like making a trade. You don't need to SHA384 or base64 anything for the public endpoints, you can just do a simple curl or even a file_get_contents.
Here's the endpoint for the orderbook: https://api.bitfinex.com/v1/book/BTCUSD
Now you can do what you'd like with it from there.
cURL
$curl = "https://api.bitfinex.com/v1/book/BTCUSD";
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_URL, $curl);
$ccc = curl_exec($ch);
print_r($ccc);
This would just dump everything.
Or you can parse all the asks and bids into a table using foreach loops. Below is an example with file_get_contents, you could do this with cURL as well of course.
file_get_contents Fiddle example
$fgc = json_decode(file_get_contents("https://api.bitfinex.com/v1/book/BTCUSD"), true);
echo "<table><tr><td>Bids</td><td>Asks</td></tr>";
$bids = $fgc["bids"];
echo "<tr><td valign='top'>";
foreach($bids as $details){
echo "$".$details["price"]." - ".$details["amount"];
echo "<br>";
}
echo "</td><td valign='top'>";
$asks = $fgc["asks"];
foreach($asks as $askDetails){
echo "$".$askDetails["price"]." - ".$askDetails["amount"];
echo "<br>";
}
echo "</td></tr></table>";

I cant save some facebook images to my server as it does not understand the file

I am using the facebook graph api and it was working well until I realised that some of the jpg files have a query string at the end that is making them unusable.
e.g.
https://scontent.xx.fbcdn.net/hphotos-xaf1/v/t1.0-9/487872_451835128174833_1613257199_n.jpg?oh=621bed79f5436e81c3e219c86db8f0d9&oe=560F3D0D
I have tried stripping off everything after .jpg in the hope that it would still load the image but unfortunately it doesnt.
In the following code take the $facebook_image_url to be the one above. This works fine when the url ends in .jpg but fails on the above. As a note, I am converting the name to a random number
$File_Name = $facebook_image_url;
$File_Ext = '.jpg';
$Random_Number = rand(0, 9999999999); //Random number to be added to name.
$NewFileName = $Random_Number.$File_Ext; //new file name
$local_file = $UploadDirectory.$NewFileName;
$remote_file = $File_Name;
$ch = curl_init();
$fp = fopen ($local_file, 'w+');
$ch = curl_init($remote_file);
curl_setopt($ch, CURLOPT_TIMEOUT, 50);
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_ENCODING, "");
curl_exec($ch);
curl_close($ch);
fclose($fp);
$image = new Imagick(DIR_TEMPORARY_IMAGES.$NewFileName);
The error Im getting is
Fatal error: Uncaught exception 'ImagickException' with message 'Not a
JPEG file: starts with 0x3c 0x21 `/mysite/temp-images/1849974705.jpg'
# jpeg.c/EmitMessage/232'
I can confirm the image isnt saving as a proper jpg, just a small 3KB file with the name 1849974705.jpg (or other random numbers)
Is there either
A: A way of getting those images from facebook as raw jpg
or
B: A way of converting them succesfully to jpgs

You could always download the image using file_get_contents()
This code works for me...
file_put_contents("image.jpg", file_get_contents("https://scontent.xx.fbcdn.net/hphotos-xaf1/v/t1.0-9/522827_10152235166655545_26514444_n.jpg?oh=1d52a86082c7904da8f12920e28d3687&oe=5659D5BB"));

Just because something has .jpg in the URI doesn't mean it's an image.
Getting that URL via wget gives the result:
<!DOCTYPE html>
<html lang="en" id="facebook">
<head>
<title>Facebook | Error</title>
<meta charset="utf-8">
<meta http-equiv="cache-control" content="no-cache">
<meta http-equiv="cache-control" content="no-store">
<meta http-equiv="cache-control" content="max-age=0">
<meta http-equiv="expires" content="-1">
<meta http-equiv="pragma" content="no-cache">
<meta name="robots" content="noindex,nofollow">
<style>
....
....
i.e. it's not an image, exactly as the error message is telling you.

cURL with HTML content

I need to post to a URL and I am doing this with curl. But the problem is with the HTML content I am posting. I am using this page which I am requesting to send an html email. So it will have inline styles. When I urlencode() or rawurlenocde() these style attribute is stripped. So the mail will not look correct. How can I avoid this and post the HTML as it is ?
This is my code :
$mail_url = "to=".$email->uEmail;
$mail_url .= "&from=info#domain.com";
$mail_url .= "&subject=".$email_campaign[0]->email_subject;
$mail_url .= "&type=signleOffer";
$mail_url .= "&html=".rawurlencode($email_campaign[0]->email_content);
//open curl request to send the mail
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_POST,count(5));
curl_setopt($ch,CURLOPT_POSTFIELDS,$mail_url);
//execute post
$result = curl_exec($ch);

Here is an example, use http_build_query() to build your post data from an array of values:
<?php
//Receiver debug
if($_SERVER['REQUEST_METHOD']=='POST'){
file_put_contents('test.POST.values.txt',print_r($_POST,true));
/*
Array
(
[to] => example#example.com
[from] => info#domain.com
[subject] => subject
[type] => signleOffer
[html] =>
<!DOCTYPE HTML>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Mail Template</title>
<style>.yada{color:green;}</style>
</head>
<body>
<p style="color:red">Red</p>
<p class="yada">Green</p>
</body>
</html>
)
*/
die;
}
$curl_to_post_parameters = array(
'to'=>'example#example.com',
'from'=>'info#domain.com',
'subject'=>'subject',
'type'=>'signleOffer',
'html'=>'
<!DOCTYPE HTML>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Mail Template</title>
<style>.yada{color:green;}</style>
</head>
<body>
<p style="color:red">Red</p>
<p class="yada">Green</p>
</body>
</html>
'
);
$curl_options = array(
CURLOPT_URL => "http://localhost/test.php",
CURLOPT_POST => true,
CURLOPT_POSTFIELDS => http_build_query( $curl_to_post_parameters ), //<<<
CURLOPT_RETURNTRANSFER => true,
CURLOPT_HEADER => false
);
$curl = curl_init();
curl_setopt_array($curl, $curl_options);
$result = curl_exec($curl);
curl_close($curl);
?>

Do a POST as described in this post:
Passing $_POST values with cURL
It should solve your problem.

Facebook API : Cannot register an achievement with facebook

I spent a lot of time trying to register an achievement with facebook using grahp API, but
I am always getting this :
{"error":{"type":"OAuthException","message":"(#2) Object at achievement URL is not of type game.achievement"}}
The code is :
$achievementUrl = 'http://www.dappergames.com/test.html';
$url = 'https://graph.facebook.com/' . $appID . '/achievements?achievement=' .
$achievementUrl . '&display_order=' . $order . '&access_token=' . $accessToken . '';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_BASIC);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST,0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER,0);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $fields);
$data = curl_exec($ch);
print_r($data);
Any help will be greatly appreciated.
Thanks

Looking at your test, you're missing the most important tag, og:url.
Facebook uses that tag to find the location of where to parse the information, I was having a similar problem on adding achievements to my app until I figured that out. Here's how mine looks. Mine passes the debugger and works currently on Facebook.
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:og="http://ogp.me/ns#" xmlns:fb="http://www.facebook.com/2008/fbml">
<head>
<title>Game Loader</title>
<meta property="og:type" content="game.achievement"/>
<meta property="og:title" content="Game Loader"/>
<meta property="og:url" content="FULL LINK TO THIS PAGE"/>
<meta property="og:description" content="You loaded the gam!e"/>
<meta property="og:image" content="Link to 50x50 image"/>
<meta property="og:points" content="10"/>
<meta property="fb:app_id" content="YOUR_APP_ID"/>
</head>
<body>
</body>
</html>
After updating your HTML to this, run it through the debugger linked above so that it is updated on Facebook's end (and to see if you receive any warnings) and then create it and give it to users.
Here's how I created it:
curl -D "achievement=[URL TO ACHIEVEMENT]&access_token=[APPLICATION ACCESS TOKEN]" https://graph.facebook.com/APP_ID/achievements
and I then rewarded it to users using the Facebook Javascript API on my canvas application like so:
FB.api('/'+user_id+'/achievements', 'POST', { 'access_token': [APPLICATION ACCESS TOKEN], 'achievement':[FULL URL]},
function(response) {
if(console)
console.log(response);
});

Make sure that you have the correct Open Graph meta tags in your achievement URL $achievementUrl:
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:og="http://ogp.me/ns#">
<head>
<title>ACHIEVEMENT_TITLE</title>
<meta property="og:type" content="game.achievement"/>
<meta property="og:url" content="URL_FOR_THIS_PAGE"/>
<meta property="og:title" content="ACHIEVEMENT_TITLE"/>
<meta property="og:description" content="ACHIEVEMENT_DESCRIPTON"/>
<meta property="og:image" content="URL_FOR_ACHIEVEMENT_IMAGE"/>
<meta property="og:points" content="POINTS_FOR_ACHIEVEMENT"/>
<meta property="fb:app_id" content="YOUR_APP_ID"/>
</head>
<body>
Promotional content for the Achievement.
This is the landing page where a user will be directed after
clicking on the achievement story.
</body>
</html>
The above is just an example from this post. Note the first meta tag in the example? I suppose it's missing in your page.

<meta property="og:title" content="Game Loader"/>
<meta property="og:url" content="FULL LINK TO THIS PAGE"/>
These are required fields. Provide all the required fields and try again with the debugger tool, if debugger tool passes your page then you are ok to register achievement.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Moz api request is being blocked by Incapsula? - php

Related

Scraping meta data on Japanese websites with some character encoding problems

Unable to pull orderbook, bitfinex api

I cant save some facebook images to my server as it does not understand the file

cURL with HTML content

Facebook API : Cannot register an achievement with facebook

Categories

Resources