Facebook page scraping - php

I am trying to scrap a facebook page ( https://www.facebook.com/pages/PTSD/455847705426 )
I found this script to login to facebook.
<?php
$EMAIL = "me#mail.com";
$PASSWORD = "facebookPassword";
function cURL($url, $header=NULL, $cookie=NULL, $p=NULL)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, $header);
curl_setopt($ch, CURLOPT_NOBODY, $header);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_COOKIE, $cookie);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
if ($p) {
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "POST");
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $p);
}
$result = curl_exec($ch);
if ($result) {
return $result;
} else {
return curl_error($ch);
}
curl_close($ch);
}
$a = cURL("https://login.facebook.com/login.php?login_attempt=1",true,null,"email=$EMAIL&pass=$PASSWORD");
preg_match('%Set-Cookie: ([^;]+);%',$a,$b);
$c = cURL("https://login.facebook.com/login.php?login_attempt=1",true,$b[1],"email=$EMAIL&pass=$PASSWORD");
preg_match_all('%Set-Cookie: ([^;]+);%',$c,$d);
for($i=0;$i<count($d[0]);$i++)
$cookie.=$d[1][$i].";";
/*
NOW TO JUST OPEN ANOTHER URL EDIT THE FIRST ARGUMENT OF THE FOLLOWING FUNCTION.
TO SEND SOME DATA EDIT THE LAST ARGUMENT.
*/
$page_html = cURL("https://www.facebook.com/pages/PTSD/455847705426",null,$cookie,null);
?>
now variable $page_html have only few posts, moreover they are in very complex code
my questions are
how can I get all posts.
is there some other approach which return me complete and clear data.
is there some way to have all posts in json format.
please tell me if there is some useful tutorial or articles regarding this.
Regards

Spend some time reading the developer documentation. You can get all the posts as a JSON object from a page by setting up an app, then querying the graph api with a page access token.

Related

PHP - cURL get's old version of google

When getting the Google site in cURL PHP I get the old Google page (The one with the blackbar and weird buttons). So is there a way to get the Google as it appears on google.com? My code:
function file_get_contents_curl($url, $proxy) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_PROXY, $proxy);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
echo file_get_contents_curl('https://www.google.com/', 'proxy:8080');
Thanks, if there is no way I am ok with the weird looking Google.

Ajax log in using CURL

I am trying to login to a web page and then retrieve data from a page that is only visible to logged in members. I seem to now be able to log in using CURL, however I'm unable to retrieve the page that I want.
function login($url,$postData){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postData);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_REFERER, "http://domain.com/".$jobUrl."");
curl_setopt($ch, CURLOPT_USERAGENT, "MozillaXYZ/1.0");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$output = curl_exec($ch);
curl_close($ch);
return $output;
}
$html=login('https://www.domain.com/users/ajaxonlogin.php',
'username=myemailaddress&passwd=mypassword');
echo $html;
I am able to get the following response:
{"status":"success","goto_url":"https:\/\/www.domain.com\/dashboard\/"}
I have tried adding to the bottom of the login function:
$url = "http://www.domain.com/page-that-i-want-but-requires-log-in.php";
$results_page = curl($url);
echo $results_page;
But I only receive the page as if I was NOT logged in...
Any pointers please?
Thanks.
Bruce

Get username from page

<?php
/* EDIT EMAIL AND PASSWORD */
$EMAIL = "MY EMAIL"; // here email
$PASSWORD = "MY PASS"; // here password
function cURL($url, $header=NULL, $cookie=NULL, $p=NULL)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, $header);
curl_setopt($ch, CURLOPT_NOBODY, $header);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_COOKIE, $cookie);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
if ($p) {
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "POST");
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $p);
}
$result = curl_exec($ch);
if ($result) {
return $result;
} else {
return curl_error($ch);
}
curl_close($ch);
}
$a = cURL("https://login.facebook.com/login.php?login_attempt=1",true,null,"email=$EMAIL&pass=$PASSWORD");
preg_match('%Set-Cookie: ([^;]+);%',$a,$b);
$c = cURL("https://login.facebook.com/login.php?login_attempt=1",true,$b[1],"email=$EMAIL&pass=$PASSWORD");
preg_match_all('%Set-Cookie: ([^;]+);%',$c,$d);
for($i=0;$i<count($d[0]);$i++)
$cookie.=$d[1][$i].";";
/*
NOW TO JUST OPEN ANOTHER URL EDIT THE FIRST ARGUMENT OF THE FOLLOWING FUNCTION.
TO SEND SOME DATA EDIT THE LAST ARGUMENT.
*/
echo cURL("https://www.facebook.com/search/results.php?q=Funny",null,$cookie,null);
?>
I want to be able to get the usernames on that page.
I can access the page i want to access, i cant get the second step.
I want to get the usernames listed on that page, i don't know where to start.
Can anybody help me out with this ?
Your method of accessing Facebook is against Facebook's Terms of Service. You should use the Facebook Graph API to get this kind of data.

How to post to a facebook users wall using the php api with the access token and curl?

Here is what I am currently trying:
if (function_exists('curl_init')) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://graph.facebook.com/me/feed');
curl_setopt($ch, CURLOPT_POST, 2);
curl_setopt($ch, CURLOPT_POSTFIELDS, "access_token=$facebook_access_token&message=testing api.");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch);
curl_close($ch);
}
where $facebook_access_token is a variable containing the access token.
Am I doing something wrong? I have to use the php api and I do not explicitly know the users facebook page url, only the access token.
Thanks.
Add one more option to cURL:
CURLOPT_SSL_VERIFYPEER : false
if (function_exists('curl_init')) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://graph.facebook.com/me/feed');
curl_setopt($ch, CURLOPT_POST, 2);
curl_setopt($ch, CURLOPT_POSTFIELDS, "access_token=$facebook_access_token&message=testing api.");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
$result = curl_exec($ch);
curl_close($ch);
}

How to use CURL instead of file_get_contents?

I use file_get_contents function to get and show external links on my specific page.
In my local file everything is okay, but my server doesn't support the file_get_contents function, so I tried to use cURL with the below code:
function file_get_contents_curl($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
echo file_get_contents_curl('http://google.com');
But it returns a blank page. What is wrong?
try this:
function file_get_contents_curl($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
This should work
function curl_load($url){
curl_setopt($ch=curl_init(), CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$response = curl_exec($ch);
curl_close($ch);
return $response;
}
$url = "http://www.google.com";
echo curl_load($url);
I encountered such a problem accessing Google Drive content via the direct link.
After calling file_get_contents returned 302 Moved temporarily
//Any google url. This example is fake for Google Drive direct link.
$url = "https://drive.google.com/uc?id=0BxQKKJYjuNElbFBNUlBndmVHHAj";
$html = file_get_contents($url);
echo $html; //print none because error 302.
With the code below it worked again:
//Any google url. This example is fake for Google Drive direct link.
$url = "https://drive.google.com/uc?id=0BxQKKJYjuNElbFBNUlBndmVHHAj";
$ch = curl_init();
curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 3);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
$html = curl_exec($ch);
curl_close($ch);
echo $html;
I tested it today, 03/19/2018
//You can try this . It should work fine.
function curl_tt($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 3);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
echo curl_tt("https://google.com");

Categories