How to scrape a webpage using PHP? - php

I am trying to scrape a webpage and parse some data from it. But everytime I try to scrape I get only the http response header. Here is my code that I used to get the data from the website..
$host = 'Host: dealnews.com';
$user_agent = 'User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:47.0) Gecko/20100101 Firefox/47.0';
$accept = 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8';
$accept_language = 'Accept-Language: en-US,en;q=0.5';
$accept_encoding = 'Accept-Encoding: gzip, deflate';
$connection = 'Connection: keep-alive';
$cookie = 'Cookie=front_page_sort=hotness; dnvta=%7B%22uid%22%3A%22VkA1VlBBb0tNcXdBQVF6UlJrTUFBQUJN%22%2C%22vid%22%3A%22VkA1bGx3b0tNcXdBQVF6bW53QUFBQUEt%22%2C%22fvts%22%3A1475237180%2C%22lvts%22%3A1475241453%2C%22ref%22%3A%22%22%2C%22usid%22%3A0%2C%22ct%22%3A2%2C%22cr%22%3A1475237180%7D; last_visit=1475241457; _ceg.s=oebjle; _ceg.u=oebjle; _ga=GA1.2.185245695.1475237222; __gads=ID=1921ec3c3fe54b1b:T=1475237222:S=ALNI_MZJZEuNpmg3Aq5e007E7iFjwuQ0nw; original_eref=DIRECT; _gat=1; mp_dealnews_mixpanel=%7B%22distinct_id%22%3A%20%221577afe52c549-01b1cfdcc8ca548-13666c4a-100200-1577afe52c620c%22%7D';
$requestHeaders = array ( $host, $user_agent, $accept, $accept_encoding, $accept_language, $connection, $cookie );
$ch = curl_init ( 'http://dealnews.com/2-LED-Window-Candles-w-Color-Changing-Bulbs-for-4-2-s-h/1797165.html?iref=rss-dealnews-todays-edition' );
curl_setopt ( $ch, CURLOPT_TIMEOUT, 100 );
curl_setopt ( $ch, CURLOPT_CONNECTTIMEOUT, 100 );
curl_setopt ( $ch, CURLOPT_RETURNTRANSFER, true );
curl_setopt ( $ch, CURLOPT_SSL_VERIFYHOST, false );
curl_setopt ( $ch, CURLOPT_SSL_VERIFYPEER, false );
curl_setopt ( $ch, CURLOPT_HEADER, TRUE );
curl_setopt ( $ch, CURLOPT_ENCODING, "gzip" );
curl_setopt ( $ch, CURLOPT_HTTPHEADER, $requestHeaders );
$data = curl_exec ( $ch );
if (! $data) {
die ( "Error: " . curl_error ( $ch ) . " Error no: " . curl_errno ( $ch ) );
}
curl_close ( $ch );
$htmlContent = str_get_html ( $data );
echo $htmlContent;
But it gives me the error as given below..
HTTP/1.1 302 Found Date: Fri, 30 Sep 2016 13:50:44 GMT Server: Apache X-Powered-By: PHP/5.5.9-1ubuntu4.19 Status: 302 Found Location: /lw/landing.html?uri=%2F2-LED-Window-Candles-w-Color-Changing-Bulbs-for-4-2-s-h%2F1797165.html%3Firef%3Drss-dealnews-todays-edition Content-Encoding: gzip Vary: Accept-Encoding Content-Length: 20 X-Cnection: close Content-Type: text/html; charset=utf-8
so can someone help me out with where I am going wrong in this

You will need
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
The header 302 is a redirect information.

If you are looking to ScreenScrape using PHP, I have successfully done it using the PHP Simple HTML DOM Parser libarary. It's very straight forward and easy to use. I know the site looks a bit antiquated, but my code from last year is still running strong. Haven't had a CRON error yet.

Related

Instagram API retrieve the code using PHP

I try to use the Instagram API but it's really not easy.
According to the API documentation, a code must be retrieved in order to get an access token and then make requests to Instagram API.
But after few try, I don't succeed.
I already well-configured the settings in https://www.instagram.com/developer
I call the url api.instagram.com/oauth/authorize/?client_id=[CLIENT_ID]&redirect_uri=[REDIRECT_URI]&response_type=code with curl, but I don't have the redirect uri with the code in response.
Can you help me please ;)!
I would recommend you use one of the existing PHP instagram client libraries like this https://github.com/cosenary/Instagram-PHP-API
I did this not too long ago, here's a good reference:
https://auth0.com/docs/connections/social/instagram
Let me know if it helps!
I've made this code, I hope it doesnt have error, but i've just made it for usecase like you wantedHere is the code, I'll explain it below how this code works.
$authorization_url = "https://api.instagram.com/oauth/authorize/?client_id=".$instagram_client_id."&redirect_uri=".$your_website_redirect_uri."&response_type=code";
$username='ig_username';
$password='ig_password';
$_defaultHeaders = array(
'User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:49.0) Gecko/20100101 Firefox/49.0',
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language: en-US,en;q=0.5',
'Accept-Encoding: ',
'Connection: keep-alive',
'Upgrade-Insecure-Requests: 1',
'Cache-Control: max-age=0'
);
$ch = curl_init();
$cookie='/application/'.strtoupper(VERSI)."instagram_cookie/instagram.txt";
curl_setopt( $ch, CURLOPT_POST, 0 );
curl_setopt( $ch, CURLOPT_HTTPGET, 1 );
if($this->token!==null){
array_push($this->_defaultHeaders,"Authorization: ".$this->token);
}
curl_setopt( $ch, CURLOPT_HTTPHEADER, $this->_defaultHeaders);
curl_setopt( $ch, CURLOPT_HEADER, true);
curl_setopt( $ch, CURLOPT_SSL_VERIFYPEER, false );
curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true );
curl_setopt( $ch, CURLOPT_COOKIEFILE,getcwd().$cookie );
curl_setopt( $ch, CURLOPT_COOKIEJAR, getcwd().$cookie );
curl_setopt($this->curlHandle,CURLOPT_URL,$url);
curl_setopt($this->curlHandle,CURLOPT_FOLLOWLOCATION,true);
$result = curl_exec($this->curlHandle);
$redirect_uri = curl_getinfo($this->curlHandle, CURLINFO_EFFECTIVE_URL);
$form = explode('login-form',$result)[1];
$form = explode("action=\"",$form)[1];
// vd('asd',$form);
$action = substr($form,0,strpos($form,"\""));
// vd('action',$action);
$csrfmiddlewaretoken = explode("csrfmiddlewaretoken\" value=\"",$form);
$csrfmiddlewaretoken = substr($csrfmiddlewaretoken[1],0,strpos($csrfmiddlewaretoken[1],"\""));
//finish getting parameter
$post_param['csrfmiddlewaretoken']=$csrfmiddlewaretoken;
$post_param['username']=$username;
$post_param['password']=$password;
//format instagram cookie from vaha's answer https://stackoverflow.com/questions/26003063/instagram-login-programatically
preg_match_all('/^Set-Cookie:\s*([^;]*)/mi', $result, $matches);
$cookieFileContent = '';
foreach($matches[1] as $item)
{
$cookieFileContent .= "$item; ";
}
$cookieFileContent = rtrim($cookieFileContent, '; ');
$cookieFileContent = str_replace('sessionid=; ', '', $cookieFileContent);
$cookie=getcwd().'/application/'.strtoupper(VERSI)."instagram_cookie/instagram.txt";
$oldContent = file_get_contents($cookie);
$oldContArr = explode("\n", $oldContent);
if(count($oldContArr))
{
foreach($oldContArr as $k => $line)
{
if(strstr($line, '# '))
{
unset($oldContArr[$k]);
}
}
$newContent = implode("\n", $oldContArr);
$newContent = trim($newContent, "\n");
file_put_contents(
$cookie,
$newContent
);
}
// end format
$useragent = "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:49.0) Gecko/20100101 Firefox/49.0";
$arrSetHeaders = array(
'origin: https://www.instagram.com',
'authority: www.instagram.com',
'upgrade-insecure-requests: 1',
'Host: www.instagram.com',
"User-Agent: $useragent",
'content-type: application/x-www-form-urlencoded',
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language: en-US,en;q=0.5',
'Accept-Encoding: deflate, br',
"Referer: $redirect_uri",
"Cookie: $cookieFileContent",
'Connection: keep-alive',
'cache-control: max-age=0',
);
$ch = curl_init();
curl_setopt($ch, CURLOPT_COOKIEJAR, dirname(__FILE__)."/".$cookie);
curl_setopt($ch, CURLOPT_COOKIEFILE, dirname(__FILE__)."/".$cookie);
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_HTTPHEADER, $arrSetHeaders);
curl_setopt($ch, CURLOPT_URL, $this->base_url.$action);
curl_setopt($ch, CURLOPT_REFERER, $redirect_uri);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, http_build_query($post_param));
sleep(5);
$page = curl_exec($ch);
preg_match_all('/^Set-Cookie:\s*([^;]*)/mi', $page, $matches);
$cookies = array();
foreach($matches[1] as $item) {
parse_str($item, $cookie1);
$cookies = array_merge($cookies, $cookie1);
}
var_dump($page);
Step 1:
We need to get to the login page first.
We can access it using curl get, with CURLOPT_FOLLOWLOCATION set to true so that we will be redirected to the login page, we access our application instagram authorization url
$authorization_url = "https://api.instagram.com/oauth/authorize/?client_id=".$instagram_client_id."&redirect_uri=".$your_website_redirect_uri."&response_type=code";
$username='ig_username';
This is step one from this Instagram documentation here
Now the result of the first get curl we have the response page and its page uri that we store at $redirect_uri, this must be needed and placed on referer header when we do http post for login.
After get the result of login_page, we will need to format the cookie, I know this and use some code from vaha answer here vaha's answer
Step 2:
After we get the login_page we will extract the action url , extract csrfmiddlewaretoken hidden input value.
After we get it, we will do a post parameter to login.
We must set the redirect uri, and dont forget the cookiejar, and other header setting like above code.After success sending the parameter post for login, Instagram will call your redirect uri, for example https://www.yourwebsite.com/save_instagram_code at there you must use or save your instagram code to get the access token using curl again ( i only explain how to get the code :D)
I make this in a short time, I'll update the code which I have tested and work if i have time, Feel free to suggest an edit of workable code or a better explanation.

php with Curl: follow redirect with POST

I have a script that send POST data to several pages. however, I encountered some difficulties sending request to some servers. The reason is redirection. Here's the model:
I'am sending post request to server
Server responses: 301 Moved Permanently
Then curl_setopt ( $ch, CURLOPT_FOLLOWLOCATION, TRUE) kicks in and follows the redirection (but via GET request).
To solve this I'am using curl_setopt ( $ch, CURLOPT_CUSTOMREQUEST, "POST") and yes, now its redirecting without POST body content that I've send in first request. How can I force curl to send post body when redirected? Thanks!
Here's the example:
<?php
function curlPost($url, $postData = "")
{
$ch = curl_init () or exit ( "curl error: Can't init curl" );
$url = trim ( $url );
curl_setopt ( $ch, CURLOPT_URL, $url );
//curl_setopt ( $ch, CURLOPT_POST, 1 );
curl_setopt ( $ch, CURLOPT_CUSTOMREQUEST, "POST");
curl_setopt ( $ch, CURLOPT_POSTFIELDS, $postData );
curl_setopt ( $ch, CURLOPT_RETURNTRANSFER, true );
curl_setopt ( $ch, CURLOPT_CONNECTTIMEOUT, 30 );
curl_setopt ( $ch, CURLOPT_TIMEOUT, 30 );
curl_setopt ( $ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.109 Safari/537.36");
curl_setopt ( $ch, CURLOPT_FOLLOWLOCATION, TRUE);
$response = curl_exec ( $ch );
if (! $response) {
echo "Curl errno: " . curl_errno ( $ch ) . " (" . $url . " postdata = $postData )\n";
echo "Curl error: " . curl_error ( $ch ) . " (" . $url . " postdata = $postData )\n";
$info = curl_getinfo($ch);
echo "HTTP code: ".$info["http_code"]."\n";
// exit();
}
curl_close ( $ch );
// echo $response;
return $response;
}
?>
curl is following what RFC 7231 suggests, which also is what browsers typically do for 301 responses:
Note: For historical reasons, a user agent MAY change the request
method from POST to GET for the subsequent request. If this
behavior is undesired, the 307 (Temporary Redirect) status code
can be used instead.
If you think that's undesirable, you can change it with the CURLOPT_POSTREDIR option, which only seems very sparsely documented in PHP but the libcurl docs explains it. By setting the correct bitmask there, you then make curl not change method when it follows the redirect.
If you control the server end for this, an easier fix would be to make sure a 307 response code is returned instead of a 301.

Pinterest login with PHP and cURL not working

I have been trying to make cURL login into pinterest.com for the last 17 hours straight, have tried countless and countless different ways just with cURL but it does not work at all.
My current code only goes to the page but the data is not posted, so it does not login just takes me to the login page.
This first code is using USERPWD which is where it takes me to the login page but it does not login.
error_reporting(E_ALL);
ini_set("display_errors", 1);
$url = "https://www.pinterest.com/login/";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6");
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 0);
curl_setopt($ch, CURLOPT_AUTOREFERER, 1);
curl_setopt($ch, CURLOPT_MAXREDIRS, 5);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); // allow https verification if true
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2); // check common name and verify with host name
curl_setopt($ch, CURLOPT_SSLVERSION,3); //
curl_setopt($ch, CURLOPT_CAINFO, getcwd() . "pin.pem"); // allow ssl cert direct comparison
curl_setopt($ch, CURLOPT_COOKIESESSION, TRUE); // set new cookie session
curl_setopt($ch, CURLOPT_COOKIEJAR, "cookies.txt");
curl_setopt($ch, CURLOPT_COOKIEFILE, "cookies.txt");
curl_setopt($ch, CURLOPT_USERPWD, "email:password");
curl_setopt($ch, CURLOPT_SSLVERSION,3);
// grab URL and pass it to the browser
curl_exec($ch);
// close cURL connection, save cookie file, free up system resources
curl_close($ch);
and if I switch it from CURLOPT_USERPWD to
curl_setopt($ch, CURLOPT_POSTFIELDS, 'username_or_email=$email&password=$password');
it just displays a blank page.
The pin.pem is the X.509 Certificate (PEM) file.
Any direction to make this work would be greatly appreciated it.
Edit
new code but leaves blank page and I got the output with a few arrays and displays this:
Array ( [url] => https://www.pinterest.com/login/ [content_type] => [http_code] => 0 [header_size] => 0 [request_size] => 0 [filetime] => -1 [ssl_verify_result] => 0 [redirect_count] => 0 [total_time] => 0.036169 [namelookup_time] => 3.3E-5 [connect_time] => 0.036186 [pretransfer_time] => 0 [size_upload] => 0 [size_download] => 0 [speed_download] => 0 [speed_upload] => 0 [download_content_length] => -1 [upload_content_length] => -1 [starttransfer_time] => 0 [redirect_time] => 0 [certinfo] => Array ( ) [redirect_url] => )
error_reporting(E_ALL);
ini_set("display_errors", 1);
$email = 'email';
$password = 'password';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://www.pinterest.com/login/');
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6');
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 0);
curl_setopt($ch, CURLOPT_AUTOREFERER, 1);
curl_setopt($ch, CURLOPT_MAXREDIRS, 5);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, TRUE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt($ch, CURLOPT_SSLVERSION,3); //
curl_setopt($ch, CURLOPT_CAINFO, getcwd() . 'pin.pem');
curl_setopt($ch, CURLOPT_COOKIESESSION, TRUE);
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookies.txt');
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookies.txt');
curl_setopt($ch, CURLOPT_POSTFIELDS, 'username_or_email=$email&password=$password');
curl_setopt($ch, CURLOPT_SSLVERSION,3);
curl_exec($ch);
$output=#curl_exec($ch);
$info = #curl_getinfo($ch);
echo $output;
print_r($info);
curl_close($ch);
The Pinterest login process isn't quite that simple. They use a CSRF token which you must extract and send with your login, along with the username and password in the POST body.
Here is what an actual login request to Pinterest looks like, so you will need to emulate this with cURL.
POST /resource/UserSessionResource/create/ HTTP/1.1
Host: www.pinterest.com
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:28.0) Gecko/20100101 Firefox/28.0
Accept: application/json, text/javascript, */*; q=0.01
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
DNT: 1
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
X-CSRFToken: 2rjgN4Qv67pN4wX91kTr4eIkgF54CzJH
X-NEW-APP: 1
X-APP-VERSION: 737af79
X-Requested-With: XMLHttpRequest
Referer: https://www.pinterest.com/login/
Content-Length: 300
Cookie: csrftoken=2rjgN4Qv67pN4wX91kTr4eIkgF54CzJH; _pinterest_sess="aPgJnrIBzvSKLUY/4H5UocshliA47GkkGtHLQwo1H4IcQv58vrdazclonByOb4fWCzb3a3nycKjQzDc6SkCB9eBKoejaLiCjkKLk/QAFRn2x1pvHFlFM+1EoD01/yFxmeQKlvULYU9+qf4D6Mkj8A=="; _track_cm=1;
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
source_url=%2Flogin%2F&data=%7B%22options%22%3A%7B%22username_or_email%22%3A%22YOU%40YOUROMAIN.COM%22%2C%22password%22%3A%22YOURPASSWORD%22%7D%2C%22context%22%3A%7B%7D%7D&module_path=App()%3ELoginPage()%3ELogin()%3EButton(class_name%3Dprimary%2C+text%3DLog+In%2C+type%3Dsubmit%2C+size%3Dlarge)
The source_url data in the request is the POST body (urlencoded). Take note that username_or_email is your login (I put YOU%40YOURDOMAIN.COM) and password is the password.
What you will have to do is make a GET request to /login/ to establish a session and cookies in the cURL session. Then using the same cURL handle, you can switch to a POST request, set CURLOPT_POSTFIELDS with the data from the source_url...... line.
You will probably also need to set the headers X-CSRFToken, X-NEW-APP, X-APP-VERSION, and X-Requested-With to match the above (except you will need to figure out how to get the correct CSRF Token value).
Unfortunately I don't have the time right now to make a working example, the next paragraph may help. You will need to use your browser to help you debug some of the HTTP requests to figure out all the requests you may need to make to get all the relevant data for your request.
If you check out this answer it shows curl login with PHP and links to a number of useful other related answers with examples.
EDIT:
Here is a working example of using PHP and cURL to log in to Pinterest.
This code is a Pinterest PHP login example (works as of 2014-05-11]. You may ask yourself, can what I want to do be done with the API instead of this hackish code which could break at any time???
As you can see I parse the CSRF_Token out of the headers, you should probably do this for the APP-VERSION as well since it can update almost daily. Right now it's hard coded.
<?php
error_reporting(E_ALL);
ini_set('display_errors', 1);
$username = 'you#yoursite.com'; // your username
$password = 'yourpassword'; // your password
// this is the http post data for logging in - username & password are substituted in later
$login_post = array(
'source_url' => '/login/',
'data' => '{"options":{"username_or_email":"%s","password":"%s"},"context":{}}',
'module_path' => 'App()>LoginPage()>Login()>Button(class_name=primary, text=Log In, type=submit, size=large',
);
$pinterest_url = 'https://www.pinterest.com/'; // pinterest home url
$login_url = $pinterest_url . 'login/'; // pinterest login page url
$login_post_url = $pinterest_url . 'resource/UserSessionResource/create/'; // pinterest login post url
// http headers to send with requests
$httpheaders = array(
'Connection: keep-alive',
'Pragma: no-cache',
'Cache-Control: no-cache',
'Accept-Language: en-US,en;q=0.5',
);
// http headers to send when logging in
$login_header = array(
'X-NEW-APP: 1',
'X-APP-VERSION: d2bb370', // THIS WILL UPDATE FREQUENTLY, CHANGE IT!!!
'X-Requested-With: XMLHttpRequest',
'Accept: application/json, text/javascript, */*; q=0.01');
// ----------------------------------------------------------------------------
// request home page to establish cookies and a session, set curl options
$ch = curl_init($pinterest_url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_AUTOREFERER, 1);
curl_setopt($ch, CURLOPT_ENCODING, 'gzip,deflate');
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Iron/31.0.1700.0 Chrome/31.0.1700.0');
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookie.txt');
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookie.txt');
curl_setopt($ch, CURLOPT_VERBOSE, 1);
curl_setopt($ch, CURLOPT_STDERR, fopen('/tmp/debug.txt', 'w+'));
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_HTTPHEADER, $httpheaders);
$data = curl_exec($ch);
// ----------------------------------------------------------------------------
// parse the csrf token out of the cookies to set later when logging in
list($headers, $body) = explode("\r\n\r\n", $data, 2);
preg_match('/csrftoken=(.*?)[\b;\s]/i', $headers, $csrf_token);
// next request the login page
curl_setopt($ch, CURLOPT_URL, $login_url);
$data = curl_exec($ch);
// ----------------------------------------------------------------------------
// perform login post
$login_header[] = 'X-CSRFToken: ' . $csrf_token[1];
$login_post['data'] = sprintf($login_post['data'], $username, $password);
$post = http_build_query($login_post);
curl_setopt($ch, CURLOPT_URL, $login_post_url);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post);
curl_setopt($ch, CURLOPT_HTTPHEADER, array_merge($httpheaders, $login_header));
curl_setopt($ch, CURLOPT_REFERER, $login_url);
curl_setopt($ch, CURLOPT_HEADER, 0);
$data = curl_exec($ch);
// check response and output status
if (curl_getinfo($ch, CURLINFO_HTTP_CODE) != 200) {
echo "Error logging in.<br />";
var_dump(curl_getinfo($ch));
} else {
$response = json_decode($data, true);
if ($response === null) {
echo "Failed to decode JSON response.<br /><br />";
var_dump($response);
} else if ($response['resource_response']['error'] === null) {
echo "Login successful, " . $response['resource_response']['data']['username'] . "<br /><br />";
echo "You have {$response['resource_response']['data']['follower_count']} followers, are following {$response['resource_response']['data']['following_count']} users. You have liked {$response['resource_response']['data']['like_count']} pins.";
}
}
My output:
Login successful, drew010
You have 0 followers, are following 0 users. You have liked 0 pins.
FYI, Pinterest has login rate limit so don't run this before every request.
Here is my Ruby implementation of the Pinterest login/session mechanism.
Run this once a day to save the headers (including csrftoken). Then use the saved headers to do requests that are not (yet) supported by the api (like ads reports).
class PinterestHeadersScheduler
include Sidekiq::Worker
sidekiq_options queue: :recurring, retry: 0
HOMEPAGE = 'https://ads.pinterest.com/'
LOGIN_URL = "#{HOMEPAGE}login/"
SESSION_URL = "#{HOMEPAGE}resource/UserSessionResource/create/"
LOGIN_DATA = {
source_url: '/login/',
data: { options: { username_or_email: ENV['PI_USERNAME'], password: ENV['PI_PASSWORD'] }, context: {} }.to_json
}
HEADERS = {
'Accept': 'application/json,text/html,image/webp,image/apng,*/*;q=0.8',
'Origin': 'https://ads.pinterest.com',
'Referer': 'https://ads.pinterest.com/',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36',
'Connection': 'keep-alive',
'Cache-Control': 'no-cache',
'Accept-Charset': 'utf-8;ISO-8859-1q=0.7,*;q=0.7',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'en-US,en;q=0.8'
}
SESSION_HEADERS = HEADERS.merge({
'Accept': 'application/json',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'X-Requested-With': 'XMLHttpRequest'
})
def perform
login = HTTParty.get(LOGIN_URL, { headers: HEADERS })
cjar = login.get_fields('Set-Cookie').each_with_object(HTTParty::CookieHash.new) { |cookie, jar| jar.add_cookies(cookie) }
headers = SESSION_HEADERS.merge({ 'Cookie': cjar.to_cookie_string, 'X-CSRFToken': cjar[:csrftoken] })
res = HTTParty.post(SESSION_URL, { headers: headers, body: LOGIN_DATA.to_param })
session = JSON.parse(ActiveSupport::Gzip.decompress(res.body))
raise "login error #{session['resource_response']['error']}" if session['resource_response']['error']
cjar = res.headers.get_fields('Set-Cookie').each_with_object(HTTParty::CookieHash.new) { |cookie, jar| jar.add_cookies(cookie) }
save_session_headers(HEADERS.merge({ 'Cookie' => cjar.to_cookie_string }))
end
def save_session_headers(headers)
# replace this with your cache/db
Utils::RedisUtil.set(:pinterest_session_headers, headers.to_json)
end
end

PHP Curl with a file attachment

I am trying to simulate a PHP cURL POST that requires a file upload.
Here is the HTML form from the website I am trying to POST TO: http://pastebin.com/X6Y0mmfP
The file I need to upload is "domains.txt" which can be found on the same directory as the script.
Using Live HTTP headers (firefox addon) I've retrieved this information:
POST to: http://www.majesticseo.com/reports/bulk-backlinks-upload
HTTP Headers:
Host: www.majesticseo.com
User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: https://www.majesticseo.com/reports/bulk-backlink-checker
Cookie: _pk_id.2.d6bc=a607157d494109d4.1382175578.4.1388174858.1384073229.; RURI=reports%2Fbulk- backlink-checker; _pk_ses.2.d6bc=*; STOK=Ox09WRWBeFCU3l3TAim86efmBa
Connection: keep-alive
Content-Type: multipart/form-data; boundary=---------------------------210646678590
Content-Length: 1106
POST Content:
-----------------------------210646678590\r\n
Content-Disposition: form-data; name="fileType"\r\n
\r\n
SingleColumn\r\n
-----------------------------210646678590\r\n
Content-Disposition: form-data; name="indexType"\r\n
\r\n
F\r\n
-----------------------------210646678590\r\n
Content-Disposition: form-data; name="ajaxLoadUrl"\r\n
\r\n
/reports/downloads/confirm-file-upload/backlinksAjax\r\n
-----------------------------210646678590\r\n
Content-Disposition: form-data; name="file"; filename="domains.txt"\r\n
Content-Type: text/plain\r\n
\r\n
facebook.com\n
twitter.com\n
google.com\n
youtube.com\n
wordpress.org\n
adobe.com\n
blogspot.com\n
wikipedia.org\n
wordpress.com\n
linkedin.com\n
yahoo.com\n
amazon.com\n
flickr.com\n
w3.org\n
pinterest.com\n
apple.com\n
tumblr.com\n
myspace.com\n
microsoft.com\n
vimeo.com\n
digg.com\n
qq.com\n
stumbleupon.com\n
baidu.com\n
addthis.com\n
miibeian.gov.cn\n
statcounter.com\n
bit.ly\n
feedburner.com\n
nytimes.com\n
reddit.com\n
delicious.com\n
msn.com\n
macromedia.com\n
bbc.co.uk\n
weebly.com\n
blogger.com\n
icio.us\n
goo.gl\n
gov.uk\n
cnn.com\n
yandex.ru\n
webs.com\n
google.de\n
mail.ru\n
livejournal.com\n
sourceforge.net\n
go.com\n
imdb.com\n
jimdo.com\n
\r\n
-----------------------------210646678590--\r\n
In the manual browser upload, I am using domains.txt - which is also the file on the server (in the same directory as the script).
My script first logs in to then it attempts to make this request.
This is what I have tried to do so far, however it is not being accepted:
$ch = curl_init();
$post = array('fileType' => 'SingleColumn',
'indexType' => 'F',
'ajaxLoadUrl' => '/reports/downloads/confirm-file-upload/backlinksAjax',
'file'=>'#'.realpath('./domains.txt') . ';filename=domains.txt'
);
$post = http_build_query($post);
curl_setopt($ch, CURLOPT_URL, "https://www.majesticseo.com/reports/bulk-backlinks-upload");
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5 );
curl_setopt($ch, CURLOPT_COOKIEJAR, 'majestic.txt');
curl_setopt($ch, CURLOPT_COOKIEFILE, 'majestic.txt');
curl_setopt($ch, CURLOPT_REFERER, 'https://www.majesticseo.com/reports/bulk-backlink-checker');
curl_setopt ($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post);
$result = curl_exec($ch);
curl_close($ch);
Curl doesn't work very well with relative paths, please provide the full path.
ex:
realpath('/home/user/public_html/domains.txt')
This is the function and how I generated the info for the request
function send_curl_request_with_attachment($method, $headers, $url, $post_fields) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
if($headers != "" && count($headers) > 0){
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
} curl_setopt($ch, CURLOPT_CUSTOMREQUEST, $method);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_fields);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_VERBOSE,true);
$result = curl_exec($ch);
curl_close($ch);
return $result;}
$token_slams = "Authorization: Bearer " . $access_token;
$authHeader = array(
$token_slams,
'Accept: application/form-data');
$schedule_path ='../../documents/' . $docs_record["document"];
$cFile = curl_file_create($schedule_path);
$post = array(
'old_record' => $old_record,
'employer_number' => $employer_number,
'payment_date' => $payment_date,
'fund_year' => $fund_year,
'fund_month' => $fund_month,
'employer_schedule'=> $cFile
);
send_curl_request_with_attachment("POST", $authHeader, $my_url, $post);

PHP CURL: Complete manipulation of HTTP Headers not allowed?

I've been working on writing a script which automatically logs me into my school's network, checks if the classes I'm trying to get into are no longer completely full, and if a spot has opened up, registers the class for me. However, I've hit a big snag in just the logging-in process.
Basically, I've been looking at the headers that are sent when I log in and try to replicate them. The problem is I keep getting an error saying "HTTP/1.1 400 Bad Request Content-Type: text/html Date: Sat, 23 Oct 2010 18:42:20 GMT Connection: close Content-Length: 42
Bad Request (Invalid Header Name)".
I'm guessing it has something to do with Host parameter I'm setting being different from what it really is (I set it so it is elion.psu.edu, but when looking at the headers from my script it has changed back to grantbachman.com, where the script is hosted). I guess it'll be best just to show you.
The beginning of the header I'm trying to create:
https://elion.psu.edu/cgi-bin/elion-student.exe/submit
POST /cgi-bin/elion-student.exe/submit HTTP/1.1
Host: elion.psu.edu
The beginning of the header which shows up when I run my script:
http://myDomain.com/myScriptName.php
GET /elionScript.php HTTP/1.1
Host: myDomain.com
Basically, the first line is different, the Host name is different, and it says I'm sending my info with a GET variable instead of a POST variable (even though I set curlopt_post to true). I'm basically looking for any help with altering this info such that the server accepts my script. I'm fresh out of ideas. Thanks.
Oh here's the code I'm using:
$data = array(
"$userIDName" => '********',
"$passName" => '********',
"$submitName" => 'Login+to+eLion',
'submitController' => '',
'forceUnicode' => '%D0%B4%D0%B0',
'sessionKey' => "$sessionValue",
'pageKey' => "$pageKeyValue",
'shopperID' => '');
$contentLength = strlen($userIDName . '=*********&' . $passName . '=********&' . $submitName .'=Login+to+eLion&submitController=&forceUnicode=%D0%B4%D0%B0&sessionKey=' . $sessionValue . '&pageKey=' . $pageKeyValue . '&shopperID=');
$ch = curl_init("https://elion.psu.edu/cgi-bin/elion-student.exe/submit");
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_COOKIE,'sessionKey="$sessionValue";pageKey="$pageKeyValue";BIGipServerelion_prod_pool="$prodPoolValue"');
curl_setopt($ch,CURLOPT_HTTPHEADER,array(
'Host: elion.psu.edu',
'User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.10) Gecko/20100914 Firefox/3.6.10',
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language: en-us,en;q=0.5','Accept-Encoding: gzip,deflate',
'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7',
'Keep-Alive: 115',
'Referer: https://elion.psu.edu/cgi-bin/elion-student.exe/launch/ELionMainGUI/Student',
"Cookie: sessionKey=$sessionValue; pageKey=$pageKeyValue; BIGipServerelion_prod_pool=$prodPoolValue", 'Content-Type: application/x-www-form-urlencoded',
"Content-Length: $contentLength"));
$contents2 = curl_exec ($ch);
I't also probably important to note that when I run the script, none of the information below the 'Keep-Alive: 115' line is displayed when I view the header.
Seems it missing some code in your question but try this :
1- Save your certificate on your server
2- Try this code
$pg = curl_init();
// Set the form data for posting the login information
$postData = array();
$postData["username"] = $username;
$postData["password"] = $password;
$postText = "";
foreach( $postData as $key => $value ) {
$postText .= $key . "=" . $value . "&";
}
curl_setopt( $pg, CURLOPT_URL, $YOUR_URL );
curl_setopt( $pg, CURLOPT_POST, true );
curl_setopt( $pg, CURLOPT_POSTFIELDS, $postText );
curl_setopt( $pg, CURLOPT_SSL_VERIFYPEER, true );
curl_setopt( $pg, CURLOPT_SSL_VERIFYHOST, 2 );
curl_setopt( $pg, CURLOPT_CAINFO, getcwd() . '/web'); //web is the exported certificate
//curl_setopt( $pg, CURLOPT_VERBOSE, true ); // for debug
//curl_setopt( $pg, CURLOPT_RETURNTRANSFERT, true); // if you want a ouput
curl_setopt( $pg, CURLOPT_COOKIEJAR, "cookies.txt" );
curl_setopt( $pg, CURLOPT_COOKIEFILE, "cookies.txt" );
curl_setopt( $pg, CURLOPT_USERAGENT, "Mozilla/5.0 (compatible; MSIE 5.01; Windows NT 5.0)" );
curl_setopt( $pg, CURLOPT_FOLLOWLOCATION, true );
curl_setopt( $pg, CURLOPT_COOKIE, session_name() . '=' . session_id() );
if( ( $response = curl_exec( $pg ) ) === false ) {
echo '*Curl erro' . curl_error($pg) . "\n";
}
curl_close($pg)
$YOUR_URL:https://elion.psu.edu/cgi-bin/elion-student.exe/launch/ELionMainGUI/Student
The form using dynamic name so its not simple like "username" and "password". Check on the website to know a "good" one.
Do not forget to add others hidden field like you did in the postData array and update the cookie section too.

Categories