I made a script which iterates through a couple pages of a third party website looking for data, I need to run it on a crontable once a day. The way I currently wrote, testing its function on a browser, the script reloads itself with javascript to go to the next page if the data it seeks isn't found. So this won't work in cron. The problem with simply looping through the function is that I can't run this function multiple times: http_get() as defined by
function http_get($target, $ref)
{
return http($target, $ref, $method="GET", $data_array="", EXCL_HEAD);
}
function http($target, $ref, $method, $data_array, $incl_head)
{
# Initialize PHP/CURL handle
$ch = curl_init();
# Prcess data, if presented
if(is_array($data_array))
{
# Convert data array into a query string (ie animal=dog&sport=baseball)
foreach ($data_array as $key => $value)
{
if(strlen(trim($value))>0)
$temp_string[] = $key . "=" . urlencode($value);
else
$temp_string[] = $key;
}
$query_string = join('&', $temp_string);
}
# HEAD method configuration
if($method == HEAD)
{
curl_setopt($ch, CURLOPT_HEADER, TRUE); // No http head
curl_setopt($ch, CURLOPT_NOBODY, TRUE); // Return body
}
else
{
# GET method configuration
if($method == GET)
{
if(isset($query_string))
$target = $target . "?" . $query_string;
curl_setopt ($ch, CURLOPT_HTTPGET, TRUE);
curl_setopt ($ch, CURLOPT_POST, FALSE);
}
# POST method configuration
if($method == POST)
{
if(isset($query_string))
curl_setopt ($ch, CURLOPT_POSTFIELDS, $query_string);
curl_setopt ($ch, CURLOPT_POST, TRUE);
curl_setopt ($ch, CURLOPT_HTTPGET, FALSE);
}
curl_setopt($ch, CURLOPT_HEADER, $incl_head); // Include head as needed
curl_setopt($ch, CURLOPT_NOBODY, FALSE); // Return body
}
curl_setopt($ch, CURLOPT_COOKIEJAR, COOKIE_FILE); // Cookie management.
curl_setopt($ch, CURLOPT_COOKIEFILE, COOKIE_FILE);
curl_setopt($ch, CURLOPT_TIMEOUT, CURL_TIMEOUT); // Timeout
curl_setopt($ch, CURLOPT_USERAGENT, WEBBOT_NAME); // Webbot name
curl_setopt($ch, CURLOPT_URL, $target); // Target site
curl_setopt($ch, CURLOPT_REFERER, $ref); // Referer value
curl_setopt($ch, CURLOPT_VERBOSE, FALSE); // Minimize logs
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE); // No certificate
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE); // Follow redirects
curl_setopt($ch, CURLOPT_MAXREDIRS, 4); // Limit redirections to four
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Return in string
$return_array['FILE'] = curl_exec($ch);
# Create return array
$return_array['STATUS'] = curl_getinfo($ch);
$return_array['ERROR'] = curl_error($ch);
# Close PHP/CURL handle
curl_close($ch);
# Return results
return $return_array;
}
Any ways I can get around this? Thanks
I'm not sure what your issue is - you could just have the loop contain the code that calls the functions, not the functions themselves.
Alternatively, use function_exists to test if you've already defined the functions.
Related
I have this script:
$ch = curl_init($url_path.'admin/');
$cookiefile = $srv_path."admin/cookie.txt" ;
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookiefile);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$output = curl_exec ($ch);
//
curl_setopt($ch, CURLOPT_PROXY, "http://127.0.0.1.:8888");
curl_setopt($ch, CURLOPT_PROXYPORT, 8888);
curl_setopt($ch, CURLOPT_HEADER, 1);
preg_match('/^Set-Cookie: (.*?);/m', curl_exec($ch), $m);
$xid = substr($m[1], 10);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url_path.'admin/login.php');
// ENABLE HTTP POST
curl_setopt($ch, CURLOPT_POST, 1);
//SET POST PARAMETERS : FORM VALUES FOR EACH FIELD
curl_setopt($ch, CURLOPT_POSTFIELDS, 'xid_7d781='.$xid.'$username=*****&password=******&mode=login&usertype=P&redirect=admin');
# Setting CURLOPT_RETURNTRANSFER variable to 1 will force cURL # not to print out the results of its query.
# Instead, it will return the results as a string return value # from curl_exec() instead of the usual true/false.
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); // Allow redirection
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookiefile);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookiefile);
//
//echo('EXECUTE 1st REQUEST (FORM LOGIN)');
$page = curl_exec ($ch);
curl_close ($ch);
echo $page;
but when i call it, the form fields are blank. Furthermore the method that returns is GET and not POST. Why returns GET?
The same code if i try it for another domain run correctly and that is also very strange. Is it possible the post method blocked by something?
I tried to echo any possible errors with var_dump(curl_error($ch)); but the string is empty.
Hey I am trying to login to the NJIT site to check if username and password are correct. For some reason I keep getting rejected even if I use correct credentials. Also how do I strip the $result to check if it contains "Fail" which would mean the credentials were wrong. Here is my code.
Main:
<?PHP
session_start();
require_once('functions.php');
//$UCID=$_POST['UCID'];
//$Pass=$_POST['Pass'];
$UCID="jko328";
$Pass="password";
$credentialsNJIT="user=".$UCID."&pass=".$Pass;
$njit_url="https://cp4.njit.edu/cp/home/login";
$njit_result=goCurlNJIT($credentialsNJIT, $njit_url);
echo $result;
?>
Here is the cURL function:
function goCurlNJIT($postdata, $url){
session_start();
$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt ($ch, CURLOPT_TIMEOUT, 60);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_PROTOCOLS, CURLPROTO_HTTPS);
curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_ANY);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt ($ch, CURLOPT_REFERER, $url);
curl_setopt ($ch, CURLOPT_POSTFIELDS, $postdata);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "POST");
curl_setopt ($ch, CURLOPT_POST, true);
$result = curl_exec($ch);
curl_close($ch);
if(strpos($result, "Failed") === false){
$response = "NJIT did not like credentials";
}
else{
$response = "NJIT liked your credentials";
}
echo $response;
}
Actually when we load a page it saves the cookie and send it . So to sign in you first need to acess the page without credential and save the cookies . In next request you need to send the cookies . To avoid bot script login usually websites have dynamic hidden fields and other securities .. in this case you cant log on .
I'm updating the function too much and making it way more flexible. --You can update it further more if you want.
First and most importantly, you need to create a text file named cookie.txt in the directory where your scrapping file is.
function goCurlNJIT($header = array(), $url, $post = false)
{
$cookie = "cookie.txt";
$ch = curl_init();
curl_setopt($ch, CURLOPT_ENCODING, 'gzip,deflate,sdch');
if (isset($header) && !empty($header))
{
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
}
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, 200);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_MAXREDIRS, 5);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36");
curl_setopt($ch, CURLOPT_COOKIEJAR, realpath($cookie));
curl_setopt($ch, CURLOPT_COOKIEFILE, realpath($cookie));
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_REFERER, $url);
//if it's a POST request instead of GET
if (isset($post) && !empty($post) && $post)
{
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post);
} //endif
$data = curl_exec($ch);
curl_close($ch);
if($info['http_code'] == 200){
return ($data); //this will return page on successful, false otherwise
}else{
return false;
}
}
if you'd paid a close look to Requested Headers, you can notice that there is one unusual header there, that is Upgrade-Insecure-Requests:1(I personally haven't seen this before, so it's wise to send this along with request).
Next the request that you're posting is not like as it should be, you're missing stuff.
$UCID="jko328";
$Pass="password";
$credentialsNJIT="user=".$UCID."&pass=".$Pass; // where is uuid?
it should be something like this. You're skipping uuid from post string.
pass=password&user=jko328&uuid=0xACA021
so putting altogether,
$post['user'] = "jko328";
$post['pass'] = "password";
$post['uuid'] = '0xACA021';
$urlToPost = "https://cp4.njit.edu/cp/home/login";
$header['Upgrade-Insecure-Requests'] = 1;
//Now make call to function, and it'll work fine.
echo goCurlNJIT($header, $urlToPost, http_build_query($post));
and this will work fine. Make sure you've created cookie.txt file in the directory where your this scripts is.
I am looking to call the Twitter API to grab tweets (successfully achieved tweets on load) but now I am looking to update the page automatically, allowing the tweets to automatically load without reload/user interaction.
I know this type of functionality is possible (monitter.com) does it, but what is the technology used to do so? Can I do it with PHP?
Thanks
As #suresh.g said, you can use AJAX. The simplest way: use jQuery.
Also, you can use an iframe that reloads every 10 seconds with the setInterval() javascript function. The user will not have a reload of his entire page, but the twitter iframe.
Another type of technology is COMET or PUSH technology, but I don't think you need it right now, but it's good to know about it ;)
use curl
function curl_grab_page($url,$data,$secure="false",$ref_url="",$login = "false",$proxy = "null",$proxystatus = "false")
{
if($login == 'true') {
$fp = fopen("cookie.txt", "w");
fclose($fp);
}
$ch = curl_init();
curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt");
curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt");
curl_setopt($ch, CURLOPT_TIMEOUT, 60);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
if ($proxystatus == 'true') {
curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, TRUE);
curl_setopt($ch, CURLOPT_PROXY, $proxy);
}
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
if($secure=='true')
{
curl_setopt($ch, CURLOPT_SSLVERSION,3);
}
curl_setopt( $ch, CURLOPT_HTTPHEADER, array( 'Expect:' ) );
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_REFERER, $ref_url);
curl_setopt($ch, CURLOPT_HEADER, TRUE);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_POST, TRUE);
curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
ob_start();
return curl_exec ($ch); // execute the curl command
curl_getinfo($ch);
ob_end_clean();
curl_close ($ch);
unset($ch);
}
just call this function the way you want the data you want to set it can do every thing for you just dont forget to setup curl in you php.ini
thanks
I am trying to scrap a facebook page ( https://www.facebook.com/pages/PTSD/455847705426 )
I found this script to login to facebook.
<?php
$EMAIL = "me#mail.com";
$PASSWORD = "facebookPassword";
function cURL($url, $header=NULL, $cookie=NULL, $p=NULL)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, $header);
curl_setopt($ch, CURLOPT_NOBODY, $header);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_COOKIE, $cookie);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
if ($p) {
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "POST");
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $p);
}
$result = curl_exec($ch);
if ($result) {
return $result;
} else {
return curl_error($ch);
}
curl_close($ch);
}
$a = cURL("https://login.facebook.com/login.php?login_attempt=1",true,null,"email=$EMAIL&pass=$PASSWORD");
preg_match('%Set-Cookie: ([^;]+);%',$a,$b);
$c = cURL("https://login.facebook.com/login.php?login_attempt=1",true,$b[1],"email=$EMAIL&pass=$PASSWORD");
preg_match_all('%Set-Cookie: ([^;]+);%',$c,$d);
for($i=0;$i<count($d[0]);$i++)
$cookie.=$d[1][$i].";";
/*
NOW TO JUST OPEN ANOTHER URL EDIT THE FIRST ARGUMENT OF THE FOLLOWING FUNCTION.
TO SEND SOME DATA EDIT THE LAST ARGUMENT.
*/
$page_html = cURL("https://www.facebook.com/pages/PTSD/455847705426",null,$cookie,null);
?>
now variable $page_html have only few posts, moreover they are in very complex code
my questions are
how can I get all posts.
is there some other approach which return me complete and clear data.
is there some way to have all posts in json format.
please tell me if there is some useful tutorial or articles regarding this.
Regards
Spend some time reading the developer documentation. You can get all the posts as a JSON object from a page by setting up an app, then querying the graph api with a page access token.
Assume captcha key is invalid, it need to download new captcha image again and re-validate captcha key. How can that be done?
I have include short example, is this the way to do?
while (1) {
$postData = http_build_query($data);
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_CAINFO, getcwd() . "\**********************.crt");
curl_setopt($ch, CURLOPT_URL, "https://domain.com/test" . $form_link);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postData);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_COOKIESESSION, TRUE);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookiesPath . "/cookiefile.txt");
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookiesPath . "/cookiefile.txt");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$page = curl_exec($ch);
//Just a quick example
if ($page == "Sucess") {
break;
} else {
$ch = curl_init();
//Some curl code here to Re-download Captcha Image (new image)
$data['captchaText'] = CaptchaToText::Scan("images/captcha.jpg");
}
}
Yes, you doing it right. But only in the firs part )
You already have cURL resource initiated ($ch).
So you only need to execute cURL request again by curl_exec($ch) and you will get a new page.
All the cURL options set by curl_setopt() are saved in resourse.
Here is the code:
if ($page == "Sucess") {
break;
} else {
$page = curl_exec($ch);
//Some curl code here to Re-download Captcha Image (new image)
$data['captchaText'] = CaptchaToText::Scan("images/captcha.jpg");
}