PHP Curl request not working but works fine in POSTMAN - php

I am trying to login to MCA portal ( POST URL : http://www.mca.gov.in/mcafoportal/loginValidateUser.do )
I tried logging in with POSTMAN app on Google Chrome which works fine. However, it doesnt work either in PHP/Python. I am not able to login through PHP/Python
Here is the PHP Code :
$url="http://www.mca.gov.in/mcafoportal/loginValidateUser.do";
$post_fields = array();
$post_fields['userNamedenc']='hGJfsdnk`1t';
$post_fields['passwordenc']='675894242fa9c66939d9fcf4d5c39d1830f4ddb9';
$post_fields['accessCode'] = ""
$str = call_post_mca($url, $post_fields);
$str = str_replace(" ","",$str);
$dom = new DOMDocument();
$dom->loadHTML($str);
$xpath = new DOMXPath($dom);
$input_id = '//input[#id="login_accessCode"]/#value';
$input_val = $xpath->query($input_id)->item(0);
$input_val1 = $input_val->nodeValue;
$url="http://www.mca.gov.in/mcafoportal/loginValidateUser.do";
$post_fields['userNamedenc']='hGJfsdnk`1t';
$post_fields['passwordenc']='675894242fa9c66939d9fcf4d5c39d1830f4ddb9';
$post_fields['accessCode'] = $input_val1; //New Accesscode
function call_post_mca($url, $params)
{
#$user_agent = getRandomUserAgent();
$user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36";
$str = "";
foreach($params as $key=>$value)
{
$str = $str . "$key=$value" . "&";
}
$postData = rtrim($str, "&");
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch,CURLOPT_HEADER, false);
#curl_setopt($ch, CURLOPT_CAINFO, DOC_ROOT . '/includes/cacert.pem');
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postData);
curl_setopt($ch,CURLOPT_USERAGENT, $user_agent);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt ($ch, CURLOPT_REFERER, $url);
$cookie= DOC_ROOT . "/cookie.txt";
curl_setopt ($ch, CURLOPT_COOKIEJAR, $cookie);
curl_setopt ($ch, CURLOPT_COOKIEFILE, $cookie);
$output=curl_exec($ch);
curl_close($ch);
return $output;
}
Any idea what is missing ?

The site does a redirect, so you need to add
CURLOPT_FOLLOWLOCATION => 1
to your options array. When in doubt with cURL, try
$status = curl_getinfo($curl);
echo json_encode($status, JSON_PRETTY_PRINT);
giving :
{
"url": "http:\/\/www.mca.gov.in\/mcafoportal\/loginValidateUser.do?userNamedenc=hGJfsdnk%601t&passwordenc=675894242fa9c66939d9fcf4d5c39d1830f4ddb9&accessCode=-825374456",
"content_type": "text\/plain",
"http_code": 302,
"header_size": 1560,
"request_size": 245,
"filetime": -1,
"ssl_verify_result": 0,
"redirect_count": 0,
"total_time": 1.298891,
"namelookup_time": 0.526375,
"connect_time": 0.999786,
"pretransfer_time": 0.999844,
"size_upload": 0,
"size_download": 0,
"speed_download": 0,
"speed_upload": 0,
"download_content_length": 0,
"upload_content_length": -1,
"starttransfer_time": 1.298875,
"redirect_time": 0,
"redirect_url": "http:\/\/www.mca.gov.in\/mcafoportal\/login.do",
"primary_ip": "115.114.108.120",
"certinfo": [],
"primary_port": 80,
"local_ip": "192.168.1.54",
"local_port": 62524
}
As you can see, you got a 302 redirect status, but a redirect_count was 0. After adding the option, i get:
{
"url": "http:\/\/www.mca.gov.in\/mcafoportal\/login.do",
"content_type": "text\/html;charset=ISO-8859-1",
"http_code": 200,
"header_size": 3131,
"request_size": 376,
"filetime": -1,
"ssl_verify_result": 0,
"redirect_count": 1,
"total_time": 2.383609,
"namelookup_time": 1.7e-5,
"connect_time": 1.7e-5,
"pretransfer_time": 4.4e-5,
"size_upload": 0,
"size_download": 42380,
"speed_download": 17779,
"speed_upload": 0,
"download_content_length": 42380,
"upload_content_length": -1,
"starttransfer_time": 0.30734,
"redirect_time": 0.915858,
"redirect_url": "",
"primary_ip": "14.140.191.120",
"certinfo": [],
"primary_port": 80,
"local_ip": "192.168.1.54",
"local_port": 62642
}
EDIT url encode the request parameters , and follow redirects
$str = urlencode("userNamedenc=hGJfsdnk%601t&passwordenc=675894242fa9c66939d9fcf4d5c39d1830f4ddb9&accessCode=-825374456");
curl_setopt_array(
$curl , array (
CURLOPT_URL => "http://www.mca.gov.in/mcafoportal/loginValidateUser.do" , // <- removed parameters here
CURLOPT_RETURNTRANSFER => true ,
CURLOPT_ENCODING => "" ,
CURLOPT_FOLLOWLOCATION => 1 ,
CURLOPT_MAXREDIRS => 10 ,
CURLOPT_TIMEOUT => 30 ,
CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1 ,
CURLOPT_CUSTOMREQUEST => "POST" ,
CURLOPT_POSTFIELDS => $str, // <- added this here
CURLOPT_HTTPHEADER => array (
"cache-control: no-cache"
) ,
)
);

The simplest thing you can do, since you already have it working in POSTMAN, is to render out the PHP code in POSTMAN. Here is link about to get PHP code from POSTMAN. Then you can compare the POSTMAN example to your code.
<?php
$curl = curl_init();
curl_setopt_array($curl, array(
CURLOPT_URL => "http://www.mca.gov.in/mcafoportal/loginValidateUser.do?userNamedenc=hGJfsdnk%601t&passwordenc=675894242fa9c66939d9fcf4d5c39d1830f4ddb9&accessCode=",
CURLOPT_RETURNTRANSFER => true,
CURLOPT_ENCODING => "",
CURLOPT_MAXREDIRS => 10,
CURLOPT_TIMEOUT => 30,
CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
CURLOPT_CUSTOMREQUEST => "POST",
CURLOPT_HTTPHEADER => array(
"cache-control: no-cache",
"postman-token: b54abdc0-17be-f38f-9aba-dbf8f007de99"
),
));
$response = curl_exec($curl);
$err = curl_error($curl);
curl_close($curl);
if ($err) {
echo "cURL Error #:" . $err;
} else {
echo $response;
}
What is immediately popping out to me is this 'hGJfsdnk`1t'. The backward quote can be an escape character '`'. This could very well be throwing an error where error handling redirects back to the login page.
POSTMAN likely has something built in to render out the escape character to 'hGJfsdnk%601t'. Thus, this works in POSTMAN, but not in your code.
Here is the status of this request:
{
"url": "http:\/\/www.mca.gov.in\/mcafoportal\/login.do",
"content_type": "text\/html;charset=ISO-8859-1",
"http_code": 200,
"header_size": 3020,
"request_size": 821,
"filetime": -1,
"ssl_verify_result": 0,
"redirect_count": 1,
"total_time": 2.920125,
"namelookup_time": 8.2e-5,
"connect_time": 8.7e-5,
"pretransfer_time": 0.000181,
"size_upload": 0,
"size_download": 42381,
"speed_download": 14513,
"speed_upload": 0,
"download_content_length": -1,
"upload_content_length": -1,
"starttransfer_time": 0.320995,
"redirect_time": 2.084554,
"redirect_url": "",
"primary_ip": "115.114.108.120",
"certinfo": [],
"primary_port": 80,
"local_ip": "192.168.1.3",
"local_port": 45086
}
Here is shows the successful login.

This is honestly one of weird sites I have seen in a long time. First thing was to know how it works. So I decided to use chrome and see what happens when we login with wrong data
Observations:
Blanks username and passwords fields
Generates SHA1 hashes of username and password fields and sets then in userNamedenc and respectively
We can override username and password directly in JavaScript and login to your account just by overriding the details from console.
There are lot of different request which generates cookies but none of them look any useful
So the approach to solve the issue was to follow below steps
Get the login url login.do
Fetch the form details from the response for the access code
Submit the form to loginValidateUser.do
The form sends below parameters
Now one interesting part of the same is below post data
displayCaptcha:true
userEnteredCaptcha:strrty
If we override the displayCaptcha to false then captcha is no more needed. So a wonderful bypass
displayCaptcha: false
Next was to code all of the above in PHP, but the site seemed so weird that many of the attempts failed. So finally I realized that we need to take it a bit closer to the browser login and also I felt delays between calls are needed
<?php
require_once("curl.php");
$curl = new CURL();
$default_headers = Array(
"Accept" => "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
"Accept-Encoding" => "deflate",
"Accept-Language" => "en-US,en;q=0.8",
"Cache-Control" => "no-cache",
"Connection" => "keep-alive",
"DNT" => "1",
"Pragma" => "no-cache",
"Referer" => "http://www.mca.gov.in/mcafoportal/login.do",
"Upgrade-Insecure-Requests" => "1"
);
// Get the login page
$curl
->followlocation(0)
->cookieejar("")
->verbose(1)
->get("http://www.mca.gov.in/mcafoportal/login.do")
->header($default_headers)
->useragent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36")
->execute();
// Save the postfileds and access code as we would need them later for the POST field
$post = $curl->loadInputFieldsFromResponse()
->updatePostParameter(array(
"displayCaptcha" => "false",
"userNamedenc" => "hGJfsdnk`1t",
"passwordenc" => "675894242fa9c66939d9fcf4d5c39d1830f4ddb9",
"userName" => "",
"Cert" => ""))
->referrer("http://www.mca.gov.in/mcafoportal/login.do")
->removePostParameters(
Array("dscBasedLoginFlag", "maxresults", "fe", "query", "SelectCert", "newUserRegistration")
);
$postfields = $curl->getPostFields();
var_dump($postfields);
// Access some dummy URLs to make it look like browser
$curl
->get("http://www.mca.gov.in/mcafoportal/js/global.js")->header($default_headers)->execute()->sleep(2)
->get("http://www.mca.gov.in/mcafoportal/js/loginValidations.js")->header($default_headers)->execute()->sleep(2)
->get("http://www.mca.gov.in/mcafoportal/css/layout.css")->header($default_headers)->execute()->sleep(2)
->get("http://www.mca.gov.in/mcafoportal/img/bullet.png")->header($default_headers)->execute()->sleep(2)
->get("http://www.mca.gov.in/mcafoportal/getCapchaImage.do")->header($default_headers)->execute()->sleep(2);
// POST to the login form the postfields saved earlier
$curl
->sleep(20)
->header($default_headers)
->postfield($postfields)
->referrer("http://www.mca.gov.in/mcafoportal/login.do")
->post("http://www.mca.gov.in/mcafoportal/loginValidateUser.do")
->execute(false)
->sleep(3)
->get("http://www.mca.gov.in/mcafoportal/login.do")
->header($default_headers)
->execute(true);
// Get the response from last GET of login.do
$curl->getResponseText($output);
//Check if user name is present in the output or not
if (stripos($output, "Kiran") > 0) {
echo "Hurray!!!! Login succeeded";
} else {
echo "Login failed please retry after sometime";
}
After running the code it works few times and few times it doesn't. My observations
Only one login is allowed at a time. So not sure if others were using the login when I was testing
Without delays it would fail most of the time
There is no obvious reason when it fails to login except the site doing something on server side to block the request
The reusable curl.php I created and used for chaining methods is below
<?php
class CURL
{
protected $ch;
protected $postfields;
public function getPostFields() {
return $this->postfields;
}
public function newpost()
{
$this->postfields = array();
return $this;
}
public function addPostFields($key, $value)
{
$this->postfields[$key]=$value;
return $this;
}
public function __construct()
{
$ch = curl_init();
$this->ch = $ch;
$this->get()->followlocation()->retuntransfer(); //->connectiontimeout(20)->timeout(10);
}
function url($url)
{
curl_setopt($this->ch, CURLOPT_URL, $url);
return $this;
}
function verbose($value = true)
{
curl_setopt($this->ch, CURLOPT_VERBOSE, $value);
return $this;
}
function post($url='')
{
if ($url !== '')
$this->url($url);
curl_setopt($this->ch, CURLOPT_POST, count($this->postfields));
curl_setopt($this->ch, CURLOPT_POSTFIELDS, http_build_query($this->postfields));
return $this;
}
function postfield($fields)
{
if (is_array($fields)){
$this->postfields = $fields;
}
return $this;
}
function close()
{
curl_close($this->ch);
return $this;
}
function cookieejar($cjar)
{
curl_setopt($this->ch, CURLOPT_COOKIEJAR, $cjar);
return $this;
}
function cookieefile($cfile)
{
curl_setopt($this->ch, CURLOPT_COOKIEFILE, $cfile);
return $this;
}
function followlocation($follow = 1)
{
curl_setopt($this->ch, CURLOPT_FOLLOWLOCATION, $follow);
return $this;
}
function loadInputFieldsFromResponse($response ='')
{
if ($response)
$doc = $response;
else
$doc = $this->lastCurlRes;
/* #var $doc DOMDocument */
//simplexml_load_string($data)
$this->getResponseDoc($doc);
$this->postfields = array();
foreach ($doc->getElementsByTagName('input') as $elem) {
/* #var $elem DomNode */
$name = $elem->getAttribute('name');
// if (!$name)
// $name = $elem->getAttribute('id');
if ($name)
$this->postfields[$name] = $elem->getAttribute("value");
}
return $this;
}
function retuntransfer($transfer=1)
{
curl_setopt($this->ch, CURLOPT_RETURNTRANSFER, $transfer);
return $this;
}
function connectiontimeout($connectiontimeout)
{
curl_setopt($this->ch, CURLOPT_CONNECTTIMEOUT, $connectiontimeout);
return $this;
}
function timeout($timeout)
{
curl_setopt($this->ch, CURLOPT_TIMEOUT, $timeout);
return $this;
}
function useragent($useragent)
{
curl_setopt($this->ch, CURLOPT_USERAGENT, $useragent);
return $this;
}
function referrer($referrer)
{
curl_setopt($this->ch, CURLOPT_REFERER, $referrer);
return $this;
}
function getCURL()
{
return $this->ch;
}
protected $lastCurlRes;
protected $lastCurlResInfo;
function get($url = '')
{
if ($url !== '')
$this->url($url);
curl_setopt($this->ch, CURLOPT_POST, 0);
curl_setopt($this->ch, CURLOPT_HTTPGET, true);
return $this;
}
function sleep($seconds){
sleep($seconds);
return $this;
}
function execute($output=false)
{
$this->lastCurlRes = curl_exec($this->ch);
if ($output == true)
{
echo "Response is \n " . $this->lastCurlRes;
file_put_contents("out.html", $this->lastCurlRes);
}
$this->lastCurlResInfo = curl_getinfo($this->ch);
$this->postfields = array();
return $this;
}
function header($headers)
{
//curl_setopt($this->ch, CURLOPT_HEADER, true);
curl_setopt($this->ch, CURLOPT_HTTPHEADER, $headers);
return $this;
}
function getResponseText(&$text){
$text = $this->lastCurlRes;
return $this;
}
/*
*
* #param DOMDocument $doc
*
*
*/
function getResponseDoc(&$doc){
$doc = new DOMDocument();
libxml_use_internal_errors(false);
libxml_disable_entity_loader();
#$doc->loadHTML($this->lastCurlRes);
return $this;
}
function removePostParameters($keys) {
if (!is_array($keys))
$keys = Array($keys);
foreach ($keys as $key){
if (array_key_exists($key, $this->postfields))
unset($this->postfields[$key]);
}
return $this;
}
function keepPostParameters($keys) {
$delete = Array();
foreach ($this->postfields as $key=>$value){
if (!in_array($key, $keys)){
array_push($delete, $key);
}
}
foreach ($delete as $key) {
unset($this->postfields[$key]);
}
return $this;
}
function updatePostParameter($postarray, $encoded=false)
{
if (is_array($postarray))
{
foreach ($postarray as $key => $value) {
if (is_null($value))
unset($this->postfields[$key]);
else
$this->postfields[$key] = $value;
}}
elseif (is_string($postarray))
{
$parr = preg_split("/&/",$postarray);
foreach ($parr as $postvalue) {
if (($index = strpos($postvalue, "=")) != false)
{
$key = substr($postvalue, 0,$index);
$value = substr($postvalue, $index + 1);
if ($encoded)
$this->postfields[$key]=urldecode($value);
else
$this->postfields[$key]=$value;
}
else
$this->postfields[$postvalue] = "";
}
}
return $this;
}
function getResponseXml(){
//SimpleXMLElement('<INPUT/>')->asXML();
}
function SSLVerifyPeer($verify=false)
{
curl_setopt($this->ch, CURLOPT_SSL_VERIFYPEER, $verify);
return $this;
}
}
?>

#yvesleborg and #tarun-lalwani gave the right hints. You need to take care of the cookies and the redirects. But nevertheless it was not working always for me. I guess the site operator requires some timeout between the two requests.
I rewrote your code a little bit to play around with it.
mycurl.php:
function my_curl_init() {
$url="http://www.mca.gov.in/mcafoportal/loginValidateUser.do";
$user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36";
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
return $ch;
}
/*
* first call in order to get accessCode and sessionCookie
*/
$ch = my_curl_init();
curl_setopt($ch, CURLOPT_COOKIEJAR, __DIR__ . "/cookie.txt"); // else cookielist is empty
$output = curl_exec($ch);
file_put_contents(__DIR__ . '/loginValidateUser.html', $output);
// save cookie info
$cookielist = curl_getinfo($ch, CURLINFO_COOKIELIST);
//print_r($cookielist);
curl_close($ch);
// parse accessCode from output
$re = '/\<input.*name="accessCode".*value="([-0-9]+)"/';
preg_match_all($re, $output, $matches, PREG_SET_ORDER, 0);
if ($matches) {
$accessCode = $matches[0][1];
// debug
echo "accessCode: $accessCode" . PHP_EOL;
/*
* second call in order to login
*/
$post_fields = array(
'userNamedenc' => 'hGJfsdnk`1t',
'passwordenc' => '675894242fa9c66939d9fcf4d5c39d1830f4ddb9',
'accessCode' => $accessCode
);
$cookiedata = preg_split('/\s+/', $cookielist[0]);
$session_cookie = $cookiedata[5] . '=' . $cookiedata[6];
// debug
echo "sessionCookie: $session_cookie" . PHP_EOL;
file_put_contents(__DIR__ . '/cookie2.txt', $session_cookie);
/*
* !!! pause !!!
*/
sleep(20);
// debug
echo "curl -v -L -X POST -b '$session_cookie;' --data 'userNamedenc=hGJfsdnk`1t&passwordenc=675894242fa9c66939d9fcf4d5c39d1830f4ddb9&accessCode=$accessCode' http://www.mca.gov.in/mcafoportal/loginValidateUser.do > loginValidateUser2.html";
$ch = my_curl_init();
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_fields);
curl_setopt($ch, CURLOPT_COOKIE, $session_cookie);
$output = curl_exec($ch);
file_put_contents(__DIR__ . '/loginValidateUser2.html', $output);
curl_close($ch);
}
The script issues two request to the website. The output of the first one is used to read the accessCode and to store the session cookie. Then after a little break the second one is issued using the accessCode and session information together with the login credentials.
I tested it with PHP5.6 from a terminal (php -f mycurl.php). The script debugs all necessary information, outputs a curl command you could use in a terminal and logs the HTML and cookie information to some files in the same folder like the script.
Running the script too often doesn't work. The login won't work. So take your time and wait some minutes between your tries. Or change your IP ;)
Hope it helps.

Replication of the Problem
I did the same thing in Postman as you did your screenshot but was not able to log in:
The only difference I can see is that your request had cookies, which I suspect is why you were able to log in without all the other input fields. And it seems like there are quite a number of input fields:
Using Postman
So, I used postman intercept to capture all the fields used during a login, including the captcha and access code and I was able to login:
Update 1
I've found out that once you've solved a captcha to login, after you've logged out you may login again without displayCaptcha and userEnteredCaptcha in your form data, provided that you use the same cookie as the one you've used to login successfully. You just need to get a valid accessCode from the login page.

it doesnt work either in PHP/Python that is (as others have already pointed out) because you were using your browser's existing cookie session,
which had already solved the captcha. clear your browser cookies, get a fresh cookie session, and DO NOT SOLVE THE CAPTCHA, and Postman won't be able to log in either.
Any idea what is missing ? several things, among them, several post login parameters (browserFlag, loginType,__checkbox_dscBasedLoginFlag, and many more),
also your encoding loop here is bugged $str = $str . "$key=$value" . "&"; ,
it pretty much only works as long as both keys and values only contain [a-zA-Z0-9] characters,
and since your userNamedenc contains a grave accent character, your encoding loop is insufficient.
a fixed loop would be
foreach($params as $key=>$value){
$str = $str . urlencode($key)."=".urlencode($value) . "&";
}
$str=substr($str,0,-1);
, but
this is exactly why we have the http_build_query function, that entire loop and the following trim can be replaced with this single line:
$str=http_build_query($params);
, also, seems you're trying to login without a pre-existing cookie session,
that's not going to work. when you do a GET request to the login page, you get a cookie, and a unique captcha,
the captcha answer is tied to your cookie session, and needs to be solved before you attempt to login,
you also provide no code to deal with the captcha. also, when parsing the "userName" input element, it will default to "Enter Username", which is emtied with javascript and replaced with userNamedenc, you must replicate this in PHP,
also, it will have an input element named "dscBasedLoginFlag", which is removed with javascript, you must also do this part in php,
also it has an input element named "Cert", which has a default value, but this value is cleared with javascript, do the same in php,
and an input element named "newUserRegistration", which is removed with javascript, do that,
here's what you should do: make a GET request to the login page, save the cookie session and make sure to provide it for all further requests, and parse out all the login form's elements and add them to your login request (but be careful, there is 2x form inputs, 1 belong to the search bar, only parse the children of the login form, don't mix the 2), and remember to clear/remove the special input tags to emulate the javascript, as described above,
then make a GET request to the captcha url, make sure to provide the session cookie, solve the captcha,
then make the final login request, with the captcha answer, and userNamedenc and passwordenc and all the other elements
parsed out from the login page... that should work. now, solving the captcha programmatically,
the captha doesn't look too hard, cracking it probably can be automated, but until someone actually does that,
you can use Deathbycaptcha to do it for you, however note that it isn't a free service.
here's a fully tested, working example implementation, using my hhb_curl library (from https://github.com/divinity76/hhb_.inc.php/blob/master/hhb_.inc.php ), and the Deathbycaptcha api:
<?php
declare(strict_types = 1);
header ( "content-type: text/plain;charset=utf8" );
require_once ('hhb_.inc.php');
const DEATHBYCATPCHA_USERNAME = '?';
const DEATHBYCAPTCHA_PASSWORD = '?';
$hc = new hhb_curl ( '', true );
$hc->setopt(CURLOPT_TIMEOUT,20);// im on a really slow net atm :(
$html = $hc->exec ( 'http://www.mca.gov.in/mcafoportal/login.do' )->getResponseBody (); // cookie session etc
$domd = #DOMDocument::loadHTML ( $html );
$inputs = getDOMDocumentFormInputs ( $domd, true ) ['login'];
$params = [ ];
foreach ( $inputs as $tmp ) {
$params [$tmp->getAttribute ( "name" )] = $tmp->getAttribute ( "value" );
}
assert ( isset ( $params ['userNamedenc'] ), 'username input not found??' );
assert ( isset ( $params ['passwordenc'] ), 'passwordenc input not found??' );
$params ['userName'] = ''; // defaults to "Enter Username", cleared with javascript
unset ( $params ['dscBasedLoginFlag'] ); // removed with javascript
$params ['Cert'] = ''; // cleared to emptystring with javascript
unset ( $params ['newUserRegistration'] ); // removed with javascript
unset ( $params ['SelectCert'] ); // removed with javascript
$params ['userNamedenc'] = 'hGJfsdnk`1t';
$params ['passwordenc'] = '675894242fa9c66939d9fcf4d5c39d1830f4ddb9';
echo 'parsed login parameters: ';
var_dump ( $params );
$captchaRaw = $hc->exec ( 'http://www.mca.gov.in/mcafoportal/getCapchaImage.do' )->getResponseBody ();
$params ['userEnteredCaptcha'] = solve_captcha2 ( $captchaRaw );
// now actually logging in.
$html = $hc->setopt_array ( array (
CURLOPT_POST => true,
CURLOPT_POSTFIELDS => http_build_query ( $params )
) )->exec ( 'http://www.mca.gov.in/mcafoportal/loginValidateUser.do' )->getResponseBody ();
var_dump ( $hc->getStdErr (), $hc->getStdOut () ); // printing debug data
$domd = #DOMDocument::loadHTML ( $html );
$xp = new DOMXPath ( $domd );
$loginErrors = $xp->query ( '//ul[#class="errorMessage"]' );
if ($loginErrors->length > 0) {
echo 'encountered following error(s) logging in: ';
foreach ( $loginErrors as $err ) {
echo $err->textContent, PHP_EOL;
}
die ();
}
echo "logged in successfully!";
/**
* solves the captcha manually, by doing: echo ANSWER>captcha.txt
*
* #param string $raw_image
* raw image bytes
* #return string answer
*/
function solve_captcha2(string $raw_image): string {
$imagepath = getcwd () . DIRECTORY_SEPARATOR . 'captcha.png';
$answerpath = getcwd () . DIRECTORY_SEPARATOR . 'captcha.txt';
#unlink ( $imagepath );
#unlink ( 'captcha.txt' );
file_put_contents ( $imagepath, $raw_image );
echo 'the captcha is saved in ' . $imagepath . PHP_EOL;
echo ' waiting for you to solve it by doing: echo ANSWER>' . $answerpath, PHP_EOL;
while ( true ) {
sleep ( 1 );
if (file_exists ( $answerpath )) {
$answer = trim ( file_get_contents ( $answerpath ) );
echo 'solved: ' . $answer, PHP_EOL;
return $answer;
}
}
}
function solve_captcha(string $raw_image): string {
echo 'solving captcha, hang on, with DEATBYCAPTCHA this usually takes between 10 and 20 seconds.';
{
// unfortunately, CURLFile requires a filename, it wont accept a string, so make a file of it
$tmpfileh = tmpfile ();
fwrite ( $tmpfileh, $raw_image ); // TODO: error checking (incomplete write or whatever)
$tmpfile = stream_get_meta_data ( $tmpfileh ) ['uri'];
}
$hc = new hhb_curl ( '', true );
$hc->setopt_array ( array (
CURLOPT_URL => 'http://api.dbcapi.me/api/captcha',
CURLOPT_POSTFIELDS => array (
'username' => DEATHBYCATPCHA_USERNAME,
'password' => DEATHBYCAPTCHA_PASSWORD,
'captchafile' => new CURLFile ( $tmpfile, 'image/png', 'captcha.png' )
)
) )->exec ();
fclose ( $tmpfileh ); // when tmpfile() is fclosed(), its also implicitly deleted.
$statusurl = $hc->getinfo ( CURLINFO_EFFECTIVE_URL ); // status url is given in a http 300x redirect, which hhb_curl auto-follows
while ( true ) {
// wait for captcha to be solved.
sleep ( 10 );
echo '.';
$json = $hc->setopt_array ( array (
CURLOPT_HTTPHEADER => array (
'Accept: application/json'
),
CURLOPT_HTTPGET => true
) )->exec ()->getResponseBody ();
$parsed = json_decode ( $json, false );
if (! empty ( $parsed->captcha )) {
echo 'captcha solved!: ' . $parsed->captcha, PHP_EOL;
return $parsed->captcha;
}
}
}
function getDOMDocumentFormInputs(\DOMDocument $domd, bool $getOnlyFirstMatches = false): array {
// :DOMNodeList?
$forms = $domd->getElementsByTagName ( 'form' );
$parsedForms = array ();
$isDescendantOf = function (\DOMNode $decendant, \DOMNode $ele): bool {
$parent = $decendant;
while ( NULL !== ($parent = $parent->parentNode) ) {
if ($parent === $ele) {
return true;
}
}
return false;
};
// i can't use array_merge on DOMNodeLists :(
$merged = function () use (&$domd): array {
$ret = array ();
foreach ( $domd->getElementsByTagName ( "input" ) as $input ) {
$ret [] = $input;
}
foreach ( $domd->getElementsByTagName ( "textarea" ) as $textarea ) {
$ret [] = $textarea;
}
foreach ( $domd->getElementsByTagName ( "button" ) as $button ) {
$ret [] = $button;
}
return $ret;
};
$merged = $merged ();
foreach ( $forms as $form ) {
$inputs = function () use (&$domd, &$form, &$isDescendantOf, &$merged): array {
$ret = array ();
foreach ( $merged as $input ) {
// hhb_var_dump ( $input->getAttribute ( "name" ), $input->getAttribute ( "id" ) );
if ($input->hasAttribute ( "disabled" )) {
// ignore disabled elements?
continue;
}
$name = $input->getAttribute ( "name" );
if ($name === '') {
// echo "inputs with no name are ignored when submitted by mainstream browsers (presumably because of specs)... follow suite?", PHP_EOL;
continue;
}
if (! $isDescendantOf ( $input, $form ) && $form->getAttribute ( "id" ) !== '' && $input->getAttribute ( "form" ) !== $form->getAttribute ( "id" )) {
// echo "this input does not belong to this form.", PHP_EOL;
continue;
}
if (! array_key_exists ( $name, $ret )) {
$ret [$name] = array (
$input
);
} else {
$ret [$name] [] = $input;
}
}
return $ret;
};
$inputs = $inputs (); // sorry about that, Eclipse gets unstable on IIFE syntax.
$hasName = true;
$name = $form->getAttribute ( "id" );
if ($name === '') {
$name = $form->getAttribute ( "name" );
if ($name === '') {
$hasName = false;
}
}
if (! $hasName) {
$parsedForms [] = array (
$inputs
);
} else {
if (! array_key_exists ( $name, $parsedForms )) {
$parsedForms [$name] = array (
$inputs
);
} else {
$parsedForms [$name] [] = $tmp;
}
}
}
unset ( $form, $tmp, $hasName, $name, $i, $input );
if ($getOnlyFirstMatches) {
foreach ( $parsedForms as $key => $val ) {
$parsedForms [$key] = $val [0];
}
unset ( $key, $val );
foreach ( $parsedForms as $key1 => $val1 ) {
foreach ( $val1 as $key2 => $val2 ) {
$parsedForms [$key1] [$key2] = $val2 [0];
}
}
}
return $parsedForms;
}
example usage: in terminal, write php foo.php | tee test.html, after a few seconds it will say something like:
the captcha is saved in /home/captcha.png
waiting for you to solve it by doing: echo ANSWER>/home/captcha.txt
then look at the captcha in /home/captcha.png , solve it, and write in another terminal: echo ANSWER>/home/captcha.txt, now the script will log in, and dump the logged in html in test.html, which you can open in your browser, to confirm that it actually logged in, screenshot when i run it: https://image.prntscr.com/image/_AsB_0J6TLOFSZuvQdjyNg.png
also note that i made 2 captcha solver functions, 1 use the deathbycaptcha api, and wont work until you provide a valid and credited deathbycaptcha account on line 5 and 6, which is not free, the other 1, solve_captcha2, asks you to solve the captcha yourself, and tells you where the captcha image is saved (so you can go have a look at it), and what command line argument to write, to provide it with the answer. just replace solve_captcha with solve_captcha2 on line 28, to solve it manually, and vise-versa. the script is fully tested with solve_captcha2, but the deathbycaptcha solver is untested, as my deathbycatpcha account is empty (if you would like to make a donation so i can actually test it, send 7 dollars to paypal account divinity76#gmail.com with a link to this thread, and i will buy the cheapest deathbycaptcha credit pack and actually test it)
disclaimer: i am not associated with deathbycaptcha in any way (except that i was a customer of theirs a couple of years back), and this post was not sponsored.

Related

How can I deactivate cookies used for authentication in php?

I would like to apologize in advance if this is a stupid question, but I am a junior developer starting a new job (and yes, very afraid of making a huge mistake). Most of my expertise is in Python, SQL, JavaScript, CSS and HTML. However, in my job I've been tasked with deactivating cookies in their website (they have to because of privacy laws in Europe). Some of the pages' backends are written in javascript and I was able to find the cookies and deactivate them, but some are written in php. I can tell what the code is and what it does, but since I've never dealt with php before, I'm not sure if I should just delete the script or if I should modify it in any way. Any help or advice will be greatly appreciated. This is the code (it is in its own file):
<?php
// Real-time Data Aggregation (RDA)
// error_reporting( E_ALL );
// ini_set('display_errors', 1);
class RDA {
private $session_cookie = '';
private $log_site = '';
private $config = array();
private $raw_payload = '';
private $payload = array();
private $publish_path_map = array();
public function __construct($config){
$this->config = $config;
}
public function process(){
$this->raw_payload = file_get_contents('php://input');
if(!$this->is_json($this->raw_payload)){
echo 'Expected payload was not provided. Script has been aborted.';
return;
}
$this->payload = json_decode($this->raw_payload);
if(array_key_exists('passed_through_rda', $this->payload) && $this->payload->passed_through_rda == 'true') return; // If this had previously passed through a RDA script so let's abort to prevent recursion.
if($this->is_test_payload()) return; // When the Test button is clicked from account settings simply echo back the payload and abort.
$this->send_next_webhook_request(); // forward payload to another webhook listener.
if($this->payload->finished != 'true') return; // we only want to react when the event has finished and not when it has been started.
$this->set_publish_path_map(); // sets up an index of publish paths to use as reference to prevent publish recursion.
foreach($this->config['actions'] as $action){
if(!$this->payload_contains_trigger_path($action)) continue; // payload does not contain trigger path so end execution.
$this->authenicate();
$this->publish($action);
}
$this->log_request();
}
private function authenicate(){
if($session_cookie != '') return; // session cookie was already created so exit authenication.
$endpoint = $this->config['ouc_base_url'] . '/authentication/login';
$config = array(
'skin' => $this->config['skin'],
'account' => $this->config['account'],
'username' => $this->config['username'],
'password' => $this->config['password']
);
$post_fields = http_build_query($config);
$cURLConnection = curl_init($endpoint);
curl_setopt($cURLConnection, CURLOPT_POSTFIELDS, $post_fields);
curl_setopt($cURLConnection, CURLOPT_RETURNTRANSFER, true);
curl_setopt($cURLConnection, CURLOPT_HEADER, true);
$api_response = curl_exec($cURLConnection);
$header = curl_getinfo( $cURLConnection );
curl_close($cURLConnection);
$header_content = substr($api_response, 0, $header['header_size']);
$pattern = "#Set-Cookie:\\s+(?<cookie>[^=]+=[^;]+)#m";
preg_match_all($pattern, $header_content, $matches);
$this->session_cookie = implode("; ", $matches['cookie']);
}
private function publish($action){
$endpoint = '/files/publish';
$config = array(
'site' => $action['site'],
'path' => $action['publish_path'],
'include_scheduled_publish' => 'true',
'include_checked_out' => 'true'
);
$this->log_site = $action['site']; // set a site to use to create log files if logging is turned on.
$this->send($endpoint, $config);
}
private function set_publish_path_map(){
foreach($this->config['actions'] as $action){
$this->publish_path_map[$action['site'] . $action['publish_path']] = 1;
}
}
private function log_request(){
if($this->config['log'] != 'true' || $this->log_site == '') return; // don't log when logging turned or if log_site not set
$log_id = uniqid();
$endpoint = '/files/save';
$config = array(
'site' => $this->log_site,
'path' => $this->config['config_file'], // uses the config PCF to do a "save as" to a log file
'new_path' => $this->get_root_relative_folderpath() . '_log/' . $log_id . '.txt',
'text' => $this->raw_payload
);
$this->send($endpoint, $config);
}
private function send_next_webhook_request(){
$next_webhook_url = trim($this->config['next_webhook_url']);
if($next_webhook_url == '') return; // next_webhook_url not entered so just return.
$this->payload->passed_through_rda = 'true';
$connection = curl_init($next_webhook_url);
curl_setopt($connection, CURLOPT_POSTFIELDS, json_encode($this->payload, JSON_UNESCAPED_SLASHES));
curl_setopt($connection, CURLOPT_RETURNTRANSFER, true);
$api_response = curl_exec($connection);
curl_close($connection);
}
private function send($endpoint, $config){
$endpoint = $this->config['ouc_base_url'] . $endpoint;
$post_fields = http_build_query($config);
$connection = curl_init($endpoint);
curl_setopt($connection, CURLOPT_POSTFIELDS, $post_fields);
curl_setopt($connection, CURLOPT_RETURNTRANSFER, true);
curl_setopt($connection, CURLOPT_COOKIE, $this->session_cookie);
$api_response = curl_exec($connection);
curl_close($connection);
}
private function payload_contains_trigger_path($action){
$site = $action['site'];
$success = array(); // the success node in the webhook payload contains files that were published.
if(!array_key_exists($site, $this->payload->success)) return false; // no success array so just return false.
$success = $this->payload->success->{$site};
$published_paths = array();
foreach($success as $i){
if(!array_key_exists($site . $i->path, $this->publish_path_map)) $published_paths[] = $i->path; // only include paths that aren't also publish targets configured in this script to avoid publish recursion.
}
$trigger_paths = $action['trigger_path'];
$trigger_paths = explode(',', $trigger_paths);
foreach($trigger_paths as $trigger_path){
$trigger_path = trim($trigger_path);
$trigger_path = preg_replace('/(.)[\/]+$/', '$1', $trigger_path); // removes trailing slash unless the value is the string length is 1, for instance: '/'
if($trigger_path == '') continue;
foreach($published_paths as $path){
if($this->starts_with($path, $trigger_path)) return true;
}
}
return false;
}
private function is_test_payload(){
$account = $this->payload->account;
if($account == '<account name>'){ // This is the account name value used by the test http request.
echo $this->raw_payload;
return true;
}
return false;
}
private function is_json($string){
if(trim($string) == '') return false;
json_decode($string);
return (json_last_error() == JSON_ERROR_NONE);
}
private function starts_with($string, $startString){
$len = strlen($startString);
return (substr($string, 0, $len) === $startString);
}
private function get_root_relative_folderpath(){
$result = $this->get_root_relative_filepath();
$result = str_replace('\\', '/', $result);
$result = preg_replace('/[^\/]+$/', '', $result);
return $result;
}
private function get_root_relative_filepath(){
$result = str_replace($_SERVER['DOCUMENT_ROOT'], '', $_SERVER['SCRIPT_FILENAME']);
return $result;
}
}
?>
For clarification: they have a service that manages cookies and they were able to turn those off, but there are a number of cookies that are persisting, and they are being generated by scripts leftover from years ago (I have no idea who wrote this code, or how old it is) and they need to be deleted. I just want to make sure that if I delete something it won't cause other bugs on the website
Method to close all the cookies and sessions
i think you have start the sessions session_start()
session_start();
you can read the documentation here
//http://php.net/manual/en/function.setcookie.php#73484
To destory and close the sessions try the code below
the below method will help to unset the cookies serving in your php program
if (isset($_SERVER['HTTP_COOKIE'])) {
$cookies = explode(';', $_SERVER['HTTP_COOKIE']);
foreach($cookies as $cookie) {
$parts = explode('=', $cookie);
$name = trim($parts[0]);
setcookie($name, '', time()-1000);
setcookie($name, '', time()-1000, '/');
}
}
session_destroy();

How to send an XML-file with PHP cURL?

This question is market as answered in this thread:
How to POST an XML file using cURL on php?
But that answer wasn't really the correct answer in my opinion since it just show how to send XML code with cURL. I need to send an XML file.
Basically, I need this C# code to be converted to PHP:
public Guid? UploadXmlFile()
{
var fileUploadClient = new WebClient();
fileUploadClient.Headers.Add("Content-Type", "application/xml");
fileUploadClient.Headers.Add("Authorization", "api " + ApiKey);
var rawResponse = fileUploadClient.UploadFile(Url, FilePath);
var stringResponse = Encoding.ASCII.GetString(rawResponse);
var jsonResponse = JObject.Parse(stringResponse);
if (jsonResponse != null)
{
var importFileId = jsonResponse.GetValue("ImportId");
if (importFileId != null)
{
return new Guid(importFileId.ToString());
}
}
return null;
}
I have tried in several ways and this is my latest try.
The cURL call:
/**
* CDON API Call
*
*/
function cdon_api($way, $method, $postfields=false, $contenttype=false)
{
global $api_key;
$contenttype = (!$contenttype) ? 'application/x-www-form-urlencoded' : $contenttype;
$curlOpts = array(
CURLOPT_URL => 'https://admin.marketplace.cdon.com/api/'.$method,
CURLOPT_RETURNTRANSFER => TRUE,
CURLOPT_TIMEOUT => 60,
CURLOPT_HTTPHEADER => array('Authorization: api '.$api_key, 'Content-type: '.$contenttype, 'Accept: application/xml')
);
if ($way == 'post')
{
$curlOpts[CURLOPT_POST] = TRUE;
}
elseif ($way == 'put')
{
$curlOpts[CURLOPT_PUT] = TRUE;
}
if ($postfields !== false)
{
$curlOpts[CURLOPT_POSTFIELDS] = $postfields;
}
# make the call
$ch = curl_init();
curl_setopt_array($ch, $curlOpts);
$response = curl_exec($ch);
curl_close($ch);
return $response;
}
The File Export:
/**
* Export products
*
*/
function cdon_export()
{
global $api_key;
$upload_dir = wp_upload_dir();
$filepath = $upload_dir['basedir'] . '/cdon-feed.xml';
$response = cdon_api('post', 'importfile', array('uploaded_file' => '#/'.realpath($filepath).';type=text/xml'), 'multipart/form-data');
echo '<br>Response 1: <pre>'.print_r(json_decode($response), true).'</pre><br>';
$data = json_decode($response, true);
if (!empty($data['ImportId']))
{
$response = cdon_api('put', 'importfile?importFileId='.$data['ImportId'], false, 'text/xml');
echo 'Response 2: <pre>'.print_r(json_decode($response), true).'</pre><br>';
$data = json_decode($response, true);
}
}
But the output I get is this:
Response 1:
stdClass Object
(
[Message] => The request does not contain a valid media type.
)
I have experimented around with different types at the different places, application/xml, multipart/form-data and text/xml, but nothing works.
How do I do to make it work? How do I manage to send the XML file with cURL?
to me, it looks like the C# code just does the equivalent of
function UploadXmlFile(): ?string {
$ch = curl_init ( $url );
curl_setopt_array ( $ch, array (
CURLOPT_POST => 1,
CURLOPT_HTTPHEADER => array (
"Content-Type: application/xml",
"Authorization: api " . $ApiKey
),
CURLOPT_POSTFIELDS => file_get_contents ( $filepath ),
CURLOPT_RETURNTRANSFER => true
) );
$jsonResponse = json_decode ( ($response = curl_exec ( $ch )) );
curl_close ( $ch );
return $jsonResponse->importId ?? NULL;
}
but at least 1 difference, your PHP code adds the header 'Accept: application/xml', your C# code does not

Scraping ASP website using cURL with manual redirection

I need to scrape an ASP website using cURL. My hosting does not allow me to turn off safe_mode or open_basedir. That's why CURLOPT_FOLLOWLOCATION cannot be activated (it throws an error "CURLOPT_FOLLOWLOCATION cannot be activated when an open_basedir is set").
I tried to implement some workaround but after several unlucky days starting to be desperate. I am wondering how to change the code below to contain manual redirection instead of CURLOPT_FOLLOWLOCATION:
include_once __DIR__.'/simple_html_dom.php';
define('COOKIE_FILE', __DIR__.'/cookie.txt');
#unlink(COOKIE_FILE); //clear cookies before we start
define('CURL_LOG_FILE', __DIR__.'/request.txt');
#unlink(CURL_LOG_FILE);//clear curl log
class ASPBrowser {
public $exclude = array();
public $lastUrl = '';
public $dom = false;
/**Get simplehtmldom object from url
* #param $url
* #param $post
* #return bool|simple_html_dom
*/
public function getDom($url, $post = false) {
$f = fopen(CURL_LOG_FILE, 'a+'); // curl session log file
if($this->lastUrl) $header[] = "Referer: {$this->lastUrl}";
$curlOptions = array(
CURLOPT_ENCODING => 'gzip,deflate',
CURLOPT_AUTOREFERER => 1,
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_URL => $url,
CURLOPT_SSL_VERIFYPEER => false,
CURLOPT_SSL_VERIFYHOST => false,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_MAXREDIRS => 9,
CURLOPT_RETURNTRANSFER => 1,
CURLOPT_HEADER => 0,
CURLOPT_USERAGENT => "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36",
CURLOPT_COOKIEFILE => COOKIE_FILE,
CURLOPT_COOKIEJAR => COOKIE_FILE,
CURLOPT_STDERR => $f, // log session
CURLOPT_VERBOSE => true,
);
if($post) { // add post options
$curlOptions[CURLOPT_POSTFIELDS] = $post;
$curlOptions[CURLOPT_POST] = true;
}
$curl = curl_init();
curl_setopt_array($curl, $curlOptions);
$data = curl_exec($curl);
$this->lastUrl = curl_getinfo($curl, CURLINFO_EFFECTIVE_URL); // get url we've been redirected to
curl_close($curl);
if($this->dom) {
$this->dom->clear();
$this->dom = false;
}
$dom = $this->dom = str_get_html($data);
fwrite($f, "{$post}\n\n");
fwrite($f, "-----------------------------------------------------------\n\n");
fclose($f);
return $dom;
}
function createASPPostParams($dom, array $params) {
$postData = $dom->find('input,select,textarea');
$postFields = array();
foreach($postData as $d) {
$name = $d->name;
if(trim($name) == '' || in_array($name, $this->exclude)) continue;
$value = isset($params[$name]) ? $params[$name] : $d->value;
$postFields[] = rawurlencode($name).'='.rawurlencode($value);
}
$postFields = implode('&', $postFields);
return $postFields;
}
function doPostRequest($url, array $params) {
$post = $this->createASPPostParams($this->dom, $params);
return $this->getDom($url, $post);
}
function doPostBack($url, $eventTarget, $eventArgument = '') {
return $this->doPostRequest($url, array(
'__EVENTTARGET' => $eventTarget,
'__EVENTARGUMENT' => $eventArgument
));
}
function doGetRequest($url) {
return $this->getDom($url);
}
}
(Credits: Andrey http://256cats.com/scraping-asp-websites-php-dopostback-ajax-emulation/)
You're probably looking for the CURLINFO_REDIRECT_URL info variable, as that returns the URL that it would otherwise had redirected to if you'd allowed it. Added in PHP 5.3.7.
Note that the exact response code 3xx also affects how the HTTP request method is supposed to change or not change when you follow a redirect. See details in the HTTP spec, RFC 7231 section 6.4.
The libcurl docs for CURLINFO_REDIRECT_URL.

Extract string inside multi-quoted variable

I've a variable with multiple single quotes and want to extract a string of this.
My Code is:
$image['src'] = addslashes($image['src']);
preg_match('~src=["|\'](.*?)["|\']~', $image['src'], $matches);
$image['src'] = $matches[1];
$image['src'] contains this string:
tooltip_html(this, '<div style="display: block; width: 262px"><img src="https://url.com/var/galerie/15773_262.jpg"/></div>');
I thought all would be right but $image['src'] returns null. The addslashes method works fine and returns this:
tooltip_html(this, \'<div style="display: block; width: 262px"><img src="https://url.com/var/galerie/15773_262.jpg"/></div>\');
I don't get the problem in here, did I miss something?
=====UPDATE======
The whole code:
<?php
error_reporting(E_ALL);
header("Content-Type: application/json", true);
define('SITE', 'https://akipa-autohandel.autrado.de/');
include_once('simple_html_dom.php');
/**
* Create CDATA-Method for XML Output
*/
class SimpleXMLExtended extends SimpleXMLElement {
public function addCData($cdata_text) {
$node = dom_import_simplexml($this);
$no = $node->ownerDocument;
$node->appendChild($no->createCDATASection($cdata_text));
}
}
/**
* Get a web file (HTML, XHTML, XML, image, etc.) from a URL. Return an
* array containing the HTTP server response header fields and content.
*/
function get_web_page( $url ) {
$user_agent='Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0';
$options = array(
CURLOPT_CUSTOMREQUEST =>"GET", //set request type post or get
CURLOPT_POST =>false, //set to GET
CURLOPT_USERAGENT => $user_agent, //set user agent
CURLOPT_COOKIEFILE =>"cookie.txt", //set cookie file
CURLOPT_COOKIEJAR =>"cookie.txt", //set cookie jar
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
if($content === FALSE) {
// when output is false it can't be used in str_get_html()
// output a proper error message in such cases
echo 'output error';
die(curl_error($ch));
}
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $content;
return $header;
}
function renderPage( $uri ) {
$rendering = get_web_page( $uri );
if ( $rendering['errno'] != 0 )
echo 'bad url, timeout, redirect loop';
if ( $rendering['http_code'] != 200 )
echo 'no page, no permissions, no service';
$content = $rendering['content'];
if(!empty($content)) {
$parsing = str_get_html($content);
}
return $parsing;
}
/**
* Get all current car data of the selected autrado site
*/
function models() {
$paramURI = SITE . 'schnellsuche.php?suche_hersteller=14&suche_modell=&suche_from=form&suche_action=suche&itemsperpage=500';
$content = renderPage($paramURI);
foreach ($content->find('tr[class*=fahrzeugliste]') as $auto) {
$item['src'] = $auto->find('a[onmouseover]', 0)->onmouseover;
preg_match('~src=["\'](.*?)["\']~', $item['src'], $matches);
echo $matches[1];
}
}
if(isset($_POST['action']) && !empty($_POST['action'])) {
$action = $_POST['action'];
if((string) $action == 'test') {
$output = models();
json_encode($output);
}
}
?>
The content of $image['src'] is not as you wrote above. I've run now your script and the content is:
tooltip_html(this, '<div style="display: block; width: 262px"><img src="http://server12.autrado.de/autradogalerie_copy/var/galerie/127915_262.jpg" /></div>');
It will work if you add the following line before the preg_match:
$item['src']= html_entity_decode($item['src']);

HTTP response code after redirect

There is a redirect to server for information and once response comes from server, I want to check HTTP code to throw an exception if there is any code starting with 4XX. For that I need to know how can I get only HTTP code from header? Also here redirection to server is involved so I afraid curl will not be useful to me.
So far I have tried this solution but it's very slow and creates script time out in my case. I don't want to increase script time out period and wait longer just to get an HTTP code.
Thanks in advance for any suggestion.
Your method with get_headers and requesting the first response line will return the status code of the redirect (if any) and more importantly, it will do a GET request which will transfer the whole file.
You need only a HEAD request and then to parse the headers and return the last status code. Following is a code example that does this, it's using $http_response_header instead of get_headers, but the format of the array is the same:
$url = 'http://example.com/';
$options['http'] = array(
'method' => "HEAD",
'ignore_errors' => 1,
);
$context = stream_context_create($options);
$body = file_get_contents($url, NULL, $context);
$responses = parse_http_response_header($http_response_header);
$code = $responses[0]['status']['code']; // last status code
echo "Status code (after all redirects): $code<br>\n";
$number = count($responses);
$redirects = $number - 1;
echo "Number of responses: $number ($redirects Redirect(s))<br>\n";
if ($redirects)
{
$from = $url;
foreach (array_reverse($responses) as $response)
{
if (!isset($response['fields']['LOCATION']))
break;
$location = $response['fields']['LOCATION'];
$code = $response['status']['code'];
echo " * $from -- $code --> $location<br>\n";
$from = $location;
}
echo "<br>\n";
}
/**
* parse_http_response_header
*
* #param array $headers as in $http_response_header
* #return array status and headers grouped by response, last first
*/
function parse_http_response_header(array $headers)
{
$responses = array();
$buffer = NULL;
foreach ($headers as $header)
{
if ('HTTP/' === substr($header, 0, 5))
{
// add buffer on top of all responses
if ($buffer) array_unshift($responses, $buffer);
$buffer = array();
list($version, $code, $phrase) = explode(' ', $header, 3) + array('', FALSE, '');
$buffer['status'] = array(
'line' => $header,
'version' => $version,
'code' => (int) $code,
'phrase' => $phrase
);
$fields = &$buffer['fields'];
$fields = array();
continue;
}
list($name, $value) = explode(': ', $header, 2) + array('', '');
// header-names are case insensitive
$name = strtoupper($name);
// values of multiple fields with the same name are normalized into
// a comma separated list (HTTP/1.0+1.1)
if (isset($fields[$name]))
{
$value = $fields[$name].','.$value;
}
$fields[$name] = $value;
}
unset($fields); // remove reference
array_unshift($responses, $buffer);
return $responses;
}
For more information see: HEAD first with PHP Streams, at the end it contains example code how you can do the HEAD request with get_headers as well.
Related: How can one check to see if a remote file exists using PHP?
Something like:
$ch = curl_init();
$httpcode = curl_getinfo ($ch, CURLINFO_HTTP_CODE );
You should try the HttpEngine Class.
Hope this helps.
--
EDIT
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $your_agent_variable);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_REFERER, $your_referer);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
$output = curl_exec($ch);
$httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if ($httpcode ...)
The solution you found looks good. If the server is not able to send you the http headers in time your problem is that the other server is broken or under very heavy load.

Categories