I am trying to scrape a website using PHP, CURL and POST method in order to submit a form before web scraping the page. The problem I am experiencing is that there is connected with POST method: no data is submitted to the server, so the scraped webpage doesn't contain what I am looking for.
I quit sure the problem is connected with the form type: enctype="multipart/form-data".
How can I manage this POST request, considering that the form is multipart/form-data?
Do I have to encode the post_string in a special way?
Here's the code I'm using:
function curl($url) {
//POST string
$post_string="XXXX";
$options = Array(
CURLOPT_RETURNTRANSFER => TRUE,
CURLOPT_FOLLOWLOCATION => TRUE,
CURLOPT_AUTOREFERER => TRUE,
CURLOPT_CONNECTTIMEOUT => 120,
CURLOPT_TIMEOUT => 120,
CURLOPT_MAXREDIRS => 10,
CURLOPT_USERAGENT => "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1a2pre) Gecko/2008073000 Shredder/3.0a2pre ThunderBrowse/3.2.1.8",
CURLOPT_URL => $url,
CURLOPT_CAINFO => dirname(__FILE__)."/cacert.pem",
CURLOPT_POSTFIELDS => $post_string,
);
$ch = curl_init();
curl_setopt_array($ch, $options);
$data = curl_exec($ch);
curl_error($ch);
curl_close($ch);
return $data;
}
$scraped_page = curl("XXXURLXXX");
echo $scraped_page;
Thank you!
Set the CURLOPT_POST to true:
CURLOPT_POST = true
Then fill your post fields like this 'setup':
$postfields = array();
$postfields['field1'] = 'value1';
$postfields['field2'] = 'value2';
CURLOPT_POSTFIELDS => $postfields
If value is an array, the Content-Type header will be set to multipart/form-data.
The PHP manual
Yes, $post_string needs to be an array.
Also set CURLOPT_POST to true.
Related
I have simple GET request on postman and it does working fine
https://www.instagram.com/p/CVhuRABqnAI/?__a=1&__d=dis
When Im using PHP CURL its did not respond me with Json data rather showing logo
$curl = curl_init();
curl_setopt_array($curl, array(
CURLOPT_USERAGENT => "Mozilla/5.0 (Windows NT 5.1; rv:31.0) Gecko/20100101 Firefox/31.0",
CURLOPT_URL => "https://www.instagram.com/p/CVhuRABqnAI/?__a=1&__d=dis",
CURLOPT_RETURNTRANSFER => true,
CURLOPT_ENCODING => "",
CURLOPT_MAXREDIRS => 10,
CURLOPT_TIMEOUT => 0,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
CURLOPT_CUSTOMREQUEST => "GET",
));
$response = curl_exec($curl);
$error = curl_error($curl);
curl_close($curl);
if($error) {
echo "cURL Error:" . $error;
} else {
echo $response;
}
Can some one else me with this?
Finally I might know what to do as I just faced this problem.
Usually, there is just some kind of difference in between postman and curl. Or there is "something" else, because my solution was just to not send any headers at all...
I am using this code to get the contents of a post request url using php curl
Code looks as below:
// Get cURL resource
$curl = curl_init();
// Set some options - we are passing in a useragent too here
curl_setopt_array($curl, array(
CURLOPT_RETURNTRANSFER => 1,
CURLOPT_URL => 'http://www1.ptt.gov.tr/tr/interaktif/sonuc-yd.php',
CURLOPT_USERAGENT => 'Codular Sample cURL Request',
CURLOPT_POST => 1,
CURLOPT_POSTFIELDS => array(
'barcode' => 'CP021325078TR',
'security_code' => $capcha2
)
));
// Send the request & save response to $resp
$resp = curl_exec($curl);
// Close request to clear up some resources
curl_close($curl);
echo "<pre>";
var_dump($resp);
echo "</pre>";
The result doesn’t seem to return anything at all.
What is wrong with this code?
Try this:
$url = 'http://www1.ptt.gov.tr/tr/interaktif/sonuc-yd.php';
$postvals = array(
'barcode' => 'CP021325078TR',
'security_code' => $capcha2
);
$resp = Request($url,$postvals);
echo "<pre>"; var_dump($resp); exit;
...
function Request($url,$params=array()){
$ch = curl_init();
$curlOpts = array(
CURLOPT_URL => $url,
CURLOPT_USERAGENT => 'Mozilla/5.0 (Windows NT 6.3; WOW64; rv:29.0) Gecko/20100101 Firefox/29.0',
CURLOPT_SSL_VERIFYPEER => false,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_FOLLOWLOCATION => true
);
if(!empty($params)){
$curlOpts[CURLOPT_POST] = true;
$curlOpts[CURLOPT_POSTFIELDS] = $params;
}
curl_setopt_array($ch,$curlOpts);
$answer = curl_exec($ch);
if (curl_error($ch)) {
echo curl_error($ch); exit;
}
curl_close($ch);
return $answer;
}
EDIT:
I tested this and got:
Could not resolve host: www1.ptt.gov.tr
So make sure you're calling the right endpoint.
Actually you need to set this variable
$captcha2
To use it here -
'security_code' => $capcha2
I am using curl and setting all the parameters correctly (as far as I know) but CURLOPT_TIMEOUT is being ignored and allowing for an infinite loop. Here is the configuration for my Curl Request:
$user_agent = 'Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0';
$options = array(
CURLOPT_CUSTOMREQUEST => "GET", //set request type post or get
CURLOPT_POST => false, //set to GET
CURLOPT_USERAGENT => $user_agent, //set user agent
CURLOPT_COOKIEFILE => dirname(__FILE__)."/cookie.txt", //set cookie file
CURLOPT_COOKIEJAR => dirname(__FILE__)."/cookie.txt", //set cookie jar
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_SSL_VERIFYPEER, FALSE, //ignore ssl
CURLOPT_PROXY =>$proxy['ip'],
CURLOPT_PROXYPORT =>$proxy['port'],
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_AUTOREFERER => true, // set referrer on redirect
CURLOPT_CONNECTTIMEOUT => 20, // timeout on connect
CURLOPT_TIMEOUT => 10, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$ch = curl_init($url);
curl_setopt_array($ch, $options);
$content = curl_exec($ch);
$err = curl_errno($ch);
$errmsg = curl_error($ch);
$header = curl_getinfo($ch);
curl_close($ch);
I am not the best at debugging so I'm not sure what the problem could be. Please help me.
The curl_setopt($connection, CURLOPT_TIMEOUT, $seconds) should be called just right before curl_exec() function.
It wasn't working for me when I called it sooner
If your are using CloudFlare note that :
Enterprise customers can increase the 524 timeout up to 6000 seconds
using the proxy_read_timeout API endpoint. If you regularly run HTTP
requests that take over 100 seconds to complete (for example large
data exports), move those processes behind a subdomain not proxied
(grey clouded) in the Cloudflare DNS app.
I have read many similar title questions, but none of them worked for me...
The problem is that when I'm sending an cURL query to a website all I get is a blank page.
Here is my code:
<?php
$action = "http://www.website.com/index.php?section=login&do=process";
$fields = array(
'username' => $user,
'rememberMe' => '1'
);
$login = curl_post($action, $fields);
var_dump($login);
function curl_post($url, array $post = NULL, array $options = array())
{
$defaults = array(
CURLOPT_POST => 1,
CURLOPT_HEADER => 0,
CURLOPT_HTTPHEADER => array('Accept-Language: pl,en-us;q=0.7,en;q=0.3', 'Accept-Charset: ISO-8859-2,utf-8;q=0.7,*;q=0.7'),
CURLOPT_USERAGENT => "Mozilla/5.0 (Windows; U; Windows NT 5.1; pl; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3",
CURLOPT_URL => $url,
CURLOPT_FRESH_CONNECT => 1,
CURLOPT_RETURNTRANSFER => 1,
CURLOPT_FORBID_REUSE => 1,
CURLOPT_TIMEOUT => 4,
CURLOPT_NOBODY => false,
CURLOPT_POSTFIELDS => http_build_query($post)
);
$ch = curl_init();
curl_setopt_array($ch, ($options + $defaults));
if( !$result = curl_exec($ch))
{
return(curl_error($ch));
}
curl_close($ch);
return $result;
}
?>
Of course I have my cURL enabled in PHP so its not about that.
If you have any ideas, please share!
Update
I have also been trying adding the following lines at the top of my PHP file:
ini_set("display_errors", 1);
error_reporting(E_ALL);
But the problem still appears - result is a blank page. When I use file_get_contents("http://website.com/"); I can see the page content, so it doesnt work with cURL only.
Running this locally and pointing it at Google, I see two things immediately:
PHP Notice: Undefined variable: user
Google returns a 'Error 405 (Method Not Allowed)!!1' error page but probably because I'm trying to post to it
What happens when you define $user and try again?
I wish to mimic, using CURL with PHP, the operation of a website that retrieves data using an AJAX POST.
Normally when I'm viewing POST requests using Firebug you will see variable/value pairs, but in this case all you see is a single JSON string. E.g.
{"refId":"14536"}
Is there a way to mimic this request using CURL? I've looked at CURL but as far as I can see the CURLOPT_POSTFIELDS parameter has to be a query string made up of one or more name/value.
Here is my test code with a normal POST request using a single name/value pair. I'd like to modify it to do the above.
$curlOptions = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-GB; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3",
CURLOPT_CONNECTTIMEOUT => 600, // timeout on connect
CURLOPT_TIMEOUT => 600, // timeout on response
CURLOPT_POST => true,
CURLOPT_POSTFIELDS => 'var1=113',
CURLOPT_URL => "http://localhost/t4.php"
);
$curlCh = curl_init();
curl_setopt_array( $curlCh, $curlOptions );
$fileContents = curl_exec( $curlCh );
$curlErr = curl_errno( $curlCh );
$curlErrmsg = curl_error( $curlCh );
if( $curlErr ) echo "CURL ERROR:</b> $curlErr $curlErrmsg";
echo $fileContents; //check worked
curl_close( $curlCh );
How about something like:
$postData = json_encode(array('refId' => '14536'));
curl_setopt($ch, CURLOPT_POSTFIELDS, $postData);