PHP web scraper with post request for pagination - php
I have a site "https://www.hellowork.mhlw.go.jp/kensaku/GECA110010.do?action=initDisp&screenId=GECA110010"
it a japanese site that give list of job table with paginatio and each pagination require to submit a form again and the site always has cookie with JSESSIONID
enter image description here
you have to click this button to get the data
i got the data of the first page only and i tried everything that I think off and could not get the data of others page
plzz help
this is the code that gives the data of first page
echo "woking";
$mainUrl = "https://www.hellowork.mhlw.go.jp/kensaku/GECA110010.do";
// first parameter for sending request
$firstParam = array(
'kSNoJo'=> '',
'kSNoGe'=> '',
'kjKbnRadioBtn'=> 1,
'nenreiInput'=> '',
'tDFK1CmbBox'=> '',
'tDFK2CmbBox'=> '',
'tDFK3CmbBox'=> '',
'sKGYBRUIJo1'=> '' ,
'sKGYBRUIGe1'=> '' ,
'sKGYBRUIJo2'=> '' ,
'sKGYBRUIGe2'=> '' ,
'sKGYBRUIJo3'=> '' ,
'sKGYBRUIGe3'=> '' ,
'freeWordInput'=> '' ,
'nOTKNSKFreeWordInput'=> '' ,
'searchBtn'=> '検索',
'kJNoJo1'=> '' ,
'kJNoGe1'=> '' ,
'kJNoJo2'=> '' ,
'kJNoGe2'=> '' ,
'kJNoJo3'=> '' ,
'kJNoGe3'=> '' ,
'fwListNaviDisp'=> 30,
'kJNoJo4'=> '' ,
'kJNoGe4'=> '' ,
'kJNoJo5'=> '' ,
'kJNoGe5'=> '' ,
'jGSHNoJo'=> '' ,
'jGSHNoChuu'=> '' ,
'jGSHNoGe'=> '' ,
'kyujinkensu'=> 0,
'iNFTeikyoRiyoDantaiID'=> '',
'fwListNaviBtn4'=> 4,
'searchClear'=> 0,
'siku1Hidden'=> '',
'siku2Hidden'=> '',
'siku3Hidden'=> '',
'kiboSuruSKSU1Hidden'=> '',
'kiboSuruSKSU2Hidden'=> '',
'kiboSuruSKSU3Hidden'=> '',
'summaryDisp'=> false,
'searchInitDisp'=> 0,
'screenId'=> 'GECA110010',
'action'=> '',
'codeAssistType'=> '',
'codeAssistKind'=> '',
'codeAssistCode'=> '',
'codeAssistItemCode'=> '',
'codeAssistItemName'=> '',
'codeAssistDivide'=> '',
'maba_vrbs'=> 'infTkRiyoDantaiBtn,searchShosaiBtn,searchBtn,searchNoBtn,searchClearBtn,dispDetailBtn,kyujinhyoBtn',
'preCheckFlg'=> false,
);
//this is the parameter
$nextPageParam = array(
'kSNoJo'=> '',
'kSNoGe'=> '',
'kjKbnRadioBtn'=> 1,
'nenreiInput'=> '',
'tDFK1CmbBox'=> '',
'tDFK2CmbBox'=> '',
'tDFK3CmbBox'=> '',
'sKGYBRUIJo1'=> '',
'sKGYBRUIGe1'=> '',
'sKGYBRUIJo2'=> '',
'sKGYBRUIGe2'=> '',
'sKGYBRUIJo3'=> '',
'sKGYBRUIGe3'=> '',
'freeWordInput'=> '',
'nOTKNSKFreeWordInput'=> '',
'searchBtn'=> '検索',
'kJNoJo1'=> '',
'kJNoGe1'=> '',
'kJNoJo2'=> '',
'kJNoGe2'=> '',
'kJNoJo3'=> '',
'kJNoGe3'=> '',
'kJNoJo4'=> '',
'kJNoGe4'=> '',
'kJNoJo5'=> '',
'kJNoGe5'=> '',
'jGSHNoJo'=> '',
'jGSHNoChuu'=> '',
'jGSHNoGe'=> '',
'kyujinkensu'=> '0',
'fwListNaviSortTop'=> 1,
'fwListNaviDispTop'=> 30,
'fwListNaviBtn1'=> '',
'fwListNaviBtn2'=> '',
'fwListNaviBtn3'=> '',
'fwListNaviBtn4'=> 4,
'fwListNaviBtn5'=> '',
'fwListNaviBtn6'=> '',
'fwListNaviSortBtm'=> 1,
'fwListNaviDispBtm'=> 30,
'fwListNowPage'=> 1,
'fwListLeftPage'=> 1,
'fwListNaviCount'=> 7,
'fwListNaviDisp'=> 30,
'fwListNaviSort'=> 1,
'iNFTeikyoRiyoDantaiID'=> '',
'searchClear'=> 0,
'siku1Hidden'=> '',
'siku2Hidden'=> '',
'siku3Hidden'=> '',
'kiboSuruSKSU1Hidden'=> '',
'kiboSuruSKSU2Hidden'=> '',
'kiboSuruSKSU3Hidden'=> '',
'summaryDisp'=> true,
'searchInitDisp'=> 1,
'screenId'=> 'GECA110010',
'action'=> '',
'codeAssistType'=> '',
'codeAssistKind'=> '',
'codeAssistCode'=> '',
'codeAssistItemCode'=> '',
'codeAssistItemName'=> '',
'codeAssistDivide'=> '',
'maba_vrbs'=> 'infTkRiyoDantaiBtn,searchShosaiBtn,searchBtn,searchNoBtn,searchClearBtn,dispDetailBtn,kyujinhyoBtn',
'preCheckFlg'=> false,
);
$sendParam = http_build_query($firstParam); // generate URL-encoded query string
$nextPage = http_build_query($nextPageParam); // generate URL-encoded query string
$cookies = array(); // to store the cookie data that we get from the response
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $mainUrl);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $sendParam);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADERFUNCTION, 'readCookie'); // get the cookie data for readCookie function
curl_setopt($ch, CURLOPT_COOKIE, $cookies['set-cookie']); // send the cookie data
echo "<pre>";
$html = curl_exec($ch);
curl_setopt($ch, CURLOPT_POSTFIELDS, $nextPage);
curl_setopt($ch, CURLOPT_HEADERFUNCTION, 'readCookie');
curl_setopt($ch, CURLOPT_COOKIE, $cookies['set-cookie']);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$html = curl_exec($ch);
echo $html;
// set the current the cookie data to $cookies array
function readCookie($curl, $header) {
global $cookies;
$len = strlen($header);
$header = explode(':', $header, 2);
if (count($header) < 2) // ignore invalid headers
return $len;
$name = strtolower(trim($header[0]));
if ($name == 'set-cookie') {
$parts = explode(';', $header[1]);
$value = trim($parts[0]);
$cookies[$name] = $value;
}
return $len;
}
Related
Bybit - Place Simple Order... (php)
I am using the following 'suggested' code to post a test order to Bybit. (https://github.com/bybit-exchange/api-usage-examples/blob/master/api_demo/futures/Encryption.php) <?php function get_signed_params($public_key, $secret_key, $params) { $params = array_merge(['api_key' => $public_key], $params); ksort($params); //decode return value of http_build_query to make sure signing by plain parameter string $signature = hash_hmac('sha256', urldecode(http_build_query($params)), $secret_key); return http_build_query($params) . "&sign=$signature"; } $params = [ 'symbol' => 'BTCUSDT', 'side' => 'Buy', 'order_type' => 'Limit', 'qty' => '1', 'price' => '30000', 'time_in_force' => 'GoodTillCancel', 'reduce_only' => false, 'close_on_trigger' => false, 'timestamp' => time() * 1000, 'position_idx' => 0 ]; //$url = 'https://api-testnet.bybit.com/private/linear/order/create'; $url = 'https://api.bybit.com/v2/private/order/create'; $public_key = 'my_key_is_here_in_my_code'; $secret_key = 'my_secret_key_is_here_in_my_code'; $qs=get_signed_params($public_key, $secret_key, $params); $curl_url=$url."?".$qs; $curl=curl_init($curl_url); echo $curl_url; curl_setopt($curl, CURLOPT_URL, $curl_url); #curl_setopt($curl, CURLOPT_POSTFIELDS, $qs); curl_setopt($curl, CURLOPT_POST, true); curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1); curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, 0); curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 0); #curl_setopt($curl, CURLOPT_PROXY,"127.0.0.1:1087"); $response=curl_exec($curl); echo $response; However, I receive the following within the response: validation for 'symbol' failed on the 'symbol' tag" https://api.bybit.com/v2/private/order/create?api_key=XXXXXX_my_key_XXXXX&close_on_trigger=0&order_type=Limit&position_idx=0&price=30000&qty=1&reduce_only=0&side=Buy&symbol=BTCUSDT&time_in_force=GoodTillCancel×tamp=1647644020000&sign=0e08e9f9be4cf5e4d7b1294d769ab4bf3b5b79ae9f92bab717670b3d95be0672{"ret_code":10001,"ret_msg":"Param validation for 'symbol' failed on the 'symbol' tag","ext_code":"","ext_info":"","result":null,"time_now":"1647644020.389465","rate_limit_status":99,"rate_limit_reset_ms":1647644020387,"rate_limit":100} Could somebody pls suggest why Bybit is not recognising the 'BTCUSDT' symbol as expected. As everything seems setup on the exchange. Many thanks for your help.
BTCUSDT is not a valid symbol as per the documenation. These are the list of valid symbols that you can use.
get cookies from curl
in adobe connect api there is 2 step for getting data: step 1) from http://87.107.152.107/api/xml?action=login&login=username&password=password I have to get token to verify in other api. step 2) from http://87.107.152.107/api/xml?action=report-my-meetings I can get meetings report with token got from step 1. the problem is when I use postman to use these api , postman set cookie from step 1 api. and it needs this cookie for step 2. I want to use cookie in curl php but I don't know how to get it. My code for step 1: $curl = curl_init(); curl_setopt_array($curl, array( CURLOPT_URL => 'http://87.107.152.107/api/xml?action=login&login=username&password=password', CURLOPT_RETURNTRANSFER => true, CURLOPT_ENCODING => '', CURLOPT_MAXREDIRS => 10, CURLOPT_TIMEOUT => 0, CURLOPT_FOLLOWLOCATION => true, CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1, CURLOPT_CUSTOMREQUEST => 'GET', )); $response = curl_exec($curl);
I used this and worked for me: curl_setopt($ch, CURLOPT_HEADER, 1); //Return everything $res = curl_exec($ch); //Split into lines $lines = explode("\n", $res); $headers = array(); $body = ""; foreach($lines as $num => $line){ $l = str_replace("\r", "", $line); //Empty line indicates the start of the message body and end of headers if(trim($l) == ""){ $headers = array_slice($lines, 0, $num); $body = $lines[$num + 1]; //Pull only cookies out of the headers $cookies = preg_grep('/^Set-Cookie:/', $headers); break; } }
php CURL request using array with key - values
I need to make a API request using CURL, the request are as follows: { "source":"0", "custom":true, "credit":"2", "filetype":"2", "ustate":"jkghlyt", "service":"1234", "templateid":"632", "retryatmpt": 1, "retryduration": 1, "custretry": 1, "refno":true, "mslist":[{"phone1":"XXXXXXXXXXX","phone2":"XXXXXXXXXX"}] } I have the following PHP CURL code: $phone1 = "XXXXXX"; $phone2 ="XXXXXXX"; //CURL $url = 'http://api-ip-XXXX'; $param = array('source' => '0', 'custom' => true, 'credit' => '2', 'filetype' => '2', 'ustate' => 'jkghlyt', 'service' => '1234', 'templateid' => '632', 'retryatmpt' => '1', 'retryduration' => '1', 'custretryatmpt' => '1', 'refno' => true, 'mslist' => array( urlencode($phone1), urlencode($phone2) ) ); $url = $url . "?" . http_build_query($param, '&'); $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_TIMEOUT, 30); $result = curl_exec($ch); curl_close($ch); I am confused regarding adding $phone1 & $phone2. It is not correct. Requesting help from experts.
How can i use fsockopen on my site?
I'm using curreny codes on my site. With the following: <?php $contents = file_get_contents('http://www.tcmb.gov.tr/kurlar/today.html'); $contents = iconv("windows-1254" , "utf8" , $contents); $dollar = preg_match('~ABD DOLARI\s+(.+?)\s+(.+?)\s+~i', $contents, $matches) ? array('buying' => $matches[1], 'selling' => $matches[2]) : ''; $euro = preg_match('~EURO\s+(.+?)\s+(.+?)\s+~i', $contents, $matches) ? array('buying' => $matches[1], 'selling' => $matches[2]) : ''; $gbp = preg_match('~İNGİLİZ STERLİNİ\s+(.+?)\s+(.+?)\s+~i', $contents, $matches) ? array('buying' => $matches[1], 'selling' => $matches[2]) : ''; $chf = preg_match('~İSVİÇRE FRANGI\s+(.+?)\s+(.+?)\s+~i', $contents, $matches) ? array('buying' => $matches[1], 'selling' => $matches[2]) : ''; echo ' <table class="form" style="background:#fff;width:300px;margin-left:14px;"> <tr style="border-bottom:1px solid #e4e4e4;"> .. But today my site is give error: Warning: eval() (/var/www/vhosts/mysite.com/httpdocs/modules/php/php.module(80) : eval()'d code dosyasının 2 satırı) içinde file_get_contents() [function.file-get-contents]: http:// wrapper is disabled in the server configuration by allow_url_fopen=0. I did ask my to hosting support about this problem and they are say: "Don't use fopen option, please use 'fsockopen'" But i don't know how can i do this? Plese help me. Thanks.
Use curl instead then. A function to replace file_get_contents from a remote server is: function get_web_page( $url ) { $options = array( CURLOPT_RETURNTRANSFER => true, // return web page CURLOPT_HEADER => false, // don't return headers CURLOPT_FOLLOWLOCATION => true, // follow redirects CURLOPT_ENCODING => "", // handle all encodings CURLOPT_USERAGENT => "spider", // who am i CURLOPT_AUTOREFERER => true, // set referer on redirect CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect CURLOPT_TIMEOUT => 120, // timeout on response CURLOPT_MAXREDIRS => 10, // stop after 10 redirects ); $ch = curl_init( $url ); curl_setopt_array( $ch, $options ); $content = curl_exec( $ch ); $err = curl_errno( $ch ); $errmsg = curl_error( $ch ); $header = curl_getinfo( $ch ); curl_close( $ch ); $header['errno'] = $err; $header['errmsg'] = $errmsg; $header['content'] = $content; //If you want error information, 'return $header;' instead. return $content; } From there change $contents = file_get_contents('http://www.tcmb.gov.tr/kurlar/today.html'); to $contents = get_web_page('http://www.tcmb.gov.tr/kurlar/today.html');
How to get the real URL after file_get_contents if redirection happens?
I'm using file_get_contents() to grab content from a site, and amazingly it works even if the URL I pass as argument redirects to another URL. The problem is I need to know the new URL, is there a way to do that?
If you need to use file_get_contents() instead of curl, don't follow redirects automatically: $context = stream_context_create( array( 'http' => array( 'follow_location' => false ) ) ); $html = file_get_contents('http://www.example.com/', false, $context); var_dump($http_response_header); Answer inspired by: How do I ignore a moved-header with file_get_contents in PHP?
You might make a request with cURL instead of file_get_contents(). Something like this should work... $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_HEADER, TRUE); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, FALSE); curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); $a = curl_exec($ch); if(preg_match('#Location: (.*)#', $a, $r)) $l = trim($r[1]); Source
Everything in one function: function get_web_page( $url ) { $res = array(); $options = array( CURLOPT_RETURNTRANSFER => true, // return web page CURLOPT_HEADER => false, // do not return headers CURLOPT_FOLLOWLOCATION => true, // follow redirects CURLOPT_USERAGENT => "spider", // who am i CURLOPT_AUTOREFERER => true, // set referer on redirect CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect CURLOPT_TIMEOUT => 120, // timeout on response CURLOPT_MAXREDIRS => 10, // stop after 10 redirects ); $ch = curl_init( $url ); curl_setopt_array( $ch, $options ); $content = curl_exec( $ch ); $err = curl_errno( $ch ); $errmsg = curl_error( $ch ); $header = curl_getinfo( $ch ); curl_close( $ch ); $res['content'] = $content; $res['url'] = $header['url']; return $res; } print_r(get_web_page("http://www.example.com/redirectfrom"));
A complete solution using the bare file_get_contents (note the in-out $url parameter): function get_url_contents_and_final_url(&$url) { do { $context = stream_context_create( array( "http" => array( "follow_location" => false, ), ) ); $result = file_get_contents($url, false, $context); $pattern = "/^Location:\s*(.*)$/i"; $location_headers = preg_grep($pattern, $http_response_header); if (!empty($location_headers) && preg_match($pattern, array_values($location_headers)[0], $matches)) { $url = $matches[1]; $repeat = true; } else { $repeat = false; } } while ($repeat); return $result; } Note that this works only with an absolute URL in the Location header. If you need to support relative URLs, see PHP: How to resolve a relative url. For example, if you use the solution from the answer by #Joyce Babu, replace: $url = $matches[1]; with: $url = getAbsoluteURL($matches[1], $url);
I use get_headers($url, 1); In my case redirect url in get_headers($url, 1)['Location'][1];