PHP web scraper with post request for pagination - php

I have a site "https://www.hellowork.mhlw.go.jp/kensaku/GECA110010.do?action=initDisp&screenId=GECA110010"
it a japanese site that give list of job table with paginatio and each pagination require to submit a form again and the site always has cookie with JSESSIONID
enter image description here
you have to click this button to get the data
i got the data of the first page only and i tried everything that I think off and could not get the data of others page
plzz help
this is the code that gives the data of first page
echo "woking";
$mainUrl = "https://www.hellowork.mhlw.go.jp/kensaku/GECA110010.do";
// first parameter for sending request
$firstParam = array(
'kSNoJo'=> '',
'kSNoGe'=> '',
'kjKbnRadioBtn'=> 1,
'nenreiInput'=> '',
'tDFK1CmbBox'=> '',
'tDFK2CmbBox'=> '',
'tDFK3CmbBox'=> '',
'sKGYBRUIJo1'=> '' ,
'sKGYBRUIGe1'=> '' ,
'sKGYBRUIJo2'=> '' ,
'sKGYBRUIGe2'=> '' ,
'sKGYBRUIJo3'=> '' ,
'sKGYBRUIGe3'=> '' ,
'freeWordInput'=> '' ,
'nOTKNSKFreeWordInput'=> '' ,
'searchBtn'=> '検索',
'kJNoJo1'=> '' ,
'kJNoGe1'=> '' ,
'kJNoJo2'=> '' ,
'kJNoGe2'=> '' ,
'kJNoJo3'=> '' ,
'kJNoGe3'=> '' ,
'fwListNaviDisp'=> 30,
'kJNoJo4'=> '' ,
'kJNoGe4'=> '' ,
'kJNoJo5'=> '' ,
'kJNoGe5'=> '' ,
'jGSHNoJo'=> '' ,
'jGSHNoChuu'=> '' ,
'jGSHNoGe'=> '' ,
'kyujinkensu'=> 0,
'iNFTeikyoRiyoDantaiID'=> '',
'fwListNaviBtn4'=> 4,
'searchClear'=> 0,
'siku1Hidden'=> '',
'siku2Hidden'=> '',
'siku3Hidden'=> '',
'kiboSuruSKSU1Hidden'=> '',
'kiboSuruSKSU2Hidden'=> '',
'kiboSuruSKSU3Hidden'=> '',
'summaryDisp'=> false,
'searchInitDisp'=> 0,
'screenId'=> 'GECA110010',
'action'=> '',
'codeAssistType'=> '',
'codeAssistKind'=> '',
'codeAssistCode'=> '',
'codeAssistItemCode'=> '',
'codeAssistItemName'=> '',
'codeAssistDivide'=> '',
'maba_vrbs'=> 'infTkRiyoDantaiBtn,searchShosaiBtn,searchBtn,searchNoBtn,searchClearBtn,dispDetailBtn,kyujinhyoBtn',
'preCheckFlg'=> false,
);
//this is the parameter
$nextPageParam = array(
'kSNoJo'=> '',
'kSNoGe'=> '',
'kjKbnRadioBtn'=> 1,
'nenreiInput'=> '',
'tDFK1CmbBox'=> '',
'tDFK2CmbBox'=> '',
'tDFK3CmbBox'=> '',
'sKGYBRUIJo1'=> '',
'sKGYBRUIGe1'=> '',
'sKGYBRUIJo2'=> '',
'sKGYBRUIGe2'=> '',
'sKGYBRUIJo3'=> '',
'sKGYBRUIGe3'=> '',
'freeWordInput'=> '',
'nOTKNSKFreeWordInput'=> '',
'searchBtn'=> '検索',
'kJNoJo1'=> '',
'kJNoGe1'=> '',
'kJNoJo2'=> '',
'kJNoGe2'=> '',
'kJNoJo3'=> '',
'kJNoGe3'=> '',
'kJNoJo4'=> '',
'kJNoGe4'=> '',
'kJNoJo5'=> '',
'kJNoGe5'=> '',
'jGSHNoJo'=> '',
'jGSHNoChuu'=> '',
'jGSHNoGe'=> '',
'kyujinkensu'=> '0',
'fwListNaviSortTop'=> 1,
'fwListNaviDispTop'=> 30,
'fwListNaviBtn1'=> '',
'fwListNaviBtn2'=> '',
'fwListNaviBtn3'=> '',
'fwListNaviBtn4'=> 4,
'fwListNaviBtn5'=> '',
'fwListNaviBtn6'=> '',
'fwListNaviSortBtm'=> 1,
'fwListNaviDispBtm'=> 30,
'fwListNowPage'=> 1,
'fwListLeftPage'=> 1,
'fwListNaviCount'=> 7,
'fwListNaviDisp'=> 30,
'fwListNaviSort'=> 1,
'iNFTeikyoRiyoDantaiID'=> '',
'searchClear'=> 0,
'siku1Hidden'=> '',
'siku2Hidden'=> '',
'siku3Hidden'=> '',
'kiboSuruSKSU1Hidden'=> '',
'kiboSuruSKSU2Hidden'=> '',
'kiboSuruSKSU3Hidden'=> '',
'summaryDisp'=> true,
'searchInitDisp'=> 1,
'screenId'=> 'GECA110010',
'action'=> '',
'codeAssistType'=> '',
'codeAssistKind'=> '',
'codeAssistCode'=> '',
'codeAssistItemCode'=> '',
'codeAssistItemName'=> '',
'codeAssistDivide'=> '',
'maba_vrbs'=> 'infTkRiyoDantaiBtn,searchShosaiBtn,searchBtn,searchNoBtn,searchClearBtn,dispDetailBtn,kyujinhyoBtn',
'preCheckFlg'=> false,
);
$sendParam = http_build_query($firstParam); // generate URL-encoded query string
$nextPage = http_build_query($nextPageParam); // generate URL-encoded query string
$cookies = array(); // to store the cookie data that we get from the response
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $mainUrl);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $sendParam);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADERFUNCTION, 'readCookie'); // get the cookie data for readCookie function
curl_setopt($ch, CURLOPT_COOKIE, $cookies['set-cookie']); // send the cookie data
echo "<pre>";
$html = curl_exec($ch);
curl_setopt($ch, CURLOPT_POSTFIELDS, $nextPage);
curl_setopt($ch, CURLOPT_HEADERFUNCTION, 'readCookie');
curl_setopt($ch, CURLOPT_COOKIE, $cookies['set-cookie']);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$html = curl_exec($ch);
echo $html;
// set the current the cookie data to $cookies array
function readCookie($curl, $header) {
global $cookies;
$len = strlen($header);
$header = explode(':', $header, 2);
if (count($header) < 2) // ignore invalid headers
return $len;
$name = strtolower(trim($header[0]));
if ($name == 'set-cookie') {
$parts = explode(';', $header[1]);
$value = trim($parts[0]);
$cookies[$name] = $value;
}
return $len;
}

Related

Bybit - Place Simple Order... (php)

I am using the following 'suggested' code to post a test order to Bybit.
(https://github.com/bybit-exchange/api-usage-examples/blob/master/api_demo/futures/Encryption.php)
<?php
function get_signed_params($public_key, $secret_key, $params) {
$params = array_merge(['api_key' => $public_key], $params);
ksort($params);
//decode return value of http_build_query to make sure signing by plain parameter string
$signature = hash_hmac('sha256', urldecode(http_build_query($params)), $secret_key);
return http_build_query($params) . "&sign=$signature";
}
$params = [
'symbol' => 'BTCUSDT',
'side' => 'Buy',
'order_type' => 'Limit',
'qty' => '1',
'price' => '30000',
'time_in_force' => 'GoodTillCancel',
'reduce_only' => false,
'close_on_trigger' => false,
'timestamp' => time() * 1000,
'position_idx' => 0
];
//$url = 'https://api-testnet.bybit.com/private/linear/order/create';
$url = 'https://api.bybit.com/v2/private/order/create';
$public_key = 'my_key_is_here_in_my_code';
$secret_key = 'my_secret_key_is_here_in_my_code';
$qs=get_signed_params($public_key, $secret_key, $params);
$curl_url=$url."?".$qs;
$curl=curl_init($curl_url);
echo $curl_url;
curl_setopt($curl, CURLOPT_URL, $curl_url);
#curl_setopt($curl, CURLOPT_POSTFIELDS, $qs);
curl_setopt($curl, CURLOPT_POST, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 0);
#curl_setopt($curl, CURLOPT_PROXY,"127.0.0.1:1087");
$response=curl_exec($curl);
echo $response;
However, I receive the following within the response: validation for 'symbol' failed on the 'symbol' tag"
https://api.bybit.com/v2/private/order/create?api_key=XXXXXX_my_key_XXXXX&close_on_trigger=0&order_type=Limit&position_idx=0&price=30000&qty=1&reduce_only=0&side=Buy&symbol=BTCUSDT&time_in_force=GoodTillCancel×tamp=1647644020000&sign=0e08e9f9be4cf5e4d7b1294d769ab4bf3b5b79ae9f92bab717670b3d95be0672{"ret_code":10001,"ret_msg":"Param validation for 'symbol' failed on the 'symbol' tag","ext_code":"","ext_info":"","result":null,"time_now":"1647644020.389465","rate_limit_status":99,"rate_limit_reset_ms":1647644020387,"rate_limit":100}
Could somebody pls suggest why Bybit is not recognising the 'BTCUSDT' symbol as expected. As everything seems setup on the exchange. Many thanks for your help.
BTCUSDT is not a valid symbol as per the documenation. These are the list of valid symbols that you can use.

get cookies from curl

in adobe connect api there is 2 step for getting data:
step 1) from http://87.107.152.107/api/xml?action=login&login=username&password=password I have to get token to verify in other api.
step 2) from http://87.107.152.107/api/xml?action=report-my-meetings I can get meetings report with token got from step 1.
the problem is when I use postman to use these api , postman set cookie from step 1 api. and it needs this cookie for step 2.
I want to use cookie in curl php but I don't know how to get it. My code for step 1:
$curl = curl_init();
curl_setopt_array($curl, array(
CURLOPT_URL => 'http://87.107.152.107/api/xml?action=login&login=username&password=password',
CURLOPT_RETURNTRANSFER => true,
CURLOPT_ENCODING => '',
CURLOPT_MAXREDIRS => 10,
CURLOPT_TIMEOUT => 0,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
CURLOPT_CUSTOMREQUEST => 'GET',
));
$response = curl_exec($curl);
I used this and worked for me:
curl_setopt($ch, CURLOPT_HEADER, 1);
//Return everything
$res = curl_exec($ch);
//Split into lines
$lines = explode("\n", $res);
$headers = array();
$body = "";
foreach($lines as $num => $line){
$l = str_replace("\r", "", $line);
//Empty line indicates the start of the message body and end of headers
if(trim($l) == ""){
$headers = array_slice($lines, 0, $num);
$body = $lines[$num + 1];
//Pull only cookies out of the headers
$cookies = preg_grep('/^Set-Cookie:/', $headers);
break;
}
}

php CURL request using array with key - values

I need to make a API request using CURL, the request are as follows:
{
"source":"0",
"custom":true,
"credit":"2",
"filetype":"2",
"ustate":"jkghlyt",
"service":"1234",
"templateid":"632",
"retryatmpt": 1,
"retryduration": 1,
"custretry": 1,
"refno":true,
"mslist":[{"phone1":"XXXXXXXXXXX","phone2":"XXXXXXXXXX"}]
}
I have the following PHP CURL code:
$phone1 = "XXXXXX";
$phone2 ="XXXXXXX";
//CURL
$url = 'http://api-ip-XXXX';
$param = array('source' => '0',
'custom' => true,
'credit' => '2',
'filetype' => '2',
'ustate' => 'jkghlyt',
'service' => '1234',
'templateid' => '632',
'retryatmpt' => '1',
'retryduration' => '1',
'custretryatmpt' => '1',
'refno' => true,
'mslist' => array(
urlencode($phone1),
urlencode($phone2)
)
);
$url = $url . "?" . http_build_query($param, '&');
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, 30);
$result = curl_exec($ch);
curl_close($ch);
I am confused regarding adding $phone1 & $phone2. It is not correct.
Requesting help from experts.

How can i use fsockopen on my site?

I'm using curreny codes on my site. With the following:
<?php
$contents = file_get_contents('http://www.tcmb.gov.tr/kurlar/today.html');
$contents = iconv("windows-1254" , "utf8" , $contents);
$dollar = preg_match('~ABD DOLARI\s+(.+?)\s+(.+?)\s+~i', $contents, $matches) ? array('buying' => $matches[1], 'selling' => $matches[2]) : '';
$euro = preg_match('~EURO\s+(.+?)\s+(.+?)\s+~i', $contents, $matches) ? array('buying' => $matches[1], 'selling' => $matches[2]) : '';
$gbp = preg_match('~İNGİLİZ STERLİNİ\s+(.+?)\s+(.+?)\s+~i', $contents, $matches) ? array('buying' => $matches[1], 'selling' => $matches[2]) : '';
$chf = preg_match('~İSVİÇRE FRANGI\s+(.+?)\s+(.+?)\s+~i', $contents, $matches) ? array('buying' => $matches[1], 'selling' => $matches[2]) : '';
echo '
<table class="form" style="background:#fff;width:300px;margin-left:14px;">
<tr style="border-bottom:1px solid #e4e4e4;">
..
But today my site is give error:
Warning: eval() (/var/www/vhosts/mysite.com/httpdocs/modules/php/php.module(80) : eval()'d code dosyasının 2 satırı) içinde file_get_contents() [function.file-get-contents]: http:// wrapper is disabled in the server configuration by allow_url_fopen=0.
I did ask my to hosting support about this problem and they are say:
"Don't use fopen option, please use 'fsockopen'" But i don't know how can i do this?
Plese help me. Thanks.
Use curl instead then. A function to replace file_get_contents from a remote server is:
function get_web_page( $url ) {
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "spider", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $content;
//If you want error information, 'return $header;' instead.
return $content;
}
From there change $contents = file_get_contents('http://www.tcmb.gov.tr/kurlar/today.html'); to $contents = get_web_page('http://www.tcmb.gov.tr/kurlar/today.html');

How to get the real URL after file_get_contents if redirection happens?

I'm using file_get_contents() to grab content from a site, and amazingly it works even if the URL I pass as argument redirects to another URL.
The problem is I need to know the new URL, is there a way to do that?
If you need to use file_get_contents() instead of curl, don't follow redirects automatically:
$context = stream_context_create(
array(
'http' => array(
'follow_location' => false
)
)
);
$html = file_get_contents('http://www.example.com/', false, $context);
var_dump($http_response_header);
Answer inspired by: How do I ignore a moved-header with file_get_contents in PHP?
You might make a request with cURL instead of file_get_contents().
Something like this should work...
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, FALSE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$a = curl_exec($ch);
if(preg_match('#Location: (.*)#', $a, $r))
$l = trim($r[1]);
Source
Everything in one function:
function get_web_page( $url ) {
$res = array();
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // do not return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_USERAGENT => "spider", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$res['content'] = $content;
$res['url'] = $header['url'];
return $res;
}
print_r(get_web_page("http://www.example.com/redirectfrom"));
A complete solution using the bare file_get_contents (note the in-out $url parameter):
function get_url_contents_and_final_url(&$url)
{
do
{
$context = stream_context_create(
array(
"http" => array(
"follow_location" => false,
),
)
);
$result = file_get_contents($url, false, $context);
$pattern = "/^Location:\s*(.*)$/i";
$location_headers = preg_grep($pattern, $http_response_header);
if (!empty($location_headers) &&
preg_match($pattern, array_values($location_headers)[0], $matches))
{
$url = $matches[1];
$repeat = true;
}
else
{
$repeat = false;
}
}
while ($repeat);
return $result;
}
Note that this works only with an absolute URL in the Location header. If you need to support relative URLs, see
PHP: How to resolve a relative url.
For example, if you use the solution from the answer by #Joyce Babu, replace:
$url = $matches[1];
with:
$url = getAbsoluteURL($matches[1], $url);
I use get_headers($url, 1);
In my case redirect url in get_headers($url, 1)['Location'][1];

Categories