Here's my problem. A few months ago, I wrote a PHP script to get connected to my account on a website. I was using CURL to get connected and everything was fine. Then, they updated the website and now I am no longer able to get connected. The problem is not with CURL, as I do not get any error from CURL, but it is the website itself which tells me that I am not able.
Here's my script :
<?php
require('simple_html_dom.php');
//Getting the website main page
$url = "http://www.kijiji.ca/h-ville-de-quebec/1700124";
$main = file_get_html($url);
$links = $main -> find('a');
//Finding the login page
foreach($links as $link){
if($link -> innertext == "Ouvrir une session"){
$page = $link;
}
}
$to_go = "http://www.kijiji.ca/".$page->href;
//Getting the login page
$main = file_get_html($to_go);
$form = $main -> find("form");
//Parsing the page for the login form
foreach($form as $f){
if($f -> id == "login-form"){
$cform = $f;
}
}
$form = str_get_html($cform);
//Getting my post data ready
$postdata = "";
$tot = count($form->find("input"));
$count = 0;
/*I've got here a foreach loop to find all the inputs in the form. As there are hidden input for security, I make my script look for all the input and get the value of each, and then add them in my post data. When the name of the input is emailOrNickname or password, I enter my own info there, then it gets added to the post data*/
foreach($form -> find("input") as $input){
$count++;
$postdata .= $input -> name;
$postdata .= "=";
if($input->name == "emailOrNickname"){
$postdata.= "my email address ";
}else if($input->name == "password"){
$postdata.= "my password";
}else{
$postdata .= $input -> value;
}
if($count<$tot){
$postdata .= "&";
}
}
//Getting my curl session
$ch = curl_init();
curl_setopt_array($ch, array(
CURLOPT_URL => $to_go,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_POST => true,
CURLOPT_POSTFIELDS => $postdata,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_COOKIESESSION => true,
CURLOPT_COOKIEJAR => 'cookie.txt'
));
$result = curl_exec ($ch);
curl_close ($ch);
echo $result;
?>
CURL nor PHP return any error. In fact, it returns the webpage of the website, but this webpage tells me that there's an error that occurred, as if there was missing some post data.
What do you think can cause that ? Could it be some missing curl_setopts ? I've got no idea, do you have any ?
$form = $main -> find("form") finds first occurrence of element
and that is <form id="SearchForm" action="/b-search.html">
you will need to change that into $form = $main->find('#login-form')
Most likely the problem is that the site (server) checks cookies. This process mainly consists of two phases:
1) When you visit the site first time on some page, e.g. on the login page, the server sets cookies with some data.
2) On each subsequent page visit or POST request the server checks cookies it has set.
So you have to reproduce this process in your script which mean you have to use CURL to get any page from the site, including the login page which should be getting by CURL, not file_get_html.
Furthemore you have to set both CURLOPT_COOKIEJAR and CURLOPT_COOKIEFILE options to the same absolute path value ('cookies.txt' is a relative path) on each request. This is necessary in order to enable cookies auto-handling (session maintaining) within entire series of requests (including redirects) the script will perform.
Related
Context
I have the following POST pipeline:
index.php -> submit.php ->list/item/new/index.php
index.php has a normal form with an action="submit.php" property.
submit.php decides where to send the following post request by some logic based on the POST variable content.
The problem is that I haven't found a successful way to debug this pipeline. Somewhere, something is failing and I would appreciate a fresh pair of eyes.
What I have tried
I have tried running list/item/new/index.php with dummy parameters through a regular GET request. DB updates successfully.
I have tried running submit.php (below) with dummy parameters through a regular GET request. The value of $result is not FALSE, indicating the file_get_contents request was successful, but it's value is the literal content of list/new/index.php instead of the generated content, which I expect to be the result of
echo $db->new($hash,$content) && $db->update_content_key($hash);
Here is submit.php
$url = 'list/new/index.php';
if($test){
$content = $_GET["i"];
$hash = $_GET["h"];
}else{
$content = $_POST["item"]["content"];
$hash = $_POST["list"]["hash"];
}
$data = array(
'item'=>array('content' => $content),
'list'=>array('hash' => $hash)
);
$post_content = http_build_query($data);
$options = array(
'http' => array(
'header' => "Content-type: application/x-www-form-urlencoded\r\n".
"Content-Length: " . strlen($post_content) . "\r\n",
'method' => 'POST',
'content' => $post_content
)
);
$context = stream_context_create($options);
$result = file_get_contents($url, false, $context);
if ($result === FALSE) {
echo "error";
//commenting out for testing. should go back to index.php when it's done
//header('Location: '.$root_folder_path.'list/?h='.$hash.'&f='.$result);
}
else{
var_dump($result);
//commenting out for testing. should go back to index.php when it's done
//header('Location: '.$root_folder_path.'list/?h='.$hash.'&f='.str_replace($root_folder_path,"\n",""));
}
And here is list/item/new/index.php
$db = new sql($root_folder_path."connection_details.php");
if($test){
$content = $_GET["i"];
$hash = $_GET["h"];
}else{
$content = $_POST["item"]["content"];
$hash = $_POST["list"]["hash"];
}
// insert into DB, use preformatted queries to prevent sqlinjection
echo $db->new($hash,$content) && $db->update_content_key($hash);
The worst thing about this is that I don't know enough of PHP to effectively debug this (I actually had it working at some point today but I did not commit right then...).
All comments and suggestions are welcome. I appreciate your time.
Got it.
I'm not sure what to call the error I was making (or what is actually going on behind the scenes) but it was the following:
on the POST request I was using
$url='list/item/new/index.php'
I used the whole url scheme:
$url = 'https://example.com/list/item/new/index.php';`
I am trying to scrape torrentz2.eu search results by using old(dead) torrentz.eu scraper code:
when I run http://localhost/jits/torz/api.php?key=kabali
it showing me warning and null value.
Notice: Undefined variable: results_urls in /Applications/XAMPP/xamppfiles/htdocs/jits/torz/api.php on line 59
null
why?
can anybody tell me what's wrong with code.?
here is code:
<?php
$t= $_GET['key'];
// Defining the basic cURL function
function curl($url) {
// Assigning cURL options to an array
$options = Array(
CURLOPT_RETURNTRANSFER => TRUE, // Setting cURL's option to return the webpage data
CURLOPT_FOLLOWLOCATION => TRUE, // Setting cURL to follow 'location' HTTP headers
CURLOPT_AUTOREFERER => TRUE, // Automatically set the referer where following 'location' HTTP headers
CURLOPT_CONNECTTIMEOUT => 120, // Setting the amount of time (in seconds) before the request times out
CURLOPT_TIMEOUT => 120, // Setting the maximum amount of time for cURL to execute queries
CURLOPT_MAXREDIRS => 10, // Setting the maximum number of redirections to follow
CURLOPT_USERAGENT => "Mozilla/5.0 (Windows NT 5.1; rv:31.0) Gecko/20100101 Firefox/31.0", // Setting the useragent
CURLOPT_URL => $url, // Setting cURL's URL option with the $url variable passed into the function
);
$ch = curl_init(); // Initialising cURL
curl_setopt_array($ch, $options); // Setting cURL's options using the previously assigned array data in $options
$data = curl_exec($ch); // Executing the cURL request and assigning the returned data to the $data variable
curl_close($ch); // Closing cURL
return $data; // Returning the data from the function
}
?>
<?php
// Defining the basic scraping function
function scrape_between($data, $start, $end){
$data = stristr($data, $start); // Stripping all data from before $start
$data = substr($data, strlen($start)); // Stripping $start
$stop = stripos($data, $end); // Getting the position of the $end of the data to scrape
$data = substr($data, 0, $stop); // Stripping all data from after and including the $end of the data to scrape
return $data; // Returning the scraped data from the function
}
?>
<?php
$url = "https://torrentz2.eu/search?f=$t"; // Assigning the URL we want to scrape to the variable $url
$results_page = curl($url); // Downloading the results page using our curl() funtion
//var_dump($results_page);
//die();
$results_page = scrape_between($results_page, "<dl><dt>", "<a href=\"http://www.viewme.com/search?q=$t\" title=\"Web search results on ViewMe\">"); // Scraping out only the middle section of the results page that contains our results
$separate_results = explode("</dd></dl>", $results_page); // Expploding the results into separate parts into an array
// For each separate result, scrape the URL
foreach ($separate_results as $separate_result) {
if ($separate_result != "") {
$results_urls[] = scrape_between($separate_result, "\">", "<b>"); // Scraping the page ID number and appending to the IMDb URL - Adding this URL to our URL array
}
}
//print_r($results_urls); // Printing out our array of URLs we've just scraped
if($_GET["key"] === null) {
echo "Keyword Missing ";
} else if(isset($_GET["key"])) {
echo json_encode($results_urls);
}
?>
for old torrentz.eu scraper code ref: GIT repo
First thing you get NOTICE "Undefined variable: results_urls" because $results_urls is defined and used directly. Define it and then use it.
Do something like:-
// $results_urls defined here:-
$results_urls = [];
// For each separate result, scrape the URL
foreach ($separate_results as $separate_result) {
if ($separate_result != "") {
$results_urls[] = scrape_between($separate_result, "\">", "<b>"); // Scraping the page ID number and appending to the IMDb URL - Adding this URL to our URL array
}
}
Secondly the null is printed because $results_urls is not getting populated because $separate_results is not getting populated correctly. It just has one value which is empty.
I debugged further and found $results_page value is false. So whatever you are trying to do in "scrape_between" function is not working as expected. Fix your function.
I have two sites one is a.com another is b.com i am passing data using curl from a.com to b.com ,i am successfully able to pass data but the problem is i want to make it more secure so that site b.com responses after ensuring that the post was from site a.com.How to obtain this?
Code in site a.com
<?php
$some_data = array(
'message' =--> 'Hello World',
'name' => 'Chad'
);
$curl = curl_init();
// You can also set the URL you want to communicate with by doing this:
// $curl = curl_init('http://localhost/echoservice');
// We POST the data
curl_setopt($curl, CURLOPT_POST, 1);
// Set the url path we want to call
curl_setopt($curl, CURLOPT_URL, 'http://localhost/b.com');
// Make it so the data coming back is put into a string
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
// Insert the data
curl_setopt($curl, CURLOPT_POSTFIELDS, $some_data);
// You can also bunch the above commands into an array if you choose using: curl_setopt_array
// Send the request
$result = curl_exec($curl);
// Free up the resources $curl is using
curl_close($curl);
echo $result;
?>
Code in B.com
//I want to check here that the request was from a.com ,if it is ensured then i want to do //the rest of the work
echo 'Your message was: ' . $_REQUEST["message"] . ' and your name is: ' . $_REQUEST["name"];
?
You could check the $_SERVER['REFERER'] property, but it's very unreliable / unsafe.
A better approach would be to set up the B site with Basic Auth, or something similar, that you can authenticate against when you make the request from site A. Then you can add basic auth to your curl request from A to B. B checks the authentication, and if correct proceeds with the rest of the processing.
$_SERVER['REMOTE_ADDR'] would be the solution
if($_SERVER['REMOTE_ADDR']=="IP OF A.com"){
//exec code
}else{
log_error($_SERVER['REMOTE_ADDR'] has tried to access B.com at date());//that's an ex .
}
The simplest way to achive this, would be to create a Key that site a.com know and site b.com knows.
Then you could pass the key from one server to the other via curl, and as long as know one else knows what the key is they won't be able to access it (assuming you program it that way).
This is how most API's work, such as Facebook, Twitter, Linkedin, etc.
Your post data would then look like this for example (a.com):
$some_data = array(
'message' =--> 'Hello World',
'name' => 'Chad',
'key' => '4h9rj8wj49tj0wgj0ejwrkw0jt0ekv0ijspxodxk9rje0rg9tskvep9rrgt9wkrgte'
);
Then on b.com you would just do this:
if(!isset($_POST['key']) && $_POST['key'] != '4h9rj8wj49tj0wgj0ejwrkw0jt0ekv0ijspxodxk9rje0rg9tskvep9rrgt9wkrgte'){
die("Invalid Key");
}
You can use a public/private pair system. A simple version would be like this:
//a.com
$keys = array(
'publicKey1' => 'privateKey1',
'publicKey2' => 'privateKey2',
//...
'ksjdlfksjdlf' => '989384kjd90903#kjskdjdsd'
);
$publicKeys = array_keys($keys);
//get a random key from pool
$publicKey = $publicKeys[rand(0, count($publicKeys))];
$privateKey = $keys[$publicKey];
//your data...
$some_data = array(
'message' => 'Hello World',
'name' => 'Chad'
);
/*generate a verification code from data...*/
//add public key to data
$some_data['key'] = $publicKey;
//sort data (to always generate same verification code regardless of params order)
uksort($some_data);
//generate code with your private key
$verificationKey = sha1($privateKey . http_build_query($some_data) . $privateKey);
//add verification code to sent data
$some_data['verification_code'] = $verificationKey;
//send data
curl_exec(...);
and on b.com:
$keys = "same keys that exist on a.com";
if (!isset($_POST['key']) || !isset($_POST['verification_code']) || !isset($keys[$_POST['key'])) {
//do something to handle invalid request
}
$verificationKey = $_POST['verification_code'];
$privateKey = $keys[$_POST['key']];
//remove verification code from data
unset($_POST['verification_code']);
//sort data
uksort($_POST);
$checkKey = sha1($privateKey . http_build_query($_POST) . $privateKey);
//validate key
if ($checkKey != $verificationKey) {
//handle invalid data
}
//verified. do something with $_POST
i write some code in php.
I wanna get last redirecting adress on this is site:
fluege.de
I posting this is;
$dep= "sFlightInput[accDep]=ZRH";
$arr= "sFlightInput[accArr]=VIE";
$depregion= "sFlightInput[accDepRegion]=";
$arrregion= "sFlightInput[accArrRegion]=";
$multidep= "sFlightInput[accMultiAirportDep]=ZRH";
$multiarr= "sFlightInput[accMultiAirportArr]=ZRH";
$ftype = "sFlightInput[flightType]=RT";
$depcity = "sFlightInput[depCity]=Zürich+-+Flughafen+(ZRH)+-+Schweiz";
$arrcity = "sFlightInput[arrCity]=Wien+-+Internationaler+Flughafen+(VIE)+-+Österreich";
$sdate = "sFlightInput[departureDate]=29.03.2014";
$srange = "sFlightInput[departureTimeRange]=2";
$rdate ="sFlightInput[returnDate]=05.04.2014";
$rrange = "sFlightInput[returnTimeRange]=2";
$adt = "sFlightInput[paxAdt]=1";
$chd ="sFlightInput[paxChd]=0";
$inf = "sFlightInput[paxInf]=0";
$cabin = "sFlightInput[cabinClass]=Y";
$airline = "sFlightInput[depAirline]=";
$send = $dep.$arr.$depregion.$arrregion.$multidep.$multiarr.$ftype.$depcity.$arrcity.$sdate.$srange.$rdate.$rrange.$adt.$chd.$inf.$cabin.$airline;
I using this ;
echo getLastEffectiveUrl("http://www.fluege.de/flight/wait/".$send);
And there is function
function getLastEffectiveUrl($url)
{
// initialize cURL
$curl = curl_init($url);
curl_setopt_array($curl, array(
CURLOPT_RETURNTRANSFER => true,
CURLOPT_FOLLOWLOCATION => true,
));
// execute the request
$result = curl_exec($curl);
// fail if the request was not successful
if ($result === false) {
curl_close($curl);
return null;
}
// extract the target url
$redirectUrl = curl_getinfo($curl, CURLINFO_EFFECTIVE_URL);
curl_close($curl);
return $redirectUrl;
}
They code must give this url;
www.fluege.de/wait/?accDep=&accArr=&accDepRegion=&accArrRegion=&accMultiAirportDep=&accMultiAirportArr=&flightType=RT&depCity=Z%FCrich+-+Flughafen+%28ZRH%29+-+Schweiz&arrCity=Wien+-+Internationaler+Flughafen+%28VIE%29+-+%D6sterreich&departureDate=04.04.2014&departureTimeRange=2&returnDate=20.04.2014&returnTimeRange=2&paxAdt=1&paxChd=0&paxInf=0&cabinClass=Y&depAirline=
But i need ;
http://www.fluege.de/flight/encodes/sFlightInput/5f8ccad612bafb69e7693f04cfaf1458/ (etc)
The code you provided does not handle cookies, so if the site you are query'ing requires this, your code won't work.
I checked http://php.net/manual/en/function.curl-setopt.php, but it seems like cURL cannot store cookies in memory. By adding the following line under curl_setopt_array, cookies are kept in a temporary file:
CURLOPT_COOKIEJAR => tempnam(sys_get_temp_dir(), 'cookiejar'),
However, I did not get your specific case to work. I noticed that the URL you create does not contain a question mark, and that the URL that your script creates does not redirect at all; it returns with 200 OK. I checked this using the following shell command:
curl -LI 'http://www.fluege.de/flight/wait/sFlightInput\[accDep\]=ZRHsFlightInput\[accArr\]=VIEsFlightInput\[accDepRegion\]=sFlightInput\[accArrRegion\]=sFlightInput\[accMultiAirportDep\]=ZRHsFlightInput\[accMultiAirportArr\]=ZRHsFlightInput\[flightType\]=RTsFlightInput\[depCity\]=Zürich+-+Flughafen+(ZRH)+-+SchweizsFlightInput\[arrCity\]=Wien+-+Internationaler+Flughafen+(VIE)+-+ÖsterreichsFlightInput\[departureDate\]=29.03.2014sFlightInput\[departureTimeRange\]=2sFlightInput\[returnDate\]=05.04.2014sFlightInput\[returnTimeRange\]=2sFlightInput\[paxAdt\]=1sFlightInput\[paxChd\]=0sFlightInput\[paxInf\]=0sFlightInput\[cabinClass\]=YsFlightInput\[depAirline\]='
If it's unclear what the URL should look like, you should contact fluege.de to ask them how to use their API.
I'm testing something. I would like to redirect my website (http://mywebsite.com) to a Landing Page (http://landingpage.com) ONLY when I pass a GET value. For example: id=23. So when it exists the page redirects.
My questions is that is there any script with the I can capture the list where I got redirected? I want the full link, like: http://mywebsite.com/index.php?id=23
Is there any possibility to capture that?
Also is there any possiblity to do the same with POST and capture that POST value on the Landing Page?
FOR GET:
You can simply attach $_SERVER['QUERY_STRING'] to the landingpage:
<?php
$get = ($_SERVER['QUERY_STRING']) ? "?".$_SERVER['QUERY_STRING']: "";
$url = 'http://landingpage.com'.$get;
header("Location: $url");
?>
FOR POST: use CURL
if($_POST){
$ch = curl_init();
$curlConfig = array(
CURLOPT_URL => "http://landingpage.com",
CURLOPT_POST => true,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_POSTFIELDS => $_POST,
);
curl_setopt_array($ch, $curlConfig)
$result = curl_exec($ch);
curl_close($ch);
}
Why not just store the data in a session variable? All you have to do is start the session again on the redirected page, and retrieve the session variables there.
Tutorial if you're not already aware of it: http://www.tizag.com/phpT/phpsessions.php
For example, on mywebsite.com:
session_start();
if (isset($_REQUEST['id'])) {
$_SESSION['id'] = $_REQUEST['id'];
$_SESSION['url'] = "http://" . $_SERVER['SERVER_NAME'] . $_SERVER['PHP_SELF'];
}
on landingpage.com:
session_start();
$url = $_SESSION['url'] . '?id=' . $_SESSION['id'];
echo $url;
Although pretty sure this won't work if the pages are stored on two different servers, so I may be misunderstanding the question. If it's between servers, you could use a cookie instead.