Good evening!
I've an script in PHP which makes a CURL call to a remote host loggin page.
After loggin in and keeping the session via cookiejar opt and cookiefile opt, I use the same CURL connection handler to loggin in on to the immediatly next page wich needs an upload.
When it's done, I got the full session parameters and I can call any page I want from the site, but IN CURL!
The idea, is that this script wich uses CURL, needs to finally be redirected to one of those pages in the remote host using the CURL session, but this is not possible, because from curl you can not show the results as a redirected page.
So I've tried alot of options. None of em works at all.
Schema:
PHP script on a local server.
Call to domain.com/loggin.php (creates curl ch)
Keep curl session on cookie.txt file.
Call to domain.com/loggin_2.php with the same ch (non closed last one).
Full logged in on the remote site.
Back to the PHP script. Need to redirect to domain.com/index.php, wich needs Session variables filled in with the full login process.
What to do then?
1) After having full loggin in, read cookies.txt file to get PHPSESSID.
Then tried to use setcookie(), or via header("Set-cookie: ...") and immediatly after, using header("Location: domain.com/index.php").
Doesn't work.
2) Tried same thing via ajax call and finally document.cookie = ...
Doesn't work.
3) Adding a third cURL call to a file in my remote host wich prints a JSONED $_SESSION.
Getting it on my PHP script, decoding it and loaded on my local session via foreach on any array value (foreach()...$_SESSION[$c] = $v).
Added a session_start() before this foreach. And immediatly after, a header("Loaction: domain.com/index.php").
Doesn't work.
4) Added a session_write_close() before the header("Loaction: domain.com/index.php").
Doesn't work.
So I don't really know how to use the CURL session.
I've tried to manually fix the PHPSESSID via Web Developer Firefox plugin. And I wrote down the curl generated session id. It perfeclty works. So, It should be possible to fix it via scripting on my php script! But I can't!
Give me a hand, please!
Thanks!
I may have gotten lost a bit, but I think I understand.
You can use
CURLOPT_HEADER for some debugging (will contain current redirected page info)
and CURLOPT_FOLLOWLOCATION like so:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://domain.com/login.php');
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
I also use
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
to return as a string, which is much more useful for debugging, or parsing.
Related
I am trying to get content of the page: I use google dev tool (network) and use "copy as curl" which gives me:
curl 'http://www.example.com/default.aspx/GetAnnonces' -H 'Cookie: `ASP.NET_SessionId=eolrcogrk1owhmpbsogwd0mf; EPC_alerte=;`
This works fine for a while, I guess beacuse of the session life period.
My question is:
Where the SessionId "eolrcogrk1owhmpbsogwd0mf" comes from and how to generate it so I can access the page any time ?
It comes from the Set-Cookie HTTP response header of the page you visited. If you're trying to use cURL in PHP it will automatically handle cookies for you and you can set CURLOPT_COOKIEJAR with curl_setopt to retain cookies even after the request is complete.
If you just want to see the response headers you could also use curl_setopt($handle, CURLOPT_HEADER, true) and look at the Set-Cookie response headers. Though there's no practical reason for doing this for most typical use cases since cURL will just handle the cookies for you like your browser would.
As I understand it, curl uses the cookiefile parameter to read the cookies, and cookiejar to save them once the curl session is completed.
Typical examples for using this indicate a file must be used, but I don't want to need manual cleanup of these leftover bits.
For example, if I set the cookiejar to a file using tempnam, I will inevitably end up with a directory full of little cookiejars that I will need to clean up.
IF the user properly logs out, I can of course delete this temp file, but I'm counting on a majority of the users just closing the browser window and the session (eventually) expiring, leaving me with no way to delete the cookie jar automatically.
My best idea thus far is to splash a cookiejar into the temp folder, read it into a session variable, and then delete the cookiejar file every time curl is used.
Previous users' implementations is to obviate the cookiejar by parsing header information, but this is a little more involved that I want to get.
Decided to go the temp-file wraparound method. Assuming your curl handler is named $c:
//Put down the cookieJar
$cookieJar = tempnam(sys_get_temp_dir(),"cookie-");
if (isset($_SESSION['c_Cookies'])) file_put_contents($cookieJar, $_SESSION['c_Cookies']);
curl_setopt($c, CURLOPT_COOKIEJAR, $cookieJar);
curl_setopt($c, CURLOPT_COOKIEFILE, $cookieJar);
and at the end of the script:
//Always destroy curl, just in case...
curl_close ($c);
unset($c);
//And pickup the cookieJar
$_SESSION['c_Cookies'] = file_get_contents($cookieJar);
unlink($cookieJar);
This of course assumes the system temp directory is writable by whoever is running php. It should ensure that the cookiejar is always deleted at the end of the script, so long as said script does not terminate prematurely.
Instead of parsing them, you could just pass-through the headers between your clients and the other server. Just remember to add a regex replace for the "domain=[^;]+" part in the set-cookie case.
I am searching 3 days for an answer and I cannot find one because I always find some obstacles.
I need to load a web page (the reason for this is to accept a cookie) and then at the same time read the source code of the new page without hitting it again. The reason for this is that the page is dynamic so the content will change.
I have tried to do this using iFrame(document.body.innerHTML) but the fact that these pages run on different servers I hit cross-site scripting issues.
I have also tried writing a php script using get_contents but this doesn't allow the cookie to be stored in my local.
This is driving me crazy.... Any suggestion will be helful! Need to use PHP or Javascript for this and any other suggestion will be useful as well.
When you are on the page document.body.innerHTML will give you the page source.
Edit: I didn't realize you were loading it like that. See this SO question.
It can be done using cURL in PHP.
A rough implementation:
$ch = curl_init('http://www.google.com/');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
$data = curl_exec($ch);
preg_match('/^Set-Cookie: (.*?);/m', $data, $cookies);
var_dump($cookies);
var_dump($data);
$data will contain the entire response, so we need to parse out the cookie headers ourselves.
If available on your system, HttpRequest would make this easier.
URL1: https://duapp3.drexel.edu/webtms_du/
URL2: https://duapp3.drexel.edu/webtms_du/Colleges.asp?Term=201125&univ=DREX
URL3: https://duapp3.drexel.edu/webtms_du/Courses.asp?SubjCode=CS&CollCode=E&univ=DREX
As a personal programming project, I want to scrape my University's course catalog and provide it as a RESTful API.
However, I'm running into the following issue.
The page that I need to scrape is URL3. But URL3 only returns meaningful information after I visit URL2 (it sets the term there Colleges.asp?Term=201125), but URL2 can only be visited after visiting URL1.
I tried monitoring the HTTP data going to and fro using Fiddler and I don't think they are using cookies. Closing the browser instantly resets everything, so I suspect they are using Session.
How can I scrape URL 3? I tried, programatically, visiting URLs 1 and 2 first, and then doing file_get_contents(url3) but that doesn't work (probably because it registers as three different sessions.
A session needs a mechanism to identify you as well. Popular methods include: cookies, session id in URL.
A curl -v on URL 1 reveals a session cookie is indeed being set.
Set-Cookie: ASPSESSIONIDASBRRCCS=LKLLPGGDFBGGNFJBKKHMPCDA; path=/
You need to send this cookie back to the server on any subsequent requests to keep your session alive.
If you want to use file_get_contents, you need to manually create a context for it with stream_context_create for to include cookies with the request.
An alternative (which I would personally prefer) would be to use curl functions conveniently provided by PHP. (It can even take care of the cookie traffic for you!) But that's just my preference.
Edit:
Here's a working example to scrape the path in your question.
$scrape = array(
"https://duapp3.drexel.edu/webtms_du/",
"https://duapp3.drexel.edu/webtms_du/Colleges.asp?Term=201125&univ=DREX",
"https://duapp3.drexel.edu/webtms_du/Courses.asp?SubjCode=CS&CollCode=E&univ=DREX"
);
$data = '';
$ch = curl_init();
// Set cookie jar to temporary file, because, even if we don't need them,
// it seems curl does not store the cookies anywhere otherwise or include
// them in subsequent requests
curl_setopt($ch, CURLOPT_COOKIEJAR, tempnam(sys_get_temp_dir(), 'curl'));
// We don't want direct output by curl
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
// Then run along the scrape path
foreach ($scrape as $url) {
curl_setopt($ch, CURLOPT_URL, $url);
$data = curl_exec($ch);
}
curl_close($ch);
echo $data;
I need to make a POST in JSON format to an HTTPS web page in a remote server and receive an answer in JSON format.
The data to be send it to the remote server is take it from the URL (bar)<---Done in PHP
My problem is to send this data and receive an answer.
I tried making it in PHP, and HTML using cURL(php) and submit(html).
The results: In PHP I can't send anything.
In HTML I can submit the data, get an answer but I can't catch in my code.
I see the answer using Wireshark, and as I see the POST is make it after a negotiation protocol, and as I said I receive an answer(encoded due to HTTPS, I think).
Now I need receive that answer in my code to generate an URL link so I'm considering to use Java Script.
I never do something similar before.
Any suggestion will be appreciated, thanks.
I'm using the following code with not result but a 20 seconds of delay until a blank page.
<?php
$url = 'https://www.google.com/loc/json';
$body = '{"version":"1.1.0","cell_towers":[{"cell_id":"48","location_area_code":1158,"mobile_country_code":752,"mobile_network_code.":7,"age":0,"signal_strength":-71,"timing_advance":2255}]}';
$c = curl_init();
curl_setopt($c, CURLOPT_URL, $url);
curl_setopt($c, CURLOPT_POST, true);
curl_setopt($c, CURLOPT_POSTFIELDS, $body);
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
//curl_setopt($c, CURLOPT_HTTPHEADERS,'Content-Type: application/json');
$page = curl_exec($c);
echo($page);
//print_r($page);
curl_close($c);
?>
New info
I Just get new very important info
"The Gears Terms of Service prohibits direct use of the Google location server (http://www.google.com/loc/json) via HTTP requests. This service may only be accessed through the Geolocation API."
So, I was going trough the wrong way, and from now I will start to learn about Gears in order to apply the Gears API.
Cheers!
There's no real reason PHP couldn't do the PHP for you, if you set things up properly.
For instance, it may require a cookie that it had set on the client browser at some point, which your PHP/curl request doesn't have.
To do proper debugging, use HTTPFox or Firebug in Firefox, which monitor the requests from within the browser itself, and can show the actual data, not the encrypted garbage that wireshark would capture.
Of course, you could use the client browser as a sort of proxy for your server. Browser posts to the HTTPS server, gets a response, then sends that response to your server. But if that data is "important" and shouldn't be exposed, then the client-side solution is a bad one.