I know that using cURL i can see the destination URL, pointing cURL to URL having CURLOPT_FOLLOWLOCATION = true.
Example :
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "www.example1.com");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$result = curl_exec($ch);
$info = curl_getinfo($ch); //Some information on the fetch
curl_close($ch);
$info will have the url of the final destination which can be www.example2.com.
I hope my above understanding is correct. Please let me know if not!.
My main question is, what all type of redirection cURL will be able to know?
Apache redirect, javascript redirects, form submition redirects, meta-refresh redirects!?
update
Thanks for your answeres #ceejayoz and #Josso. So is there a way by which I can follow all the redirect programatically through php?
cURL will not follow JS or meta tag redirects.
I know this answer is a little late, but I ran into a similar issue and needed more than just following the HTTP 301/302 status redirects. So I wrote a small library that will also follow rel=canonical and og:url meta tags.
https://github.com/mattwright/URLResolver.php
I found meta refresh tags to not provide much benefit, but they are used if no head or body html tag is returned.
As far as I know, it only follows HTTP Header redirects. (301 and 302).
curl is a multi-protocol library, which provides just a little HTTP support but not much more that will help in your case. You could manually scan for the meta refresh tag as workaround.
But a better idea was to check out PEAR HTTP_Request or the Zend_Http class, which more likely already provide something like this. Also phpQuery might be relevant, as it comes with its own http functions, but could easily ->find("meta[refresh]") if there's a need. Or look for a Mechanize-like browser class: Is there a PHP equivalent of Perl's WWW::Mechanize?
I just found this on the php site. It parses the response to find redirects and follows them. I don't think it gets every type of redirect, but it's pretty close
http://www.php.net/manual/en/ref.curl.php#93163
I'd copy it here but I don't want to plagiarize
Related
From everything I've read, it seems that this is an impossible. But here is my scenario:
I need to scrape a table's content containing for sale housing information. The page is not password protected or anything, but you first have to click an "I Agree" link on the previous page so that a cookie gets set saying you agree that the content may not be 100% accurate. You are only then shown the data. Is there any way at all to accomplish this using php/jquery/javascript? I know you cannot create an iframe because of the fact that it is cross-domain. I also do not have access to this other website.
Thanks for any answers, as I'm not really expecting anything positive. :) And many thanks if you can tell me how to do this. :D
Use server side script (PHP using cURL) to crawl the website and return the information you need. Make sure you set the appropriate HTTP header with your request that represents the "I agree" cookie.
Sample:
<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.example.com/');
curl_setopt($ch, CURLOPT_COOKIE, 'I_Agree=1');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$responseBody = curl_exec($ch);
curl_close($ch);
// Read the information you need from $responseBody and return it as response body
?>
Now you can access the information from your website by calling your server side script above. For details about how to use cURL take a look at the documentation.
CURL can store or recall cookies from a file depending on the options you set. Here is the "cookiejar" example:
http://curl.haxx.se/libcurl/php/examples/cookiejar.html
Check out the CURLOPT_COOKIEJAR and CURLOPT_COOKIEFILE options
I am searching 3 days for an answer and I cannot find one because I always find some obstacles.
I need to load a web page (the reason for this is to accept a cookie) and then at the same time read the source code of the new page without hitting it again. The reason for this is that the page is dynamic so the content will change.
I have tried to do this using iFrame(document.body.innerHTML) but the fact that these pages run on different servers I hit cross-site scripting issues.
I have also tried writing a php script using get_contents but this doesn't allow the cookie to be stored in my local.
This is driving me crazy.... Any suggestion will be helful! Need to use PHP or Javascript for this and any other suggestion will be useful as well.
When you are on the page document.body.innerHTML will give you the page source.
Edit: I didn't realize you were loading it like that. See this SO question.
It can be done using cURL in PHP.
A rough implementation:
$ch = curl_init('http://www.google.com/');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
$data = curl_exec($ch);
preg_match('/^Set-Cookie: (.*?);/m', $data, $cookies);
var_dump($cookies);
var_dump($data);
$data will contain the entire response, so we need to parse out the cookie headers ourselves.
If available on your system, HttpRequest would make this easier.
I'm trying to perform a redirect using cURL. I can load the page fine, that's not a problem, but if I load say google.com non of the images load and the site does not work (obviously because its just printing the HTML and not actually doing a redirect).
Is there any way to perform a redirect using cURL? Sort of similar to how ...
header("Location: http://google.com");
... works?
Any help would be much appreciated.
Well, from my understading, it seems like OP want's to redirect the user to the search results URL.
Using the GoogleAPI would be a first choice and to achieve something like that, I would do this:
<?php
$query = "firefox";
$apiUrl = "http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=".urlencode($query);
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $apiUrl);
$content = curl_exec($ch);
$content = json_decode($content);
$luckyUrl = $content->responseData->results[0]->unescapedUrl;
header("Location: ".$luckyUrl);
?>
The code above works like 'I feel lucky'....
Use curl with -L
-L/--location
(HTTP/HTTPS) If the server reports that the requested page has
moved to a different location (indicated with a Location: header
and a 3XX response code), this option will make curl redo the
request on the new place. If used together with -i/--include or
-I/--head, headers from all requested pages will be shown. When
authentication is used, curl only sends its credentials to the
initial host. If a redirect takes curl to a different host, it
won't be able to intercept the user+password. See also --loca‐
tion-trusted on how to change this. You can limit the amount of
redirects to follow by using the --max-redirs option.
When curl follows a redirect and the request is not a plain GET
(for example POST or PUT), it will do the following request with
a GET if the HTTP response was 301, 302, or 303. If the response
code was any other 3xx code, curl will re-send the following
request using the same unmodified method.
So when using cURL
add
curl_setopt($process, CURLOPT_FOLLOWLOCATION, 1);
I'm afraid it is impossible to force the client's browser to send certain POST values and refers, you can only force it to go somewhere, hence header().
Does this answer your question?
It's should to work.pls try this: header( 'Location: http://www.google.com' ).Use the (')single cote instead of "(double)
I wish to include JSPs include files which contain java code in a PHP template. The two includes in question are a header file, and a footer file. Anyone any experience of doing this? We are considering just doing a HTTP request to grab the resulting HTML from the JSP files independantly, but aren't sure if there will be slight performance issues with doing so.
Is there any better solution using some of the tools within Apache to perform this?
echo file_get_contents('http://full/link/to/jsp/page');
If you JSP page echos a header, body structure, you'll need to strip it out. You can do that from the JSP side or PHP.
That's disabled on some systems so you might need to use cURL (it also allows you to post back which you might need to do if you're playing with forms).
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_URL, 'http://full/link/to/jsp/page');
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
echo curl_exec($ch);
You can't include a JSP page into a PHP page.
You can do what you are thinking of though: doing a HTTP request to get HTML content from JSP and embed that into the PHP result. Not pretty, but will work.
There is the Java / PHP Integration extension, but it doesn't allow to compile Java code. I don't think there is a way to compile Java from PHP, if not executing command line commands.
Depending on your requirements, if you don't want to impact page loading, you could also perform an AJAX request to grab the content once the HTML page is loaded, and inject it in the page : this would move the problem to the client.
Does this JSP page change frequently, or depend on PHP page's parameters (some kind of advertisement) ?
You could also cache the output of your JSP (even by parameters) for a pair of hours or a whole day, to avoid calling the page on every request.
What is the best way to check if a given url points to a valid file (i.e. not return a 404/301/etc.)? I've got a script that will load certain .js files on a page, but I need a way to verify each URL it receives points to a valid file.
I'm still poking around the PHP manual to see which file functions (if any) will actually work with remote URLs. I'll edit my post as I find more details, but if anyone has already been down this path feel free to chime in.
The file_get_contents is a bit overshooting the purpose as it is enough to have the HTTP header to make the decision, so you'll need to use curl to do so:
<?php
// create a new cURL resource
$ch = curl_init();
// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, "http://www.example.com/");
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_NOBODY, 1);
// grab URL and pass it to the browser
curl_exec($ch);
// close cURL resource, and free up system resources
curl_close($ch);
?>
one such way would be to request the url and get a response with a status code of 200 back, aside from that, there's really no good way because the server has the option of handling the request however it likes (including giving you other status codes for files that exist, but you don't have access to for a number of reasons).
If your server doesn't have fopen wrappers enabled (any server with decent security won't), then you'll have to use the CURL functions.