I have an application that includes a file (FileX.php) which under certain conditions, will echo an iFrame to the screen which loads a tracking URL (FileY.php). In a production environment where I directly include FileX.php into the main page (FileA.php), the iFrame gets echoed to the screen and FileY.php is successfully called.
In testing though, I need to call multiple versions of FileA.php which each include FileX.php which outputs the iFrame to call FileY.php. I am automating this large number of requests using cURL requests.
When loading FileA.php through a cURL request, it successfully does the include() of FileX.php but because it is happening through cURL, the iFrame never loads it's destination (FileY.php).
The cURL request for fileA looks something like this:
TestFile.php
// URL
$url = "http://www.example.com/FileA.php";
// New Cookie file
$ckfile = tempnam("/tmp", "CURLCOOKIE");
// New Connection
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_COOKIEJAR, $ckfile);
curl_setopt ($ch, CURLOPT_COOKIEFILE, $ckfile);
curl_setopt($ch, CURLOPT_COOKIESESSION, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_exec($ch);
curl_close($ch);
FileA.php:
include_once('FileX.php');
FileX.php:
echo("<iframe src='http://www.example.com/FileY.php' width='0' height='0'></iframe>");
FileY.php
// Contains logging stuff to log the fact that FileY.php was called.
Like I said, if I call FileA.php directly in my browser, FileX.php is included and FileY.php is loaded in the iFrame successfully. When I call FileA.php via cURL the iFrame doesn't load and FileY.php is never called.
I've tried wrapping the echo() in FileX.php with ob_start() and ob_end_flush() to force the output but that didn't work. I've tried adding a sleep(1) in case maybe the request was happening too fast, no luck.
Is there a cURL option I can change to allow this to occur? I can't figure out why it won't load the iFrame src.
Ah, so it turns out I was using an option incorrectly.
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
This should actually be false or 0. When using true or 1, all output is caught and returned via the cURL request rather than being output. So it should be:
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 0);
Related
How does one download a file from a web page without a direct path to the file. For example a URL with GET information instead of the path. The code below seems to be downloading the actual page html instead of the file...
Not sure what I'm doing wrong. I also would like to augment this to also perform on sites that require logins but I think I would just have to add
curl_setopt($ch, CURLOPT_USERPWD, "$username:$password")
to the code?
$output_filename = "advanced.exe";
$host = "http://download.cnet.com/Advanced-SystemCare-Free/3001-2086_4-10407614.html?hlndr=1";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $host);
curl_setopt($ch, CURLOPT_VERBOSE, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_AUTOREFERER, false);
curl_setopt($ch, CURLOPT_REFERER, "http://download.cnet.com");
curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1);
curl_setopt($ch, CURLOPT_HEADER, 0);
$result = curl_exec($ch);
curl_close($ch);
$fp = fopen($output_filename, 'w');
fwrite($fp, $result);
fclose($fp);
The link you have there isn't the actual link to the file, only the page that initiates the download. By the looks of it, the page uses JavaScript to trigger the download, so you would want to dig through their code to find out exactly how they do it. Then you can find the real URL to the file.
A simple way, if you are doing this only for one file, would be to download the file in your browser, and then access the URL it used from the browser's download manager. (In Firefox, for example, right click the file and choose "Copy Download Link")
I also would like to augment this to also perform on sites that require logins but I think I would just have to add ...
That would work only for HTTP based authentication. If the site uses a traditional login form, this will not work. You'd have to submit several, sequential HTTP requests via CURL, using cookies to store the session state.
I have a code that sends a request to a PHP page to get it's headers. The thing is, on that page, copy() function is executed and cURL either waits for the whole page to load (finish copying) or returns false if I set timeout to 2-3 seconds.
How do I get page headers without waiting for copy() function to finish doing it's job?
My code so far is:
$req='page_with_copy_function_in_it.php';
$ch=curl_init($req);
curl_setopt($ch,CURLOPT_NOBODY,true);
curl_setopt($ch,CURLOPT_HEADER,true);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch,CURLOPT_TIMEOUT,2);
$data=curl_exec($ch);
curl_close($ch);
You should use a HEAD request if you don't want to load the page content.
From PHP Doc
CURLOPT_NOBODY: Set TRUE to exclude the body from the output. Request method is then set to HEAD. Changing this to FALSE does not change it to GET.
$ch = curl_init();
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, 20);
curl_setopt ($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
// Only calling the head
curl_setopt($ch, CURLOPT_HEADER, true); // header will be at output
//HERE IS THE MAGIC LINE
curl_setopt($ch, CURLOPT_NOBODY, true); // HTTP request is 'HEAD'
$content = curl_exec ($ch);
curl_close ($ch);
curl_setopt Doc
When you use cURL to access the headers of a page, the whole PHP file will be executed, even if there is long-running tasks inside. That's because HTTP headers may be overrided by the header function.
If you don't want to hang up, my suggestion is to use a command-line instead of a function to copy your file : instead of copy($source, $target), run the following if you're on a Linux system :
$source = escapeshellarg($source);
$target = escapeshellarg($target);
exec("cp $source $target &");
The & symbol will execute the command in background (so if the copy takes 3 secondes, it will be run in background and not hang your PHP file).
I try to program a webboot using PHP/CURL, but I face a problem in handling a specific page that it's loading some contents dynamically !! .. to explain more :
when I try to download the page using PHP/CURL, I do not get some contents ! then I discovered that this contents are loaded after page is loaded. and this is why CURL does not handle these missed contents.
can any one help me !
my sample code is :
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_REFERER, $reffer);
curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, $redirect);
curl_setopt($ch, CURLOPT_COOKIEFILE, ABSOLUTE_PATH."Cookies/cookies.txt");
curl_setopt($ch, CURLOPT_COOKIEJAR, ABSOLUTE_PATH."Cookies/cookies.txt");
$result = curl_exec($ch);
What URL are you trying to load? It could be that the page you're requesting has one or more AJAX requests that load content in after the fact. I don't think that cURL can accomodate runtime-loaded information via AJAX or other XHR request.
You might want to look at something like PhantomJS, which is a headless WebKit browser which will execute the page fully and return the dynamically assembled DOM.
Because the page uses javascript to load the content, you are not going to be able to do this via cURL. Check out this page for more information on the problem: http://googlewebmastercentral.blogspot.com/2007/11/spiders-view-of-web-20.html
I have a site that uses cURL to access some pages, stores the returned results in variables, and then uses these variables within its own page. The script works well except where the target cURL page has a header('Location: ...') command inside it. It seems to just ignore this header command.
The cURL command is as follows...
//Load result page into variable so portions can be allocated to correct variables
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url); # URL to post to
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1 ); # return into a variable
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postfields);
$loaded_result = curl_exec( $ch ); # run!
curl_close($ch);
I've tried changing the CURLOPT_HEADER to 1 but it doesn't do anything.
So how can I allow script redirection within the target urls using cURL to grab the results? By the way, the pages work fine if accessed other than via cURL but iFrames are not an option in this instance.
If you want cURL to follow redirections add this:
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE); // Follow redirects
You'll want the options CURLOPT_FOLLOWLOCATION and CURLOPT_MAXREDIRS. See the manual.
try
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
I'm connecting to a website daily to collect some statistics, the website runs .net to make things extra difficult. What i would like to do is to mechanize this process.
I go to http://www.thesite.com:8080/statistics/Login.aspx?ReturnUrl=%2Fstatistics%2Fdataexport.ashx%3FReport%3D99, (the return url is /statistics/dataexport.ashx?Report=99 decoded).
The Login.aspx displays a form, in which I enter my user/pass and when the form is submitted the dataexport.ashx starts to download the file directly. The filename delivered is always statistics.csv.
I have experimented with this a few days now. Are there any resources or does anyone have some kind of hint of what I should try next?
Here is some of my code.
<?php
// INIT CURL
$ch = curl_init();
// SET URL FOR THE POST FORM LOGIN
curl_setopt($ch, CURLOPT_URL, $url);
// ENABLE HTTP POST
curl_setopt ($ch, CURLOPT_POST, 1);
// SET POST PARAMETERS : FORM VALUES FOR EACH FIELD
$viewstate = urlencode('/wEPDwUKM123123daE2MGQYAQUeX19Db250cm9sc1JlcXVpcmVQb3N0QmFja0tleV9fFgEFGG1fTG9naW4kTG9naW5JbWFnZUJ1dHASdasdRvbij2MVoasdasdYibEXm/eSdad4hS');
$eventval = urlencode('/wEWBAKMasd123LKJJKfdAvD8gd8KAoCt878OED00uk0pShTQHkXmZszVXtBJtVc=');
curl_setopt ($ch, CURLOPT_POSTFIELDS, "__VIEWSTATE=$viewstate"."__EVENTVALIDATION=$eventval&UserName=myuser&Password=mypassword");
// IMITATE CLASSIC BROWSER'S BEHAVIOUR : HANDLE COOKIES
curl_setopt ($ch, CURLOPT_COOKIEJAR, 'cookie.txt');
# Setting CURLOPT_RETURNTRANSFER variable to 1 will force cURL
# not to print out the results of its query.
# Instead, it will return the results as a string return value
# from curl_exec() instead of the usual true/false.
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
// FOLLOW REDIRECTS AND READ THE HEADER
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_HEADER, true);
// EXECUTE REQUEST (FORM LOGIN)
$store = curl_exec ($ch);
// print the result
print_r($store);
// CLOSE CURL
curl_close ($ch);
?>
Thanks
Trikks
You also need to use CURLOPT_COOKIEFILE to send the cookies along with the next request. Another thing if i remember correctly is that ASPX would set unique value each time for variables like __VIEWSTATE. See if these 2 pointers help.