I have a script called cronStats.php that pulls data from Apple's autoingest tool (data on app downloads, region etc) and populates a database for access later on.
When the script is executed in the browser, everything works correctly. But when using a scheduled cron job to execute the script there's an error.
Here is the relevant code:
function pullITCData($username,$password,$VND,$date) {
$fields_string = "USERNAME=" . urlencode($username);
$fields_string .= "&PASSWORD=" . urlencode($password);
$fields_string .= "&VNDNUMBER=" . $VND;
$fields_string .= "&TYPEOFREPORT=Sales";
$fields_string .= "&DATETYPE=Daily";
$fields_string .= "&REPORTTYPE=Summary";
$fields_string .= "&REPORTDATE=$date";
$fn = "dailyStat_" . $date . "_" . $VND;
$filename = $fn . ".gz";
//$abFN = __DIR__ . "/" . $fn;
//$abFilename = __DIR__ . "/" . $filename;
$abFN = $fn;
$abFilename = $filename;
$ch = curl_init();
echo("<br>abFN url is $abFN");
echo("<br>abFilename url is $abFilename");
$fp = fopen($abFilename, "w+");
//set the url, number of POST vars, POST data
curl_setopt($ch, CURLOPT_URL, 'https://reportingitc.apple.com/autoingestion.tft');
curl_setopt($ch, CURLOPT_POST, 7);
curl_setopt($ch, CURLOPT_POSTFIELDS,$fields_string);
curl_setopt($ch, CURLOPT_FILE, $fp);
//execute post
$contents = curl_exec ($ch);
$fileCreated = false;
if ($contents === false) {
echo("<br> no contents");
} else {
echo("<br>contents found");
}
fclose($fp);
curl_close($ch);
//delete the opened file
if ($fileCreated == true){
unlink("$abFN");
}
}
I've removed some code for brevity. Basically, what happens is I pass a string of variables to Apple's ingest tool, which then returns with a .txt.gz file (which is then opened and parsed, not shown here).
When browser executed, I get the "Contents Found" statement, but via Cron Job, the "No Contents" statement meaning that the CURL_EXEC is failing for some reason.
My thinking is that the .GZ file cannot be created because the Cron Job is executing from another directory (?). I tried setting an absolute URL to write the .txt.gz to, but this failed also:
[function.fopen]: failed to open stream: HTTP wrapper does not support writeable connections in
I'm at a loss as to how to proceed. Any ideas?
EDIT:
thank you for your feedback, it's appreciated.
Based on BojanT's cron command:
/web/cgi-bin/php5 cd "$HOME/html/path/to/php/cron/ && php cronStats.php > cron.log 2>&1
I'm getting the following error:
/bin/sh: -c: line 0: unexpected EOF while looking for matching `"'
/bin/sh: -c: line 1: syntax error: unexpected end of file
EDIT 2 - Work Around:
work around method that is working for anyone in a similar situation. It's not elegant / efficient, and doesn't actually solve the issue, but for what it's worth: set up a cron job on another file: e.g. runCron.php:
<?php
executeScript();
function executeScript() {
file_get_contents("http://website.com/path/to/php/cron/cronStats.php");
informOfExecution();
}
which then executes the actual file at the absolute url. informOfExecution() method sends an email to me notifying of update. it would be nice to cron job the file directly, but done is better than perfect. thanks all.
Change to your file's directory with 'cd' and then run your file.
Example:
* * * * * cd /f1/f2/f3/f4/ && php thing.php >> /home/output/outputs.txt
Related
I'm building a PHP application which does a curl request to a number of different URL's. It's then attempting to parse the string of data returned by curl to extract everything in the <body> </body> tags. This is working absolutely fine for 99% of URL's.
However, one such URL is a page, which takes some time to load in a browser. Upon inspection I realised that the markup for the page is 16 Mb.
The settings I have for curl are as follows:
$ch = curl_init();
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
$data = curl_exec($ch);
if (!$data) {
echo 'ERROR: Curl has reported an error: ' . curl_error($ch) . "\n";
}
return $data;
The error message I added for the !$data condition is not output - so my assumption is there are no errors from curl itself. I attempted to change CURLOPT_CONNECTTIMEOUT to 120 seconds (as opposed to 5) but this doesn't fix the issue.
When $data is returned to my script:
if ($data) {
$body = '';
preg_match("/<body[^>]*>(.*?)<\/body>/is", $data, $body);
if (empty($body)) {
echo 'WARNING: nothing found in <body> tag: ' . "\n";
var_dump($body);
} else {
// Writing to file occurs here...
// This bit works ok when $body is available.
}
}
It's showing me the warning message "WARNING: nothing found in tag:" and the output from var_dump($body) is an empty array:
array(0) {
}
Does anyone know how I can further debug this, as I'm not sure where the error is originating? I have manually saved a copy of the web page and there are indeed a starting and closing <body> tag with lots of HTML in between.
My assumption is this is some problem due to the file size. The "average" file size on this application is about 1 Mb, and my script works perfectly with these files.
I am running this on a server from the cli, i.e. php download.php not through a browser.
For some automated tests that I did, I had to record requests from Chrome, and then repeat them in curl commands.
I start checking how to do it...
The way I did it was:
Access websites when the developers tools open.
Issue requests, make sure they are logged in the console.
Right click on the requests, select 'Save as HAR with content', and save to a file.
Then run the following php script to parse the HAR file and output the correct curls:
script:
<?php
$contents=file_get_contents('/home/elyashivl/har.har');
$json = json_decode($contents);
$entries = $json->log->entries;
foreach ($entries as $entry) {
$req = $entry->request;
$curl = 'curl -X '.$req->method;
foreach($req->headers as $header) {
$curl .= " -H '$header->name: $header->value'";
}
if (property_exists($req, 'postData')) {
# Json encode to convert newline to literal '\n'
$data = json_encode((string)$req->postData->text);
$curl .= " -d '$data'";
}
$curl .= " '$req->url'";
echo $curl."\n";
}
Don't know in which version they added this feature, but Chrome now offers a "Save as cURL" option:
You can access this by going under the Network tab of the Developer Tools, and right clicking on a XHR request
Building upon the code by ElyashivLavi, I added a file name argument, error checking when reading from the file, putting curl in verbose mode, and disabling the Accept-encoding request header, which usually results in getting back compressed output that would make it hard to debug, as well as automatic execution of curl commands:
<?php
function bail($msg)
{
fprintf(STDERR, "Fatal error: $msg\n");
exit(1);
}
global $argv;
if (count($argv) < 2)
bail("Missing HAR file name");
$fname = $argv[1];
$contents=file_get_contents($fname);
if ($contents === false)
bail("Could not read file $fname");
$json = json_decode($contents);
$entries = $json->log->entries;
foreach ($entries as $entry)
{
$req = $entry->request;
$curl = 'curl --verbose -X '.$req->method;
foreach($req->headers as $header)
{
if (strtolower($header->name) === "accept-encoding")
continue; // avoid gzip response
$curl .= " -H '$header->name: $header->value'";
}
if (property_exists($req, 'postData'))
{
# Json encode to convert newline to literal '\n'
$data = json_encode((string)$req->postData->text);
$curl .= " -d '$data'";
}
$curl .= " '$req->url'";
echo $curl."\n";
system($curl);
}
PHP code:
$fields_string = "";
foreach ($postingfields as $key=>$value) {
$fields_string .= $key . '=' . urlencode($value) . '&';
}
$fields_string = rtrim($fields_string,'&');
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL,$client_url);
curl_setopt($ch,CURLOPT_POST,true);
curl_setopt($ch,CURLOPT_POSTFIELDS,$fields_string);
curl_setopt($ch,CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch);
$response = curl_getinfo( $ch );
curl_close($ch);
$client_url php variable holds the value: https://pcloudtest.com/Default.aspx?cid=99938
$fields_string php variable holds the value: &sid=30&title=Mr&firstname=Charles&surname=Smith
The destination server has been set up to respond with the following HTML:
When I debug (send info to a separate txt file in linux) the value of $result is:
<URL>https://pcloudtest.com/Default.aspx?cid=99938</URL>
ie this is what the destination server is claiming has been sent to them from my end.
In other words, the $client_url is all that is being posted, and not the rest of it (ie the $fields_string) and the full URL that should've been posted should read:
https://pcloudtest.com/Default.aspx?cid=99938&sid=30&title=Mr&firstname=Charles&surname=Smith
I have tried everything I can to figure out why the php curl functions are apparently sending out a shortened URL, ie up to the first occurrence of an ampersand. The code logic I have above has not changed in months and is working for other destination servers.
I might add that the other destination servers where this logic has no issues are http: sites not https:. But I have been reassured by the tech guys on the other end that it definitely has nothing to do with posting to a https site.
Please help. I hope I have outlined my issue clearly enough, and if not, please advise as to more info I can provide.
How do I get the filesize of js file on another website. I am trying to create a monitor to check that a js file exists and that it is more the 0 bytes.
For example on bar.com I would have the following code:
$filename = 'http://www.foo.com/foo.js';
echo $filename . ': ' . filesize($filename) . ' bytes';
You can use a HTTP HEAD request.
<?php
$url = "http://www.neti.ee/img/neti-logo.gif";
$head = get_headers($url, 1);
echo $head['Content-Length'];
?>
Notice: this is not a real HEAD request, but a GET request that PHP parses for its Content-Length. Unfortunately the PHP function name is quite misleading. This might be sufficient for small js files, but use a real HTTP Head request with Curl for bigger file sizes because then the server won't have to upload the whole file and only send the headers.
For that case, use the code provided by Jakub.
Just use CURL, here is a perfectly good example listed:
Ref: http://www.php.net/manual/en/function.filesize.php#92462
<?php
$remoteFile = 'http://us.php.net/get/php-5.2.10.tar.bz2/from/this/mirror';
$ch = curl_init($remoteFile);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); //not necessary unless the file redirects (like the PHP example we're using here)
$data = curl_exec($ch);
curl_close($ch);
if ($data === false) {
echo 'cURL failed';
exit;
}
$contentLength = 'unknown';
$status = 'unknown';
if (preg_match('/^HTTP\/1\.[01] (\d\d\d)/', $data, $matches)) {
$status = (int)$matches[1];
}
if (preg_match('/Content-Length: (\d+)/', $data, $matches)) {
$contentLength = (int)$matches[1];
}
echo 'HTTP Status: ' . $status . "\n";
echo 'Content-Length: ' . $contentLength;
?>
Result:
HTTP Status: 302
Content-Length: 8808759
Another solution. http://www.php.net/manual/en/function.filesize.php#90913
This is just a two step process:
Crawl the the js file and store it to a variable
Check if the length of the js file is greater than 0
thats it!!
Here is how you can do it in PHP
<?php
$data = file_get_contents('http://www.foo.com/foo.js');
if(strlen($data)>0):
echo "yay"
else:
echo "nay"
?>
Note: You can use HTTP Head as suggested by Uku but then if you are seeking for the page content if js file has content then you would have to crawl again :(
Using PHP and CURL (unless there is a better alternative then CURL in this case), is it possible to have a php function handle the header response before downloading the file?
For example:
I have a script that downloads and processes urls supplied by the user. I would like to add a check so that if the file is not valid for my process (not a text file, too large, etc),the CURL request would be cancelled before the server wastes time downloading the file.
Update: Solution
PEAR class HTTP_Request2: http://pear.php.net/package/HTTP_Request2/
Gives you the ability to set observers to the connection and throw exceptions to cancel anytime. Works perfectly for my needs!
Using cURL, do a HTTP HEAD request to check the headers, then if it is valid (the status is 200) do the full HTTP GET request.
The basic option you must set is CURLOPT_NOBODY, which changes the requested to the type HEAD
curl_setopt($ch, CURLOPT_NOBODY, true);
Then after executing the query, you need to check the return status which can be done using curl_getinfo()
$status = curl_getinfo($ch, CURLINFO_HTTP_CODE);
I know this is an old topic but just in case people comes here in the future.
With CURL, you can use CURLOPT_WRITEFUNCTION, who let's you place a callback that will be called as soon as the body response starts coming and needs to be written. In that moment you can read the headers and cancel the process and the body will not be downloaded. All in one request.
For a deeper look and code examples see PHP/Curl: inspecting response headers before downloading body
This is an example how you can solve it:
// Include the Auth string in the headers
// Together with the API version being used
$headers = array(
"Authorization: GoogleLogin auth=" . $auth,
"GData-Version: 3.0",
);
// Make the request
curl_setopt($curl, CURLOPT_URL, "http://docs.google.com/feeds/default/private/full");
curl_setopt($curl, CURLOPT_HTTPHEADER, $headers);
$response = curl_exec($curl);
curl_close($curl);
// Parse the response
$response = simplexml_load_string($response);
// Output data
foreach($response->entry as $file)
{
//now you can do what ever if file type is a txt
//if($file->title =="txt")
// do something
else
// do soething
echo "File: " . $file->title . "<br />";
echo "Type: " . $file->content["type"] . "<br />";
echo "Author: " . $file->author->name . "<br /><br />";
}