wkhtmltopdf failing to convert local pages to PDF

wkhtmltopdf failing to convert local pages to PDF - php

I've been trying to get wkhtmltopdf to convert pages on a website and it's failing to convert pages that are on the same. It'll convert and store external pages (tried it with google and bbc.co.uk, both worked) so the permissions are fine but if I try to convert a local page, either a static html file or one generated by a script, it takes around 3 minutes before failing.
The output says the page has failed to load, if forcibly ignore this, I end up with a blank PDF.
I thought it might be session locking but closing the session resulted in the same issue. I feel it's something down to the way the server may be behaving though
Here's the code in question:
session_write_close ();
set_time_limit (0);
ini_set('memory_limit', '1024M');
Yii::app()->setTheme("frontend");
// Grabbing the page name
$ls_url = Yii::app()->request->getHostInfo().Yii::app()->request->url;
// Let's remove the PDF otherwise we'll be in endless loop
$ls_url = str_replace('.pdf','',$ls_url);
// Setting paths
$ls_basePath = Yii::app()->basePath."/../extras/wkhtmltopdf/";
if(PHP_OS=="Darwin")
$ls_binary = $ls_basePath . "wkhtmltopdf-osx";
else
$ls_binary = $ls_basePath . "wkhtmltopdf";
$ls_generatedPagesPath = $ls_basePath . "generated-pages/";
$ls_outputFileName = str_replace(array("/",":"),"-",$ls_url)."--".date("dmY-His").".pdf";
$ls_outputFile = $ls_generatedPagesPath. $ls_outputFileName;
// making sure no nasty chars are in place
$ls_command = escapeshellcmd($ls_binary ." --load-error-handling ignore " . $ls_url . " " . $ls_outputFile);
// Let's run things now
system($ls_command);

did you lynx that exact url? since wkhtmltopdf is actually small but powerful webkit browser, it fails places just like a normal browser.
check the URL you gave, check external URLs within your page are accessible from your server. It loads CSS, external images, iframes, everything before it even starts making PDF.
Personally, I love wkhtmltpdf. nothing beats it.

Related

php large file download timeout

first time posting so sorry if I get anything wrong.
I'm trying to create a secure file download storefront. Actually it works, but only with small file. I have a 1.9gb product to download and it keeps stopping partway through the transfer. Inconsistent sizes too, I've had up to 1gb, but often it is 200-500mb.
The aim is to create a space where only users with a registered account can download the file, so direct link is not possible.
I've read elsewhere on this site that resetting the script timeout within the file read loop should get around the script time limit.
try
{
$num_bytes = filesize ("products/" . $filename);
$mp3content = fopen("products/" . $filename, "rb") or die("Couldn't get handle");
$bytes_read=0;
if ($mp3content) {
while (!feof($mp3content)) {
set_time_limit(30);
$buffer = fread($mp3content, 4096);
echo $buffer;
$bytes_read+=4096;
}
fclose($handle);
}
}
catch (Exception $e)
{
error_log("User failed to download file: " . $row['FILENAME'] . "(" . $row['MIMETYPE'] . ")\n" . $e, 1, getErrorEmail());
}
error_log("Bytes downloaded:" . $bytes_read . " of " . $num_bytes, 1, getErrorEmail());
I don't receive the final error log email on large files that fail, but I do get the emails on smaller files that succeed, so I know the code works in principle.

Turns out my hosting is the issue. The PHP code is correct, but my shared hosting environment limits all php scripts to 30 seconds, which in the case of the code above, takes about 15 minutes to run its course. Unless someone can come up with a way of keeping PHP tied up in file handling methods which don't contribute to the timer, looks like this one is stuck.

Try this one
set_time_limit(0);

I had the same problem so I thought of a different approach.
When file is requested, I make a hard link of the file in a random named directory inside the "download" folder and give the user the link for 4 hours.
File url finishes being like this.
http://example.com/downloads/3nd83js92kj29dmcb39dj39/myfile.zip
Every call to the script parses the "download" folders and delete all folders and their contents that have over 4 hours of creation time to keep the thing clean.
This is not safe for brut force attacks, but can be worked around.

PHP Restriction when using image tracking

Good day everyone!
Well here's the thing;
One .htaccess file, mod rewrite, to redirect imagename.png (non existing file) to tracker.php (real file). So when a user is looking at site.com/hello.png the user is actuall looking at /hello.php which gather information and stores it.
tracker.php
<?php
$date = date('d-m-Y');
$time = date('H:i:s');
$ip = $_SERVER['REMOTE_ADDR'];
$ref = #$_SERVER["HTTP_REFERER"];
header('Content-type: image/png');
echo gzinflate(base64_decode('6wzwc+flkuJiYGDg9fRwCQLSjCDMwQQkJ5QH3wNSbCVBfsEMYJC3jH0ikOLxdHEMqZiTnJCQAOSxMDB+E7cIBcl7uvq5rHNKaAIA'));
$myFile = "tr.txt";
$fh = fopen($myFile, 'a');
fwrite($fh, $myFile = $time ." | ". $date . " | " .$ip. " | " .$ref. " | \r\n\r\n");
fclose($fh);
?>
I am using it on my site to track visitors. Everything works fine, I could gather information about ip, webbrowser, ref-link etc.
But my question is, what are the restriction when doing this? I have been experimenting a long time and it seems like I could only use plain php (not echo "some other language").
I can not redirect or echo text. Loops, if/else, variables etc is working.
If I tries to redirect I could see that the page is attempting to connect to e.g. google but just for a second so there are no actual redirects.
tl;dr
What are the restriction in code when using a php file as an image?

I don't think there are any restrictions.
Your php code is executed and runs.
What you display doesn't restrict php.
You mentioned a redirect there. You're loading an image in a html page (generated by php or not, doesn't matter). The html page received it's headers from the server, and your php generated image also received it's headers (content-type image). Here is where the redirect would be.
There's no reason why your html page will ever redirect using a php headers function on your image.
If you load the php generated image directly in your browser, I think you'll be able to redirect the user (directly, without displaying the image), but as an image in a html page, it's not possible to affect the rest of the page, unless you use javascript.

Download a large XML file from an external source in the background, with the ability to resume download if incomplete

Some background information
The files I would like to download is kept at the external server for a week, and a new XML file(10-50mb large) is created there every hour with a different name. I would like the large file to be downloaded to my server chunk by chunk in the background each time my website is loaded, perhaps 0.5mb each time, and then resume the download the next time someone else loads the website. This would require my site to have atleast 100 pageloads each hour to stay updated, so perhaps abit more of the file each time if possible. I have researched simpleXML, XMLreader, SAX parsing, but whatever I do, it seems it takes too long to parse the file directly, therefore I would like a different approach, namely downloading it like described above.
If I download a 30mb large XML file, I can parse it locally with XMLreader in 3 seconds(250k iterations) only, but when I try to do the same from the external server limiting it to 50k iterations, it uses 15secs to read that small part, so it would not be possible to parse it directly from that server it seems.
Possible solutions
I think it's best to use cURL. But then again, perhaps fopen(), fsockopen(), copy() or file_get_contents() are the way to go. I'm looking for advice on what functions to use to make this happen, or different solutions on how I can parse a 50mb external XML file into a mySQL database.
I suspect a Cron job every hour would be the best solution, but I am not sure how well that would be supported by webhosting companies, and I have no clue how to do something like that. But if thats the best solution, and the majority thinks so, I will have to do my research in that area too.
If a java applet/javascript running in the background would be a better solution, please point me in the right direction when it comes to functions/methods/libraries there aswell.
Summary
What's the best solution to downloading parts of a file in the
background, and resume the download each time my website is loaded
until its completed?
If the above solution would be moronic to even try, what
language/software would you use to achieve the same thing(download a large file every hour)?
Thanks in advance for all answers, and sorry for the long story/question.
Edit: I ended up using this solution to get the files with cron job scheduling a php script. It checks my folder for what files I already have, generates a list of the possible downloads for the last four days, then downloads the next XMLfile in line.
<?php
$date = new DateTime();
$current_time = $date->getTimestamp();
$four_days_ago = $current_time-345600;
echo 'Downloading: '."\n";
for ($i=$four_days_ago; $i<=$current_time; ) {
$date->setTimestamp($i);
if($date->format('H') !== '00') {
$temp_filename = $date->format('Y_m_d_H') ."_full.xml";
if(!glob($temp_filename)) {
$temp_url = 'http://www.external-site-example.com/'.$date->format('Y/m/d/H') .".xml";
echo $temp_filename.' --- '.$temp_url.'<br>'."\n";
break; // with a break here, this loop will only return the next file you should download
}
}
$i += 3600;
}
set_time_limit(300);
$Start = getTime();
$objInputStream = fopen($temp_url, "rb");
$objTempStream = fopen($temp_filename, "w+b");
stream_copy_to_stream($objInputStream, $objTempStream, (1024*200000));
$End = getTime();
echo '<br>It took '.number_format(($End - $Start),2).' secs to download "'.$temp_filename.'".';
function getTime() {
$a = explode (' ',microtime());
return(double) $a[0] + $a[1];
}
?>
edit2: I just wanted to inform you that there is a way to do what I asked, only it would'nt work in my case. With the amount of data I need the website would have to have 400+ visitors an hour for it to work properly. But with smaller amounts of data there are some options; http://www.google.no/search?q=poormanscron

You need to have a scheduled, offline task (e.g., cronjob). The solution you are pursuing is just plain wrong.
The simplest thing that could possibly work is a php script you run every hour (scheduled via cron, most likely) that downloads the file and processes it.

You could try fopen:
<?php
$handle = fopen("http://www.example.com/test.xml", "rb");
$contents = stream_get_contents($handle);
fclose($handle);
?>

Serving php as css/js: Is it fast enough? What drawbacks are there?

I've recently started getting into the area of optimizing preformance and load times client side, compressing css/js, gzipping, paying attention to YSlow, etc.
I'm wondering, while trying to achieve all these micro-optimizations, what are the pros and cons of serving php files as css or javascript?
I'm not entirely sure where the bottleneck is, if there is one. I would assume that between an identical css and php file, the "pure" css file would be slightly faster simply because it doesn't need to parse php code. However, in a php file you can have more control over headers which may be more important(?).
Currently I'm doing a filemtime() check on a "trigger" file, and with some php voodoo writing a single compressed css file from it, combined with several other files in a defined group. This creates a file like css/groupname/301469778.css, which the php template catches and updates the html tags with the new file name. It seemed like the safest method, but I don't really like the server cache getting filled up with junk css files after several edits. I also don't bother doing this for small "helper" css files that are only loaded for certain pages.
If 99% of my output is generated by php anyways, what's the harm (if any) by using php to directly output css/js content? (assuming there are no php errors)
If using php, is it a good idea to mod_rewrite the files to use the css/js extension for any edge cases of browser misinterpretation? Can't hurt? Not needed?
Are there any separate guidelines/methods for css and javascript? I would assume that they would be equal.
Which is faster: A single css file with several #imports, or a php file with several readfile() calls?
What other ways does using php affect speed?
Once the file is cached in the browser, does it make a difference anymore?
I would prefer to use php with .htaccess because it is much simpler, but in the end I will use whatever method is best.

ok, so here are your direct answers:
no harm at all as long as your code is fine. The browser won't notice any difference.
no need for mod_rewrite. the browsers usually don't care about the URL (and often not even about the MIME type).
CSS files are usually smaller and often one file is enough, so no need to combine. Be aware that combining files from different directories affect images referenced in the CSS as they remain relative to the CSS URL
definitely readfile() will be faster as #import requires multiple HTTP requests and you want to reduce as much as possible
when comparing a single HTTP request, PHP may be slightly slower. But you loose the possibility to combine files unless you do that offline.
no, but browser caches are unreliable and improper web server config may cause the browser to unnecessarily re-fetch the URL.
It's impossible to give you a much more concrete answer because it depends a lot on your project details.

We are developing really large DHTML/AJAX web application with about 2+ MB of JavaScript code and they still load quickly with some optimizations:
try to reduce the number of Script URLs included. We use a simple PHP script that loads a bunch of .js files and sends them in one go to the browser (all concatenated). This will load your page a lot faster when you have a lot of .js files as we do since the overhead of setting up a HTTP connection is usually much higher that the actually transferring the content itself. Note that the browser needs to download JS files synchroneously.
be cache friendly. Our HTML page is also generated via PHP and the URL to the scripts contains a hash that's dependent on the file modification times. The PHP script above that combines the .js files then checks the HTTP cache headers and sets a long expiration time so that the browser does not even have to load any external scripts the second time the user visits the page.
GZIP compress the scripts. This will reduce your code by about 90%. We don't even have to minify the code (which makes debugging easier).
So, yes, using PHP to send the CSS/JS files can improve the loading time of your page a lot - especially for large pages.
EDIT: You may use this code to combine your files:
function combine_files($list, $mime) {
if (!is_array($list))
throw new Exception("Invalid list parameter");
ob_start();
$lastmod = filemtime(__FILE__);
foreach ($list as $fname) {
$fm = #filemtime($fname);
if ($fm === false) {
$msg = $_SERVER["SCRIPT_NAME"].": Failed to load file '$fname'";
if ($mime == "application/x-javascript") {
echo 'alert("'.addcslashes($msg, "\0..\37\"\\").'");';
exit(1);
} else {
die("*** ERROR: $msg");
}
}
if ($fm > $lastmod)
$lastmod = $fm;
}
//--
$if_modified_since = preg_replace('/;.*$/', '',
$_SERVER["HTTP_IF_MODIFIED_SINCE"]);
$gmdate_mod = gmdate('D, d M Y H:i:s', $lastmod) . ' GMT';
$etag = '"'.md5($gmdate_mod).'"';
if (headers_sent())
die("ABORTING - headers already sent");
if (($if_modified_since == $gmdate_mod) or
($etag == $_SERVER["HTTP_IF_NONE_MATCH"])) {
if (php_sapi_name()=='CGI') {
Header("Status: 304 Not Modified");
} else {
Header("HTTP/1.0 304 Not Modified");
}
exit();
}
header("Last-Modified: $gmdate_mod");
header("ETag: $etag");
fc_enable_gzip();
// Cache-Control
$maxage = 30*24*60*60; // 30 Tage (Versions-Unterstützung im HTML Code!)
$expire = gmdate('D, d M Y H:i:s', time() + $maxage) . ' GMT';
header("Expires: $expire");
header("Cache-Control: max-age=$maxage, must-revalidate");
header("Content-Type: $mime");
echo "/* ".date("r")." */\n";
foreach ($list as $fname) {
echo "\n\n/***** $fname *****/\n\n";
readfile($fname);
}
}
function files_hash($list, $basedir="") {
$temp = array();
$incomplete = false;
if (!is_array($list))
$list = array($list);
if ($basedir!="")
$basedir="$basedir/";
foreach ($list as $fname) {
$t = #filemtime($basedir.$fname);
if ($t===false)
$incomplete = true;
else
$temp[] = $t;
}
if (!count($temp))
return "ERROR";
return md5(implode(",",$temp)) . ($incomplete ? "-INCOMPLETE" : "");
}
function fc_compress_output_gzip($output) {
$compressed = gzencode($output);
$olen = strlen($output);
$clen = strlen($compressed);
if ($olen)
header("X-Compression-Info: original $olen bytes, gzipped $clen bytes ".
'('.round(100/$olen*$clen).'%)');
return $compressed;
}
function fc_compress_output_deflate($output) {
$compressed = gzdeflate($output, 9);
$olen = strlen($output);
$clen = strlen($compressed);
if ($olen)
header("X-Compression-Info: original $olen bytes, deflated $clen bytes ".
'('.round(100/$olen*$clen).'%)');
return $compressed;
}
function fc_enable_gzip() {
if(isset($_SERVER['HTTP_ACCEPT_ENCODING']))
$AE = $_SERVER['HTTP_ACCEPT_ENCODING'];
else
$AE = $_SERVER['HTTP_TE'];
$support_gzip = !(strpos($AE, 'gzip')===FALSE);
$support_deflate = !(strpos($AE, 'deflate')===FALSE);
if($support_gzip && $support_deflate) {
$support_deflate = $PREFER_DEFLATE;
}
if ($support_deflate) {
header("Content-Encoding: deflate");
ob_start("fc_compress_output_deflate");
} else{
if($support_gzip){
header("Content-Encoding: gzip");
ob_start("fc_compress_output_gzip");
} else{
ob_start();
}
}
}
Use files_hash() to generate a unique hash string that changes whenever your source files change and combine_files() to send the combined files to the browser. So, use files_hash() when generating the HTML code for the tag and combine_files() in the PHP script that is loaded via that tag. Just place the hash in the query string of the URL.
<script language="JavaScript" src="get_the_code.php?hash=<?=files_hash($list_of_js_files)?>"></script>
Make sure you specify the same $list in both cases.

You're talking about serving static files via PHP, there's really little point doing that since its always going to be slower than Apache serving a normal file. A CSS #import will be quicker that PHP's readfile() but the best performance will be gained by serving one minified CSS file that combines all the CSS you need to use.
If sounds like you're on the right track though. I'd advise pre-processing your CSS and saving to disk. If you need to set special headers for things like caching just do this in your VirtualHost directive or .htaccess file.
To avoid lots of cached files you could use a simple file-naming convention for your minified CSS. For example, if your main CSS file called main.css and it references reset.css and forms.css via #imports, the minified version could be called main.min.css
When this file is regenerated it simply replaces it. If you include a reference to that file in your HTML, you could send the request to PHP if the file doesn't exist, combine and minify the file (via something like YUI Compressor), and save it to disk and therefore be served via normal HTTP for all future requests.
When you update your CSS just delete the main.min.css version and it will automatically regenerate.

You can do the preprocessing with an ANT Build. Sorry, the post is german, but I've tried translate.google.com and it worked fine :-) So you can use the post as tutorial to achieve a better performance...
I would preprocess the files and save them to disk, just like simonrjones said. Caching-stuff etc. should be done by the dedicated elements, like Apache WebServer, Headers and Browser.

While slower, one advantage / reason you might have to do this is to put dynamic content into the files on the server, but still have them appear to be js or css from the client perspective.
Like this for example, passing the environment from php to javascript:
var environment = <?=getenv('APPLICATION_ENV');?>
// More JS code here ...

Creating files on a time (hourly) basis

I experimenting with twitter streaming API,
I use Phirehose to connect to twitter and fetch the data but having problems storing it in files for further processing.
Basically what I want to do is to create a file named
date("YmdH")."."txt"
for every hour of connection.
Here is how my code looks like right now (not handling the hourly change of files)
public function enqueueStatus($status)
$data = json_decode($status,true);
if(isset($data['text'])/*more conditions here*/) {
$fp = fopen("/tmp/$time.txt");
fwirte ($status,$fp);
fclose($fp);
}
Help is as always much appreciated :)

You want the 'append' mode in fopen - this will either append to a file or create it.
if(isset($data['text'])/*more conditions here*/) {
$fp = fopen("/tmp/" . date("YmdH") . ".txt", "a");
fwrite ($status,$fp);
fclose($fp);
}

From the Phirehose googlecode wiki:
As of Phirehose version 0.2.2 there is
an example of a simple "ghetto queue"
included in the tarball (see file:
ghetto-queue-collect.php and
ghetto-queue-consume.php) that shows
how statuses could be easily collected
on to the filesystem for processing
and then picked up by a separate
process (consume).
This is a complete working sample of doing what you want to do. The rotation time interval is configurable too. Additionally there's another script to consume and process the written files too.
Now if only I could find a way to stop the whole sript, my log keeps filling up (the script continues execution) even if I close the browser tab :P

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.