Viewing large text file in a browser

Viewing large text file in a browser - php

I need to write a text file viewer (not the directory tree, but the actual file contents) for use in a browser. It will be used to view large files. I want to give the user the ability to actually ummm, browse the file, ie prev page & next page buttons, while each page will show only a portion of the file.
Two question:
Is there anyway to pass the file descriptor through POST (or something) so that on each page I can keep reading from an already open file, and not starting all over again (again - huge files)
Is there a way to read the file backwards? Will be very useful for browsing back in a file.
Any other implementation ideas are very welcome. Thanks

Keeping the file open between requests is not a good idea - you don't have to "start all over again" - just maintain an offset and use fseek() to jump to that offset. That way, you can also implement the "backwards jumping".

Cut your huge files into smaller files once, and then serve the small files to the user.

You should consider pagination. If you're concerned about the user being frustrated by needing to click "next" too often, you could make each chunk reasonably large (so a normal reader pages every 20min).
Another option is the Chunked-Endoding transfer type: Wikipedia Entry. This would allow your server to respond quickly and give the user something to read while it streams the rest of the file over the network (rather than the server needing to read in the file and send it all at once). This could dramatically improve the perceived performance compared to serving the files normally, but still consumes a lot of bandwidth for your server.
You might be able to simulate a large document with Javascript and AJAX, but only send pieces at a time for better performance.
Consider sending a few pages worth of your document and attaching listeners to the scroll event of your browser. Over time or as the user scrolls down you AJAX more chunks. This creates a few annoying UX edge cases, like:
Scroll bar indicates a much smaller document than there actually is
You might be able to avoid this by filling in the bottom of your document with many page breaks, but it'll be difficult to make the length perfect.
Scrolling past the point of currently-available content will show a blank page.
You could detect this using JavaScript and display a "loading" icon to let the user know what's going on.
Built-in "find" feature doesn't work
Hard to avoid this without the user downloading the entire document, but you could provide your own search feature for them to use instead (not as good but perhaps adequate).
Really though, you're probably best off with pagination with medium-sized pages. It's a very well understood design pattern that's a relatively easy (compared to other options at least) to implement and make fast.
Hope that helps!

Related

PHP include (JavaScript and CSS files)

If JavaScript and CSS files were included inside of pages it would cut down the number of http requests and therefore make the page load faster. I feel like I am missing something because it seems like any organization interested in lightning-quick pages would do this. However, I don't recall any sites having tons of CSS and JavaScript into their pages as I look at the source code.
Questions:
What errors are in my statements above?
What are the drawbacks of this approach (shown in the title via psuedocde)?

If the data is in an external file it can be cached and reused on other pages (or the same page, revisited) without having to fetch it over the network again.
You get a minor performance penalty on the first page in exchange for a major performance enhancement on subsequent pages.

Modularity is a major concern:
I can pick and choose which javascript and css files I want per page: otherwise I'd have a ton of css and javascript files that have all the different configurations (which is just messy).
I can also cache a file and hand it to someone else faster
Where you will find an example of this happening is when sites chuck their images together into one png file and then use css to slice up the bits they want for buttons etc.

Another aspect not only for inline css and jscript. When I write code I hate to repeat. It leads to errors is difficult to maintain (update/edit) and a waste of time and space. Printing CSS or jscript once in a file that gets downloaded once is less error prone, easy to maintain and less waste of time and space.

Linking an image to a PHP file

Here's a bit of history first: Recently finished an application that allows me to upload images and store them in a directory, it also stores the information of that file in a database. Database stores the location, name and gives it an ID (auto_increment).
Okay, so what I'm doing now is allowing people to insert images into posts. Throwing a few ideas around on the best way to do this, as the application I designed allows people to move files around, and I don't want images in posts to break if an image is moved to a different directory (hence the storing of IDs).
What I'm thinking of doing is when linking to images, instead of linking to the file directly, I link it like so:
<img src="/path/to/functions.php?method=media&id=<IMG_ID_HERE>" alt="" />
So it takes the ID, searches the database, then from there determines the mime type and what not, then spits out the image.
So really, my question is: Is this the most efficient way?
Note that on a single page there could be from 3 to 30 images, all making a call to this function.

Doing that should be fine as long as you are aware of your memory limitations configured by both PHP and the web server. (Though you'll run into those problems merely by receiving the file first)
Otherwise, if you're strict about this being just for images, it could prove more efficient to go with Mike B's approach. Design a static area and just drop the images off in there, and record those locations in the records for their associated post. It's less work, and less to worry about... and I'm willing to bet your web server is better at serving files than most developer's custom application code will be.

Normally, I would recommend keeping the src of an image static (instead of a php script). But if you're allowing users to move them around the filesystem you need a way to track them
Some form of caching would help reduce the number of database calls required to fetch the filesystem location of each image. Should be pretty easy to put an indefinite TTL on the cache and invalidate upon the image being moved.

I don't think you should worry about that, what you have planned sounds fine.
But if you want to go out of your way to minimise requests or whatever, you could instead do the following: when someone embeds an image in a post, replace the anchor tag with some special character sequence, like [MYIMAGE=1234] or something. Then when a page with one or more posts is viewed, search through all the posts to find all the [MYIMAGE=] sequences, query the database to get all of the images' locations, and then output the posts with the [MYIMAGE=] sequences replaced with the appropriate anchor tags. You might or might not want to make sure users cannot directly add [MYIMAGE=] tags to their submitted content.

The way you have suggested will work, and it's arguably the nicest solution, but I should warn you that I've tried something similar before and it completely fell apart under load. The database seemed to be keeping up, but the script would start to time out and the image wouldn't arrive. That was probably down to some particular server configuration, but it's worth bearing in mind.
Depending on how much access you have to the server it's running on, you could just create a symlink whenever the user moves a file. It's a little messy but it'll be fast and reliable, and will also handle collisions if a user moves a file to where another one used to be.

Use the format proposed by Hammerite, and use [MYIMAGE=1234] tags (or something similar).
You can then fetch the id-path mappings before display, and replace the [MYIMAGE] tags with proper tags which link to images directly. This will yield much better performance than outputting images using php.
You could even bypass the database completely, and simply use image paths like (for example) /images/hash(IMAGEID).jpg.
(If there are different file formats, use [MYIMAGE=1234.png], so you can append png/jpg/whatever without a database call)
If the need arises to change the image locations, output method, or anything else, you only need to change the method where [MYIMAGE] tags are converted to full file paths.

How to properly preload images, js and css files?

I'm creating a website from scratch and I was really into this in the late 90's but the web has changed alot since then! And I'm more of a designer so when I started putting this site together, I basically did a system of php includes to make the site more "dynamic"
When you first visit the site, you'll be presented to a logon screen, if you're not already logged on (cookies). If you're not logged on, a page called access.php is introdused.
I thought I'd preload the most heavy images at this point. So that when the user is done logging on, the images are already cached. And this is working as I want. But I still notice that the biggest image still isn't rendered immediatly anyway. So it's seems kinda pointless.
All of this has made me rethink how the site is structured and how scripts and css files are loaded. Using FireBug and YSlow with Firefox I see a few pointers like expires headers and reducing the size of each script. But is this really the culprit?
For example, would this be really really stupid in the main index.php? The entire site is basically structured like this
<?php
require("dbconnect.php");
?>
<?php
include ("head.php");
?>
And below this is basically just the body and the content of the site.
Head.php however consists of the doctype, head portions, linking of two css style sheets, jQuery library, jQuery validation engine, Cufon and Cufon font file, and then the small Cufon.Replace snippet.
The rest of the body comes with the index.php file, but at the bottom of this again is an include of a file called "footer.php" which basically consists of loading of a couple of jsLoader scripts and a slidepanel and then a js function.
All of this makes the end page source look like a typical complete webpage, but I'm wondering if any of you can see immediatly that "this is really really stupid" and "don't do that, do this instead" etc. :) Are includes a bad way to go?
This site is also pretty image intensive and I can probably do a little more optimization.
But I don't think that's its the primary culprit. YSlow gives me a report of what takes up the most space:
doc(1) - 5.8K
js(5) - 198.7K
css(2) - 5.6K
cssimage(8) - 634.7K
image(6) - 110.8K
I know it looks like it's cssimage(8) that weighs the most, but I've already preloaded these images from before and it doesn't really affect the rendering.

To speed a little, you could assemble all your images on the same image sprite, so that you have only 1 request to download all the images. But that requires you to fine tune your css to let display just the small subset of your image.
To have a better explanation, check out : http://css-tricks.com/css-sprites/
Another answer that could seem a little stupid but I like to think of this when I make a website : Just Keep It Simple. I mean do all your JS add real value, do all this images are fine, could you display less, make a lighter design ? I'm not criticizing your work at all, just suggest you...

I used the following approach on an extranet project:
Using jQuery and a array of file names, I ajax in all the images, .js and .css files so that they are preloaded in the cache. As I iterate through the array, I update a progress bar on the screen that indicates that the site is loading - much like a flash loader.
It worked well.

What I will do is show by default the loading page with pure CSS and HTML then wait for the jQuery to load and preload the images with ImageLoader. Once you are done redirect to the normal website since the images will be already in the cache they won't be loaded again.
Another optimization you can do is minify all JS files and combine all except the jquery.js. Put the jquery.js first into your HTML so it loads first. Also put your SCRIPT tags at the bottom of the HTML.

It sounds like you have pretty much nailed preloading, if you have loaded it once, and the expiry header is set correctly, you have preloaded it, no matter what kind of content it is.
File combination can be key to a quick website, each extra file will add load time, in the worst cases of network and server lag you might add up to a second extra for each separate file. More commonly it will be around 100 - 200 milliseconds per file.
If not already minified, minify the scripts, and put them in the same file, just remember to keep the order. I have no idea why Ivo Sabev wouldn't include jQuery.
Same thing with the CSS files.
How much have you done about testing image compression? There can really be a gain from trying out different compression settings and comparing size vs. quality. For PNG images IrfanView with PNGOUT can often make files 25% smaller than other programs, on top of that, a very big gain in size reduction can be achieved by reducing the image to 8 bit colour, with a lot of graphic elements you simply can't tell the difference. Right here on Stack Overflow there is a great example of well compressed and stacked images in the editor control buttons: http://sstatic.net/so/Img/wmd-buttons.png

How can I speed up image load time in my web site?

I am currently developing a web site with PHP + MySQL and jQuery. So far I have been doing it in my local machine. I notice that when I see the page the images on it take some time to load (few time but its visible). All images are small (PNG's with less than 3 KB). Now, when I load the page, there are some database connections happening in order to get some data that I will display.
I am not sure if this loading time issue has something to do with the amount of images, or with the time that the PHP script + the DB connections take to execute. (I have very little data in my database so I wouldn't assume this case.)
My question is: Is it a good approach to preload all the images in the beginning of each page? I tried it with jQuery and it works fine. I'm just not sure which disadvantages I can get with it. For example, to do so, I need to include the jQuery library in the beginning of the page? I thought it was a bad practice.

If these PNGs are stored in the database as BLOBs — not clear from your question — don't do that. Serving images from a DB through PHP is not as efficient as letting the web server serve them straight from the filesystem. If the images are tied to particular records, just name the PNG after the row ID, so you can find it in a directory dedicated to storing those images. The PHP code then just generates the URL that points to the PNG file on disk, so the web server can send them statically.
I don't think preloading the images within the same page is going to buy you anything. If anything, it might slow the apparent overall page load time because the browser can only retrieve a fixed number of resources concurrently, typically 2-4. Loading images at the top of the <body> means there are other things at the top of the page "above the fold" that have to wait for some HTTP connection slots to free up. Better to let the images load in their natural order.
Preloading makes sense in two situations:
The image isn't shown by default, but is expected to be needed as the user interacts with the page. Good examples of this are the hover and click state images for rollovers.
The image isn't used on this page, but will be needed on the next. Good examples of this are any site where there is a clear progression from one page to the next, like in a shopping cart.
Either way, do the preload at the very bottom of the <body>, so everything else loads first.
Having addressed those two issues, run YSlow on your site. It started out as a plugin for Firebug, which in turn is a plugin for Firefox, but it's since been ported to all major browsers except IE.
The beauty of YSlow is that it detects common slowdowns automatically, just by loading the page while the extension is active. It then gives you a clear grade for the page, so you can judge when you're "done" optimizing. If you're below an A, you're not done yet. :) It's not uncommon to see sites rating a D or worse, because the default configuration for web servers is conservative to avoid causing problems. Fixing YSlow warnings is generally pretty easy, but you have to be careful to avoid creating caching and other problems, which is why the default server config doesn't do these things.
Another answer recommended the Google PageSpeed offering. It's available as a plugin for Chrome and Firefox, as a server-side Apache module, and as a Google-hosted service. I have no idea how it compares to YSlow, but it looks interesting.
Also consider using the browser's debugger to get a waterfall graph of resource load times:
In Firebug you get this in the Net tab.
In Safari, you get to it via the Develop menu, which is normally disabled in Preferences. Turn it on if needed, then say Develop > Start Timeline Recording. That puts you into the Network Requests instrument. You can also get to it through Develop > Show Web Inspector.
In Chrome, say View > Developer > Developer Tools, then go to the Network tab.
IE has a very weak form of this, via Tools > Developer Tools > Profiler. It just gives a table of numbers, rather than a waterfall graph, so the information is there, but you can't just visually scan for long bars to find the slowest resources.

You should use page speed plugin from google to check what data takes most of the time to load. It will show separate load times for images as well.
If you're using lots of small pngs I suggest you combining them into one image and manipulating the display via css background property since they are part of styling and not information. In that case - instead of few images only one will be loaded and reused through all elements. In this case even preloading of one image is really easy.

Have you considered using CSS Sprites to combine all of your images into a single download? There are a number of tools online to help you do this, and it will significantly reduce the number of HTTP requests required by your page.
Make sure you have set the correct Expires header on your images to allow them to be cached.
Finally, take a look at YSlow which can provide you with futher optimisation tips.

Displaying Large Text Files via Ajax/dojo

I want to display to a user a large text file (100MB Log Files specifically) via a web interface without requiring the user to have to download the entire file. Obviously returning the entire file to someones web browser would not be sensible, so my theory was to used Ajax to fetch portions of the file depending on the user scrolling through the file, similar to the way Google Maps provides a "window" of the map.
My application server is PHP, and I fairly sure I can perform the appropriate seeks and reads through the file and return the results via XHR to application, but my Ajax framework is dojo and I can't think of any standard dijit that would work and am trying to figure out how best it would be to impliment something.
Should I derive my own widget? Is there already something out there that I am not aware of? If I build my own custom widget, what sort of structure should it take and are there any good resources for developing custom widgets for dojo/dijit? Any other thoughts?

This seems to be a tut on what you might need I would suggest that you use an li, because you will end up wanting to perform some actions on each line, most likely each line will be relevant.
Scrolling is nice, but you can also just blit the interface with pagination, meaning they click next page, previous page, and you fetch it, then update the view. That's the easiest method. With scrolling, you'll need to get more above and below the current visible lines for seamless scrolling.
For instance, if you want to show 25 lines, you'll need to fetch 25 + bottom pad on the first go, and define the lines showing in bottom pad as the threshold for signalling a new event to download an extra 25+ bottom pad items.
With a 100mb file, that's gonna get sluggish soon, so you'll have to clear out the previous entries, and define a new top pad to signal a request to get the reverse. That is to say, 1st req: fetch 25 + bottom pad, 2nd req fetch 25 + bottom pad remove prev 25 - top pad.
One thing to note is, when you do this, in firefox at least, it can tend to get wonky and not fire events after a few loads, so you may want to unbind/rebind your even listeners. I only say this because I have a friend who is currently working on something with similar functionality, and these are some of the issues he came across.
No one is going to complain that they have to click next page/previous page, it'll be fast and clean, but mess up your scrolling and no one will want to use your widget.
Here are some other resources on the topic: Old Ajax Scrollable Table -Twitter like load more tut - Good scrolling example, read the source - Check out this googlecode project

I recommend caching.
It should be noted that the solution to this problem should take into account that reading a sufficiently large file (100mb+) from disk is going to be disk bound and likely to outrun any timeout that your web server has set for script execution time. In order to avoid making the user wait an inordinate amount of time to load any portion of the file I would avoid hacks like changing your server's timeout limits.
Here's one possible solution that comes to mind:
1) Cache the file by chopping it up into separate files. You can easily do this in a cron job or even trigger it when the file is written. Use readfile_chunked (http://cn2.php.net/manual/en/function.readfile.php#48683) or similar.
2) Write a service handler script that when invoked from the browser (say './readfile?chunk=##') returns the requested chunk.
3) Use a pagination widgit or a scroller as suggested by the other contributor to make the call to the service handler via AJAX.
Cons: This will inevitably increase the amount of disk space. Pros: Happy users as disk access will be optimized and so will script execution time. Also, it scales well. (on the order of O(n)).

Have you considered using Dojo Grid for viewing logs? It has built-in support for dynamic loading of 'pages' i.e. rows of data.

If the log file is a text file with a consistent line ending, maybe you can fetch it by line number.
I have idea with the algorithm like this:
When page loaded, fetch first 100 line from file. put it in some container, maybe a div, textarea, or using <ul><li>
Put an event handler to know that user have scrolling to the last part of container.
Send AJAX request to get next 100 lines from the file. Pass ithe line offset as parameter (GET or URI Parameter) so the PHP script can get the right part of the file
Put the AJAX response to the end of container, update next AJAX request offset.
If no more lines in file left, return an empty response. AJAX handler should consider this as end of file so will remove event handler in step 2 above.
I don't know much about Dojo. I use jquery tools's scrollable in my application. It's easy to put an event handler when the scroller reach last page, then fetch next item.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.