Displaying Large Text Files via Ajax/dojo

Displaying Large Text Files via Ajax/dojo - php

I want to display to a user a large text file (100MB Log Files specifically) via a web interface without requiring the user to have to download the entire file. Obviously returning the entire file to someones web browser would not be sensible, so my theory was to used Ajax to fetch portions of the file depending on the user scrolling through the file, similar to the way Google Maps provides a "window" of the map.
My application server is PHP, and I fairly sure I can perform the appropriate seeks and reads through the file and return the results via XHR to application, but my Ajax framework is dojo and I can't think of any standard dijit that would work and am trying to figure out how best it would be to impliment something.
Should I derive my own widget? Is there already something out there that I am not aware of? If I build my own custom widget, what sort of structure should it take and are there any good resources for developing custom widgets for dojo/dijit? Any other thoughts?

This seems to be a tut on what you might need I would suggest that you use an li, because you will end up wanting to perform some actions on each line, most likely each line will be relevant.
Scrolling is nice, but you can also just blit the interface with pagination, meaning they click next page, previous page, and you fetch it, then update the view. That's the easiest method. With scrolling, you'll need to get more above and below the current visible lines for seamless scrolling.
For instance, if you want to show 25 lines, you'll need to fetch 25 + bottom pad on the first go, and define the lines showing in bottom pad as the threshold for signalling a new event to download an extra 25+ bottom pad items.
With a 100mb file, that's gonna get sluggish soon, so you'll have to clear out the previous entries, and define a new top pad to signal a request to get the reverse. That is to say, 1st req: fetch 25 + bottom pad, 2nd req fetch 25 + bottom pad remove prev 25 - top pad.
One thing to note is, when you do this, in firefox at least, it can tend to get wonky and not fire events after a few loads, so you may want to unbind/rebind your even listeners. I only say this because I have a friend who is currently working on something with similar functionality, and these are some of the issues he came across.
No one is going to complain that they have to click next page/previous page, it'll be fast and clean, but mess up your scrolling and no one will want to use your widget.
Here are some other resources on the topic: Old Ajax Scrollable Table -Twitter like load more tut - Good scrolling example, read the source - Check out this googlecode project

I recommend caching.
It should be noted that the solution to this problem should take into account that reading a sufficiently large file (100mb+) from disk is going to be disk bound and likely to outrun any timeout that your web server has set for script execution time. In order to avoid making the user wait an inordinate amount of time to load any portion of the file I would avoid hacks like changing your server's timeout limits.
Here's one possible solution that comes to mind:
1) Cache the file by chopping it up into separate files. You can easily do this in a cron job or even trigger it when the file is written. Use readfile_chunked (http://cn2.php.net/manual/en/function.readfile.php#48683) or similar.
2) Write a service handler script that when invoked from the browser (say './readfile?chunk=##') returns the requested chunk.
3) Use a pagination widgit or a scroller as suggested by the other contributor to make the call to the service handler via AJAX.
Cons: This will inevitably increase the amount of disk space. Pros: Happy users as disk access will be optimized and so will script execution time. Also, it scales well. (on the order of O(n)).

Have you considered using Dojo Grid for viewing logs? It has built-in support for dynamic loading of 'pages' i.e. rows of data.

If the log file is a text file with a consistent line ending, maybe you can fetch it by line number.
I have idea with the algorithm like this:
When page loaded, fetch first 100 line from file. put it in some container, maybe a div, textarea, or using <ul><li>
Put an event handler to know that user have scrolling to the last part of container.
Send AJAX request to get next 100 lines from the file. Pass ithe line offset as parameter (GET or URI Parameter) so the PHP script can get the right part of the file
Put the AJAX response to the end of container, update next AJAX request offset.
If no more lines in file left, return an empty response. AJAX handler should consider this as end of file so will remove event handler in step 2 above.
I don't know much about Dojo. I use jquery tools's scrollable in my application. It's easy to put an event handler when the scroller reach last page, then fetch next item.

Related

Some kind of php flush

Let me describe what I've made ar first:
I have to import large ammount of data from different xml's to my database and because it last a lot I had to put a progress bar and I did it like this: I split the whole import into tiny little AJAX requests and I import little data at a time (when an ajax request completes the progress bar increases a bit). This whole idea is great but the data just keeps getting bigger and bigger and I can't optimize the code anymore (it's as optimized as it gets).
The problem is that everytime I do a AJAX call I lose a lot of time with things specific to the framework (model initializations and stuff), with the browser handling the url and so on. So I was wondering if I could use the flush function from php.
But I've been reading that the flush function doesn't work great on all browsers (which is weird cause it's a server-side function). If I would use the flush function I would just write <script>increase_progressbar</script> or whatever I want and I could do it.
So, any opinions on the flush function? I've been testing it on little scripts but I want to know if someone really used it with big scripts. Also, I can listen to any other suggestion of doing what I want to do :)

I wont give you direct advise, but I will tell you how I did it in one of my projects. In my case I need to upload an Excel files and then parse them. The data exceeding 3000 rows and I had to check all columns of each row for some data. When I parse it directly after the upload, the parser often crashes somewhere and it was really not safe.
So how I did it? The upload process has been split in 2 parts:
Upload physically the file (regular upload field and submit). When the button is clicked some CSS and JS "magic" hide the form and one nice loading bar appears on the screen. When the upload has been done the page just refreshes and the form appear again for the next file
Start parsing the data on the background using php-cli as #Dragon suggest with exec().
In the database I had a table which stores information about the files and there is a boolean field called "parsed". When the parser finishes the job, the last task is to update that field to true.
So here is the whole process from user point of view:
select a file and upload it.
wait until the file has been uploaded on the server. Till then a message and loading bar appear indicating that something is working. The upload form has been hidden with CSS and JS, so preventing user to upload another file.
When it's over the page has been refreshed (because I did normal _POST submit) the form appear on the screen again as well as a list of recently uploaded files (this I've stored this in the session).
In each of the nodes of that list I had an indicator (an icon). In the first time it's a spinner (ajax spinning wheel).
On a regular basis (30 sec or 1 min) I've checked the file table through Ajax call and reading the parsed field. If the background process has been over, the field was set to true and with some JS and CSS I've changed the icon to "Done". Otherwise the spinner remain.
In my project I doesn't have requirement to show extra details about the imports, but you can always go wild with other extra data.
Hope this help you with your project.

Best way to display large PDF for mobile (web development) site/usage

I need to display a PDF with over 100 pages (yeah it's huge) in a mobile website. This is not going to work and I need to come up with an alternative solution.
I'm using PHP as the backend and jQueryMobile lib as the mobile framework.
I was thinking to convert the PDF to html but then there is the question about how to load each page or load all the pages before hand (as a multi page layout w/ JQM can do).
I was thinking about having each page a AJAX request and loading it this way but wanted to get a fresh point of view on any alternative methods/ideas.
I'm also concerned about connection speed as mobile devices vary from Edge, 3G, 4G, wireless, etc... so making the load time(s) as fast as possible is a must.
UPDATE:
The PDF file size is around 9MB, yeah it's not a lot but for a mobile browser over a slow connection speed this will take some time to display and that's if it doesn't timeout first
Thanks in advance.

I would do something like this, as you're not really providing a PDF (which may require bookmarks, graphics, navigation links, and other complex content), but simply need to load a lot of content and make sure the request isn't so large that the page times out.
Concerns:
Large file size and timeouts
Must serve entire agreement
Step one
Make each page as its own raw-text component. You can do this by either making individual files with each page, or have one PHP script serve out individual pages of the content. Google is littered with PHP that translates PDF into text.
For our purposes, you have one PHP file that serves out content, in this method:
./agreement.php?page=1
Step two
Request content into a div, use recursion to make it load the next page after the first one successfully loads (not before, or later. Asynchronous can be messy like that.).
//Untested code
function loadPages(n) {
$.ajax({
url: './agreement.php',
data: {'page':n},
error : function (){ loadPages(n) }, //Try to load again
success: function (pageData) {
n++;
$('#agreementBox').append(pageData);//Puts data into document
if (n > 100) {return 0;} //Exit recursion on 100th page being appended
loadPages(n); //Recursive call, loads next page after this one has been loaded
}
});
}
loadPages(1); //Call function for the first time.
Concerns
Keep it to simple text, 100 pages of graphical data in the same rendering may become memory-intensive.
Your ajax request may fail for any number of reasons, the error re-try I put in won't cover off all of them, and may go on forever, in cases such as the php script crashes.
Users might hit agree before agreement is done loading.
Style. I use 'magic numbers' in my example, where 100 pages is a hard-coded value and not a constant, don't do that when you go to implement this.

Can you get a specific xml value without loading the full file?

I recently wrote a PHP plugin to interface with my phpBB installation which will take my users' Steam IDs, convert them into the community ids that Steam uses on their website, grab the xml file for that community id, get the value of avatarFull (which contains the link to the full avatar), download it via curl, resize it, and set it as the user's new avatar.
In effect it is syncing my forum's avatars with Steam's avatars (Steam is a gaming community/platform and I run a gaming clan). My issue is that whenever I am reading the value from the xml file it takes around a second for each user as it loads the entire xml file before searching for the variable and this causes the entire script to take a very long time to complete.
Ideally I want to have my script run several times a day to check each avatarFull value from Steam and check to see if it has changed (and download the file if it has), but it currently takes just too long for me to tie up everything to wait on it.
Is there any way to have the server serve up just the xml value that I am looking for without loading the entire thing?
Here is how I am calling the value currently:
$xml = #simplexml_load_file("http://steamcommunity.com/profiles/".$steamid."?xml=1");
$avatarlink = $xml->avatarFull;
And here is an example xml file: XML file

The file isn't big. Parsing it doesn't take much time. Your second is wasted mostly for network communication.
Since there is no way around this, you must implement a cache. Schedule a script that will run on your server every hour or so, looking for changes. This script will take a lot of time - at least a second for every user; several seconds if the picture has to be downloaded.
When it has the latest picture, it will store it in some predefined location on your server. The scripts that serve your webpage will use this location instead of communicating with Steam. That way they will work instantly, and the pictures will be at most 1 hour out-of-date.
Added: Here's an idea to complement this: Have your visitors perform AJAX requests to Steam and check if the picture has changed via JavaScript. Do this only for pictures that they're actually viewing. If it has, then you can immediately replace the outdated picture in their browser. Also you can notify your server who can then download the updated picture immediately. Perhaps you won't even need to schedule anything yourself.

You have to read the whole stream to get to the data you need, but it doesn't have to be kept in memory.
If I were doing this with Java, I'd use a SAX parser instead of a DOM parser. I could handle the few values I was interested in and not keep a large DOM in memory. See if there's something equivalent for you with PHP.

SimpleXml is a DOM parser. It will load and parse the entire document into memory before you can work with it. If you do not want that, use XMLReader which will allow you to process the XML while you are reading it from a stream, e.g. you could exit processing once the avatar was fetched.
But like other people already pointed out elsewhere on this page, with a file as small as shown, this is likely rather a network latency issue than an XML issue.
Also see Best XML Parser for PHP

that file looks small enough. It shouldn't take that long to parse. It probably takes that long because of some sort of network problem and the slowness of parsing.
If the network is your issue then no amount of trickery will help you :(.
If isn't the network then you could try a regex match on the input. That will probably be marginally faster.
Try this expression:
/<avatarFull><![CDATA[(.*?)]]><\/avatarFull>/
and read the link from the first group match.
You could try the SAX way of parsing (http://php.net/manual/en/book.xml.php) but as i said since the file is small i doubt it will really make a difference.

You can take advantage of caching the results of simplexml_load_file() somewhere like memcached or filesystem. Here is typical workflow:
check if XML file was processed during last N seconds
return processing results on success
on failure get results from simplexml
process them
resize images
store results in cache

how can I use javascript or php to determine when images are loaded if image tags are dynamically created from database?

I have a page that's very image-intensive. This is by client request - it's a list of merchants with a logo for each one. The list is quite long (over 500), and yes - the client insists on displaying all of them. We do have an ajax typeahead search to help users find what they're looking for without scrolling, so it's not a total disaster.
Here's the issue: the client is just now realizing that it takes a long time to load this page because of all the logos. Even if each one is only a few kb, it still adds up pretty quickly. He's now decided he wants a progress bar to display while the images are loading. I've never done that before, so I starting looking around, and most of the ones I've seen rely on getting an array of img tags and looping through to check the complete property. The problem I'm having (at least I think this is what's causing the problem) is that the image tags are generated by a database query, and I think the javascript to get the image array is loading before the image tags are finished loading. Obviously this isn't an issue on pages where the images are hard-coded.
Can anyone point me in the right direction of how I can implement a progress bar on img tags that get loaded dynamically? My site is written in PHP, so I'm perfectly happy to do something server-side if that would work better.

As pretty much everyone here has noted, this is a nasty problem to have to solve. Accordingly, I propose sidestepping the technical components of it and addressing only the human ones.
Leave everything almost exactly as it is. All you have to do is find or make a throbber (I use http://ajaxload.info/ and it couldn't be easier), and use it as the background image for a CSS selector that only applies to the logos on the page.
Users (and clients who make unreasonable requests!) are far more frustrated by a lack of responsiveness than they are by things taking time. This quick gimmicky fix might be just enough to coax site users to perceive the problem more as the latter than as the former.

CSS Sprites will definitely be a good start in your case. If you got 500 spearate images on one page, then browser will have to start 500 new connections to fetch them, and unfortunately, concurrent connections will be around 20 or something, so... it is not good.
CSS Sprites from css-tricks.com

I'd suggest pre-making and caching of the logos (or their various states), as your own hunch is that the main bottleneck is that "the image tags are generated by a database query". Is this at all possible?
It's better to store a few states or whatever with a naming scheme that makes it possible to fetch the right image, than regenerating them on-the-fly each time. Of course, you'll need a proper cache handling mechanism, so it's not like an easy task, but more often than not, your hunch is helping you in the right direction.
If you're able to boil it down to a few static files per logo, you could also consider using a CDN and/or multiple subdomains, as Michael Mao suggests.

I haven't tested this but something like this might work (its jQuery)
<?php // Do your select image stuff from here ?>
<html>
<head></head>
<body>
<script type="text/javascript">
$(document).ready(function () {
var images = <?php echo json_encode($images); ?>;
$.each(images, function(i, image) {
$(new Image()).load(function () {
$('body').append(this);
alert('Loaded '+i+' out of '+images.length);
}).attr('src', image);
})
});
</script>

Since you already have a javascript search to get people to specific listings faster, how about loading a placeholder static image for all logos and then replacing the placeholder with the correct logos on an as-needed basis? "As-needed" could be determined by JavasScript calculation of window position and any typed input.
Do your just-in-time image loading from multiple subdomain masks to parallelize requests and you should be able to pop the images up somewhat quickly as-needed without bogging down the page by loading unnecessary images.
It's not pretty, but neither is the client's request.
Edit: As far as a progress bar goes, when you determine a window location (or typed-input location), determine how many listings will be in-view, and how many listings to buffer above and below the view. Then you'll have a total number of listings and you can update a JavaScript/HTML progressbar as you dynamically replace the logos within that range.

It would be much easier to answer the question if I could see the complete code but I can remember doing something remotely similar and I ended up using a custom JavaScript object. You could perhaps start with an object like this somewhere in the head of your app:
function Iterator() {
var counter = 0;
this.images = 215; // This number comes from the DB and gives you a total number of images
this.progress = counter / this.images
this.hasMore = function() { counter < this.images }
this.getPicture = function() {
// send a request to the server using counter as a parameter
// upon receiving this request the server should load only ONE image (LIMIT=1 in SQL)
// with OFFSET equal to the value of "counter"
// when the image loads (use whatever JS framework you need for that),
// we increment the counter:
counter++;
}
this.loadPictures = function() {
while this.hasMore() {
this.getPicture()
// You can do something with the "progress" attribute here, to be visualizad by your progress bar
}
}
);
You sure need to instantiate the Iterator object on body load and have it execute the "loadPictures" function.
Please let me know if you have any problems implementing that.

Here's a javascript-only solution that shouldn't require any modification to your server-side code. Using jquery:
$(document).ready(function() {
var loadedSoFar = 0;
//use jquery to get all image tags you want.
//The totalImgs is used to calculate percent and is the length of the jquery array
var totalImgs = $("#imgHolder img").each(function(i, img) {
//if this image has already loaded, add it to loadedSoFar.
if (img.complete)
loadedSoFar++;
else {
//otherwise add a load event for the image.
$(img).load(function() {
loadedSoFar++;
console.log("complete: " + (loadedSoFar / totalImgs * 100) + "%");
});
}
}).length;
});
I wrote this assuming all the images are already in the dom when the document.ready is called. If it is not, move this to a function and call it after the img tags are loaded into the dom from the server (via ajax request for instance).
Basically all it does is find all the imgs in imgHolder (modify the selector to match your situation), and wire the load event so it can update the loadedSoFar count. If the image has already loaded by the time this script runs, the load event would never fire, so increment the loadedSoFar counter right away. The number of total images that need to be loaded will be the length of the jquery object array returned by the selector.
I'll leave it to you to write a progress bar, but I do recommend this plugin: http://t.wits.sg/jquery-progress-bar/

I'd definitely try to avoid the progress bar -- you're going to really struggle with it.
If you really must do it, probably the best you can hope for is to fake it - ie have it show progress on a given amount of time, which you'd set as an approximation of the actual time it takes to load the page. Not foolproof but far easier than trying to actually time the loading progress.
Don't forget also that the progress bar itself will add to the loading time. And you'll need to have the code for it embedded in your actual HTML code; if it's in an external javascript file it could itself be subject to loading delays, which would make the whole excersise pointless.
So as I say, it's probably not worth the effort. What would be worth the effort would be to try to reduce the loading time. If it's noticable enough that you're considering a progress bar then there's something seriously wrong.
There are a whole stack to techniques for speeding up site loading times; it's a whole topic on its own. But I'll try to give you a few pointers. I suggest you also take some time googling for additional and follow-up information, though.
Image optimisation. If you haven't done so already, run all your images through an optimising program. You may find that you can reduce file sizes and thus load-times significantly.
CSS Sprites. One of the main causes of slow loading times is having too many separate HTTP requests. Since a browser can only load a given number of files at once, any files over that number will have to wait till others have finished before they can even begin loading. CSS sprites solves this by combining a number of images into a single file and using CSS to display only the relevant part of it in each spot. This is typically used for groups of related images.
Lazy Load. There is a jQuery plugin called LazyLoad which tells your page to load images only as they are needed. Images that are off the bottom of the viewable page are deferred until the user starts scrolling the page. This means that the images that are visible immediately are loaded first, making the page as a whole appear to load more quickly.
That'll do for now. There's loads more, of course, but all I'm really trying to say is: Optimise to get rid of the speed issue rather than trying to cobble together a band aid solution.
Hope that helps.

there are different ways to speedup image loading especially when you have a lot of them
Sprite - Use sprite where you group a number of images together and use the css background property (FACT: one big image better speed than multiple requested images).
Use the <image/> tag properly. You can add the lowsrc="" attribute to show a low resolution image until the real image aka src="" attribute is fully loaded
Use the ajax, although you may have 500+ images users will never see the full list. So you can load 6 at a time (or Nth at a time) when the document is fully loaded you can add the rest that way u can speed up the loading time as the user will only see some images and by the time they click next set of images they would have been loaded (or wait less) better yet to save broadband only use the ajax when required aka only when they click next or scroll down auto load new images. (look at google images and ajax used while scralling)

Viewing large text file in a browser

I need to write a text file viewer (not the directory tree, but the actual file contents) for use in a browser. It will be used to view large files. I want to give the user the ability to actually ummm, browse the file, ie prev page & next page buttons, while each page will show only a portion of the file.
Two question:
Is there anyway to pass the file descriptor through POST (or something) so that on each page I can keep reading from an already open file, and not starting all over again (again - huge files)
Is there a way to read the file backwards? Will be very useful for browsing back in a file.
Any other implementation ideas are very welcome. Thanks

Keeping the file open between requests is not a good idea - you don't have to "start all over again" - just maintain an offset and use fseek() to jump to that offset. That way, you can also implement the "backwards jumping".

Cut your huge files into smaller files once, and then serve the small files to the user.

You should consider pagination. If you're concerned about the user being frustrated by needing to click "next" too often, you could make each chunk reasonably large (so a normal reader pages every 20min).
Another option is the Chunked-Endoding transfer type: Wikipedia Entry. This would allow your server to respond quickly and give the user something to read while it streams the rest of the file over the network (rather than the server needing to read in the file and send it all at once). This could dramatically improve the perceived performance compared to serving the files normally, but still consumes a lot of bandwidth for your server.
You might be able to simulate a large document with Javascript and AJAX, but only send pieces at a time for better performance.
Consider sending a few pages worth of your document and attaching listeners to the scroll event of your browser. Over time or as the user scrolls down you AJAX more chunks. This creates a few annoying UX edge cases, like:
Scroll bar indicates a much smaller document than there actually is
You might be able to avoid this by filling in the bottom of your document with many page breaks, but it'll be difficult to make the length perfect.
Scrolling past the point of currently-available content will show a blank page.
You could detect this using JavaScript and display a "loading" icon to let the user know what's going on.
Built-in "find" feature doesn't work
Hard to avoid this without the user downloading the entire document, but you could provide your own search feature for them to use instead (not as good but perhaps adequate).
Really though, you're probably best off with pagination with medium-sized pages. It's a very well understood design pattern that's a relatively easy (compared to other options at least) to implement and make fast.
Hope that helps!

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.