Loading several thousands of images efficiently

Loading several thousands of images efficiently - php

I am developing a website that uses Google Maps.
The map has 4000-5000 markers.
Upon the client enters the website the server determines all "active" markers and sends a JSON document telling the client information about each marker and what marker-icon to use (a url to an image on the server, ex: icon: '/icon/xxx.png').
The website loads instantly but it takes about 5 seconds until all markers are shown since the client has to fetch those ~5000 images.
The images can change so the server only knows when the client ask for it, exactly which image each marker uses.
How can I speedup this process?
Can I dynamically create a spritesheet of some sort or pack all those files and let the client unpack them for faster loading?
The server backend is PHP för this part.

You can indeed create a spritesheet, or if they are relatively simple in style, you can also create them dynamically as SVGs within your page code, and when you declare each marker, you give it a path defined in SVG within your JS (as served by your main page, or a static file containing functions for many of them).
SVG info here:
https://developer.mozilla.org/en-US/docs/Web/SVG/Tutorial/Paths
Some good examples in this prior answer : How to use SVG markers in Google Maps API v3
A spritesheet is a good way to do this if the total number is relatively unchanging, and each client is likely to use a large subset of the sprites.
The downside of a spritesheet in my experience is maintenance - combining a code and an art workflow can be a bit of a pain!

Related

How to secure my GeoJSON data?

I am trying to create a website that display Google map mark up with my proprietary data (in form of multiple polygons) on top of it.
I have been studying Google Map API and found the only way to do it is to publish my proprietary data in GeoJSON file then use the following api function to load the data to Google map: map.data.loadGeoJson(xxxxxx);
This means I need to publish my proprietary data by web service in GeoJSON format. However, I don't want users to download my proprietary data and use it for another purpose other than my site. This is similar to a website which allows video streaming but not allowing download the whole video offline.
How can I achieve the purpose? Can I use some language like PHP to generate the map (with markup) at server then send to web client in form of HTML? Or if I cannot achieve this by Google Map API, can other map API support it (like Bing?)
Thanks very much for your help!!!!
Code Mon key

One option is to turn your data into a tile layer. This will limit the user to only seeing an image of the data but would not give them access to the raw data. In a worse case scenario they would only be able to take the images and view the data and not do any kind of analytics against it unless they manually trace all the data.
As an added benefit of rendering the data as a tile layer, you will be able to visualize a lot more data. I've built a few systems that can render 500M rows of polygon data on a map using this approach. The cool thing, if you store the data in a spatial database like SQL Azure, you can easily make your data interactive by taking the point a user clicks on a map and searching the database for any shapes that intersect with that point.
I wrote a simple blog post on how to create a web service that does this many years ago here: https://rbrundritt.wordpress.com/2009/11/26/dynamic-tile-layers-in-the-bing-maps-silverlight-control/
There is also a good open source project here that uses ASP.NET: http://ajaxmapdataconnector.codeplex.com/
I have a whitepaper that is a lot more up to date than my blog post that will be published soon. If you email me at ricky_brundritt at Hotmail.com, I'll send you a draft copy.

Batch PHP image upload slow, architecture change?

I've build a CMS with photo album. Pretty simple, most stuff static, static HTML pages, no database, just (as little as possible) text files containing some JSON stuff.
The webinterface for the admin panel is all in jQuery with a PHP (Zend Framework) based backend. A much as possible is done within the browser so the backend is pretty bare.
Now the photo album works currently like this:
Clicking link 'Media'
Fetching a JSON string from the backend containing an object with all albums and for every album all the photo's
Rendering an unordered list with all the albums
Rendering an unordered list within each album list item with all the pictures
Uploading:
Drop one or more jpeg/png files into the browser to create a new album
Drop one or more jpeg/png files into an album to append those files to the album under the cursor
Send all dropped files (using this jQuery drag drop upload plugin) to the backend
Backend receives all files (while displaying a nice progress bar)
Backend loops through all uploaded files, while the webinterface displays a nice spinner
Each file is resized to a maximum size specified and renders a thumbnail at max 133x133 px
Each file is appended to an array with the serverside filename and thumbnail name
[Not yet implemented: rendering the (updated) static html page for the album overview and for each image]
Array with all newly uploaded files is converted to JSON and sent to client
Webinterface appends all new files as list items (displaying the thumbnail)
Uploading done
This is all going pretty well, upto +- 600 images or +- 900MB. That's fine by me, if the user wants to upload more files at once, well, do it in two stages. The problem is, the backend processing is a bitch. Converting 100+ images at a good size (1.5MB each) to the maximum size and generating the thumbnail is taking way to long. I'm doing this with PHP GD. Didn't take me too much time (or no time at all), to find out that that's the problem. My guess is that there is no way I'm going to speed this up within PHP.
So here are a few questions:
Will ImageMagick be faster? I'm not a fan, so please so no, also, I don't want to install this on my server that badly..
Is there a really, really lightweight command-line program that does the same with just a few commands (and I know that I'm not alluding to ImageMagick)?
If the answer to the previous question is no: what would be the best way to do this? Don't say Java, I'm no that big of a fan of Java also. Some C(-dialect)? Preferably one with a powerful, yet lightweight image library for the nearest neighbor, bilinear and bicubic interpolation algorithms.
Could my architecture be changed? At this moment, the images start appearing in the browser once the thumbnail is inserted, thus after the whole JSON array is received, causing the entire action having to complete and generating all the image data before any kind of feedback is received in the browser. This means that the spinner (without any indication of how long the process is going to take or how many images have been completed) will be displayed for a long, long time. Is it an idea to use Javascripts FileReader to preload the images from the users system, generate the thumbnails in the browser and display them after uploading is done immediately? And on the backend: just receiving the file, writing them to disk, executing a command-line command, immediately send response to browser and converting in the background?
How do I prevent client side abort event of an AJAX request? When uploading and converting, a warning should be displayed when the user want to close the page or when the #hash is being tried to change.
Thanks. Hope you guys can help me. Just so you know: the client side is pretty complex with way to much code. I'd rather change the backend.

PHP chart Libraries VS JavaScript Chart Libraries

I am just stuck a little in making a choice between PHP chart Lib and JavaScript Chart Lib. I do understand that PHP if for the server side and Javascript for the client side. My problem is what difference does it make when using their charting libraries. Is it performance issue or what?
I want to understand the difference in using PHP chart Libs and JavaScript Chart Libs. Please am not looking for examples of their chart libraries. I am looking for why i should choose one over the other.
I tried to google 'php chart vs javascript chart' but didn't get any links that can give me
the difference.
EDIT 1
1)
If this question has been answered before, then point me there.
2)
Am developing the application for internet
EDIT 2
1)
I have found out about PHPChart PHPChart which has both PHP source code and JavaScript source code. If anyone has experience in that library, does it may be solve the problem of server side load (bandwidth issues) etc.. I am thinking since it has both the PHP and JavaScript source then it may be the best to use. Am just assuming. :-)
Thank you very much

Both ways of creating graphs have their own pros and cons.
If you decide to do it using PHP, first you need to make sure that you have all the required graphical libraries installed (e.g. GD, which might not always available on shared hosts).
Assuming you have them, the first negative thing in my opinion is that you will end up with static images. Of course, it's not always a bad thing, as that ensures compatibility with all the clients, be those with or without javascript support, however, it takes away the dynamics of graphs generated on the client side using javascript. Your users won't be able to zoom, move, slide, full screen or do anything that they could with the likes of Highcharts or Flot.
Another con is that images take up more bandwidth than, say, JSON. The bigger you want to have your graph, the more colors it contains, the longer your clients will have to wait till your page loads. And just because those loads are not asynchronous, they will have to wait for the images to load before they will see the rest of the page.
With javscript libraries everything is different though. You only request the data required for your graph and you only request it when your page loads. The amount of data is usually smaller than an image would be plus you can compress your output with GZ to make it even smaller. Users will see nice spinners informing them that the graph is loading instead of some incomplete webpage.
Another thing to take into account is - what if you decide to show a nice table with data in them below each graph? If you chose to render images on the server, you would end up having to add new functionality just to get the data. With JSON, however, you just make one call, render the graph and display the table. Maybe calculate totals or do whatever you want with it. Hand it out to people as an API if you wish, after all :)
If you ask me, I would definitely go with client-side graphs as most of the devices have nice HTML5 support nowadays and being able to display a graph on an Android phone, or an iPhone or an iPad shouldn't pose a problem. If you only need images and you don't wish to expose the original data, go with PHP.

My opinion is that having a server side solution (i.e. php) takes away any browser compatibility issues you may have with a client side solution (i.e. javascript) and hence support issues.
A benfit of using JS is that it does offload resources from your server to the client because you may only have to generate some light weight data (e.g. JSON , XML) and the rendering occurs on the client. You will have to investigate how many hits your server is likely to get, etc to determine if resource is an isuse with PHP or JS.
However, using Php to create images of charts you can always get around the performance/resource issue by using a cache of the image files and serving from the cache (it's a just a folder of images) instead of generating a new one. Whether you cna use a cache will depend on your usage. If clients require up to the second data and its always changing, obviously a cache may not be of use.

Here's what I see :
Using PHP
Increase load on the server for the request
Will work everywhere
Also, like someone said here and made me think of it, you can cache the image that PHP give you, reducing bandwith (no lib to download) and reducing load (cache)
Using Javascript
Decrease load but increase the bandwitch and addition http request (to load the JS lib)
Will work where JS is available
But remember, PHP take more load then an HTTP request.
Also, always remember, Javascript is made for effects and specials stuffs you need to display.

There is one PHP render advantage that no one told about. Since sometime you need to include chart as image into PDF, DOC, XLS etc. file or email it – you have no other way except to render chart on server and store it as image to be inserted.

For data manipulation you use PHP.
For visual and behavioral effects you use JavaScript.
For that reason, you should use Javascript as its designed for visual behavior. Plus it will put less load on your server as all processing will be client side. As more people use your application simultaneously, it will start to slow down as your server will be doing a lot more then it has to.
Hope that helps :)

How do extract text layer and background layer from pdf?

In my project I've to do a PDF Viewer in HTML5/CSS3 and the application has to allow user to add comments and annotation. Actually, I've to do something very similar to crocodoc.com.
At the beginning I was thinking to create images from the PDF and allow user create area and post comments associates to this area. Unfortunately, the client wants also navigate in this PDF and add only comments on allowed sections (for example, paragraphs or selected text).
And now I'm in front of one problem that is to get the text and the best way to do it. If any body has some clues how I can reach it, I would appreciate.
I tried pdftohtml, but output doesn't look like the original document whom is really complex (example of document). Even this one doesn't reflect really the output, but is much better than pdftohtml.
I'm open to any solutions, with preference for command line under linux.

I've been down the same road as you, with even much more complex tasks.
After trying out everything I ended up using C# under Mono (so it runs on linux) with iTextSharp.
Even with a very complete library such as iTextSharp, some tasks required allot of trial-and-error :)
To extract the text from a page is easy (check the below snipper), however if you intend to keep the text coordinates, fonts and sizes, you will have more work to do.
int pdf_page = 5;
string page_text = "";
PdfReader reader = new PdfReader("path/to/pdf/file.pdf");
PRTokeniser token = new PRTokeniser(reader.GetPageContent(pdf_page));
while(token.NextToken())
{
if(token.TokenType == PRTokeniser.TokType.STRING)
{
page_text += token.StringValue;
}
else if(token.StringValue == "Tj")
{
page_text += " ";
}
}
Do a Console.WriteLine(token.StringValue) on all tokens to see how paragraphs of text are structured in PDFs. This way you can detect coordinates, font, font size, etc.
Addition:
Given the task you are required to do, I have a suggestion for you:
Extract the text with coordinates and font families and sizes - all information about each paragraph. Then, to a PDF-to-images, and in your online viewer, apply invisible selectable text over the paragraphs on the image where needed.
This way your users can select a part of the text where needed, without the need of reconstructing the whole PDF in html :)

I recently researched and discovered a native PHP solution to achieve this using FOSS. The FPDI PHP class can be used to import a PDF document for use with either the TCPDF or FPDF PHP classes, both of which provide functionality for creating, reading, updating and writing PDF documents. Personally, I prefer TCPDF as it provides a larger feature set (TCPDF vs. FPDF), a richer API (TCPDF vs. FPDF), more usage examples (TCPDF vs. FPDF) and a more active community forum (TCPDF vs. FPDF).
Choose one of the before mentioned classes, or another, to programmatically handle PDF documents. Focusing on both current and possible future deliverables, as well as the desired user experience, decide where (e.g. server - PHP, client - JavaScript, both) and to what extent (feature driven) your interactive logic should be implemented.
Personally, I would use a TCPDF instance obtained by importing a PDF document via FPDI to iteratively inspect, translate to a common format (XML, JSON, etc.) and store the resulting representation in relational tables designed to persist data pertinent to the desired level of document hierarchy and detail. The necessary level of detail is often dictated by a specifications document and its mention of both current and possible future deliverables.
Note: In this case, I strongly advise translating documents and storing them in a common format to create a layer of abstraction and transparency. For example, a possible and unforeseen future deliverable might be to provide the same application functionality for users uploading Microsoft Word documents. If the uploaded Microsoft Word document was not translated and stored in a common format then updates to the Web service API and dependent business logic would almost certainly be necessary. This ultimately results in storing bloated, sub-optimal data and inefficient use of development resources in designing, developing and supporting multiple translators. It would also be an inefficient use of server resources to translate outbound data for every request, as opposed to translating inbound data to an optimal format only once.
I would then extend the base document tables by designing and relating additional tables for persisting functionality specific document asset data such as:
Versioned Additions / Edits / Deletions
What
Header / Footer
Text
Original Value
New Value
Image
Page(s) (one, many or all)
Location (relative - textual anchor, absolute - x/y coordinates)
File (relative or absolute directory or url)
Brush (drawing)
Page(s) (one, many or all)
Location (relative - textual anchor, absolute - x/y coordinates)
Shape (x/y coordinates to redraw line, square, circle, user defined, etc.)
Type (pen, pencil, marker, etc.)
Weight (1px, 3px, 5px, etc.)
Color
Annotation
Page
Location (relative - textual anchor, absolute - x/y coordinates)
Shape (line, square, circle, user defined, etc.)
Value (annotation text)
Comment
Target (page, another text/image/brush/annotation asset, parent comment - threading)
Value (comment text)
When
Date
Time
Who
User
Once some, all or more, of the document and its asset data has a place to persist I would design, document and develop a PHP Web service API to expose CRUD and PDF document upload functionality to the UI consumer, while enforcing core business rules. At this point, the remaining work now lies on the Client-side. Currently, I have relational tables persisting both a document and its asset data, as well as an API exposing sufficient functionality to the consumer, in this case the Client-side JavaScript.
I can now design and develop a Client-side application using the latest Web technologies such as HTML5, JavaScript and CSS3. I can upload and request PDF documents using the Web service API and easily render the returned common format out to the browser however I decide (probably HTML in this case). I can then use 100% native JavaScript and/or 3rd party libraries for DOM helper functionality, creating vector graphics to provide drawing and annotation features, as well as access and control functional and stylistic attributes of currently selected document text and/or images. I can provide a real-time collaborative experience by employing WebSockets (before mentioned WebService API does not apply), or a semi-delayed, but still fairly seamless experience using XMLHttpRequest.
From this point forward the sky is the limit and the ball is in your court!

It's a hard task you're trying to accomplish.
To read text from a PDF, have a look at PEAR's PDF_Reader proposal code.

There's also a very extensive documentation around Zend_PDF(), which also allows the loading and parsing of a PDF document. The various elements of the PDF can be iterated on and thus also being transformed to HTML5 or whatever you like. You may even embed the notations from your website into the PDFs and vice versa.
Still, you have been given no easy task. Good Luck.

pdftk is a very good tool to do thinks like that (I don't know if it can do exactly this task).
http://www.pdflabs.com/docs/pdftk-cli-examples/

Google Maps Overlays

I'm trying to find something, preferably F/OSS, that can generate a Google Maps overlay from KML and/or KMZ data.
We've got an event site we're working on that needed to accommodate ~16,000 place markers last year and will likely have at least that many again this year. Last year, the company that had done the site just fed the KML data directly to the gMaps API and let it place all of the markers client side. Obviously, that became a performance nightmare and tended to make older browsers "freeze" (or at least appear frozen for several minutes at a time).
Ideally this server side script would take the KML, the map's lat/lon center, and the map zoom level and appropriately merge all of the visible place markers into a single GIF or PNG overlay.
Any guidance or recommendations on this would be greatly appreciated.
UPDATE 10/8/2008 - Most of the information I've come across here and other places would seem to indicate that lessening the number of points on the map is the way to go (i.e. using one marker to represent several when viewing from a higher altitude/zoom level). While that's probably a good approach in some cases, it won't work here. We're looking for the visual impact of a US map with many thousand markers on it. One option I've explored is a service called PushPin, which when fed (presumably) KML will create, server side, an overlay that has all of the visible points (based on center lat/lon and zoom level) rendered onto a single image, so instead of performing several thousand DOM manipulations client side, we merge all of those markers into a single image server side and do a single DOM manipulation on the client end. The PushPin service is really slick and would definitely work if not for the associated costs. We're really looking for something F/OSS that we could run server side to generate that overlay ourselves.

You may want to look into something like Geoserver or Mapserver. They are Google map clones, and a lot more.
You could generate an overlay that you like, and Geoserver(I think mapserver does as well) can give you KML, PDF, png, and other output to mix your maps, or you could generate the whole map by yourself, but that takes time.

Not sure why you want to go to a GIF/PNG overlay, you can do this directly in KML. I'm assuming that most of your performance problem was being caused by points outside the user's current view, i.e. the user is looking at New York but you have points in Los Angeles that are wasting memory because they aren't visible. If you really have 16,000 points that are all visible at once for a typical then yes you'll need to pursue a different strategy.
If the above applies, the procedure would be as follows:
Determine the center & extent of the map
Given that you should be able to calculate the lat/long of the upper left and lower right corners of the map.
Iterate through your database of points and check each location against the two corners. Longitude needs to be greater (signed!) than the upper left longitude and less than the lower right longitude. Latitude needs to be less than the upper left latitude (signed!) and greater than the lower right latitude. Just simple comparisons, no fancy calculations required here.
Output the matching points to a temporary KML for the user.
You can feed KML directly into Google Maps and let it map it, or you can use the Javascript maps API to load the points via KML.
It might not solve your exact problem here, but for related issues you might also look into the Google Static Maps API. This allows you to create a static image file with placemarkers on it that will load very quickly, but won't have the interactivity of a regular Google map. Because of the way the API is designed, however, it can't handle anywhere near 16,000 points either so you'd still have to filter down to the view.

I don't know how fare you are with your project but maybe you can take a look at GeoDjango? This modified Django release includes all kinds of tools to store locations; convert coordinates and display maps, the easy way. Offcourse you need some Python experience and a server to run it on, but once you've got the hang of Django it works fast and good.
If you just want a solution for your problem try grouping your results at lower zoom levels, a good example of this implementation can be found here.

This is a tough one. You can use custom tilesets with Google Maps, but you still need some way to generate the tiles (other than manually).
I'm afraid that's all I've got =/

OpenLayers is a great javascript frontend to multiple mapping services or your own map servers. Version 2.7 was just released, which adds some pretty amazing features and controls.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.