Pagination of text from xml file onto html page

Pagination of text from xml file onto html page - php

O.K. so I'm developing a website to feature my fiction writings. I'm putting all of my documents into XML files, pulling and parsing them from the server with PHP and displaying them on the page. You can visit the page here for an example.
As implied from the background image, What I would like to do is take the text and split it into two columns, (with the text from the first spilling into the second), then allow for the overflow to be paginated so that there is no scrolling necessary. In other words, I'd like for the text to read like a book with the paging based on how long the body of the XML document is.
I would like for this to be done on the server side using PHP or something similar. Is there a way I can do this with an xsl stylesheet or a server-side script? I've been looking everywhere and can't seem to find anything.
Any help is appreciated.
Mr. Mutant

This is a surprisingly hard problem in general, and it's one you'll have no end of trouble with if you try to do it on the server. The problem with paginating HTML text is that where the page breaks go are entirely contingent on the client. The server doesn't know the client's screen resolution, font selection, or window size, and apart from the text itself those are the dependent variables for the problem.
I'd be surprised if at this point there weren't some jQuery library that just does this, but when I had to implement it myself about 7 years ago, here's the approach I took:
Create a div for each column. Each one contains the entirety of the document text. Style the divs with fixed line height. Put the column divs bottom in the document's z-order. Now you can lay out the rest of the page, leaving holes of known size in the layout that the divs can show through, and by manipulating the vertical position of each div you can control which line is the first to appear inside a given hole.
You can then let the client manipulate the font size, and as long as you recalculate the height of the holes and then reposition the divs properly, it will all magically work.
There may be ways of doing this in HTML5 that are easier; I would definitely look into that.

Related

wkhtmltopdf dynamic margins

Is it possible, using wkhtmltopdf, to define dynamic margins ?
I'm generating a pdf for an invoice, on the first page I have the BVR in the footer so the user can cut it out and pay with it. On pages besides the first one, I have no footer.
The problem is that, when having enough content to fill the second page fully, the page breaks occurs at the given footer margin, leaving me with a page empty for 1/3.
Is it possible to define (javascript or other ways) a dynamic margin size for the first page, and then remove the bottom margin for any other ?

Nope. You can create a feature request at https://github.com/wkhtmltopdf/wkhtmltopdf/issues, but I don't think this is coming any time soon because the underlying technology (webkit) has stabilised (doesn't change often) and doesn't really need this feature.
How about for a workaround you examine the content and if it's the specific length you split the HTML to two files and convert them individually. Then you can join them with pdftk for example.

Converting from div to table or parsing PDF in PHP

I'm developing a WebApp in which I take an invoice converted from PDF to HTML, then parse the invoice lines.
I have a div in my main window which displays the contents.
But when I display the contents from the invoice in that div, all the contents appear overlapped.
In the converted invoice there is no table, only divs with absolute positioning. I can't make it any other way at least with this aproach, because that's the way the converter works.
So, as a solution I'm converting from "div to table", trying to decide when there is a change of row or not, based on the top parameter from the corresponding div.
However besides the invoice data, I also have the invoice header. I'm having difficulties to decide if the table is the same or not.
But so far, I think the solution passes through making 3 tables, one for the company logo, one for the header, and one for the data.
But I need all these tables to appear in the correct positions and with the correct sizes.
At the moment, I'm not allowed to paste invoice examples, and as I'm stuck in an early stage (close to the algorithm stage). I don't think any examples of my code and of the invoices could help anyone to understand the situation better.
But I promise to update this with examples soon.
As an alternative solution I could parse the PDF myself, but I haven't found a way to do it so far.
I'm using PHP to make the WebApp and verypdf pdf2html to make the conversion.
I know with that little information, is hard to get help.
Any ideas are welcome.

How about trying to cure the overlapping itself. For example you could strip all the styling information from the DIVs after the PDF is parsed into DIVs. Then you can apply your own styles.
It might be useful to know if all the invoices are in the same format/arrangement, or not.

alternative to display:none for mobile

Im currently building a practice responsive website, what I am doing is taking an exising website, building it up using twitter bootsrap js and css, meaning it will be fully responsive for mobile.
The issue is that there are some large carousels and images on the site. Ideally I would like to just completely remove certain elements, like a carousel for instance, and instead have the options within the carousel as a standard list menu.
It seems the main option is display:none based on media queries, but I am starting to foresee that I will run into big problems for loading time if the entire desktop site is still going to be loaded on the mobile, only elements hidden.
Are there ways to completely exclude html based on browser size? If anyone has any good links or articles that would be great. Or even just opinions, on whether there is actually need to exclude html or not.
Thank you

First off it is really good to see that although you're talking about display:none; you actually still want to display the content without the bells and whistles of the image. Well done you.
The next thing I would look at is if you don't want to load images for a mobile then why are you adding it for the larger sites. If the image isn't providing a function, assisting in explaining the content better, then why not just drop it for the desktop size as well?
If in fact it does help tell a story then you can include the images and some of the popular image services like adaptive images, hiSRC, or PictureFill which will serve the mobile version of the image first and replace with a larger image at higher viewports (but remember, there's no bandwidth test).
Finally, if you do want to serve some different content, then take the advice of fire around including more content with ajax. The South Street toolbox from Filament group can help you out, pay particular attention to the AjaxInclude pattern (it also has a link to the picturefill).

You could consider storing heavy data JSON-encoded, and then creating elements and loading them on demand like so
var heavyImage = new Image();
heavyImage.src=imageList[id];
Then you can append image element to a desired block. From my experience with mobiles this is more robust than requesting <img> via AJAX, since AJAX could be pretty slow sometimes.
You may also 'prefetch' images with this method (like 2-3 adjacent to visible at the moment), thus improving UX.

You could pull in the heavy elements via AJAX so they wouldn't sit on the page initially, making it load faster. You could decide to do the AJAX call only if the screen size is larger than X.

If you want you can use visibility:hidden, or if you use jQuery you can use
$(element).remove() //to remove completely
$(element).hide() //to hide
$(element).fadeOut(1) //to fadeout

a way to implement facebook's functionality of link sharing

I am looking for a way to create functionality, similar to when you post a link to the existed web-site in facebook. If this statement is rather ambiguous, I will try to elaborate.
When you paste your link and submit your post, facebook together with you link gives a small preview of the page, you are posting (text and may be a small image)
What are the ways to achieve this?
I read the similar post, but the thing is that I do not need an image so much, text will be sufficient.
Working in PHP, but language is not important, because I am looking for a high level idea.
Previously I was thinking about parsing content of the link with cURL but the thing is that in a lot of situations the text returned by facebook is not available on the page.
Is there other ways?

From what I can tell, Facebook pulls from the meta name="description" tag's content attribute on the linked page.
If no meta description tag is available, it seems to pull from the beginning of the first paragraph <p> tag it can find on the page.
Images are pulled from available <img> tags on the page, with a carousel selection available to pick from when posting.
Finally, the link subtext is also user-editable (start a status update, include a link, and then click in the link subtext area that appears).
Personally I would go with such a route: cURL the page, parse it for a meta tag description and if not grab some likely data using a basic algorithm or just the first paragraph tag, and then allow user editing of whatever was presented (it's friendlier to the user and also solves issues with different returns on user-agent). Do the user facing control as ajax so that you don't have issues with however long it takes your site to access the link you want to preview.
I'd recommend using a DOM library (you could even use DOMDocument if you're comfortable with it and know how to handle possibly malformed html pages) instead of regex to parse the page for the <meta>, <p>, and potentially also <img> tags. Building a regex which will properly handle all of the myriad potential different cases you will encounter "in the wild" versus from a known set of sites can get very rough. QueryPath usually comes recommended, and there are stackoverflow threads covering many of the available options.
Most modern sites, especially larger ones, are good about populating the meta description tag, especially for dynamically generated pages.
You can scrape the page for <img> tags as well, but you'll want to then host the images locally: You can either host all of the images, and then delete all except the one chosen, or you can host thumbnails (assuming you have an image processing library installed and turned on). Which you choose depends on whether bandwidth and storage are more important, or the one-time processing of running an imagecopyresampled, imagecopyresized, Gmagick::thumbnailimage, etc, etc. (pick whatever you have at hand/your favorite). You don't want to hot link to the images on the page due to both the morality of it in terms of bandwidth and especially the likelihood of ending up with broken images when linking any site with hotlink prevention (referrer/etc methods), or from expiration/etc. Personally I would probably go for storing thumbnails.
You can wrap the entire link entity up as an object for handling expiration/etc if you want to eventually delete the image/thumbnail files on your own server. I'll leave particular implementation up to you since you asked for a high level idea.
but the thing is that in a lot of situations the text returned by facebook is not available on the page.
Have you looked at the page's meta tags? I've tested with a few pages so far and this is generally where content not otherwise visible on the rendered linked pages is coming from, and seems to be the first choice for Facebook's algorithm.

Full disclosure upfront, I'm a developer at ThumbnailApp.com.
It's an JSON API service with an optional Javascript SDK which I think does exactly what you're after: It will parse a string to detect any urls and return the title, description and thumbnail of the asset. If the page has OpenGraph tags, it will use those for the image thumbnail. It's currently in private beta but we're adding more accounts each week.
If you feel that you really need a do-it-yourself solution:
Checkout the python based Webkit2Png and the headless browser PhantomJs. They can render webpages to an image (default size is 800x600), then you'll have to write some code to resize and crop the image like taswyn mentioned. Ideally you would then upload the resized image to Amazon S3 and then get it hosted on a CDN such as CloudFront.
To get the title and description, first get the URL content (cURL or whatever) and you will need to check the content-type header to make sure it's a webpage. If it is, you can then use a HTML parser such as the SimpleHTMLDOM PHP library to grab the title and description meta data. If you want it exactly like Facebook you will also need to check for any OpenGraph tags specifically the og:image tag.
Also don't forget about caching. The first render and description parsing can take a long time. Even if your site is fast, the webpage you're rendering could be slow and the best approach is to render / parse it once, then just save and return the resized image and meta data for subsequent requests. Depending on what your requirements are you may need to refresh the cached data every hour or you could get away with refreshing it once a day.
To do it yourself takes quite a bit of work and lots of server configuration. I feel using a 3rd party service is a better way to go, but obviously I have a biased opinion :)

Can we measure height of a div using php?

Can we measure height of a div using php?

This is not possible at all: PHP serves HTML code. The browser renders it. Only after it is rendered, can height be determined reliably. Different browsers may end up with different heights. Different user settings (like font size) may end up with different heights.
The only way to find out an element's height is using JavaScript which runs in the browser. You can theoretically send the results back to a separate PHP script using Ajax, but I doubt that'll make much sense.
You could use jQuery's .height() like so:
var height = $("#elementID").height();
(there are native JavaScript approaches to this as well, but they tend to be a bit long and complicated.)

As others have said here, you cannot use PHP to read the height/width of a div already rendered. However, aside from the javascript options already presented keep in mind that you can use PHP to set the height/width of a div before it is sent to the browser. This would be in the form of an inline style of course. This is not the most elegant solution and to be honest I would avoid it, but you did not state what specifically it is that you want to do, and why.
Not sure if that info will help you in your implementation but it wasn't mentioned so far and thought I would contribute it.

No, we cannot. div is rendered by a browser based on CSS/JS. in a different browsers it can be different (IE, Firefox). It does not depends of PHP.

In case you are using text inside the div you could use strlen() to have some kind of measurement of height. I am saying some kind ofcourse because you are just counting the number of characters which then can be equated to some height depending on the font-size of the text, the line-height, the width of the div.
Lets say one screenheight can output 2000 characters on your website
If you count 4000 characters you have 2 screenheigths.
954 characters = almost half of a screenheight ...
i have used this method once to calculate the amount of ads i could implement in the sidebanners on a blog styled website with mainly textcontent on it ...
The height of a vertical ad was about one screenheight. If the text that needed to be outputted was 7000 characters i knew i had room for 3 ads ...

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.