I'm working with PDFLib (9.0.4) with PHP 5.5 to create a very large table with a lot of very small cells.
I'm aware that PDFLib use a special algorithm to fit the table into a specified space. I would like to know how to prevent some cells to shrink.
My current problem is that some time to time, certain cells are not shrinked the same way as others while those cells are empty.
I tried to play with the column witdh, margins, ... nothing really worked.
I tried to play with horshrinking and vertshrinking options when I call the PDF_fit_table function. But those options are too general.
I'm looking for a way to prevent only certain cells to shrink.
Thank you for your time.
EDIT
This is not a problem of shrinking cell but stretching instead.
I've got the answer to my problem : you just have to specify vershrinklimit and horshrinklimit option into the PDF_fit_table function.
But I've post the wrong question. My problem is not a shrinking cell but un stretching one. My cell is empty and fit_table stretched it while it's not wanted.
I've found the solution to my problem.
by default, in case the parameter was not provided, I was pre-pending a colwidth for every cells ... even for colspaned cells.
So, if one of my table chunks were ending by colspaned cell with a colwidth specified, the colwidth were applied to every other cells above.
The solution was to test if the cell options string was containing the colspan parameter. If so, no colwidth parameter is prepend.
Related
I need to generate some pretty large excel files, and I was thinking of switching from PHPExcel to spout, since it seems to be much more efficient. I have been able to find every feature I needed, except one: how to format a cell as date. It seems to think that by default everything is a string. For numbers I have found that using intval() or floatval() forces it to consider the value a number, but is there anything similar for dates?
The only workaround I have found so far is to convert the date to a number using (strtotime($datestr)/86400)+25569.4167 , but then you have to manually format the column as a date after exporting the file, but the users will not accept that.
There is no way to format a cell as a date for now. You can always pass a date string (like "03/03/2017"); Excel is usually pretty good at recognizing that this is a date.
Your workaround indeed requires a manual step to configure the column as a date, so I would not recommend doing this.
In the end, I have found this commit on github https://github.com/box/spout/pull/209 where they add the option to format dates and, amongst other things, to format cells individually. I know this is not an official release, and so it is "use at your own risk", but for me it was just what I needed, so I thought to add the link just in case someone else is in the same situation. Warning, though, it does break setting the background color for both a cell and a row, but in my case that wasn't a problem.
I am working with a fairly large, complex spreadsheet (there are 6 sheets, each with 200-400 rows) and am having trouble getting the correct values out of some cells.
My workflow is roughly:
User data is inputted on front-end
Data is validated and then placed into certain cells on the spreadsheet
Calculations in other cells reference the user-input cells
I use getCalculatedValue on particular cells to retrieve the necessary values
For debug purposes I then save out the modified spreadsheet so that I can easily see that the data has been inputted and generated correctly.
PHPExcel has been working great, but I have ran into an issue where the getCalculatedValue method (step 4) is returning an incorrect value, but when I inspect the spreadsheet that has been saved out (step 5) the values are correct.
The calculations consist of general mathematical equations, IF conditions, some date manipulation and multiple VLOOKUPs.
I am currently picking my way through the calculations in order to trace the issue, but was wondering if there may be a simpler solution to this that I am not aware of. Perhaps some setting that affects the outcome of various different calculations? This may even be a subtle change in calculations that is subsequently snow-balling into a bigger change further down the line.
Thanks in advance.
Turned out to be a syntax error in the spreadsheet that I was provided.
A round function was being used like so:
ROUND(NUMBER,)
Excel compensated for this by using 0 as the second parameter, whereas PHPExcel (quite correctly) didn't.
How to best choose a size for a varchar/text/... column in a (mysql) database (let's assume the text the user can type into a text area should be max 500 chars), considering that the user also might use formatting (html/bb code/...), which is not visible to the user and should not affect the max 500 chars text size...??
1) theoretically, to prevent any error, the varchar size has to be almost endless, if the user e.g. uses 20 links like this (http://[huge number of chars]) or whatever... - or not?
2) should/could you save formatting in a separate column, to e.g. not give an index (like FULLTEXT) wrong values (words that are contained in formatting but not in the real text)?
If yes, how to best do this? do you remember at which point the formatting was used, save this point and the formatting and when outputting put this information together?
(php/mysql, java script, jquery)
Thank you very much in advance!
A good solution is to consider in the amount of formatting characters.
If you do not, to avoid data loss, you need to use much more space for the text on the database and check the length of prior record before save or use full text.
Keep the same data twice in one table is not a good solution, it all depends on your project, but usually better it's filter formating on php.
Do you know of a good solution for dealing with text entries of variable lengths lists with a fixed width?
I have an unordered list with a set number of items. Each item contains a title. The title can be variable length, not only in number of characters, but also number of characterset (korean, japanese, roman, etc.).
One option seems to be cutting the text length down with PHP and adding "..." at the end, but since character widths can be variable, the exact cutoff can also be variable. Another would be to make items fixed with and hide overflow, but this seems inelegant (because characters might get cutoff right in their centers...).
Do you know of a good tutorial or solution for something like this?
Using CSS: text-overflow: ellipsis. Doesn't work in all browsers though. More information: http://www.quirksmode.org/css/textoverflow.html.
Short question: How do I automatically detect whether a CSV file has headers in the first row?
Details: I've written a small CSV parsing engine that places the data into an object that I can access as (approximately) an in-memory database. The original code was written to parse third-party CSV with a predictable format, but I'd like to be able to use this code more generally.
I'm trying to figure out a reliable way to automatically detect the presence of CSV headers, so the script can decide whether to use the first row of the CSV file as keys / column names or start parsing data immediately. Since all I need is a boolean test, I could easily specify an argument after inspecting the CSV file myself, but I'd rather not have to (go go automation).
I imagine I'd have to parse the first 3 to ? rows of the CSV file and look for a pattern of some sort to compare against the headers. I'm having nightmares of three particularly bad cases in which:
The headers include numeric data for some reason
The first few rows (or large portions of the CSV) are null
There headers and data look too similar to tell them apart
If I can get a "best guess" and have the parser fail with an error or spit out a warning if it can't decide, that's OK. If this is something that's going to be tremendously expensive in terms of time or computation (and take more time than it's supposed to save me) I'll happily scrap the idea and go back to working on "important things".
I'm working with PHP, but this strikes me as more of an algorithmic / computational question than something that's implementation-specific. If there's a simple algorithm I can use, great. If you can point me to some relevant theory / discussion, that'd be great, too. If there's a giant library that does natural language processing or 300 different kinds of parsing, I'm not interested.
As others have pointed out, you can't do this with 100% reliability. There are cases where getting it 'mostly right' is useful, however - for example, spreadsheet tools with CSV import functionality often try to figure this out on their own. Here's a few heuristics that would tend to indicate the first line isn't a header:
The first row has columns that are not strings or are empty
The first row's columns are not all unique
The first row appears to contain dates or other common data formats (eg, xx-xx-xx)
In the most general sense, this is impossible. This is a valid csv file:
Name
Jim
Tom
Bill
Most csv readers will just take hasHeader as an option, and allow you to pass in your own header if you want. Even in the case you think you can detect, that being character headers and numeric data, you can run into a catastrophic failure. What if your column is a list of BMW series?
M
3
5
7
You will process this incorrectly. Worst of all, you will lose the best car!
In the purely abstract sense, I don't think there is an foolproof algorithmic answer to your question since it boils down to: "How do I distinguish dataA from dataB if I know nothing about either of them?". There will always be the potential for dataA to be indistinguishable from dataB. That said, I would start with the simple and only add complexity as needed. For example, if examining the first five rows, for a given column (or columns) if the datatype in rows 2-5 are all the same but differ from the datatype in row 1, there's a good chance that a header row is present (increased sample sizes reduce the possibility of error). This would (sorta) solve #1/#3 - perhaps throw an exception if the rows are all populated but the data is indistinguishable to allow the calling program to decide what to do next. For #2, simply don't count a row as a row unless and until it pulls non-null data....that would work in all but an empty file (in which case you'd hit EOF). It would never be foolproof, but it might be "close enough".
It really depends on just how "general" you want your tool to be. If the data will always be numeric, you have it easy as long as you assume non-numeric headers (which seems like a pretty fair assumption).
But beyond that, if you don't already know what patterns are present in the data, then you can't really test for them ahead of time.
FWIW, I actually just wrote a script for parsing out some stuff from TSVs, all from the same source. The source's approach to headers/formatting was so scattered that it made sense to just make the script ask me questions from the command line while executing. (Is this a header? Which columns are important?). So no automation, but it let's me fly through the data sets I'm working on, instead of trying to anticipate each funny formatting case. Also, my answers are saved in a file, so I only have to be involved once per file. Not ideal, but efficient.
This article provides some good guidance:
Basically, you do statistical analysis on columns based on whether the first row contains a string and the rest of the rows numbers, or something like that.
http://penndsg.com/blog/detect-headers/
If you CSV has a header like this.
ID, Name, Email, Date
1, john, john#john.com, 12 jan 2020
Then doing a filter_var(str, FILTER_VALIDATE_EMAIL) on the header row will fail. Since the email address is only in the row data. So check header row for an email address (assuming your CSV has email addresses in it).
Second idea.
http://php.net/manual/en/function.is-numeric.php
Check header row for is_numeric, most likely a header row does not have numeric data in it. But most likely a data row would have numeric data.
If you know you have dates in your columns, then checking the header row for a date would also work.
Obviously you need to what type of data you are expecting. I am "expecting" email addresses.