I am working with a fairly large, complex spreadsheet (there are 6 sheets, each with 200-400 rows) and am having trouble getting the correct values out of some cells.
My workflow is roughly:
User data is inputted on front-end
Data is validated and then placed into certain cells on the spreadsheet
Calculations in other cells reference the user-input cells
I use getCalculatedValue on particular cells to retrieve the necessary values
For debug purposes I then save out the modified spreadsheet so that I can easily see that the data has been inputted and generated correctly.
PHPExcel has been working great, but I have ran into an issue where the getCalculatedValue method (step 4) is returning an incorrect value, but when I inspect the spreadsheet that has been saved out (step 5) the values are correct.
The calculations consist of general mathematical equations, IF conditions, some date manipulation and multiple VLOOKUPs.
I am currently picking my way through the calculations in order to trace the issue, but was wondering if there may be a simpler solution to this that I am not aware of. Perhaps some setting that affects the outcome of various different calculations? This may even be a subtle change in calculations that is subsequently snow-balling into a bigger change further down the line.
Thanks in advance.
Turned out to be a syntax error in the spreadsheet that I was provided.
A round function was being used like so:
ROUND(NUMBER,)
Excel compensated for this by using 0 as the second parameter, whereas PHPExcel (quite correctly) didn't.
Related
I need to generate some pretty large excel files, and I was thinking of switching from PHPExcel to spout, since it seems to be much more efficient. I have been able to find every feature I needed, except one: how to format a cell as date. It seems to think that by default everything is a string. For numbers I have found that using intval() or floatval() forces it to consider the value a number, but is there anything similar for dates?
The only workaround I have found so far is to convert the date to a number using (strtotime($datestr)/86400)+25569.4167 , but then you have to manually format the column as a date after exporting the file, but the users will not accept that.
There is no way to format a cell as a date for now. You can always pass a date string (like "03/03/2017"); Excel is usually pretty good at recognizing that this is a date.
Your workaround indeed requires a manual step to configure the column as a date, so I would not recommend doing this.
In the end, I have found this commit on github https://github.com/box/spout/pull/209 where they add the option to format dates and, amongst other things, to format cells individually. I know this is not an official release, and so it is "use at your own risk", but for me it was just what I needed, so I thought to add the link just in case someone else is in the same situation. Warning, though, it does break setting the background color for both a cell and a row, but in my case that wasn't a problem.
I would like to scrape a text-file which is the output from Oracle AP. I don't have access to Oracle, but need to assist in bug hunting and compare text-file against two csv-files from other systems. Importing the csv-files into a database is not a problem, but I'm struggling with this text-file.
The text-file is divided in two parts. What is successfully imported, and what is rejected. Each column has a specific width set by Oracle when creating the report. They will not change the setting for column width. If content of a column exceeds the width it simply continues on the row below. And columns for imported and rejected are not 100% the same.
For the successful imports it's simple, as there is one version of every row, but the rejected one might have more than one row for different reasons.
The import file is shortened and obfuscated for obvious reasons, as it can be several thousands of lines. It's best viewed in a text editor without word-wrap. I cannot get it to look any good in this forum with blockquote or code sample in forum editor, so please view/copy it from links below.
I'm showing the successful ones on regex101.com here.
Regex finding the imported (I'm sure it could be better, but it works and that is good enough for me):
\s(\d+)\s+([\D]{2,})(\d+)\s+(\d{1,2}-[a-zA-Z]{3}-\d{2})\s+(\w+)\s+([\w+\,]*\.\d+)\s+(\d)\s+([\w+\,]*\.\d+)\s+(\d{1,2}-[a-zA-Z]{3}-\d{2})
I'm struggling with the rejected ones however, due to the variations.
Duplicate invoice number, if there are more than one reason (column) for not being imported.
Missing supplier number and supplier name (always shows up in pair).
Here is what I'm done so far with the rejected ones.
Regex finding rejected:
^\s(\d+)\s+([\D]{2,})(\d+)\s+(\d{1,2}-[a-zA-Z]{3}-\d{2})\s+(\w+)\s+(-?[\w]{1,}\.?\d+)\s+
Clearly my regex for rejected is not the final result. It's crap at the moment. It would even scrape a successful row.
My questions:
Is it possible to have only one regex for rejected catching the variations mentioned in bullet points above? Example would be appreciated.
Is it possible to fetch the word-wrapped parts of a column? Example would be appreciated.
I'm trying to understand the PCRE documentation regarding conditionals as it might be of help when dealing with the rejected variations, but so far I'm struggling with it.
Regards,
Bjørn
I'm trying to export a CSV with php/mysql, when an amount is in dollars the amount is taken as text.
Is there a work around to have these amounts taken as numbers without additional excel formatting from my php program?
Edit:
The following code is where the amounts are given their respective amounts.
switch($currency)
{
case 'Eur':
$symbol=iconv("UTF-8", "cp1252", "€");
break;
case 'USD':
//$symbol=chr(36);//"$";
$symbol=iconv("UTF-8", "cp1252", "$");
break;
case 'GBP':
//$symbol="£";//chr(163)//;
$symbol=iconv("UTF-8", "cp1252", "£");
break;
case 'EUR':
$symbol=iconv("UTF-8", "cp1252", "€");
break;
}
return $symbol."".$number;
The return is fed to the csv.
Obviously Euro and Pounds are working correctly but Dollar isn't I suspect becouse of the absolute reference function it has.
It's excel who converts the value to text. This all depends on the regional settings of windows. There's no way around it. (As far as I know)
For Example: When I set Standard and formats to English (United States), the cell is formatted as currency (cell value = 1 with $ as currency) and when set to Dutch (Netherlands), the cell is formatted as general (cell value = $1 as text).
I wouldn't really advise storing formatting in CSV files. Although it can cause more work it also leads to more flexibility if you store your data raw. You can always send raw data and convert afterwards.
Keep in mind that CSV is not an Excel format. By forcing your CSV to be Excel compliant, you could even be causing issues for someone who is, say, trying to import CSV onto the web, or something along those lines. Formatting that works for one project (or country, as dn Fer has shown in his answer) may not work for another in the same way.
I think a more important question than "how can I do this?" is, "why should I do this?" If you are going to provide an "Excel-friendly" pre-formatted program then you should offer it in addition to an un-formatted raw CSV file. If you are only going to choose one, let the end user handle the formatting themselves. The alternative will inevitably cause you more issues than it is worth.
My suggestion is to let the column headers do the explaining for you. Put the name of the currency and even the symbol at the top of each column head if you want, and put the totals underneath in raw number form. Your end user should have no trouble formatting it themselves in whatever program they choose to open it with, and it will have far less chance of not converting correctly.
Conversely, you can consider creating an XML file. While it will be much more high-powered if you want it to work in multiple file formats, it will allow you to format as far as your imagination/documentation/experimentation/testing gets you. If you choose to go this route here are some resources I have found:
xml - Foreign currency to Excel xslt (SO)
Features and limitations of Excel spreadsheet format
In the end, I would strongly recommend going the CSV route first. If you want to make something more robust, you can begin developing something different. XML is probably a better format as modern spreadsheets support some version of XML standard. The trade off is that there is time. But in the end, nothing is impossible - just expensive!
How to best choose a size for a varchar/text/... column in a (mysql) database (let's assume the text the user can type into a text area should be max 500 chars), considering that the user also might use formatting (html/bb code/...), which is not visible to the user and should not affect the max 500 chars text size...??
1) theoretically, to prevent any error, the varchar size has to be almost endless, if the user e.g. uses 20 links like this (http://[huge number of chars]) or whatever... - or not?
2) should/could you save formatting in a separate column, to e.g. not give an index (like FULLTEXT) wrong values (words that are contained in formatting but not in the real text)?
If yes, how to best do this? do you remember at which point the formatting was used, save this point and the formatting and when outputting put this information together?
(php/mysql, java script, jquery)
Thank you very much in advance!
A good solution is to consider in the amount of formatting characters.
If you do not, to avoid data loss, you need to use much more space for the text on the database and check the length of prior record before save or use full text.
Keep the same data twice in one table is not a good solution, it all depends on your project, but usually better it's filter formating on php.
Short question: How do I automatically detect whether a CSV file has headers in the first row?
Details: I've written a small CSV parsing engine that places the data into an object that I can access as (approximately) an in-memory database. The original code was written to parse third-party CSV with a predictable format, but I'd like to be able to use this code more generally.
I'm trying to figure out a reliable way to automatically detect the presence of CSV headers, so the script can decide whether to use the first row of the CSV file as keys / column names or start parsing data immediately. Since all I need is a boolean test, I could easily specify an argument after inspecting the CSV file myself, but I'd rather not have to (go go automation).
I imagine I'd have to parse the first 3 to ? rows of the CSV file and look for a pattern of some sort to compare against the headers. I'm having nightmares of three particularly bad cases in which:
The headers include numeric data for some reason
The first few rows (or large portions of the CSV) are null
There headers and data look too similar to tell them apart
If I can get a "best guess" and have the parser fail with an error or spit out a warning if it can't decide, that's OK. If this is something that's going to be tremendously expensive in terms of time or computation (and take more time than it's supposed to save me) I'll happily scrap the idea and go back to working on "important things".
I'm working with PHP, but this strikes me as more of an algorithmic / computational question than something that's implementation-specific. If there's a simple algorithm I can use, great. If you can point me to some relevant theory / discussion, that'd be great, too. If there's a giant library that does natural language processing or 300 different kinds of parsing, I'm not interested.
As others have pointed out, you can't do this with 100% reliability. There are cases where getting it 'mostly right' is useful, however - for example, spreadsheet tools with CSV import functionality often try to figure this out on their own. Here's a few heuristics that would tend to indicate the first line isn't a header:
The first row has columns that are not strings or are empty
The first row's columns are not all unique
The first row appears to contain dates or other common data formats (eg, xx-xx-xx)
In the most general sense, this is impossible. This is a valid csv file:
Name
Jim
Tom
Bill
Most csv readers will just take hasHeader as an option, and allow you to pass in your own header if you want. Even in the case you think you can detect, that being character headers and numeric data, you can run into a catastrophic failure. What if your column is a list of BMW series?
M
3
5
7
You will process this incorrectly. Worst of all, you will lose the best car!
In the purely abstract sense, I don't think there is an foolproof algorithmic answer to your question since it boils down to: "How do I distinguish dataA from dataB if I know nothing about either of them?". There will always be the potential for dataA to be indistinguishable from dataB. That said, I would start with the simple and only add complexity as needed. For example, if examining the first five rows, for a given column (or columns) if the datatype in rows 2-5 are all the same but differ from the datatype in row 1, there's a good chance that a header row is present (increased sample sizes reduce the possibility of error). This would (sorta) solve #1/#3 - perhaps throw an exception if the rows are all populated but the data is indistinguishable to allow the calling program to decide what to do next. For #2, simply don't count a row as a row unless and until it pulls non-null data....that would work in all but an empty file (in which case you'd hit EOF). It would never be foolproof, but it might be "close enough".
It really depends on just how "general" you want your tool to be. If the data will always be numeric, you have it easy as long as you assume non-numeric headers (which seems like a pretty fair assumption).
But beyond that, if you don't already know what patterns are present in the data, then you can't really test for them ahead of time.
FWIW, I actually just wrote a script for parsing out some stuff from TSVs, all from the same source. The source's approach to headers/formatting was so scattered that it made sense to just make the script ask me questions from the command line while executing. (Is this a header? Which columns are important?). So no automation, but it let's me fly through the data sets I'm working on, instead of trying to anticipate each funny formatting case. Also, my answers are saved in a file, so I only have to be involved once per file. Not ideal, but efficient.
This article provides some good guidance:
Basically, you do statistical analysis on columns based on whether the first row contains a string and the rest of the rows numbers, or something like that.
http://penndsg.com/blog/detect-headers/
If you CSV has a header like this.
ID, Name, Email, Date
1, john, john#john.com, 12 jan 2020
Then doing a filter_var(str, FILTER_VALIDATE_EMAIL) on the header row will fail. Since the email address is only in the row data. So check header row for an email address (assuming your CSV has email addresses in it).
Second idea.
http://php.net/manual/en/function.is-numeric.php
Check header row for is_numeric, most likely a header row does not have numeric data in it. But most likely a data row would have numeric data.
If you know you have dates in your columns, then checking the header row for a date would also work.
Obviously you need to what type of data you are expecting. I am "expecting" email addresses.