CSV read issue between Mac and Win - missing \n char - php

I have a basic csv file reading, but it can't read the csv saved on a Mac platform.
I understood that the issue is from Different operating system families have different line-ending conventions, but I cannot fix it.
I've found a suggestion - opening the file in binary mode, but didn't work.
The code is pretty basic:
file opening:
$this->fileHandler = fopen($this->filename, 'rb');
read line:
$columns = fgetcsv(
$this->fileHandler,
$this->length,
$this->delimiter,
$this->enclosure
);
I've opened both files with Notepad++ and it seems that the Mac file lacks the \n characters at the end of rows, but the \r is there.

Set the auto_detect_line_endings option to true before using fgetcsv():
ini_set("auto_detect_line_endings", true);
// rest of your code

I don't know if this applies to Mac, but I know that when moving a text file from Windows to some Linux flavors, I can run the dos2unix filename command on the file and it'll fix up the formatting for me. Maybe Mac has a similar functionality?
EDIT: Maybe this can help: http://schmeits.wordpress.com/2010/08/26/dos2unix-alternative-those-darn-m-characters/

Related

Hungarian/Bulgarian characters from CSV file end up garbled in PHP

I'm trying to import a CSV file which looks something like this:
"source "," destination "
férfi-/ruházat-Öltöny," férfi-/ruházat-blézer_zakó",
Note that this is just a sample of the CSV, not the whole CSV.
The way I'm reading the file is pretty straight forward:
$line = fgets($this->fileHandle) ;
$line = mb_convert_encoding($line , 'UTF-8', mb_detect_encoding($line));
Where $this->fileHandle is just a resource pointing to the file opened using fopen. So nothing too special there.
I want to do some string manipulation on the strings inside the CSV. I can import it just fine.
When I read from the file, either using fgets, fread or whatever other function I can think if I end up with garbled text.
Something along the lines of this:
So far I've tried mb_internal_encoding("UTF-8"), to ISO-8859-2 and a few other encodings. Nothing worked.
I've also tried mb_convert_encoding($line , 'UTF-8', mb_detect_encoding($line)) where $line is the line read from the csv.
Again, nothing. Still garbled text.
Next I assumed it may be something from my OS. I' using MAC with a docker instance on Ubuntu.
Using High Sierra v10.13.4 on mac
A locale command in the terminal gives me:
LANG="C.UTF-8"
LC_COLLATE="C"
LC_CTYPE="C"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL=
As far as the docker instance:
Distributor ID: Ubuntu
Description: Ubuntu 14.04.5 LTS
Release: 14.04
Codename: trusty
# locale
LANG=C.UTF-8
LANGUAGE=
LC_CTYPE="C.UTF-8"
LC_NUMERIC="C.UTF-8"
LC_TIME="C.UTF-8"
LC_COLLATE="C.UTF-8"
LC_MONETARY="C.UTF-8"
LC_MESSAGES="C.UTF-8"
LC_PAPER="C.UTF-8"
LC_NAME="C.UTF-8"
LC_ADDRESS="C.UTF-8"
LC_TELEPHONE="C.UTF-8"
LC_MEASUREMENT="C.UTF-8"
LC_IDENTIFICATION="C.UTF-8"
LC_ALL=
So everything seems to be fine in that regard.
I've also tried an online PHP interpreter and that works fine. So clearly the issue is on my side.
To be honest I have no idea where the issue lies.
Any pointing in the right direction is greatly appreciated.
To answer my own question:
I had to ini_set("default_charset", "UTF-8");. The default was an empty string.
I have no idea how it worked without it so far, I assume it has some sort of fallback encoding.
Either way, I hope this helps anybody else who gets stuck on this.

PHP's fgetcsv returning false at beginning of file

This is a PHP script running under Windows. It had been working but has recently stopped.
The file is opened and a valid file handle is returned: $fh = fopen($filename, 'r');
However, the very first time I call fgetcsv it returns false:
$headers = fgetcsv($fh, 6000, ',');
$line_no++;
if($headers === FALSE){
echo 'Error parsing file headers';
}
This is now happening on all csv files I try. Other changes I have tried to no avail are:
ini_set('auto_detect_line_endings', true); Right before opening the file
rewind($fh); Right after opening the file
Using both 0 or a number like 6000 for the second parameter, length.
Changing the file's line endings style from unix to Windows and Mac
It seems like something with Windows is causing this file not to parse.
Is there any way to return the actual error from fgetcsv? The documentation doesn't say there is, just that it returns false on any error. Are there other Windows settings that could be causing issues? The Windows security settings give everyone full control of the files.
The issue turned out to be that a change at the beginning of the script was using the same file as a lock file so the script wouldn't be run on the same file twice at the same time. Then, later in the script when I actually wanted to parse the file, I opened it again (which was successful), but then I couldn't actually read the contents.
The solution I used was to create a temporary lock file based on the filename instead of using the actual file. Eg: $filename.'.lock'
It was a silly mistake on my part, however it would have been much more helpful if PHP had returned or written an error/warning at some point.
The canonical way to debug this would be "print_r($headers)".
As fgetcsv returns an array, it must be empty or a non-array. If you can configure (or have configured) PHP to log errors to a known location (Windows with IIS would be "syslog" and should show up in the Event Viewer), you should be able to figure out what's wrong.

PHP problems when transfering code from Windows to OS X

I have recently bought a new MacBook Pro. Before I had my MacBook Pro I was working on a website on my desktop computer. And now I want to transfer this code to my new MacBook Pro.
The problem is that when I transfered the code (I put it on Dropbox and simply downloaded it on my MacBook Pro) I started to see lots of error messages in my PHP code.
The error message I”m receiving is:
Warning: Cannot modify header information - headers already sent by (output started at /some/file.php:1) in /some/file.php on line 23
I have done some research on this and it seems that this error is most frequently caused by a new line, simple whitespace or any output before the <?php sign. I have looked through all the places where I have cookies that are being sent in the HTTP request and also where I'm using the header() function. I haven’t detected any output or whitespace that possibly could interfere and cause this problem.
Noteworthy is that the error always says that the output is started at line 1. Which got me thinking if there is some kind of coding differences in the way that the Mac OS X and Windows operating systems handle new lines or white spaces? Or could the Dropbox transfer messed something up?
The code on one of the sites(login.php) which produces the error:
<?php
include "mysql_database.php";
login();
$id = $_SESSION['Loggedin'];
setcookie("login", $id, (time()+60*60*24*30));
header('Location: ' . $_SERVER['HTTP_REFERER']);
?>
login function:
function login() {
$connection = connecttodatabase();
$pass = "";
$user = "";
$query = "";
if (isset($_POST['user']) && $_POST['user'] != null) {
$user = $_POST['user'];
if (isset($_POST['pass']) && $_POST['pass'] != null) {
$pass = md5($_POST['pass']);
$query = "SELECT ID FROM Anvandare WHERE Nickname='$user' AND Password ='$pass'";
}
}
if ($query != "") {
$id = $connection->query($query);
$id = mysqli_fetch_assoc($id);
$id = $id['ID'];
$_SESSION['Loggedin'] = $id;
}
closeconnection($connection);
}
Complete error:
Warning: Cannot modify header information - headers already sent by (output started at /Users/name/GitHub/website/login.php:1) in /Users/namn/GitHub/website/login.php on line 9
Check if there are spaces in front of your php opening tag. Also try resaving the file from notepad++ using the windows (crlr) line endings. (Edit > EOL Conversion > Windows format)
Noteworthy is that the error always says that the output is started at
line 1. Which got me thinking if there is some kind of coding
differences in the way that the Mac OS X and Windows operating systems
handle new lines or white spaces? Or could the Dropbox transfer messed
something up?
Don’t redo your code or worry about the header() calls or even the cookie stuff. That is not the issue.
The issue is that Windows line endings are different from Mac line endings. More details here.
Different operating systems use different characters to mark the end
of line:
Unix / Linux / OS X uses LF (line feed, '\n', 0x0A)
Macs prior to OS X use CR (carriage return, '\r', 0x0D)
Windows / DOS uses CR+LF (carriage return followed by line feed, '\r\n', 0x0D0A)
And what happens in cases like this is the formatting of the page causes the PHP parser in Apache to choke on the files. Possibly sending content to the browser before you intend to when making header() calls or setting cookies. Meaning technically the screwed up line endings force a “header” to be sent because the file itself is outputting data to the browser inadvertently.
The solution might be to avoid using Dropbox & just copy the files onto a flash drive & transfer it that way. That’s an idea but I am not convinced that Dropbox was the culprit in this. Meaning the issue might still exist even if you copy the files to a flash drive.
Or if that does not work, do as the linked to article suggests & download a good text editing tool like TextWrangler. Just load the files into TextWrangler & then manually change the line endings so they are Mac (CR) and resave the files.
Another long-term solution to this issue might be to use a version control system like git coupled with an account on GitHub to manage your code. The benefit is by pushing code to GitHub & pulling code from GitHub, the process itself will deal with cross-platform line ending headaches. And you don’t need to worry about inadvertent oddities caused by a straight copy of files to a service like DropBox.
But again, pretty convinced this has nothing to do with Dropbox. It’s all about Windows line endings being different from Mac OS X line endings.
EDIT: There are some interesting ideas on how to handle the bulk conversion of Windows line endings to Mac OS X line endings on Mac OS X Hints. The most intrguing one is the use of zip and unzip to facilitate the process. I have not tried this, so caveat emptor! But it does sound like something worth testing since the last line states, BTW, it's the "-a" flag to unzip, that is causing the ascii files to have their lines endings converted.:
I've always used the following (in a file named fixascii):
#!/bin/sh
zip -qr foo.zip "$#" && unzip -aqo foo.zip && rm foo.zip
And then execute it as:
fixascii [files or directories to convert]
Which has the benefit over most of these other commands in that you
can point it with impunity at an entire directory tree and it will
process all the files in it and not corrupt any binaries that may
happen to have a string of bits in them that look like a line-ending.
I've seen too many times where someone corrupted a ton of images and
other binaries, when trying to fix line-endings on text files using
dos2unix or tr in combination with find but failed to ensure that only
text files were processed. Unzip figure out which files are ascii,
converts them, and leaves the binaries alone.
BTW, it's the "-a" flag to unzip, that is causing the ascii files to
have their lines endings converted.
And then looking in the official man page for unzip under the -a (convert text files) option; emphasis is mine:
Ordinarily all files are extracted exactly as
they are stored (as ''binary'' files). The -a option causes files
identified by zip as text files (those with the 't' label in zipinfo
listings, rather than 'b') to be automatically extracted as
such, converting line endings, end-of-file characters and the
character set itself as necessary. (For example, Unix files use
line feeds (LFs) for end-of-line (EOL) and have no end-of-file (EOF)
marker; Macintoshes use carriage returns (CRs) for EOLs; and most
PC operating systems use CR+LF for EOLs and control-Z for EOF. In
addition, IBM mainframes and the Michigan Terminal System use
EBCDIC rather than the more common ASCII character set, and NT
supports Unicode.) Note that zip's identification of text files
is by no means perfect; some ''text'' files may actually be binary
and vice versa. unzip therefore prints ''[text]'' or
''[binary]'' as a visual check for each file it extracts when using
the -a option. The -aa option forces all files to be extracted
as text, regardless of the supposed file type.
EDIT: Also, if you have access to a Linux machine, you might want to checkout dos2unix. More details here as well. And found another Stack Overflow question here.
Finally found an easy way to fix this! I was looking through the php.ini file when i came across an option which is named: auto_detect_line_endings, and has its default value set to: Off.
The description to this option is:
; If your scripts have to deal with files from Macintosh systems,
; or you are running on a Mac and need to deal with files from
; unix or win32 systems, setting this flag will cause PHP to
; automatically detect the EOL character in those files so that
; fgets() and file() will work regardless of the source of the file.
; http://php.net/auto-detect-line-endings
Which is exactly what i was looking for!
I simply used the ini_set() function at the beginning of my database file(which i load on every php page) and it seems to have solved the problem for me! The ini_set() function also returns the option changed in the php.ini file to normal when script is completed.
Full line of the ini_set() function that i used:
ini_set("auto_detect_line_endings", true);
Thanks for all your help guys!
More info on ini_set() function here: ini_set() function
More info on the auto_detect_line_endings option here: Auto detect line endings option

PHP files with extra lines inserted?

I work on a Windows 7 machine and Notepad++ for a number of tasks. I have noticed that when I work with someone on a Mac who tries to edit a file, and then I access it later, there are always extra lines, sometimes missing lines, white space is all crazy. Usually extra lines.
Sometimes, there fewer lines or code is just collapsed as if all white space were removed.
I'm certain there isn't a prank involved, as it has happened a number of times over the years. I'm just finally curious enough to ask if anyone knows what causes this?
That happens when you download a file that it's in Linux to a Windows throught FTP ASCII, you can download the files with FileZilla selecting:
Transfer -> Transfer Type -> Binary
That way the EOLs transfers just fine.
This is probably caused by the EOL or end-of-line character that you have selected in your editor. I also code on a Windows 7 machine, but have to push my files to UNIX where if I view the files, I will see ^M's or other strange characters in VI. If I recall correctly, go to Edit -> EOL Conversion and convert to UNIX/MAC. Just be sure to always set your EOL to UNIX and you shouldn't see the issue anymore.
Here's a link to a similar topic on SO:
https://stackoverflow.com/questions/2889163/eol-in-notepad-and-notepad
What may be happening is that the Mac user is encoding the file in a slightly different way. Notepad++ is readying the file, but is not expecting to have to handle a Mac encoded file -- thus it renders oddly.
For example, the software may be converting tabs to spaces. Another example is the special characters used between systems, such as:
\n = CR (Carriage Return) - used as a new line character in Unix
\r = LF (Line Feed) - used as a new line character in Mac OS
\n\r = CR + LF - used as a new line character in Windows
That's my thought.

PHP File Upload corrupted JPEGS

We have a web app using Andrew Valums ajax file uploader, if we kick off 5 - 10 image uploads at once, more often then not at least 2 or 3 will result in the same gd error "Corrupt JPEG data"
Warning: imagecreatefromjpeg() [function.imagecreatefromjpeg]:
gd-jpeg, libjpeg: recoverable error: Corrupt JPEG data:
47 extraneous bytes before marker 0xd9 in ....
However this did not happen on our old test server, or local development box's, only on our new production server.
The file size on the server is the same as the original on my local machine, so it completes the upload but I think the data is being corrupted by the server.
I can "fix" the broken files by deleting them and uploading again, or manually uploading via FTP
We had a shared host on Godaddy and just have started to have this issue on a new box (that I set up, so probably explains a lot :) CentOS 5.5+, Apache 2.2.3, PHP 5.2.10
You can see some example good and bad picture here. http://174.127.115.220/temp/pics.zip
When I BinDiffed them I see a consistent pattern the corruption is always 64 byte blocks, and while the distance between corrupted blocks is not constant the number 4356 comes up a lot.
I really think we can rule out the Internet as error checking and retransmission with TCP is pretty reliable, further there seems to be no difference between browser versions, or if I turn anti-virus and firewalls off.
So I'm picking configuration of Apache / PHP?
Some cameras will append some data inside the file that will get interpreted incorrectly (most likely do to character encoding with in the headers).
A solution I found was to read the file in binary mode like so
$fh = fopen('test.jpg', 'rb');
$str = '';
while($fh !== false && !feof($fh)){
$str .= fread($fh, 1024);
}
$test = #imagecreatefromstring($str);
imagepng($test,'save.png');
Well, i think the problem is jpeg-header data, and as far as i know there is nothing to do with it by PHP, i think the problem is your fileuploader, maybe there are some configuration for it that you are missing.
Hmm - a 64 byte corruption?...or did you mean 64 bit?
I'm going to suggest that the issue is in fact as a result of the PHP script. the problem that regularly comes up here is that the script inserts CRLFs into the data stream being uploaded, and is caused by differences between the Window/*nix standards.
Solution is to force the php script to upload in binary mode (use the +b switch for ALL fopen() commands in the php upload). It is safe to upload a text file in binary mode as at least you can still see the data.
Read here for more information on this issue:
http://us2.php.net/manual/en/function.fopen.php
This can be solved with:
ini_set ('gd.jpeg_ignore_warning', 1);
I had this problem with GoDaddy hosting.
I had created the database on GoDaddy using their cPanel interface. It was created as "latin collation" (or something like that). The database on the development server was UTF8. I've tried all solutions on this page, to no avail. Then I converted the database to UTF8, and it worked.
Database encoding shouldn't affect BLOB data (or so I would think). BLOB stands for BINARY Large Object (something...), to my knowledge!
Also, strangely, the data was copied from the dev to production server while the database was still "latin", and it was not corrupted at all. It's only when inserting new images that the problem appeared. So I guess the image data was being fed to MySQL as text data, and I think there is a way (when using SQL) of inserting binary data, and I did not follow it.
Edit: just took a look at the MySQL export script, here it is:
INSERT INTO ... VALUES (..., _binary 0xFFD8FF ...
Anyway, hope this will help someone. The OP did not indicate what solved his problem...

Categories