How can I get PHP to parse control characters? - php

I have a PHP site and a cron job which runs to update the DB for the site. The cron reads a CSV file which is uploaded by a third party. Recently this cron job stopped working correctly. After some investigation, I've discovered that the problem is in the CSV file. The problem is that the new line character in the CSV has changed from the standard "\n" to the older ASCII "^M" and PHP doesn't seem to recognise this as a new line so instead of seeing the CSV as multiline, it is seeing it as one single line of info. I have only been able to see this difference in the Command Line text apps less and vim. Does anyone know of a way to get PHP to recognise these new line characters?
By way of an example, the incorrect CSV file looks similar to this in vim:
Heading 1,Heading 2,Heading 3,^MInfo 1-1,Info 1-2,Info 1-3,^MInfo 2-1,Info 2-2,Info 2-3,^M^M
Whereas the older (correct) version displays like this in vim:
Heading 1,Heading 2,Heading 3,
Info 1-1,Info 1-2,Info 1-3,
Info 2-1,Info 2-2.Info 2-3,

Set the auto_detect_line_endings option appropriately.

Can you not change your code to look for ^M or replace ^M with \n?
str_replace("^M", "\n", $input);

Related

CSV file line count not working in PHP

I have a webpage that needs to count the number of lines in a CSV file, but the following code isn't working:
$linecount = count(file("sample.csv"));
var_dump($linecount);
When I run this code, the code returns the number 1, but there are 8 lines in sample.csv. Does anybody know why this is happening and how to fix it?
If the sample.csv file created in mac/linux you might want to consider setting auto_detect_line_endings to ON.
From the manual:
auto_detect_line_endings boolean
When turned on, PHP will examine the data read by fgets() and file() to see if it is using
Unix, MS-Dos or Macintosh line-ending conventions.
Another option (if you don't want to use this) is to read the file and split the lines by all new-line options (\r\n|\r|\n):
$linecount = count(preg_split("/\r\n|\r|\n/", file_get_contents("sample.csv")));

Issue with the command exec on php

I've an issue while using exec on php, maybe I didn't understand how it works.
The problem is: I wanna execute a binary file named 'test.exe'. That program takes data from an input file 'input.xml' and create an new file 'output.xml' with some modification on those data.
The below command line works perfectly on windows cmd :
>cd C:\Test
>test.exe C:\XML\example.xml C:\XML\example.out.xml
But a php script like this :
exec('C://Test//test.exe C://XML//example.xml C://XML//example.out.xml');
Doesn't work like suspected;
Each time I get beside the empty generated file example.out.xml another file named gmon.out.
I don't know what kind of file it is. Is it possible that this file will be the source of my problem?
Any idea?
You're using the wrong slash, you should be using a double backslash (one to be printed, and another to escape it). Using // will be literally translated to // so PHP will try to find C://Test//test.exe and fail.
The corrected syntax would be either
exec('C:\\Test\\test.exe C:\\XML\\example.xml C:\\XML\\example.out.xml');
or
exec('C:/Test/test.exe C:/XML/example.xml C:/XML/example.out.xml');

Is there a limit on the length of command passed to exec in PHP?

Currently I need to merge that 50+ PDF files into 1 PDF. I am using PDFTK. Using the guide from: http://www.johnboy.com/blog/merge-multiple-pdf-files-with-php
But it is not working. I have verified the following:
I have tried the command to merge 2 pdfs from my PHP and it is working.
I have echo the final command and copied that command and paste into command prompt and run manually and all the 50 PDFs are successfully merged.
Thus exec in my PHP and the command to merge 50 PDFs are both correct but it is not working when done together in PHP. I have also stated set_time_limit(0) to prevent any timeout but still not working.
Any idea what's wrong?
You can try to find out yourself:
print exec(str_repeat(' ', 5000) . 'whoami');
I think it's 8192, at least on my system, because it fails with strings larger than 10K, but it still works with strings shorter than 7K
I am not sure if there is a length restriction on how long a single command can be but I am pretty sure you can split it accross multiple lines with "\" just to check if thats the problem. Again I dont think it is... Is there any error output when you try to run the full command with PHP and exec, also try system() instead of exec().
PDFTK versions prior to 1.45 are limited to merge 26 files cuz use "handles"
/* Collate scanned pages sample */
pdftk A=even.pdf B=odd.pdf shuffle A B output collated.pdf
as you can see "A" and "B" are "handles", but should be a single upper-case letter, so only A-Z can be used, if u reach that limit, maybe you script outputs an error like
Error: Handle can only be a single, upper-case letter
but in 1.45 this limitation was removed, changelog extract
You can now use multi-character input handles. Prior versions were
limited to a single character, imposing an arbitrary limitation on
the number of input PDFs when using handles. Handles still must be all
upper-case ASCII.
maybe you only need update your lib ;)

fgetcsv returns too many entries

I have the following code:
while (!feof($file)) {
$arrayOfIdToBodyPart = fgetcsv($file,0, "\t");
if (count($arrayOfIdToBodyPart)==2){
the problem is, the contents of the file look like this:
39 ankle
40 tibia
41 Vastus Intermedius
and so on
sometimes, the test in the if will show three entries, with the first being the number, the second being the name, and the third being just... emtpy.
This causes the if block to fail, and me to be sad. I know i can just make the if block test for >=2, but is there any way i can get it to just recognise the fact that there are two items? I don't like that the fgetcsv is finding "mystery" characters at the end of the line.
Is this possibly a unix server running a windows-based file error? If so, and i'm running an ubuntu server without dos2unix, where do i get it?
You probably have tabs at the end of a line:
value<tab>value<tab><newline>
If that's the case, dos2unix won't help you. You might have to do something like read each line into a variable, trim() the variable, and then use str_getcsv() to split it.
Is it possible that you have a tab at the end of those lines? They are invisible and often hard to spot... you might want to double check.
Also if you are working with csv files, while you are running windows locally and the server is unix, I found this line:
ini_set('auto_detect_line_endings', true);
saves a lot of headaches.

Why might my PHP log file not entirely be text?

I'm trying to debug a plugin-bloated Wordpress installation; so I've added a very simple homebrew logger that records all the callbacks, which are basically listed in a single, ultimately 250+ row multidimensional array in Wordpress (I can't use print_r() because I need to catch them right before they are called).
My logger line is $logger->log("\t" . $callback . "\n");
The logger produces a dandy text file in normal situations, but at two points during this particular task it is adding something which causes my log file to no longer be encoded properly. Gedit (I'm on Ubuntu) won't open the file, claiming to not understand the encoding. In vim, the culprit corrupt callback (which I could not find in the debugger, looking at the array) is about in the middle and printed as ^#lambda_546 and at the end of file there's this cute guy ^M. The ^M and ^# are blue in my vim, which has no color theme set for .txt files. I don't know what it means.
I tried adding an is_string($callback) condition, but I get the same results.
Any ideas?
^# is a NUL character (\0) and ^M is a CR (\r). No idea why they're being generated though. You'd have to muck through the source and database to find out. geany should be able to open the file easily enough though.
Seems these cute guys are a result of your callback formatting for windows.
Mystery over. One of the callbacks was an anonymous function. Investigating the PHP create_function documentation, I saw that a commenter had noted that the created function has a name like so: chr(0) . lambda_n. Thanks PHP.
As for the \r. Well, that is more embarrassing. My logger reused some older code that I previously written which did end lines in \r\n.

Categories