PHP Removing Windows ^M Character - php

I have a CSV I am downloading from a source I'm not in control of and the end of each line is a
^M
character when printed to a bash terminal. How can I sanitize this input programmatically in PHP?

What you're seeing is a Windows control character. To get rid of this in PHP, what you need to do is
$file = str_ireplace("\x0D", "", $file)
this will work whether hexadecimal is lowercase or uppercase.

You can also ask PHP to auto detect any weird line endings by just adding in this line before reading the CSV file and you won't be required to do anything else.
ini_set('auto_detect_line_endings', true);

^M is a carriage return, you should be able to remove it with:
$string = str_replace( "\r", "", $string);

Related

Replacing smart quote with regular quote causes entire string to be erased

I have some files that were originally RTF files. They were opened with Microsoft Word 2016 and saved as .txt files. No other changes were made to the files. They were transferred to a Linux system.
When using the command:
file myfile.txt on the Linux they are showing as Non-ISO extended-ASCII text, with CRLF line terminators.
I am reading the files into PHP and processing them line by line. I am trying to replace any right smart-quotes with regular single quotes, but my entire string is being erased.
My code looks like this:
$text = "I can’t go for supper";
$text = preg_replace('/\x{2019}/u', "'", $text);
echo $text;
The apostrophe here is a right smart quote which shows up in Vim as <92>. Upon researching on the web, I have discovered this is actually unicode character 2019.
However, when I try to display the new value of $text nothing is displayed.
What is wrong with my code and why is it wiping out the entire string of text?
Upon further research, I have determined that character code <92> is specific to the Windows-1252 character encoding. I first needed to convert it to UTF-8 before I was able to manipulate the string.
The following code works correctly:
$text = "I can’t go for supper";
$text = iconv("Windows-1251", "UTF-8", $text);
$text = preg_replace('/\x{2019}/u', "'", $text);
echo $text;

PHP trim non standard characters from the string

I am trying to save an XML file with a string pulled out of a text file (which is actually a converted PDF to TXT file). In CMD (php.exe) the echo command shows the string normally, without any extra characters, but in an XML file I get a different input.
This is the string that I am trying to save.
Ponedjeljak
In CMD it shows it like this
Ponedjeljak\n
While in XML the string is stored with some extra characters, like this
Ponedjeljak
I have tried using preg_replace like this
preg_replace("/&#\\d+;|\n/", "", $dan);
But the string and the extra line are still saved in the XML. What am I doing wrong here and why is it saving the extra characters in the XML file? Both PHP and XML files are in UTF-8 encoding.
Try this:
$string = str_replace(array("\n", "\r"), '', $string);

Converting ^M Character to Whitespace without Line Break using PHP

I'm trying to convert ^M character to a white space character, but having hard time doing this.
In PHP, I used "wb" option so, it wouldn't write DOS character into the file. fopen("file.csv", "wb")
It was successful, but still has line breaks instead of ^M
$fp = fopen("file.csv", "wb");
$description =nl2br( $product->getShortDescription());
$line .= $description . $other_variables . "\n";
fputs($fp, $line);
but I still see line break within the description, is there any way to remove ^M and replace it with possibly a whitespace.
Also used dos2unix, when it was in regular file "w" mode. It removes all ^M characters, but the file still has line breaks where there was ^M. I really need it to be all on one line for my CSV file to work.
Thank you.
I think you're asking how to remove all the newline/line feed/carriage return characters from the description. If so:
$description =str_replace(array("\r", "\n"), '', nl2br($product->getShortDescription());

PHP Generate Ascii file?

I am generating a CSV file but the people who are processing these file tells me it needs to be in ASCII format?? How do I go about to make that?
This is what I have to generate the file:
$filename = '/logs/'.date('Ymd').'.txt';
$myfile = fopen($filename,'a');
fwrite($myfile, $data);
fclose($myfile);
This file generates fine and opens fine...everything is ok to the naked eye but they said it needs to be in ascii format...
Output of file:
"","932-4","Mike","Tanner","","1234 Testing Lane","","Los Angeles","CA","90066","","(993)857-7727","","","","SALE","","","V","4111111111111111","01/14","AXLW","","ZENC","","","REG","","511.80","","07/21/11","932-359","D1234","4","","1","","","","","","","Tanner","Mike","","1234 Testing Lane","","CA","Los Angeles","90066","","CC","","","","Y","100.00","","100.00","","","","","","","","Y","11.8","info#info.com","359","001","001","(993)857-7727","(993)857-7727","","","","","","","","","","","","","","","","","","","","222","","","","","","","","","","","","","",
Anyone?
Thanks...
I'm going to play Carnac the Magnificent and say that you're just using a line-feed (ascii 10, aka \n) to terminate each line. I'll bet they want carriage-return plus line-feed (ascii 13,10). Just a wild guess. :)
ANSI = Windows-1252, so probably: $data = iconv("windows-1252","ASCII",$data);

strange characters ^M php can not identify

I open a file (saved as ISO 8859-1) using the terminal (Ubuntu) and see where new lines should be the following character ^M (surrounded by XX before and after).
Now, I run this code in php to see how PHP handles that:
$text=str_split($text);
var_dump($text);
in the var_dump I see only an array with size 4 and only the 'X' in it.
Any idea what is going on in there?
EDIT: open office translates this ^M correctly to a new line.
ANOTHER EDIT:
The following code changes nothing. echo str_replace("\r","XXXXXX",$text);
I run this before the str_split
^M is not a newline. ^J is a newline. ^M is the character that Windows uses before a newline to show that it causes a line break. It is also called a "carriage return". The escape sequence for it is \r.

Categories