PHP - Unable to remove whitespace from string [duplicate] - php

This question already has answers here:
Trim whitespace ASCII character "194" from string
(7 answers)
Closed 6 months ago.
I have a piece of code that reads in values from a CSV file, all working grand and then each individual record is validated and used in an API call to an external source, all of this is working grand.
Now today I have a CSV file uploaded that when I open it it looks to have 3 spaces at the end of each entry in the CSV which causes the API call to fail.
I've tried using trim() but it doesn't do anything, I've also tried using preg_replace('/^((?=^)(\s*))|((\s*)(?>$))/si', '', $pValue) and this isn't having any affect either.
I opened the CSV file in Notepad++ and enabled it to show all symbols, there are clearly 3 spaces and the CR and LF.
Has anyone come across an issue like this before, code snippets below and an example on codepad.
EDIT:
Example: http://codepad.org/DfjzPcUU
Code snippets, if more is needed please let me know.
$vCSVData = self::getCSVData($vPendingFilePath);
foreach ($vCSVData AS $vKey => $vValue)
{
echo self::getNubmer($vValue);
...
}
getCSVData():
private function getCSVData($pFilePath)
{
return call_user_func_array('array_merge', array_map('str_getcsv', file($pFilePath, FILE_SKIP_EMPTY_LINES)));
}
CSV File Snippet:
Blue Table   
Green Chair  
Temp. Table   

As it turns out that what I thought was white space was actually ASCII 194 characters (Thanks to MikeB) for figuring this out, as trim, rtrim, preg_replace, str_replace with the normal " " (empty space) check didn't work the below snippet is what worked for me.
preg_replace("/[\xA0\xC2]/", "", "Table ")
xA0 is char 160 and xC2 is char 194.
Also you can use trim in this instance by using the below statement, using trim over preg_replace is slightly faster for processing time, with that said the overheads in each case and the difference are negligible, in my case either will suffice.
trim("Table ", "\xA0\xC2")
Example: http://codepad.org/8H5Ut2KA
I was able to find out what the 'space' was by using the below code snippet:
var_dump(ord($vString{5}));

Related

PHP - Can't Remove Carriage Return / Space [duplicate]

This question already has answers here:
What is HTML Entity '
'?
(2 answers)
Closed 5 years ago.
I am reading from a MySQL Database.
The field upc reads as:
811657019822
843018021328
I only want the first numbers; there is a space/carriage return and for some reason I cannot explode it out or trim it out. When I convert to XML it displays as:
<g:gtin>811657019822
843018021328</g:gtin>
Here is what I have tried in PHP and the result:
When I do a var_dump it shows this:
string(25) "811657019822
843018021328"
Notice how they are not all on one line?
It doesn't appear to be a line break as the XML returns a Carriage Return. Any ideas on what to try to remove everything after the first numbers?
UPDATE
As pointed out by #Don't Panic I have erroneously mistaken my slashes the wrong way and should of only been using \r.
This is what worked correctly:
explode("\r", $product['upc']);
Explode with '/r/n' won't work for a couple of reasons. For one you'd need to use a double quoted string, with backslashes instead of forward slashes, like "\r\n". But there isn't a \n, just an \r.
Try using
explode("\r", $yourString);

Opening an encoded file with PHP

I am opening a file on the server with PHP. The file seems ordinary. It opens in Notepad and Textedit on a PC. Even PHP can display it without any issue in a web browser when we echo out.
But when I try searching it with strpos() it can’t find anything except single characters. if i search for a string with 2 or more characters, it doesn’t find anything.
I have tried encoding it to UTF-8, and it detects it as ASCII. so everything seems right there.
I have also isolated the part of the file that I am trying to read down to only 250 characters. They all look fine on the screen.
But strpos can’t find it. I’ve run tests on every part of my code and I believe everything is fine with my code. The problem I believe derives from that the characters I see on the screen are not exactly matching what those characters really are.
My last resort is to write a function which converts each character into an integer array (if that’s even possible), and then convert all that back to a string. This way, we’ll know 100% that the characters we see are real.
Hoping that somebody has a better approach or perhaps an idea for something I missed?
I'll post the code below:
$content = file_get_contents($file->getPathname()); // get the file contents
$content = substr($content, 30, 300); // reduce the large file to just the first few lines
$content = htmlspecialchars($content); // try to remove any special characters from the file
$content = iconv('ASCII', 'UTF-8//IGNORE', $content); // encode to a friendly format
$string = "JobName"; // this is the string i'm searching for
if (strpos($content, $string) !== false) {
echo "bingo";
}
else {
echo " not found ";
}
Just to be clear, the file I'm opening is generated from a PC program that stores its data in .DAT format. Like I said, I can see and read the content very easily using any program, including PHP. but when I try to search, its as if it doesn't recognize the content at all.
I am not aware of how to upload a file on StackOverflow, but if someone can tell me how to do it then I will gladly post the file itself.
Thank you very much for your help ARKASCHA. I was able to find an online HexEditor and when I saw the characters, it seems there is a NUL character between every single character in this file. that's probably why I couldn't see it with a regular view. I just had to run an additional function to remove NUL characters from the file, and then it works as its supposed. Thanks again.

Line Breaks When Using PHP to Write a Javascript Alert [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Pass a PHP string to a Javascript variable (and escape newlines)
I am using javascript to do some form validation on a php page. There are several parts to it, but a typical one looks like this:
var brek=document.forms["seasonprice"]["price_brek"].value
if (brek==0)
{
alert("<?php echo $lang["brekzero"] ?>");
return true;
}
$lang['brekzero'] is defined in an included language file. As the text is fairly long I want to break it up onto several lines for better readability. I expected to be able to do that by inserting \n in the appropriate places. Instead it stops the alert working altogether, and it took me a long time to identify that as the cause.
What should I be doing instead?
The same script has also been giving me problems with accented letters. Writing á in the lang file results in á appearing in the alert instead of á. Writing á in the lang file causes strange symbol to appear in the alert. For the moment I've taken the very English way out, and left out the accents!
You need escaped \n characters:
echo str_replace("\n", "\\n", $lang["brekzero"]);

why is php trim is not really remove all whitespace and line breaks?

I am grabbing input from a file with the following code
$jap= str_replace("\n","",addslashes(strtolower(trim(fgets($fh), " \t\n\r"))));
i had also previously tried these while troubleshooting
$jap= str_replace("\n","",addslashes(strtolower(trim(fgets($fh)))));
$jap= addslashes(strtolower(trim(fgets($fh), " \t\n\r")));
and if I echo $jap it looks fine, so later in the code, without any other alterations to $jap it is inserted into the DB, however i noticed a comparison test that checks if this jap is already in the DB returned false when i can plainly see that a seemingly exact same entry of jap is in the DB. So I copy the jap entry that was inserted right from phpmyadmin or from my site where the jap is displayed and paste into a notepad i notice that it paste like this... (this is an exact paste into the below quotes)
"
バスにのって、うみへ行きました"
and obviously i need, it without that white space and breaks or whatever it is.
so as far as I can tell the trim is not doing what it says it will do. or im missing something here. if so what is it?
UPDATE:
with regards to Jacks answer
the preg_replace did not help but here is what i did, i used the
bin2hex() to determine that the part that "is not the part i want" is
efbbbf
i did this by taking $jap into str replace and removing the japanese i am expecting to find, and what is left goes into the bin2hex. and the result was the above "efbbbf"
echo bin2hex(str_replace("どちらがあなたの本ですか","",$jap));
output of the above was efbbbf
but what is it? can i make a str_replace to remove this somehow?
The trim function doesn't know about Unicode white spaces. You could try this:
preg_replace('/^\p{Z}+|\p{Z}+$/u', '', $str);
As taken from: Trim unicode whitespace in PHP 5.2
Otherwise, you can do a bin2hex() to find out what characters are being added at the front.
Update
Your file contains a UTF8 BOM; to remove it:
$f = fopen("file.txt", "r");
$s = fread($f, 3);
if ($s !== "\xef\xbb\xbf") {
// bom not found, rewind file
fseek($f, 0, SEEK_SET);
}
// continue reading here

What's throwing off my str_word_count?

I'm using PHP's function to count the number of words from a textarea via POST...
The issue is that if I do a post back to my file and output the word count it is different than if I copy and paste the same text into my PHP script to evaluate the word count.
What is throwing off the number? There is difference of 6 words, incidentally there are 6 double line breaks in the textarea as well.
How do I minimize this difference?
You could remove the line breaks and tags altogether:
str_word_count(str_replace('<br>', '', nl2br(strip_tags($data))));
Or I guess this is better:
str_word_count(strip_tags(nl2br($data)));
If your line breaks are in HTML-form, you could use something like strip_tags()
If they aren't, I suspect an issue with encoding. Maybe an combination of stripslashes, utf8_encode or utf8_decode could solve this wrong counted words.
As an last resort you could use some regular expression to filter anything but [a-zA-Z] and spaces.

Categories