Fixing weird indentation in text file with php - php

I'm taking this file and splitting it up into sentences. The issue is that its formatted weirdly. I need to remove all the random new lines, indentations and unneeded spaces. Is there a way to do this with php?
I am currently using
$test= file_get_contents("text.txt");
$stringtest = str_replace(PHP_EOL,'', $test);
But I am getting weird behavior when I try to split up the sentences. Is there a way to do this?
The weird behavior is that when I print out the text
echo $stringtest;
There are unseen characters between lines where a newline/weird_spacing used to exist.

You can use a regex to merge all whitespaces to a single space. Also you probably want to remove whitespace at the beginning and end. Try this:
$test = trim($test);
$test = preg_replace('/\s+/s', ' ', $test);

Related

Replace spaces in all URLs with %20 using Regex

I have a large block of HTML that contains multiples URLs with spaces in them. How do I used Regex to replace any space that occurs in a URL, with a '%20'. The good thing is that all of the URLs end with '.pdf'.
Looking for something I could run in BBedit/Text Wrangler, or even PHP.
Example: http://www.site-name.com/dir/file name here.pdf
Need to return: http://www.site-name.com/dir/file%20name%20here.pdf
Instead of Regex you could use could use urlencode in PHP to achieve this which escapes the url for you. Similar to encodeURI in JavaScript.
I was faced with exactly the same problem. I solved it with this:
$text = preg_replace("/http(.*) (.*)\.pdf/U", "http$1%20$2.pdf", $text);
This looks for a space between http and pdf and then replaces the space with %20.
If your URLs have multiple spaces, then simply run the code over and over until all the spaces are gone:
while(preg_match("/http(.*) (.*)\.pdf/U", $text))
{
$text = preg_replace("/http(.*) (.*)\.pdf/U", "http$1%20$2.pdf", $text);
echo('testing testing');
}
However, I've found this will overwrite text if there are two or more URLs on the same line. I haven't found a solution for this yet.

How can I strip all line breaks to generate a proper CSV?

I have a textarea submitting to my database on a website that is properly working. But when I generate a CSV (via PHP) from my database, all line breaks will mess up with the resulting CSV. Any CSV reader will interpret the line break from the input into a new line.
I have tried the following approaches:
Encapsulating the fields in quotation marks.
This:
$field = str_replace(array('\n', '\r', '\r\n', '\n\r'), ',', $original_field);
Also this:
$field = strip_tags(nl2br($original_field));
Combining all approaches above.
Anyhow, the ending result will still be a messed up CSV that will break on any line break inputted by user. I have managed to block new line breaks from the text area, but there's a lot of legacy submissions that need me to fix this on the CSV side as well.
Why is it not working? How can I fix this issue?
Before accepted answer (of user user1517891) is not correct, it will replace in string twice, when there is \r\n... It will replace it as two commas ,. First it will replace \r => ,, then \n => ,.
You need to use it in different order, as:
$field = str_replace(array("\r\n", "\n\r", "\n", "\r"), ',', $original_field);
Use double quotes:
$field = str_replace(array("\n", "\r", "\r\n", "\n\r"), ',', $original_field);
I'd suggest using preg_replace() for this rather than str_replace(). The reason is that there may be multiple newlines and combinations of \r and \n, and I would expect that you'd want to replace them all with just a single comma.
I'd also suggest using trim() to remove trailing blank lines.
$field = preg_replace('/[\n\r]+/', ',', trim($original_field));
You have to put \n and similar tags in double quotes otherwise they will be treated as simple strings and not as linebreaks.

White spaces are lost when echoing under php

I've got the following issue with PHP and PostgreSQL.
In a table I added the following value, mark the spaces.
Things: 10 POLI
When I read this out with PHP it will become
Things 10 POLI
My simpified code (for an ideal world without errors) is:
$query = "SELECT stuff, thing, planets FROM 42 WHERE answer = '-'";
$result = pg_query($connection, $query);
$resultTable = pg_fetch_all($result);
Then with
echo "Things: $result[stuff]";
My question is, which step eliminates all the white spaces? And how to get these spaces back? I know that most people want to remove them, I want to keep them.
that is not a PHP issue, but a HTML issue, becauyse if you output with echo, you do in fact generate HTML code.
The HTML specification defines, that multiple consecutive spaces get rendered as only one space.
If you want to avoid this, wrap a <pre> tag around the string:
echo "<pre>Things: $result[stuff]</pre>";
That's because browser does not recognize more than one space, you can use this code to convert consective spaces to (space understood by browser)
$str = str_replace(' ', ' ', $origText);
Or alternatively wrap your text in <pre> tag if that suites your requirements as suggested in comments below.

PHP doesn't detect white space in string

I'm working on transferring data from one database to another. For this I have to map some values (string) to integers and this is where I run into a strange problem.
The string looks like this $string = "word anotherword"; so two words (or one space).
When I explode the string or count the amount of spaces it misses the white space. Why? I var_dumped the variable and it says it's a string.
Below is the code i'm using.
echo "<strong>Phases</strong>: ".$fases = mapPhase($lijst[DB_PREFIX.'projectPhase']);
The string that's being send to the function is for example "Design Concept". This calls the following function (where the spaces get ignored)
function mapPhase($phases){
echo "Whitespace amount: ".substr_count($phases, ' ')."<br />";
}
For the example string given this function echoes 0. What's causing this and how can i fix it? The strangest thing is that for one instance the function worked perfectly.
More than one whitespaces (in HTML) are always converter into one whitespace. For example code indents.
If you want to print more than one, one by one use &nbps; instead.
function mapPhase($phases){
echo 'Whitespace amount: '.substr_count($phases, ' ').'<br />';
}
It may well be that the alleged space in the string may not be a space as in ' ', but something similar, which gets rendered in the browser in the same way as ' ' would. (for a rudimentary list of possible characters: http://php.net/manual/en/function.trim.php)
Thus, checking what the whitespace exactly is may be the solution to that problem.
Maybe they are not even spaces. Try ord() for each symbol in your string.
ord(' ') is 32.
You can use:
$string = preg_replace('/\s+/', '', $string);

file_put_contents, file_append and line breaks

I'm writing a PHP script that adds numbers into a text file. I want to have one number on every line, like this:
1
5
8
12
If I use file_put_contents($filename, $commentnumber, FILE_APPEND), the result looks like:
15812
If I add a line break like file_put_contents($filename, $commentnumber . "\n", FILE_APPEND), spaces are added after each number and one empty line at the end (underscore represents spaces):
1_
5_
8_
12_
_
_
How do I get that function to add the numbers the way I want, without spaces?
Did you tried with PHP EOL constant?
file_put_contents($filename, $commentnumber . PHP_EOL, FILE_APPEND)
--- Added ---
I just realize that my file editor does the same, but don't worrie, is just a ghost character that the editor places there to signal that there is a newline
You could try this
A file with EOL after the last number looks like:
1_
2_
3_
EOF
but a file without that last character looks like
1_
2_
3
EOF
where _ means a space character
You could try to parse the file contents using php to see what's inside
$lines = explode( PHP_EOL, file_get_contents($file));
foreach($lines as $line ) {
var_dump($line);
}
...tricky
pauls answer has the correct approach but he has a mistake.
what you need ist the following:
file_put_contents($filename, trim($commentnumber).PHP_EOL, FILE_APPEND);
the PHP_EOL constant makes sure to use the right line ending on mac, windows and unix systems
the trim function removes any newline or whitespace on both sides of the string.
converting to integer would be a huge mistake because
1. you might end up having zero, expecially because of white space or special characters (wherever they come from...)
2. ids dont necessarily need to be integers
Ohh Guys! Just Use
\r\n
insted of \n
There is nothing in the code you provided that would generate those spaces, unless $commentnumber already contains the space to begin with. If that is the case, simply use trim($commentnumber) instead.
There is also nothing in your code that would explain empty lines at the bottom of the file, unless $commentnumber can be an empty string. If that is the case, and you want it to output the number 0 instead, use intval($commentnumber).
Of course, you need only one of those two. If you want to preserve string-like content, use trim(); if you always want integers, use intval(), which already trims it automatically.
It is also possible that you accidentally wrote " \n" instead of "\n" in your actual code, but in the code you posted here it is correct.
annoyingregistration, what you have there is absolutely fine.
PHP_EOL and "\n" are exactly the same.
The code you provided theres nothing wrong with it so it must be the value of $commentnumber that has a space at the end of it. as stated, run your $commentnumber through the trim() function.
file_put_contents($filename, trim($commentnumber . "\n"), FILE_APPEND);
Good luck.
After reading your code and responses, I have come up with a theory...
Since I can't see that there's anything wrong with your code, how did you open and read the file? Did you actually open it in a text editor? Did you use a PHP script to do it? If so, open the file with a text editor and check that there are actually spaces at the end of each line. If there is actually is...well, ignore the rest of this answer, then. If not, just read on.
For instance, if you use something like this:
<?php
$lines = file($filename);
if($lines) // Error reading
die();
foreach($lines as $line)
echo $line."<br />";
Then you would always a whitespace at the end of the line because of the way file() work. Make sure each $line does not have a whitespace - such as a newline character - at the end.
Since HTML handles all whitespaces - spaces, tabs, newlines etc. - as spaces, if there is a whitespace at the end of $line, then those would appear as spaces in the HTML output.
Solution: use rtrim($line) to remove whitespaces at the end of the lines. Using the following code:
<?php
$lines = file($filename);
if($lines) // Error reading
die();
foreach($lines as $line)
echo rtrim($line)."<br />";
wouldn't have the same problems as the first example, and all spaces at the end of the lines would be gone.
its because each time you write to the file, the file is being finished, file_put_contents inserts an extra line break at the end

Categories