So this question applies for lots of languages, so don't be thrown off by the fact I'm using PHP in the terminal. An answer for say Python or Perl would probably also give what I need to know.
So I'm reading a text file, and I want to know what special characters are contained on every line. So for example, if the text file is this:
hello
world
I want the script to output "hello\nworld". My root problem is that I'm trying to write a PHP script which involves reading from a text file but I want it to ignore the blank lines but no matter what I try it still reads in the blank lines. I think it's because I'm not putting in the right match for the line so I'm trying to figure out how a blank line exists and I'm unsure if it's "\n" or "\t\t" etc.
Just do ordinary str_replace() like this:
$text = str_replace( array("\n","\r"), array('\n', '\r'), $text);
My solution, certainly more tedious, will remove blank lines, CLI PHP Script requires Shebang at the head:
#!/usr/bin/php
<?php
//
// main test:
//
$xarr = file("MyFilename.txt");
$n = count($xarr);
$strret = "";
for($i = 0; $i < $n; $i++)
{
//
// ignore blank lines:
//
if(! preg_match("/^$/", $xarr[$i]))
{
if($i > 0)
{
$strret .= "\\n";
}
$strret .= rtrim($xarr[$i]);
}
}
//
echo $strret . "\n";
?>
With a text file:
# cat MyFilename.txt
hello
world
It puts:
hello\nworld
I'll assume that by "special character", you mean only \n, \t, and \r.
Text file:
hello
world
foo
bar!
baz_
PHP:
$fp = fopen('textfile.txt', 'r');
while (!feof($fp)) {
$c = fgetc($fp);
if ($c == "\n") $c = '\n';
else if ($c == "\t") $c = '\t';
else if ($c == "\r") $c = '\r';
echo $c;
}
What the script above will basically do, is read each character of the file, and replace any occurrences of \t, \r, or \n that it finds. This eliminates the necessity to check for double-ups of characters.
Related
I'm trying to make text input in a textarea as easy as possible for the user, without them needing to know any code or have to deal with wysiwyg formatting. It's just text.
But since the text, when shown on the page, will be shown in html, it would be great for there to be 'p' tags around each line.
Right now I've tried this:
$content = mysqli_real_escape_string($connection,$_POST['content']);
$paragraphs = explode("\n", $content);
for ($i = 0; $i < count ($paragraphs); $i++)
{
$paragraphs[$i] = '<p>' . $paragraphs[$i] . '</p>';
}
$content = implode('', $paragraphs);
But this is putting the <p></p> tags around EVERYTHING - one at the beginning of the post, and one at the end, ignoring all returns.
Can anyone see what I'm doing wrong?
When you call mysqli_real_escape_string(), this encodes certain things - from the manual
escapestr
The string to be escaped.
Characters encoded are NUL (ASCII 0), \n, \r, \, ', ", and Control-Z.
So any \n's will be escaped.
You could change it round to do the replacement first...
$paragraphs = explode("\n", $_POST['content']);
for ($i = 0; $i < count ($paragraphs); $i++)
{
$paragraphs[$i] = '<p>' . $paragraphs[$i] . '</p>';
}
$content = implode('', $paragraphs);
BUT you should also be using prepared statements, so you shouldn't need to call mysqli_real_escape_string() at all.
So I get these values from a form and they are then saved into a word document.
If my input (this is a textarea by the way) reads this:
"This"
&
"That"
I would expect the output to be exactly like that
However, whenever it comes out it looks like this:
It adds those special block characters at the end...
How can I get rid of these?
These are my variables:
$multipleImports = explode("\n",$_POST['multipleImports']);
$multipleImportsInfo = explode("\n",$_POST['multipleImportsInfo']);
$multipleImportsCounts = explode("\n",$_POST['multipleImportsCounts']);
And here I concatenate them into a string.
$length = count($multipleImports);
for ($i = 0; $i < $length; $i++) {
$content = $content . $multipleImports[$i] . " " . $multipleImportsInfo[$i] . " " . $multipleImportsCounts[$i] . "\n ";
}
I tried to right trim, I tried to use html entities and html decode entities and nothing I tried worked. Please help.
After reading #Tom Hedden's post it gave me an idea to try this, and it worked!
$length = count($multipleImports);
for ($i = 0; $i < $length; $i++) {
$content = $content . $multipleImports[$i] . " " . $multipleImportsInfo[$i] . " " . $multipleImportsCounts[$i] . "\r\n ";
}
I would be curious to know what those special characters are. You should do a hex dump to see. I just glanced at your code and haven't thought seriously about it, but what immediately pops into my mind is the different in end-of-line in Windows vs. *nix. That is, if the data comes from Windows I think the end of line is provided by a carriage return AND line feed ("\r\n") rather than by just a line feed ("\n").
My question is exactly opposite to this one i added last night .
Need to remove the last br tag.
Input:
Test1 is here<br><br>Now comes Test2<br><br>Then test 3<br><br><br>Thats it.
Output
Test1 is here<br>Now comes Test2<br>Then test 3<br><br>Thats it.
My try:
preg_replace("[((?:<br>)+)]","",$posttext)
It removes all breaks.
You can substitute
<br><br>(?!<br)
to <br>
preg_replace('/<br><br>(?!<br)/', "<br>", $posttext);
The lookahead will prevent to match any more <br>
See demo at regex101
Feast your eyes on this hahaha
If Preg replace doesn't work...
// cuts off one <br> as high as whatever $i is at
$string = "Test1 is here<br><br>Now comes Test2<br><br>Then test 3<br><br><br>Thats it.";
$i = 10;
while($i > 0)
{
$ii = 1;
$brake = "<br>";
$replace = "";
while($ii < $i)
{
$brake .= "<br>";
$replace .= "<br>";
$ii++;
}
$string = str_replace($brake,$replace,$string);
$i--;
}
echo $string; // Test1 is here<br>Now comes Test2<br>Then test 3<br><br>Thats it.
PS: If theres no preg replace for this, it is usable albeit very inefficent.
I have the following function that I use in a PHP application to remove white space and line breaks from the source of a page.
It's based on some examples I have read on Stack Overflow, with some amends to handle JS and HTML comments. Note: I've not used an exisiting library because I wanted something simple without all the additional features that others include and with this code I have fine-grained control over what is stripped and what is not.
protected function MinifyHTML($str) {
$str = preg_replace("/(?<!\S)\/\/\s*[^\r\n]*/", "", $str); // strip JS/CSS comments
$str = preg_replace("/<!--(.*)-->/Uis", "", $str); // strip HTML comments
$protected_parts = array('<pre>,</pre>','<textarea>,</textarea>','<,>');
$extracted_values = array();
$i = 0;
foreach ($protected_parts as $part) {
$finished = false;
$search_offset = $first_offset = 0;
$end_offset = 1;
$startend = explode(',', $part);
if (count($startend) === 1) $startend[1] = $startend[0];
$len0 = strlen($startend[0]); $len1 = strlen($startend[1]);
while ($finished === false) {
$first_offset = strpos($str, $startend[0], $search_offset);
if ($first_offset === false) $finished = true;
else {
$search_offset = strpos($str, $startend[1], $first_offset + $len0);
$extracted_values[$i] = substr($str, $first_offset + $len0, $search_offset - $first_offset - $len0);
$str = substr($str, 0, $first_offset + $len0).'$$#'.$i.'$$'.substr($str, $search_offset);
$search_offset += $len1 + strlen((string)$i) + 5 - strlen($extracted_values[$i]);
++$i;
}
}
}
$str = preg_replace("/\s/", " ", $str);
$str = preg_replace("/\s{2,}/", " ", $str);
$replace = array('> <'=>'><', ' >'=>'>','< '=>'<','</ '=>'</');
$str = str_replace(array_keys($replace), array_values($replace), $str);
for ($d = 0; $d < $i; ++$d)
$str = str_replace('$$#'.$d.'$$', $extracted_values[$d], $str);
return $str;
}
However if I get a scenario like:
Link Link
It will remove that space between the two anchor tags.
I've added '</a> <a' to my $protected_parts in an attempt to stop this, but it still strips out the space between them. So I end up with LinkLink in the source which isn't what I want.
The same also happens with:
<p>This is <span class="">some</span> <span class="">styled</span> text.</p>
Also it seems the protected_parts arn't working as my textareas are being minified too so all the content inside them is compressed down into one line...
Any ideas on the fixes? I've also not been able to find alternatives to use instead that don't implement caching, gzipping and other features I don't want. I purely want a simple solution that strips spaces, line breaks and comments and that's it.
UPDATED 2014/02/25 (late):
Here's another workaround. Instead of touching $protected_parts I'm just adding another replace operation at the end that adds a space after every </a> -- again a workaround, but this shouldn't screw up any of your original operability, and the penalty this time is only one space character after every anchor tag, not bad. Here it is: http://phpfiddle.org/main/code/5qj-13z
UPDATED 2014/02/25:
I added '</a> ' to $protected_parts and it does not strip the space. I threw it into phpfiddle over here, http://phpfiddle.org/lite/code/dms-cud. This is only a workaround for a few lines of synethetic-emulated HTML... I'm not sure what kind of organic code you're running through your function. Obviously this workaround is not a universal fix either.
Original
I added '</a>',' <a ', to $protected_parts and it does not strip the space. I threw it into phpfiddle over here, http://phpfiddle.org/lite/code/ztz-5hf.
Your function is scary to me, but I like some of the basic functionality, like stripping HTML, JS and CSS comments. I'd still recommend using an apache extension or library. Using other people's open source code is the most powerful witchcraft a programmer can yield. :)
Probably a simple problem here, but I cannot find it.
I am exploding a string that was input and stored from a textarea. I use nl2br() so that I can explode the string by the <br /> tag.
The string explodes properly, but when I try to get the first character of the string in a while loop, it only returns on the first line.
Note: The concept here is greentexting, so if you are familiar with that then you will see what I am trying to do. If you are not, I put a brief description below the code sample.
Code:
while($row = mysqli_fetch_array($r, MYSQLI_ASSOC)) {
$comment = nl2br($row['comment']);
$sepcomment = explode("<br />", $comment);
$countcomment = count($sepcomment);
$i = 0;
//BEGIN GREENTEXT COLORING LOOP
while($i < $countcomment) {
$fb = $sepcomment[$i];
$z = $fb[0]; // Check to see if first character is >
if ($z == ">") {
$tcolor = "#789922";
}
else {
$tcolor = "#000000";
}
echo '<font color="' . $tcolor . '">' . $sepcomment[$i] . '</font><br>';
$i++;
}
//END GREENTEXT COLORING LOOP
}
Greentext: If the first character of the line is '>' then the color of that entire line becomes green. If not, then the color is black.
Picture:
What I have tried:
strip_tags() - Thinking that possibly the tags were acting as the first characters.
$fb = preg_replace("/(<br\s*\/?>\s*)+/", "", $sepcomment[$i]);
str_replace()
echo $z //Shows the correct character on first line, blank on following lines.
$z = substr($fb, 0, 1);
Here is a test I just did where I returned the first 5 characters of the string.
Any ideas for getting rid of those empty characters?
Try "trim" function
$fb = trim($sepcomment[$i]);
http://php.net/manual/en/function.trim.php
(probably line breaks are the problem, there are \n\r characters after tag)