What's throwing off my str_word_count? - php

I'm using PHP's function to count the number of words from a textarea via POST...
The issue is that if I do a post back to my file and output the word count it is different than if I copy and paste the same text into my PHP script to evaluate the word count.
What is throwing off the number? There is difference of 6 words, incidentally there are 6 double line breaks in the textarea as well.
How do I minimize this difference?

You could remove the line breaks and tags altogether:
str_word_count(str_replace('<br>', '', nl2br(strip_tags($data))));
Or I guess this is better:
str_word_count(strip_tags(nl2br($data)));

If your line breaks are in HTML-form, you could use something like strip_tags()
If they aren't, I suspect an issue with encoding. Maybe an combination of stripslashes, utf8_encode or utf8_decode could solve this wrong counted words.
As an last resort you could use some regular expression to filter anything but [a-zA-Z] and spaces.

Related

Line break gets replaced with rn in php

So, I have a taxtarea where the user makes a blog post but when the user submits it their line breaks get replaced with a 'rn' and I don't know why. I thought it was the php script but when I rewrote it and after taking away the str_replaces' it still replaced a new line with a rn.
What is happening?
In textarea just like any Text Editor, New Line are carriage (\r) or newline feed (\n) characters or combination of the two (depending on the OS). To convert these characters to linebreak of HTML, use the nl2br() of PHP.
Check PHP Manual for reference.
Try removing any stripslahes(). Stripslashes removes any backslashes and forward slashes. For example, line breaks are being sent as \n or \r and the stripslashes() takes away the backslashes in those, so that's why it says 'rn'.
I had this very problem, and this solution helped me. Good luck!

replacing single break tags for two using regex

Whilst being aware of the pitfalls/dangers of certain html manipulation with regex (instead of using say the PHP dom manipulator) I'm trying to achieve something that should be pretty simple and not that risky.
Basically I have some uncleaned html copy from a database that doesn't use paragraphs but line break tags to produce the effects of paragraphs. Sometimes though the user only entered content with a single break so that the text line returns but without a blank line appearing. In such instances and ONLY in such instances I want to replace that single <br> with two (<br><br>).
So as an example...
This is <br>a test<br><br>example!
would become
This is <br><br>a test<br><br>example!
Note how the second set of breaks is left alone as its already got 2 tags.
Simply replace one or more occurences of <br> with <br> :)
Replace what:
(<br>)+
Replace with:
<br><br>
You can use negative lookahead and lookbehind to solve this:
(?<!<br>)<br>(?!<br>)
See the example here: http://rubular.com/r/WYjoenH1SA
(?<!NOPREFIX)
(?!NOPOSTFIX)
The first part prevents from matching, if the NOPREFIX is present - the second one if NOPOSTFIX is present.

Use line-break as separator for an array input?

I've never actually used arrays before, as I've never had to so far (a simple variable has been enough for me), however now I've created a form with a text-area that is meant to POST multiple urls through to my PHP script.
What I want to do is use a line-break in the visitors input to act as a separator for an array input.
For example, the visitor inputs 90 lines of text (all url's), the array breaks each one into a list of 90, and creates an array value for each one.
Any info, advice or comments would be greatly appreciated :)!
Not 100% percent sure what line breaks are used, e.g.:
Windows uses \r\n
Linux uses \n
(old) Macs used \r
However if you know this you can simply do:
$urls = explode("\n", $_POST['urls']);
EDIT
Actually after testing using regex IS faster than first doing a str_replace() and explode.
Look at http://www.php.net/manual/en/function.preg-split.php and as delimiter use new line sign
or see PHP REGEX - text to array by preg_split at line break
be careful about using just \r or \n because every operating system has "new line" defined another way
see answer by Tgr on SO question PHP REGEX - text to array by preg_split at line break
Use explode
$array=explode("\n",$_POST['textarea']);

Removing Break Lines

I've asked this question before but I didn't seem to get the right answer. I've got a problem with new lines in text. Javascript and jQuery don't like things like this:
alert('text
text);
When I pull information from a database table that has a break line in it, JS and jQuery can't parse it correctly. I've been told to use n2lbr(), but that doesn't work when someone uses 'shift+enter' or 'enter' when typing text into a message (which is where I get this problem). I still end up with separate lines when using it. It seems to correctly apply the BR tag after the line break, but it still leaves the break there.
Can anyone provide some help here? I get the message data with jQuery and send it off to PHP file to storage, so I'd like to fix the problem there.
This wouldn't be a problem normally, but I want to pull all of a users messages when they first load up their inbox and then display it to them via jQuery when they select a certain message.
You could use a regexp to replace newlines with spaces:
alert('<?php preg_replace("/[\n\r\f]+/m","<br />", $text); ?>');
The m modifier will match across newlines, which in this case I think is important.
edit: sorry, didn't realise you actually wanted <br /> elements, not spaces. updated answer accordingly.
edit2: like #LainIwakura, I made a mistake in my regexp, partly due to the previous edit. my new regexp only replaces CR/NL/LF characters, not any whitespace character (\s). note there are a bunch of unicode linebreak characters that i haven't acknowledged... if you need to deal with these, you might want to read up on the regexp syntax for unicode
Edit: Okay after much tripping over myself I believe you want this:
$str = preg_replace('/\n+/', '<br />', $str);
And with that I'm going to bed...too late to be answering questions.
I usually use json_encode() to format string for use in JavaScript, as it does everything that's necessary for making JS-valid value.

PHP: How to prevent unwanted line breaks

I'm using PHP to create some basic HTML. The tags are always the same, but the actual links/titles correspond to PHP variables:
$string = '<p style="..."><strong><i>'.$title[$i].'</i></strong>
<br>';
echo $string;
fwrite($outfile, $string);
The resultant html, both as echoed (when I view the page source) and in the simple txt file I'm writing to, reads as follows:
<p style="..."><a href="http://www.example.com
"><strong><i>Example Title
</i></strong></a></p>
<br>
While this works, it's not exactly what I want. It looks like PHP is adding a line break every time I interrupt the string to insert a variable. Is there a way to prevent this behavior?
Whilst it won't affect your HTML page at all with the line breaks (unless you are using pre or text-wrap: pre), you should be able to call trim() on those variables to remove newlines.
To find out if your variable has a newline at front or back, try this regex
var_dump(preg_match('/^\n|\n$/', $variable));
(I think you have to use single quotes so PHP doesn't turn your \n into a literal newline in the string).
My guess is your variables are to blame. You might try cleaning them up with trim: http://us2.php.net/trim.
The line breaks show up because of multi-byte encoding, I believe. Try:
$newstring = mb_substr($string_w_line_break,[start],[length],'UTF-8');
That worked for me when strange line breaks showed up after parsing html.

Categories