I'm counting words in an article and removing common words such as "and" or "the".
I"m removing them by use of preg_replace
after it is done I do a quick clean of extra white space by using.
$search_body = preg_replace('/\s+/',' ',$search_body);
However I've got some very stubborn white space that will not go away. I've tried
if($word == "" OR $word == " "){
//chop it's head off
}
But the if statement does not see $word as being just whitespace. I've also tried printing it to the screen to get the raw data type of it and it's still just showing up blank.
Here is the full regex that I'm using.
$pattern = array(
'/\"\;/',
'/[0-9]/',
'/\,/',
'/\./',
'/\!/',
'/\#/',
'/\#/',
'/\$/',
'/\%/',
'/\^/',
'/\&/',
'/\*/',
'/\(/',
'/\)/',
'/\_/',
'/\"/',
'/\'/',
'/\:/',
'/\;/',
'/\?/',
'/\`/',
'/\~/',
'/\[/',
'/\]/',
'/\{/',
'/\}/',
'/\|/',
'/\+/',
'/\=/',
'/\-/',
'/–/',
'/°/',
'/\bthe\b/',
'/\band\b/',
'/\bthat\b/',
'/\bhave\b/',
'/\bfor\b/',
'/\bnot\b/',
'/\bwith\b/',
'/\byou\b/',
'/\bthis\b/',
'/\bbut\b/',
'/\bhis\b/',
'/\bfrom\b/',
'/\bthey\b/',
'/\bsay\b/',
'/\bher\b/',
'/\bshe\b/',
'/\bwill\b/',
'/\bone\b/',
'/\ball\b/',
'/\bwould\b/',
'/\bthere\b/',
'/\btheir\b/',
'/\bwhat\b/',
'/\bout\b/',
'/\babout\b/',
'/\bwho\b/',
'/\bget\b/',
'/\bwhich\b/',
'/\bwhen\b/',
'/\bmake\b/',
'/\bcan\b/',
'/\blike\b/',
'/\btime\b/',
'/\bjust\b/',
'/\bhim\b/',
'/\bknow\b/',
'/\btake\b/',
'/\bpeople\b/',
'/\binto\b/',
'/\byear\b/',
'/\byour\b/',
'/\bgood\b/',
'/\bsome\b/',
'/\bcould\b/',
'/\bthem\b/',
'/\bsee\b/',
'/\bother\b/',
'/\bthan\b/',
'/\bthen\b/',
'/\bnow\b/',
'/\blook\b/',
'/\bonly\b/',
'/\bcome\b/',
'/\bits\b/', //it's?
'/\bover\b/',
'/\bthink\b/',
'/\balso\b/',
'/\bback\b/',
'/\bafter\b/',
'/\buse\b/',
'/\btwo\b/',
'/\bhow\b/',
'/\bour\b/',
'/\bwork\b/',
'/\bfirst\b/',
'/\bwell\b/',
'/\bway\b/',
'/\beven\b/',
'/\bnew\b/',
'/\bwant\b/',
'/\bbecause\b/',
'/\bany\b/',
'/\bthese\b/',
'/\bgive\b/',
'/\bday\b/',
'/\bmost\b/',
'/\bare\b/',
'/\bwas\b/',
'/\<\w+\>/', '/\<\/\w+\>/',
'/\b\w{1}\b/', //1 letter word
'/\b\w{2}\b/', //2 letter word
'/\//',
'/\</',
'/\>/'
);
$search_body = strip_tags($body);
$search_body = strtolower($search_body);
$search_body = preg_replace($pattern, ' ', $search_body);
$search_body = preg_replace('/\s+/',' ',$search_body);
$search_body = explode(" ", $search_body);
When exploded blank values show up left and right
Example text that I am using is too long to post here. But I copied and pasted
This article to give it a test and it showed 32 counts of white space, not including the white space in front of or behind of other words even after using trim().
Here's a js.fiddle of the raw data that is being handled by php.
htmlentities and htmlspecialchars also show nothing.
Here's the code counts all the values and puts them into one.
$inhere = array();
$body_hold = array();
foreach($search_body as $value){
$value = trim($value);
if(in_array($value, $inhere) && $value != ""){
$key = array_search($value, $inhere);
$body_hold[$key]['count'] = $body_hold[$key]['count']+1;
}elseif($value != ""){
$inhere[] = $value;
$body_hold[] = array(
'count' => 1,
'word' => $value
);
}
}
rsort($body_hold);
Basic foreach to see values.
foreach($body_hold as $value){
$count = $value['count'];
$word = trim($value['word']);
echo "Count: ".$count;
echo " Word: ".$word;
echo '<br>';
}
Here's a PHP example of what it's returning
Are you sure you put the exact same data you're processing in the js.fiddle? Or did you get it from a subsequent post-processed step?
It's obviously a Wikipedia article. I went to that article on Wikipedia and opened it in Edit mode, and saw that there are s in the raw wikitext. However, those nbsp's don't appear in your js.fiddle data.
TL;DR: Check for in your processing (and convert to spaces, etc.).
This character 160 looks like space but it's not, replacing all of them to the regular spaces (32) and then removing all the double spaces will fix your problem.
$search_body = str_replace(chr(160), chr(32), $search_body);
$search_body = trim(preg_replace('/\s+/', ' ', $search_body));
Im showing a text excerpt that starts with a searched word with more 35 chars after that searched word.
Do you know some way to show this text excerpt (searched word + 35chars) without cut the last word, because with substr is not working?
$search = $url[1];
$read = $pdo->prepare("SELECT * FROM pages WHERE title LIKE ? OR content LIKE ? LIMIT ?,?");
$read->bindValue(1, "%$search%", PDO::PARAM_STR);
$read->bindValue(2, "%$search%", PDO::PARAM_STR);
$read->bindParam(3, $begin,PDO::PARAM_INT);
$read->bindParam(4, $max,PDO::PARAM_INT);
$read->execute();
$searchPos = stripos($result['content'],$search);
$searchLen = strlen($search);
$result_text = '"'.substr($result['content'], $searchPos, $searchLen + 35).'..."';
echo '<p>'.strip_tags($result_text).'</p>';
I'm guessing you are looking for something like the following:
<?php
#Do you know some way to show this text excerpt (searched word + 35chars) without cut the last word?
$search = 'Do';
$str = explode(' ', 'you know some way to show this text excerpt (searched word + 35chars) without cut the last word?');
$len = strlen($search) + 1;#might need to be 0 if you don\'t want search to be with the 35 characters
$ret = $search . ' ';
foreach($str as $st){
if(strlen($st) + $len < 35){
$len += strlen($st) + 1;
$ret .= $st . ' ';
}else{
break;
}
}
$ret = trim($ret);
var_dump($ret);
which gives string(33) "Do you know some way to show this"
Please use something like this:
//$orgText = "This text is exactly the same length...";
//$orgText = "This text is shorter...";
//$orgText = "This_text_has_no_spaces_and_is_shorter";
//$orgText = "This_text_has_no_spaces_and_is_the_same";
//$orgText = "This_text_has_no_spaces_and_is_really_longer";
$orgText = "This text is longer and will be definitely cut but last word survives...";
$searchedWord = "This";
$charsToShow = 35;
$desiredExcerptLength = strlen($searchedWord) + $charsToShow;
$searchedWordPos = strpos($orgText, $searchedWord);
if ($searchedWordPos !== false) { // Word found
$subText = substr($orgText, $searchedWordPos); // Subtext: begins with searched word and ends at org text end
$subTextLength = strlen($subText);
if ($subTextLength > $desiredExcerptLength) { // Subtext longer than desired excerpt => cut it but keep last word
$spaceAfterLastWordPos = strpos($subText, ' ', $desiredExcerptLength);
$excerpt = substr($subText, 0, $spaceAfterLastWordPos ? $spaceAfterLastWordPos : $subTextLength);
}
else { // Subtext equal or shorter than desired excerpt => keep all text
$excerpt = $subText;
}
}
var_dump($excerpt);
It's clear way to do it.
I hope that's a behavior what you meant.
You can check it at: http://writecodeonline.com/php
There are several "kinds" of text you can pass into that:
Text where searched word isn't present=> return NULL:
input: NULL, "", "Something without searched word"=> result: NULL
Text with spaces longer than desired excerpt length (searched word length + e.g. 35)=> return org text cut out but keep whole last word:
"This text is longer and will be definitely cut but last word survives..." => "This text is longer and will be definitely"
Text with spaces equal to desired excerpt length=> return org text:
"This text is exactly the same length..." => "This text is exactly the same length..."
Text with spaces shorter than desired excerpt length=> return org text:
"This text is shorter..." => "This text is shorter..."
Text without spaces longer than desired excerpt length=> return org text:
"This_text_has_no_spaces_and_is_really_longer" => "This_text_has_no_spaces_and_is_really_longer"
Text without spaces equal to desired excerpt length=> return org text:
"This_text_has_no_spaces_and_is_the_same" => "This_text_has_no_spaces_and_is_the_same"
Text without spaces shorter than desired excerpt length=> return org text:
"This_text_has_no_spaces_and_is_shorter" => "This_text_has_no_spaces_and_is_shorter"
I'm using the code below for highlight one word from file_get_content and go to anchor.
$file='
IAR6=1002
SHF6=1
REF6=0002
TY7=2
DATE7=20130820182357
STAT_N7=1002
SEQ7=0002110000001
STA7=000005
TY8=2
DATE8=20130820182429
STAT_N8=1002
SH8=1
OP8=S123
SEQ8=0002120000081
';
$Seq = 0002110000001;
$text = preg_replace("/\b($Seq)\b/i", '<span class="highlight"><a name="here">\1</a></span>', $file);
for now this highlight : 0002110000001
i would like to highlight all part of the same index number.
ex:
looking for 0002110000001
highlight this part of txt only where number is 7
TY7=2
DATE7=20130820182357
STAT_N7=1002
SEQ7=0002110000001
STA7=000005
Any help will be appreciated.
EDIT:
i try to be more specific.
file contain lot of code parts always start by TYx (x is auto numbering)
i have the SEQ number for my search , in ex 0002110000001
the preg_replace("/\b($Seq)\b/i", '\1 find 0002110000001 and higlight them.
what i need is higlight what is between TY7 and TY8 instead of only 0002110000001.
Hope this is clear enough due to my bad english
thanks
You can make use of stripos() and explode() in PHP
<?php
$file='
IAR6=1002
SHF6=1
REF6=0002
TY7=2
DATE7=20130820182357
STAT_N7=1002
SEQ7=0002110000001
STA7=000005
TY8=2
DATE8=20130820182429
STAT_N8=1002
SH8=1
OP8=S123
SEQ8=0002120000081
';
//$Seq = "0002110000001";
$Seq = "7";
$new_arr=explode(PHP_EOL,$file);
foreach($new_arr as $k=>$v)
{
if(stripos($v,$Seq)!==false)
{
echo "$v\n";
}
}
OUTPUT :
TY7=2
DATE7=20130820182357
STAT_N7=1002
SEQ7=0002110000001
STA7=000005
This question already has answers here:
Explode a paragraph into sentences in PHP
(8 answers)
Closed 9 years ago.
I would like to add span tag for every sentence which ended with "."
like:
my string:
"I could not have said, ’Betrayest thou the Son of man with a kiss?’ unless I believed in betrayal. The whole message of the crucifixion was simply that I did not."
O/P:
"<span id="s1">I could not have said, ’Betrayest thou the Son of man with a kiss?’ unless I believed in betrayal.</span> <span id="s2">The whole message of the crucifixion was simply that I did not.</span>"
how it possible with php?
You could do
<?php
$string="I could not have said, ’Betrayest thou the Son of man with a kiss?’ unless I believed in betrayal. The whole message of the crucifixion was simply that I did not.";
$output=str_replace(". ",'. </span> <span id="s2">',$string);
echo '<span id="s1">'.$output.'</span>';
?>
Edit Based on Comments
This version will make sure every new replacement gets a new span id
<?php
$string="I could not have said, ’Betrayest thou the Son of man with a kiss?’ unless I believed in betrayal. The whole message of the crucifixion was simply that I did not. Testing 123. testing again.";
$dots=substr_count($string,". ");
for($i=2;$i<=$dots+2;$i++)
{
$string=preg_replace("/\. /", ".</span> <span id =\"s$i\">" ,$string,1);
}
echo '<span id="s1">'.$string.'</span>';
?>
If you explode the sentence with ". ". I want to modify the code above.
$newText = "";
$count = 0;
foreach (explode(". ",$theText) as $part) {
if(ctype_upper($part{0}))
{
$count++;
$newText .= "<span id=\"s$count\">$part</span>";
}
}
I hope it should work for abbreviations or something.
Try this code,
You will get unique id for span tag.
<?php
$str = "I could not have said, 'Betrayest thou the Son of man with a kiss?' unless I believed in betrayal. The whole message of the crucifixion was simply that I did not.";
$exp_str = explode(". ",$str);
$sentence = "<span id=\"s1\">";
$n = 2;
foreach($exp_str as $val) {
$sentence .= $val.".</span> <span id=\"s$n\">";
$n++;
}
$n = $n-1;
$sentence = substr($sentence,0,strpos($sentence,".</span> <span id=\"s$n\">"));
echo $sentence;
?>
I'm not sure if preg_replace can do this (because of the unique IDs). Otherwise it can be manually done like this:
$newText = "";
$count = 0;
foreach (explode(".",$theText) as $part) {
$count++;
$newText .= "<span id=\"s$count\">$part</span>";
}
But what about periods mid sentence, as in abbreviations or something? You could probably replace the explode with a preg_split to get a better result. For instance only split if the char right after the period is not a letter or another period.
preg_split("/\.[^\,\w]/",$theText);
Or just ensure the next char is a whitespace.
preg_split("/\.\s/",$theText);
Probably a simple problem here, but I cannot find it.
I am exploding a string that was input and stored from a textarea. I use nl2br() so that I can explode the string by the <br /> tag.
The string explodes properly, but when I try to get the first character of the string in a while loop, it only returns on the first line.
Note: The concept here is greentexting, so if you are familiar with that then you will see what I am trying to do. If you are not, I put a brief description below the code sample.
Code:
while($row = mysqli_fetch_array($r, MYSQLI_ASSOC)) {
$comment = nl2br($row['comment']);
$sepcomment = explode("<br />", $comment);
$countcomment = count($sepcomment);
$i = 0;
//BEGIN GREENTEXT COLORING LOOP
while($i < $countcomment) {
$fb = $sepcomment[$i];
$z = $fb[0]; // Check to see if first character is >
if ($z == ">") {
$tcolor = "#789922";
}
else {
$tcolor = "#000000";
}
echo '<font color="' . $tcolor . '">' . $sepcomment[$i] . '</font><br>';
$i++;
}
//END GREENTEXT COLORING LOOP
}
Greentext: If the first character of the line is '>' then the color of that entire line becomes green. If not, then the color is black.
Picture:
What I have tried:
strip_tags() - Thinking that possibly the tags were acting as the first characters.
$fb = preg_replace("/(<br\s*\/?>\s*)+/", "", $sepcomment[$i]);
str_replace()
echo $z //Shows the correct character on first line, blank on following lines.
$z = substr($fb, 0, 1);
Here is a test I just did where I returned the first 5 characters of the string.
Any ideas for getting rid of those empty characters?
Try "trim" function
$fb = trim($sepcomment[$i]);
http://php.net/manual/en/function.trim.php
(probably line breaks are the problem, there are \n\r characters after tag)