We want to censor certain words on our site but each word has different censored output.
For example:
PHP => P*P, javascript => j*vascript
(However not always the second letter.)
So we want a simple "one star" censor system but with keeping the original caps. The datas coming from the database are uncensored so we need the fastest way that possible.
$data="Javascript and php are awesome!";
$word[]="PHP";
$censor[]="H";//the letter we want to replace
$word[]="javascript";
$censor[]="a"//but only once (j*v*script would look wierd)
//Of course if it needed we can use the full censored word in $censor variables
Expected value:
J*vascript and p*p are awesome!
Thanks for all the answers!
You can put your censored words in key-based array, and value of the array should be the position of what char is replaced with * (see $censor array example bellow).
$string = 'JavaSCRIPT and pHp are testing test-ground for TEST ŠĐČĆŽ ŠĐčćŽ!';
$censor = [
'php' => 2,
'javascript' => 2,
'test' => 3,
'šđčćž' => 4,
];
function stringCensorSlow($string, array $censor) {
foreach ($censor as $word => $position) {
while (($pos = mb_stripos($string, $word)) !== false) {
$string =
mb_substr($string, 0, $pos + $position - 1) .
'*' .
mb_substr($string, $pos + $position);
}
}
return $string;
}
function stringCensorFast($string, array $censor) {
$pattern = [];
foreach ($censor as $word => $position) {
$word = '~(' . mb_substr($word, 0, $position - 1) . ')' . mb_substr($word, $position - 1, 1) . '(' . mb_substr($word, $position) . ')~iu';
$pattern[$word] = '$1*$2';
}
return preg_replace(array_keys($pattern), array_values($pattern), $string);
}
Use example :
echo stringCensorSlow($string, $censor);
# J*vaSCRIPT and p*p are te*ting te*t-ground for TE*T ŠĐČ*Ž ŠĐč*Ž!
echo stringCensorFast($string, $censor) . "\n";
# J*vaSCRIPT and p*p are te*ting te*t-ground for TE*T ŠĐČ*Ž ŠĐč*Ž!
Speed test :
foreach (['stringCensorSlow', 'stringCensorFast'] as $func) {
$time = microtime(true);
for ($i = 0; $i < 10000; $i++) {
$func($string, $censor);
}
$time = microtime(true) - $time;
echo "{$func}() took $time\n";
}
output on my localhost was :
stringCensorSlow() took 1.9752140045166
stringCensorFast() took 0.11587309837341
Upgrade #1: added multibyte character safe.
Upgrade #2: added example for preg_replace, which is faster than mb_substr. Tnx to AbsoluteƵERØ
Upgrade #3: added speed test loop and result on my local PC machine.
Make an array of words and replacements. This should be your fastest option in terms of processing, but a little more methodical to setup. Remember when you're setting up your patterns to use the i modifier to make each pattern case insensitive. You could ultimately pull these from a database into the arrays. I've hard-coded the arrays here for the example.
<!DOCTYPE html>
<html>
<meta content="text/html; charset=UTF-8" http-equiv="content-type">
<?php
$word_to_alter = array(
'!(j)a(v)a(script)(s|ing|ed)?!i',
'!(p)h(p)!i',
'!(m)y(sql)!i',
'!(p)(yth)o(n)!i',
'!(r)u(by)!i',
'!(ВЗЛ)О(М)!iu',
);
$alteration = array(
'$1*$2*$3$4',
'$1*$2',
'$1*$2',
'$1$2*$3',
'$1*$2',
'$1*$2',
);
$string = "Welcome to the world of programming. You can learn PHP, MySQL, Python, Ruby, and Javascript all at your own pace. If you know someone who uses javascripting in their daily routine you can ask them about becoming a programmer who writes JavaScripts. взлом прохладно";
$newstring = preg_replace($word_to_alter,$alteration,$string);
echo $newstring;
?>
</html>
Output
Welcome to the world of programming. You can learn P*P, M*SQL, Pyth*n,
R*by, and J*v*script all at your own pace. If you know someone who
uses j*v*scripting in their daily routine you can ask them about
becoming a programmer who writes J*v*Scripts. взл*м прохладно
Update
It works the same with UTF-8 characters, note that you have to specify a u modifier to make the pattern treated as UTF-8.
u (PCRE_UTF8)
This modifier turns on additional functionality of PCRE that is incompatible with Perl. Pattern strings are treated as UTF-8. This
modifier is available from PHP 4.1.0 or greater on Unix and from PHP
4.2.3 on win32. UTF-8 validity of the pattern is checked since PHP 4.3.5.
Why not just use a little helper function and pass it a word and the desired censor?
function censorWord($word, $censor) {
if(strpos($word, $censor)) {
return preg_replace("/$censor/",'*', $word, 1);
}
}
echo censorWord("Javascript", "a"); // returns J*avascript
echo censorWord("PHP", "H"); // returns P*P
Then you can check the word against your wordlist and if it is a word that should be censored, you can pass it to the function. Then, you also always have the original word as well as the censored one to play with or put back in your sentence.
This would also make it easy to change the number of letters censored by just changing the offset in the preg_replace. All you have to do is keep an array of words, explode the sentence on spaces or something, and then check in_array. If it is in the array, send it to censorWord().
Demo
And here's a more complete example doing exactly what you said in the OP.
function censorWord($word, $censor) {
if(strpos($word, $censor)) {
return preg_replace("/$censor/",'*', $word, 1);
}
}
$word_list = ['php','javascript'];
$data = "Javascript and php are awesome!";
$words = explode(" ", $data);
// pass each word by reference so it can be modified inside our array
foreach($words as &$word) {
if(in_array(strtolower($word), $word_list)) {
// this just passes the second letter of the word
// as the $censor argument
$word = censorWord($word, $word[1]);
}
}
echo implode(" ", $words); // returns J*vascript and p*p are awesome!
Another Demo
You could store a lowercase list of the censored words somewhere, and if you're okay with starring the second letter every time, do something like this:
if (in_array(strtolower($word), $censored_words)) {
$word = substr($word, 0, 1) . "*" . substr($word, 2);
}
If you want to change the first occurrence of a letter, you could do something like:
$censored_words = array('javascript' => 'a', 'php' => 'h', 'ruby' => 'b');
$lword = strtolower($word);
if (in_array($lword, array_keys($censored_words))) {
$ind = strpos($lword, $censored_words[$lword]);
$word = substr($word, 0, $ind) . "*" . substr($word, $ind + 1);
}
This is what I would do:
Create a simple database (text file) and make a "table" of all your censored words and expected censored results. E.G.:
PHP --- P*P
javascript --- j*vascript
HTML --- HT*L
Write PHP code to compare the database information to your simple censored file. You will have to use array explode to create an array of only words. Something like this:
/* Opening database of censored words */
$filename = "/files/censored_words.txt";
$file = fopen( $filename, "r" );
if( $file == false )
{
echo ( "Error in opening file" );
exit();
}
/* Creating an array of words from string*/
$data = explode(" ", $data); // What was "Javascript and PHP are awesome!" has
// become "Javascript", "and", "PHP", "are",
// "awesome!". This is useful.
If your script finds matching words, replace the word in your data with the censored word from your list. You would have to delimit the file first by \r\n and finally by ---. (Or whatever you choose for separating your table with.)
Hope this helped!
Related
If one is experienced in PHP, then one knows how to find whole words in a string and their position using a regex and preg_match() or preg_match_all. But, if you're looking instead for a lighter solution, you may be tempted to try with strpos(). The question emerges as to how one can use this function without it detecting substrings contained in other words. For example, how to detect "any" but not those characters occurring in "company"?
Consider a string like the following:
"Will *any* company do *any* job, (are there any)?"
How would one apply strpos() to detect each appearance of "any" in the string? Real life often involves more than merely space delimited words. Unfortunately, this sentence didn't appear with the non-alphabetical characters when I originally posted.
I think you could probably just remove all the whitespace characters you care about (e.g., what about hyphenations?) and test for " word ":
var_dump(firstWordPosition('Will company any do any job, (are there any)?', 'any'));
var_dump(firstWordPosition('Will *any* company do *any* job, (are there any)?', 'any'));
function firstWordPosition($str, $word) {
// There are others, maybe also pass this in or array_merge() for more control.
$nonchars = ["'",'"','.',',','!','?','(',')','^','$','#','\n','\r\n','\t',];
// You could also do a strpos() with an if and another argument passed in.
// Note that we're padding the $str to with spaces to match begin/end.
$pos = stripos(str_replace($nonchars, ' ', " $str "), " $word ");
// Have to account for the for-space on " $str ".
return $pos ? $pos - 1: false;
}
Gives 12 (offset from 0)
https://3v4l.org/qh9Rb
<?php
$subject = "any";
$b = " ";
$delimited = "$b$subject$b";
$replace = array("?","*","(",")",",",".");
$str = "Will *any* company do *any* job, (are there any)?";
echo "\nThe string: \"$str\"";
$temp = str_replace($replace,$b,$str);
while ( ($pos = strpos($temp,$delimited)) !== false )
{
echo "\nThe subject \"$subject\" occurs at position ",($pos + 1);
for ($i=0,$max=$pos + 1 + strlen($subject); $i <= $max; $i++) {
$temp[$i] = $b;
}
}
See demo
The script defines a word boundary as a blank space. If the string has non-alphabetical characters, they are replaced with blank space and the result is stored in $temp. As the loop iterates and detects $subject, each of its characters changes into a space in order to locate the next appearance of the subject. Considering the amount of work involved one may wonder if such effort really pays off compared to using a regex with a preg_ function. That is something that one will have to decide themselves. My purpose was to show how this may be achieved using strpos() without resorting to the oft repeated conventional wisdom of SO which advocates using a regex.
There is an option if you are loathe to create a replacement array of non-alphabetical characters, as follows:
<?php
function getAllWholeWordPos($s,$word){
$b = " ";
$delimited = "$b$word$b";
$retval = false;
for ($i=0, $max = strlen( $s ); $i < $max; $i++) {
if ( !ctype_alpha( $s[$i] ) ){
$s[$i] = $b;
}
}
while ( ( $pos = stripos( $s, $delimited) ) !== false ) {
$retval[] = $pos + 1;
for ( $i=0, $max = $pos + 1 + strlen( $word ); $i <= $max; $i++) {
$s[$i] = $b;
}
}
return $retval;
}
$whole_word = "any";
$str = "Will *$whole_word* company do *$whole_word* job, (are there $whole_word)?";
echo "\nString: \"$str\"";
$result = getAllWholeWordPos( $str, $whole_word );
$times = count( $result );
echo "\n\nThe word \"$whole_word\" occurs $times times:\n";
foreach ($result as $pos) {
echo "\nPosition: ",$pos;
}
See demo
Note, this example with its update improves the code by providing a function which uses a variant of strpos(), namely stripos() which has the added benefit of being case insensitive. Despite the more labor-intensive coding, the performance is speedy; see performance.
Try the following code
<!DOCTYPE html>
<html>
<body>
<?php
echo strpos("I love php, I love php too!","php");
?>
</body>
</html>
Output: 7
i need some help. how to count the length of each word in the text file using PHP.
for example. there is the test.txt. and the contain is " hello everyone, i need some help."
how to output the text and then count the length of each word,like:
array
hello => 5
everyone => 8
i => 1
need => 4
some => 4
help => 4
i just start to learn php. so please explain the detail about the code what you write.
many thanks
This should work
$text = file_get_contents('text.txt'); // $text = 'hello everyone, i need some help.';
$words = str_word_count($text, 1);
$wordsLength = array_map(
function($word) { return mb_strlen($word, 'UTF-8'); },
$words
);
var_dump(array_combine($words, $wordsLength));
For more informations about str_word_count and its parameters see http://php.net/manual/en/function.str-word-count.php
Basically, everything is well described on php.net. The function array_map walks through given array and applies given (eg. anonymous) function on every item in that array. The function array_combine creates an array by using one array for keys and another for its values.
this is working
$stringFind="hello everyone, i need some help";
$file=file_get_contents("content.txt");/*put your file path */
$isPresent=strpos($file,$stringFind);
if($isPresent==true){
$countWord=explode(" ",$stringFind);
foreach($countWord as $val){
echo $val ." => ".strlen($val)."<br />";
}
}else{
echo "Not Found";
}
If you don't need to process the words' length later, try this:
// Get file contents
$text = file_get_contents('path/to/file.txt');
// break text to array of words
$words = str_word_count($text, 1);
// display text
echo $text, '<br><br>';
// and every word with it's length
foreach ($words as $word) {
echo $word, ' => ', mb_strlen($word), '<br>';
}
But be noticed, that str_word_count() function has many issues with UTF-8 string (f.e. Polish, Czech and similar characters). If you need those, then I suggest filtering out commas, dots and other non-word characters and using explode() to get $words array.
I'm trying to do a search engine where I write in a textbox, for example, "Mi" and it selects and shows "Mike Ross". However it's not working with spaces. I write "Mike" and I get "Mike Ross", but when I write "Mike " I get "Mike Ross" (no bold).
The same is happening with accents.
So I write "Jo" and the result is "João Carlos". If I write "Joa", the result is "João Carlos" (without any bold part). I want to ignore the accents while writing but still display them in the results.
So this is my script after the SELECT:
while($row = $result->fetch_array()) {
$name = $row['name'];
$array = explode(' ',trim($name));
$array_length = count($array);
for ($i=0; $i<$array_length; $i++ ) {
$letters = substr($array[$i], 0, $q_length);
if (strtoupper($letters) == strtoupper($q)) {
$bold_name = '<strong>'.$letters.'</strong>';
$final_name = preg_replace('~'.$letters.'~i', $bold_name, $array[$i], 1);
$array[$i] = $final_name;
}
array[$i] = array[$i]." ";
}
foreach ($array as $t_name) { echo $t_name;
}
Thank you for your help!
if (strtoupper($letters) == strtoupper($q))
This will never evaluate to "true" with spaces since you're removing spaces from the matchable letter set with explode(' ', trim($name), effectively making any value of $q with a space unmatchable to $letters
Here's a quick example that does what I think you're looking for
<?php
$q = "Mike "; // User query
$name = "Mike Ross"; // Database row value
if(stripos($name, $q) !== false) // Case-insensitive match
{
// Case-insensitive replace of match with match enclosed in strong tag
$result = preg_replace("/($q)/i", '<strong>$1</strong>', $name);
print_r($result);
}
// Result is
// <strong>Mike </strong>Ross
From what I can tell (a quick google for "replace accented characters PHP"), you're kind of out of luck with that one. This question provides a quick solution using strtr, and this tip uses a similar method with str_replace.
Unfortunately, these rely on predefined character sets, so incoming accents you haven't prepared for will fail. You may be better off relying on users to enter the special characters when they search, or create a new column with a "searchable" name with the accented characters replaced as best as you can, and return the real name as the "matched" display field.
One more Note
I found another solution that can do most of what you want, except the returned name will not have the accent. It will, however, match the accented value in the DB with a non-accented search. Modified code is:
<?php
$q = "Joa";
$name = "João Carlos";
$searchable_name = replace_accents($name);
if(stripos($searchable_name, $q) !== false)
{
$result = preg_replace("/($q)/i", '<strong>$1</strong>', $searchable_name);
print_r($result);
}
function replace_accents($str) {
$str = htmlentities($str, ENT_COMPAT, "UTF-8");
$str = preg_replace('/&([a-zA-Z])(uml|acute|grave|circ|tilde);/','$1',$str);
return html_entity_decode($str);
}
this is what I try to get:
My longest text to test When I search for e.g. My I should get My longest
I tried it with this function to get first the complete length of the input and then I search for the ' ' to cut it.
$length = strripos($text, $input) + strlen($input)+2;
$stringpos = strripos($text, ' ', $length);
$newstring = substr($text, 0, strpos($text, ' ', $length));
But this only works first time and then it cuts after the current input, means
My lon is My longest and not My longest text.
How I must change this to get the right result, always getting the next word. Maybe I need a break, but I cannot find the right solution.
UPDATE
Here is my workaround till I find a better solution. As I said working with array functions does not work, since part words should work. So I extended my previous idea a bit. Basic idea is to differ between first time and the next. I improved the code a bit.
function get_title($input, $text) {
$length = strripos($text, $input) + strlen($input);
$stringpos = stripos($text, ' ', $length);
// Find next ' '
$stringpos2 = stripos($text, ' ', $stringpos+1);
if (!$stringpos) {
$newstring = $text;
} else if ($stringpos2) {
$newstring = substr($text, 0, $stringpos2);
} }
Not pretty, but hey it seems to work ^^. Anyway maybe someone of you have a better solution.
You can try using explode
$string = explode(" ", "My longest text to test");
$key = array_search("My", $string);
echo $string[$key] , " " , $string[$key + 1] ;
You can take i to the next level using case insensitive with preg_match_all
$string = "My longest text to test in my school that is very close to mY village" ;
var_dump(__search("My",$string));
Output
array
0 => string 'My longest' (length=10)
1 => string 'my school' (length=9)
2 => string 'mY village' (length=10)
Function used
function __search($search,$string)
{
$result = array();
preg_match_all('/' . preg_quote($search) . '\s+\w+/i', $string, $result);
return $result[0];
}
There are simpler ways to do that. String functions are useful if you don't want to look for something specific, but cut out a pre-defined length of something. Else use a regular expression:
preg_match('/My\s+\w+/', $string, $result);
print $result[0];
Here the My looks for the literal first word. And \s+ for some spaces. While \w+ matches word characters.
This adds some new syntax to learn. But less brittle than workarounds and lengthier string function code to accomplish the same.
An easy method would be to split it on whitespace and grab the current array index plus the next one:
// Word to search for:
$findme = "text";
// Using preg_split() to split on any amount of whitespace
// lowercasing the words, to make the search case-insensitive
$words = preg_split('/\s+/', "My longest text to test");
// Find the word in the array with array_search()
// calling strtolower() with array_map() to search case-insensitively
$idx = array_search(strtolower($findme), array_map('strtolower', $words));
if ($idx !== FALSE) {
// If found, print the word and the following word from the array
// as long as the following one exists.
echo $words[$idx];
if (isset($words[$idx + 1])) {
echo " " . $words[$idx + 1];
}
}
// Prints:
// "text to"
I want to CaPiTaLiZe $string in php, don't ask why :D
I made some research and found good answers here, they really helped me.
But, in my case I want to start capitalizing every odd character (1,2,3...) in EVERY word.
For example, with my custom function i'm getting this result "TeSt eXaMpLe" and want to getting this "TeSt ExAmPlE".
See that in second example word "example" starts with capital "E"?
So, can anyone help me? : )
Well I would just make it an array and then put it back together again.
<?php
$str = "test example";
$str_implode = str_split($str);
$caps = true;
foreach($str_implode as $key=>$letter){
if($caps){
$out = strtoupper($letter);
if($out <> " ") //not a space character
$caps = false;
}
else{
$out = strtolower($letter);
$caps = true;
}
$str_implode[$key] = $out;
}
$str = implode('',$str_implode);
echo $str;
?>
Demo: http://codepad.org/j8uXM97o
I would use regex to do this, since it is concise and easy to do:
$str = 'I made some research and found good answers here, they really helped me.';
$str = preg_replace_callback('/(\w)(.?)/', 'altcase', $str);
echo $str;
function altcase($m){
return strtoupper($m[1]).$m[2];
}
Outputs: "I MaDe SoMe ReSeArCh AnD FoUnD GoOd AnSwErS HeRe, ThEy ReAlLy HeLpEd Me."
Example
Here's a one liner that should work.
preg_replace('/(\w)(.)?/e', "strtoupper('$1').strtolower('$2')", 'test example');
http://codepad.org/9LC3SzjC
Try:
function capitalize($string){
$return= "";
foreach(explode(" ",$string) as $w){
foreach(str_split($w) as $k=>$v) {
if(($k+1)%2!=0 && ctype_alpha($v)){
$return .= mb_strtoupper($v);
}else{
$return .= $v;
}
}
$return .= " ";
}
return $return;
}
echo capitalize("I want to CaPiTaLiZe string in php, don't ask why :D");
//I WaNt To CaPiTaLiZe StRiNg In PhP, DoN'T AsK WhY :D
Edited: Fixed the lack of special characters in the output.
This task can be performed without using capture groups -- just use ucfirst().
This is not built to process multibyte characters.
Grab a word character then, optionally, the next character. From the fullstring match, only change the case of the first character.
Code: (Demo) (or Demo)
$strings = [
"test string",
"lado lomidze needs a solution",
"I made some research and found 'good' answers here; they really helped me."
]; // if not already all lowercase, use strtolower()
var_export(preg_replace_callback('/\w.?/', function ($m) { return ucfirst($m[0]); }, $strings));
Output:
array (
0 => 'TeSt StRiNg',
1 => 'LaDo LoMiDzE NeEdS A SoLuTiOn',
2 => 'I MaDe SoMe ReSeArCh AnD FoUnD \'GoOd\' AnSwErS HeRe; ThEy ReAlLy HeLpEd Me.',
)
For other researchers, if you (more simply) just want to convert every other character to uppercase, you could use /..?/ in your pattern, but using regex for this case would be overkill. You could more efficiently use a for() loop and double-incrementation.
Code (Demo)
$string = "test string";
for ($i = 0, $len = strlen($string); $i < $len; $i += 2) {
$string[$i] = strtoupper($string[$i]);
}
echo $string;
// TeSt sTrInG
// ^-^-^-^-^-^-- strtoupper() was called here