I need to check a string to see if any word in it has multiple occurences. So basically I will accept:
"google makes love"
but I don't accept:
"google makes google love" or "google makes love love google" etc.
Any ideas? Really don't know any way to approach this, any help would be greatly appreciated.
Based on Wicked Flea code:
function single_use_of_words($str) {
$words = explode(' ', trim($str)); //Trim to prevent any extra blank
if (count(array_unique($words)) == count($words)) {
return true; //Same amount of words
}
return false;
}
Try this:
function single_use_of_words($str) {
$words = explode(' ', $str);
$words = array_unique($words);
return implode(' ', $words);
}
No need for loops or arrays:
<?php
$needle = 'cat';
$haystack = 'cat in the cat hat';
if ( occursMoreThanOnce($haystack, $needle) ) {
echo 'Success';
}
function occursMoreThanOnce($haystack, $needle) {
return strpos($haystack, $needle) !== strrpos($haystack, $needle);
}
?>
<?php
$words = preg_split('\b', $string, PREG_SPLIT_NO_EMPTY);
$wordsUnique = array_unique($words);
if (count($words) != count($wordsUnique)) {
echo 'Duplicate word found!';
}
?>
The regular expression way would definitely be my choice.
I did a little test on a string of 320 words with Veynom's function and a regular expression
function preg( $txt ) {
return !preg_match( '/\b(\w+)\b.*?\1/', $txt );
}
Here's the test
$time['preg'] = microtime( true );
for( $i = 0; $i < 1000; $i++ ) {
preg( $txt );
}
$time['preg'] = microtime( true ) - $time['preg'];
$time['veynom-thewickedflea'] = microtime( true );
for( $i = 0; $i < 1000; $i++ ) {
single_use_of_words( $txt );
}
$time['veynom-thewickedflea'] = microtime( true ) - $time['veynom-thewickedflea'];
print_r( $time );
And here's the result I got
Array
(
[preg] => 0.197616815567
[veynom-thewickedflea] => 0.487532138824
)
Which suggests that the RegExp solution, as well as being a lot more concise is more than twice as fast. ( for a string of 320 words anr 1000 iterations )
When I run the test over 10 000 iterations I get
Array
(
[preg] => 1.51235699654
[veynom-thewickedflea] => 4.99487900734
)
The non RegExp solution also uses a lot more memory.
So.. Regular Expressions for me cos they've got a full tank of gas
EDIT
The text I tested against has duplicate words, If it doesn't, the results may be different. I'll post another set of results.
Update
With the duplicates stripped out ( now 186 words ) the results for 1000 iterations is:
Array
(
[preg] => 0.235826015472
[veynom-thewickedflea] => 0.2528860569
)
About evens
function Accept($str)
{
$words = explode(" ", trim($str));
$len = count($words);
for ($i = 0; $i < $len; $i++)
{
for ($p = 0; $p < $len; $p++)
{
if ($p != $i && $words[$i] == $words[$p])
{
return false;
}
}
}
return true;
}
EDIT
Entire test script. Note, when printing "false" php just prints nothing but true is printed as "1".
<?php
function Accept($str)
{
$words = explode(" ", trim($str));
$len = count($words);
for ($i = 0; $i < $len; $i++)
{
for ($p = 0; $p < $len; $p++)
{
if ($p != $i && $words[$i] == $words[$p])
{
return false;
}
}
}
return true;
}
echo Accept("google makes love"), ", ", Accept("google makes google love"), ", ",
Accept("google makes love love google"), ", ", Accept("babe health insurance babe");
?>
Prints the correct output:
1, , ,
This seems fairly fast. It would be interesting to see (for all the answers) how the memory usage and time taken increase as you increase the length of the input string.
function check($str) {
//remove double spaces
$c = 1;
while ($c) $str = str_replace(' ', ' ', $str, $c);
//split into array of words
$words = explode(' ', $str);
foreach ($words as $key => $word) {
//remove current word from array
unset($words[$key]);
//if it still exists in the array it must be duplicated
if (in_array($word, $words)) {
return false;
}
}
return true;
}
Edit
Fixed issue with multiple spaces. I'm not sure whether it is better to remove these at the start (as I have) or check each word is non-empty in the foreach.
The simplest method is to loop through each word and check against all previous words for duplicates.
Regular expression with backreferencing
http://www.regular-expressions.info/php.html
http://www.regular-expressions.info/named.html
Related
This is my sample string (this one has five words; in practice, there may be more):
$str = "I want to filter it";
Output that I want:
$output[1] = array("I","want","to","filter","it");
$output[2] = array("I want","want to","to filter","filter it");
$output[3] = array("I want to","want to filter","to filter it");
$output[4] = array("I want to filter","want to filter it");
$output[5] = array("I want to filter it");
What I am trying:
$text = trim($str);
$text_exp = explode(' ',$str);
$len = count($text_exp);
$output[$len][] = $text; // last element
$output[1] = $text_exp; // first element
This gives me the first and the last arrays. How can I get all the middle arrays?
more generic solution that works with any length word:
$output = array();
$terms = explode(' ',$str);
for ($i = 1; $i <= count($terms); $i++ )
{
$round_output = array();
for ($j = 0; $j <= count($terms) - $i; $j++)
{
$round_output[] = implode(" ", array_slice($terms, $j, $i));
}
$output[] = $round_output;
}
You can do that easily with regular expressions that give you the most flexibility. See below for the way that supports dynamic string length and multiple white characters between words and also does only one loop which should make it more efficient for long strings..
<?php
$str = "I want to filter it";
$count = count(preg_split("/\s+/", $str));
$results = [];
for($i = 1; $i <= $count; ++$i) {
$expr = '/(?=((^|\s+)(' . implode('\s+', array_fill(0, $i, '[^\s]+')) . ')($|\s+)))/';
preg_match_all($expr, $str, $matches);
$results[$i] = $matches[3];
}
print_r($results);
You can use a single for loop and if conditions to do
$str = "I want to filter it";
$text = trim($str);
$text_exp = explode(' ',$str);
$len = count($text_exp);
$output1=$text_exp;
$output2=array();
$output3=array();
$output4=array();
$output5=array();
for($i=0;$i<count($text_exp);$i++)
{
if($i+1<count($text_exp) && $text_exp[$i+1]!='')
{
$output2[]=$text_exp[$i].' '.$text_exp[$i+1];
}
if($i+2<count($text_exp) && $text_exp[$i+2]!='')
{
$output3[]=$text_exp[$i].' '.$text_exp[$i+1].' '.$text_exp[$i+2];
}
if($i+3<count($text_exp) && $text_exp[$i+3]!='')
{
$output4[]=$text_exp[$i].' '.$text_exp[$i+1].' '.$text_exp[$i+2].' '.$text_exp[$i+3];
}
if($i+4<count($text_exp) && $text_exp[$i+4]!='')
{
$output5[]=$text_exp[$i].' '.$text_exp[$i+1].' '.$text_exp[$i+2].' '.$text_exp[$i+3].' '.$text_exp[$i+4];
}
}
Is regex the only way? Is it slow?
Something like this?
preg_match("/^(\-){3,}/", $string);
Only dashes
If you want the string to be only dashes, and there must be 3 or more:
$match = (preg_match('/^-{3,}$/', $string) === 1);
Another way without regex that seems to be about 25% slower (isset beats strlen):
$match = (count_chars($string, 3) === '-' && isset($string[2]));
Adjacent dashes
If you want 3 or more dashes in a row, but there may be other characters (e.g. foo---bar):
$match = (strpos($string, '---') !== false);
Some dashes
If you want 3 or more dashes anywhere (e.g. -foo-bar-):
$match = (substr_count($string, '-') >= 3);
there is substr_count function. might be good for counting characters
echo substr_count('fa-r-r', '-'); // outputs 2
You can do that:
function dashtest($str) {
$rep = str_replace('-', '', $str);
$length = strlen($str);
return ( $length>2 && $rep =='' ) ? true : false;
}
an other way:
function dashtest($str) {
for ($i=0 ; $i<strlen($str); $i++) {
if ($str[$i]!='-') return false;
}
if ($i<3) return false;
return true;
}
the regex way:
if (preg_match('~^-{3,}+$~', $str)) { /*true*/} else { /*false*/}
I ran this test and funny thing is regex ist the fastest
<?php
function dashtest($str) {
$rep = str_replace( '-', '', $str );
$length = strlen( $str );
return ( $length < 3 || $rep != '' ) ? false : true;
}
function dashtest2($str) {
for ($i=0 ; $i<strlen($str); $i++) {
if ($str[$i]!='-') return false;
}
if ($i<3) return false;
return true;
}
$string = '------------';
$start = microtime(true);
for ( $i=0; $i<100000; $i++ ) {
dashtest( $string );
}
echo microtime(true) - $start;
echo "\n";
$start = microtime(true);
for ( $i=0; $i<100000; $i++ ) {
dashtest2( $string );
}
echo microtime(true) - $start;
echo "\n";
$start = microtime(true);
for ( $i=0; $i<100000; $i++ ) {
(preg_match('/^-{3,}$/', $string) === 1);
}
echo microtime(true) - $start;
echo "\n";
output:
0.38635802268982
1.5208051204681 <- haha!
0.15313696861267
another try
$string = '----------------------------------------------------------------------------';
0.52067899703979
8.7124900817871
0.17864608764648
regex wins again
I have a string that that is an unknown length and characters.
I'd like to be able to truncate the string after x amount of characters.
For example from:
$string = "Hello# m#y name # is Ala#n Colem#n"
$character = "#"
$x = 4
I'd like to return:
"Hello# m#y name # is Ala#"
Hope I'm not over complicating things here!
Many thanks
I'd suggest explode-ing the string on #, then getting the 1st 4 elements in that array.
$string = "Hello# m#y name # is Ala#n Colem#n";
$character = "#";
$x = 4;
$split = explode($character, $string);
$split = array_slice($split, 0, $x);
$newString = implode($character, $split).'#';
function posncut( $input, $delim, $x ) {
$p = 0;
for( $i = 0; $i < $x; ++ $i ) {
$p = strpos( $input, $delim, $p );
if( $p === false ) {
return "";
}
++ $p;
}
return substr( $input, 0, $p );
}
echo posncut( $string, $character, $x );
It finds each delimiter in turn (strpos) and stops after the one you're looking for. If it runs out of text first (strpos returns false), it gives an empty string.
Update: here's a benchmark I made which compares this method against explode: http://codepad.org/rxTt79PC. Seems that explode (when used with array_pop instead of array_slice) is faster.
Something along these lines:
$str_length = strlen($string)
$character = "#"
$target_count = 4
$count = 0;
for ($i = 0 ; $i<$str_length ; $i++){
if ($string[$i] == $character) {
$count++
if($count == $target_count) break;
}
}
$result = sub_str($string,0,$i)
I found some unexpected behavior in my code, so made two examples to demonstrate what was happening and couldn't figure things out from there. What I found was odd to me, and perhaps I'm missing something.
Goal: Create a random string and avoid anything specified in an array.
In the examples below, I have two methods of testing this.
First I have a function that creates a random string from specified characters ($characters) and then I have an array ($avoid) (here with double letters specified) which then loops and informs you if the code worked and it indeed found what was specified in the array.
This seems to work, however then I modified the second function to attempt to generate a new random string if the same trigger happened. This to avoid having a string with anything in the array.
This part doesn't seem to work.. I'm not sure how else to modify it from here, but I must be missing something. Running the code works, but it catches some things and misses other times.. which I wouldn't expect from code.
function getrandom($loopcount)
{
$loopcount++;
$length = 20;
$characters = 'abc';
$string = '';
for ($p = 0; $p < $length; $p++)
$string.= $characters[ mt_rand( 0,strlen($characters) ) ];
$avoid = array(
'aa',
'bb',
'cc'
);
foreach ($avoid as $word)
if ( stripos($string,$word) )
$string = 'Double '.$word.' Detected:'.$string;
return '<h1 style="color:blue;">'.$string.'<h1>';
}
echo getrandom(0);
echo getrandom(0);
echo getrandom(0);
function getrandom2($loopcount)
{
$loopcount++;
$length = 20;
$characters = 'abc';
$string = '';
for ($p = 0; $p < $length; $p++)
$string.= $characters[ mt_rand( 0,strlen($characters) ) ];
$avoid = array(
'aa',
'bb',
'cc'
);
foreach ($avoid as $word)
if ( stripos($string,$word) )
$string = getrandom2($loopcount);
return '<h1 style="color:green;">'.$string.'<h1>';
}
echo getrandom2(0);
echo getrandom2(0);
echo getrandom2(0);
I used this one
function randomToken($length)
{
srand(date("s"));
$possible_charactors = "abcdefghijklmnopqrstuvwxyz1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ";
$string = "";
while(strlen($string)<$length)
{
$string .= substr($possible_charactors, rand()%strlen($possible_charactors),1);
}
return($string);
}
You need to check stripos() with a tripple operator, otherwise your if will interpret the occurrence at position 0 as false (1)
foreach ($avoid as $word){
if ( stripos($string,$word) !== FALSE){
$string = getrandom2($loopcount);
}
}
(1) http://php.net/manual/en/language.operators.comparison.php
The following worked for me:
print gen_rand_str_avoid('abc', 20, array('aa', 'bb', 'cc'));
function gen_rand_str($chars, $length) {
$str = '';
for ($i = 0; $i < $length; $i++) {
$str .= $chars[mt_rand(0, strlen($chars) - 1)];
}
return $str;
}
function gen_rand_str_avoid($chars, $length, array $avoids) {
while (true) {
$str = gen_rand_str($chars, $length);
foreach ($avoids as $avoid) {
if (stripos($str, $avoid) !== false) {
continue 2;
}
}
break;
}
return $str;
}
Firstly, I want to inform that, what I need is the reverse of in_array PHP function.
I need to search all items of array in the string if any of them found, function will return true otherwise return false.
I need the fastest solution to this problem, off course this can be succeeded by iterating the array and using the strpos function.
Any suggestions are welcome.
Example Data:
$string = 'Alice goes to school every day';
$searchWords = array('basket','school','tree');
returns true
$string = 'Alice goes to school every day';
$searchWords = array('basket','cat','tree');
returns false
You should try with a preg_match:
if (preg_match('/' . implode('|', $searchWords) . '/', $string)) return true;
After some comments here a properly escaped solution:
function contains($string, Array $search, $caseInsensitive = false) {
$exp = '/'
. implode('|', array_map('preg_quote', $search))
. ($caseInsensitive ? '/i' : '/');
return preg_match($exp, $string) ? true : false;
}
function searchWords($string,$words)
{
foreach($words as $word)
{
if(stristr($string," " . $word . " ")) //spaces either side to force a word
{
return true;
}
}
return false;
}
Usage:
$string = 'Alice goes to school every day';
$searchWords = array('basket','cat','tree');
if(searchWords($string,$searchWords))
{
//matches
}
Also take note that the function stristr is used to make it not case-sensitive
As per the example of malko, but with properly escaping the values.
function contains( $string, array $search ) {
return 0 !== preg_match(
'/' . implode( '|', preg_quote( $search, '/' ) ) . '/',
$string
);
}
If string can be exploded using space following will work:
var_dump(array_intersect(explode(' ', $str), $searchWords) != null);
OUTPUT: for 2 examples you've provided:
bool(true)
bool(false)
Update:
If string cannot be exploded using space character, then use code like this to split string on any end of word character:
var_dump(array_intersect(preg_split('~\b~', $str), $searchWords) != null);
There is always debate over what is faster so I thought I'd run some tests using different methods.
Tests Run:
strpos
preg_match with foreach loop
preg_match with regex or
indexed search with string to explode
indexed search as array (string already exploded)
Two sets of tests where run. One on a large text document (114,350 words) and one on a small text document (120 words). Within each set, all tests were run 100 times and then an average was taken. Tests did not ignore case, which doing so would have made them all faster. Test for which the index was searched were pre-indexed. I wrote the code for indexing myself, and I'm sure it was less efficient, but indexing for the large file took 17.92 seconds and for the small file it took 0.001 seconds.
Terms searched for included: gazerbeam (NOT found in the document), legally (found in the document), and target (NOT found in the document).
Results in seconds to complete a single test, sorted by speed:
Large File:
0.0000455808639526 (index without explode)
0.0009979915618897 (preg_match using regex or)
0.0011657214164734 (strpos)
0.0023632574081421 (preg_match using foreach loop)
0.0051533532142639 (index with explode)
Small File
0.000003724098205566 (strpos)
0.000005958080291748 (preg_match using regex or)
0.000012607574462891 (preg_match using foreach loop)
0.000021204948425293 (index without explode)
0.000060625076293945 (index with explode)
Notice that strpos is faster than preg_match (using regex or) for small files, but slower for large files. Other factors, such as the number of search terms will of course affect this.
Algorithms Used:
//strpos
$str = file_get_contents('text.txt');
$t = microtime(true);
foreach ($search as $word) if (strpos($str, $word)) break;
$strpos += microtime(true) - $t;
//preg_match
$str = file_get_contents('text.txt');
$t = microtime(true);
foreach ($search as $word) if (preg_match('/' . preg_quote($word) . '/', $str)) break;
$pregmatch += microtime(true) - $t;
//preg_match (regex or)
$str = file_get_contents('text.txt');
$orstr = preg_quote(implode('|', $search));
$t = microtime(true);
if preg_match('/' . $orstr . '/', $str) {};
$pregmatchor += microtime(true) - $t;
//index with explode
$str = file_get_contents('textindex.txt');
$t = microtime(true);
$ar = explode(" ", $str);
foreach ($search as $word) {
$start = 0;
$end = count($ar);
do {
$diff = $end - $start;
$pos = floor($diff / 2) + $start;
$temp = $ar[$pos];
if ($word < $temp) {
$end = $pos;
} elseif ($word > $temp) {
$start = $pos + 1;
} elseif ($temp == $word) {
$found = 'true';
break;
}
} while ($diff > 0);
}
$indexwith += microtime(true) - $t;
//index without explode (already in array)
$str = file_get_contents('textindex.txt');
$found = 'false';
$ar = explode(" ", $str);
$t = microtime(true);
foreach ($search as $word) {
$start = 0;
$end = count($ar);
do {
$diff = $end - $start;
$pos = floor($diff / 2) + $start;
$temp = $ar[$pos];
if ($word < $temp) {
$end = $pos;
} elseif ($word > $temp) {
$start = $pos + 1;
} elseif ($temp == $word) {
$found = 'true';
break;
}
} while ($diff > 0);
}
$indexwithout += microtime(true) - $t;
try this:
$string = 'Alice goes to school every day';
$words = split(" ", $string);
$searchWords = array('basket','school','tree');
for($x = 0,$l = count($words); $x < $l;) {
if(in_array($words[$x++], $searchWords)) {
//....
}
}
Below prints the frequency of number of elements found from the array in the string
function inString($str, $arr, $matches=false)
{
$str = explode(" ", $str);
$c = 0;
for($i = 0; $i<count($str); $i++)
{
if(in_array($str[$i], $arr) )
{$c++;if($matches == false)break;}
}
return $c;
}
Below link will help you : just need to customize as you required.
Check if array element exists in string
customized:
function result_arrayInString($prdterms,208){
if(arrayInString($prdterms,208)){
return true;
}else{
return false;
}
}
This may be helpful to you.