I am trying to learn PHP while I write a basic application. I want a process whereby old words get put into an array $oldWords = array(); so all $words, that have been used get inserted using array_push(oldWords, $words).
Every time the code is executed, I want a process that finds a new word from $wordList = array(...). However, I don't want to select any words that have already been used or are in $oldWords.
Right now I'm thinking about how I would go about this. I've been considering finding a new word via $wordChooser = rand (1, $totalWords); I've been thinking of using an if/else statement, but the problem is if array_search($word, $doneWords) finds a word, then I would need to renew the word and check it again.
This process seems extremely inefficient, and I'm considering a loop function but, which one, and what would be a good way to solve the issue?
Thanks
I'm a bit confused, PHP dies at the end of the execution of the script. However you are generating this array, could you also not at the same time generate what words haven't been used from word list? (The array_diff from all words to used words).
Or else, if there's another reason I'm missing, why can't you just use a loop and quickly find the first word in $wordList that's not in $oldWord in O(n)?
function generate_new_word() {
foreach ($wordList as $word) {
if (in_array($word, $oldWords)) {
return $word; //Word hasn't been used
}
}
return null; //All words have been used
}
Or, just do an array difference (less efficient though, since best case is it has to go through the entire array, while for the above it only has to go to the first word)
EDIT: For random
$newWordArray = array_diff($allWords, $oldWords); //List of all words in allWords that are not in oldWords
$randomNewWord = array_rand($newWordArray, 1);//Will get a new word, if there are any
Or unless you're interested in making your own datatype, the best case for this could possibly be in O(log(n))
Related
I've looked into the similar_text() and levenshtein() functions, but they only seem to return THAT there are similarities and the percentage of those similarities.
What I am trying to do is compare 2 strings to determine WHAT is actually similar between the two.
Basically:
<?php
$string1 = "IMG_1";
$string2 = "IMG_2";
echo CompareTheseStrings($string1,$string2); // returns "IMG_";
If this wonderful function doesn't exist, what would be the best way to accomplish this?
My end game plan is to read through a list of file names and then replace the similar text with something user defined or just remove it all together, but I don't want to replace each files unique identifier.
Reading your 'end goal' I think you're going about this completely the wrong way I think you should really be looking at str_replace
$string1 = "IMG_1";
$string2 = "IMG_2";
// you would create a loop here to insert each string
str_replace("IMG", "NEW_TERM", $string1);
If you want to remove the text altogether then just pass an empty string in as the 2nd parameter
I have a loop, which takes a large amount of text in each iteration and replaces specific placeholder ('token') with some other content like so:
$string = $pageContent;
foreach($categories as $row) {
$images = $mdlGallery->getByCategory($row['id']);
if (!empty($images)) {
$plug = Plugin::get('includes/gallery', array('rows' => $images));
$string = str_replace($row['token'], $plug, $string);
}
}
The Plugin class and it's get() method simply takes the right file from a specific directory and outputs buffer as a string.
There might be a large number of categories therefore I wonder whether whether it would be better to first check the input string for an occurrence of the specific 'token' before going through populating all images from a given category using strpos() function like so:
foreach($categories as $row) {
if (strpos($string, $row['token']) !== false) {
$images = $mdlGallery->getByCategory($row['id']);
if (!empty($images)) {
$plug = Plugin::get('includes/gallery', array('rows' => $images));
$string = str_replace($row['token'], $plug, $string);
}
}
}
My concern is the performance - would this help? - consider $string to potentially contain a large number of characters (TEXT field type in MySQL)?
To solve your problem
As per your example code it seems that the files used in Plugin::get() are small in size which means including them or reading them should not incur large performance costs, but if there are a lot of them you may need to consider those costs due to OS queuing mechanisms even if the data they contain is not big.
The getByCategory method should incur large performance costs because it implies many connect->query->read->close communication sequences to the database and each implies the transfer of a large amount of data (the TEXT fields you mentioned).
You should consider fetching the data as a batch operation with one single SQL query and storing it in a cache variable indexed by the row id so that getByCategory can fetch it from the cache.
Your current problem is not a matter of simple code review, it's a matter of approach. You have used a typical technique for small datasets as an approach to handling large datasets. The notion of "wrap a foreach over the simple script" works if you have medium datasets and don't feel a performance decay, if you don't you need a separate approach to handle the large dataset.
To answer your question
Using strpos means running through the entire haystack once to check if it contains the needle, and after that running through it again to do the replace with str_replace.
If the haystack does not contain the needle, strpos === str_replace (in the matter of computational complexity) because both of them have to run through the entire string to the end to make sure no needles are there.
Using both functions adds 100% more computational complexity for any haystack that does not contain the needle and increases the computational complexity anywhere from 1% to 100% more computational complexity for any haystack that does contain the needle because strpos will return right after the first needle found which can be found at the start of the string, the middle or the end.
In short don't use strpos it does not help you here, if you were using preg_replace the RegEx engine might have incurred more computational complexity than strpos for haystacks that do not contain the needle.
Thanks Mihai - that makes a lot of sense, however - in this particular scenario even if I get all of the records from the database first - meaning all of the images with associated categories - it would be rare that the $string would contain more than just one or two 'tokens' - meaning that using strpos() could actually save time if there were many categories ('tokens') to compare against.
Imagine we don't call the getByCategory in each iteration because we already store all possible records in earlier generated array - we still have to go through output buffering inside of the Plugin::get() method and str_replace() - meaning that if we have say 20 categories - this would occur 20 times without necessarily 'token' being included within the $string.
So your suggestion would work if there was suppose to be a large number of 'tokens' found in the $string comparing to the number of categories we are looping through, but for a small number of 'tokens' I think that strpos() would still be beneficial as that would be the only one executed for each category rather then two following when the strpos() returns true - in which case it's a small price to pay in the form of strpos() comparing to ob and str_replace together each time in the loop - don't you think?
I very much appreciate your explanation though.
I think it's better to benchmark stuff by yourself if you are looking for optimization (especially for micro-optimization). Any implementation has more that one variation (usually) so it's better to benchmark your used variation. According to this you can see the benchmark results here:
with strpos: http://3v4l.org/pb4hY#v533
without strpos: http://3v4l.org/v35gT
I have a text file that has item numbers in it (one per line). When an item is scanned by our barcode scanner it gets placed into this text file IF it exists in the order (which is stored in an array...item numbers only, nothing else).
What's happening is that if I have the two item numbers:
C0DB-9700-W
C0DB-9700-WP
If I scan the item C0DB-9700-W first then I can scan the second item just fine, but if I scan C0DB-9700-WP first, it thinks that I've already scanned C0DB-9700-W because that item is a prefix to the item I've already scanned.
I know that strpos only checks for the first occurrence. I was using the following code:
if (strpos($file_array, $submitted ) !==FALSE) {
I switched to using:
if (preg_match('/'.$submitted.'/', $file_array)) {
I thought that by using preg_match I could overcome the problem, but apparently not. I just want PHP to check the EXACT string I give it against items in the array (which I'm getting from the file) to see if it has already been scanned or not. This isn't that hard in my mind but obviously I'm missing something here. How can I coax PHP into looking for the entire string and not giving up when it finds something that will be good enough (or at least what it thinks is good enough)?
Thanks!
Just use in_array:
if (in_array($submitted, $file_array))
FYI, your regex was missing start/end anchors (and the second argument needs to be a string, not an array):
preg_match('/^'.$submitted.'$/', $subject)
There's nothing inexact about C0DB-9700-WP containing a match for C0DB-9700-W. What you're looking for is a regular expression that ensures the string you want is an entire word by itself:
if (preg_match('/\\b'.$submitted.'\\b/', $file_array)) {
For an array of items $file_array:
if (in_array($submitted, $file_array)) {
// Do something...
}
Although in your examples, it looks like your $file_array is a string, so you'd want to do:
$file_array = explode("\n", $file_array);
Let's say I have text file Data.txt with:
26||jim||1990
31||Tanya||1942
19||Bruce||1612
8||Jim||1994
12||Brian||1988
56||Susan||2201
and it keeps going.
It has many different names in column 2.
Please tell me, how do I get the count of unique names, and how many times each name appears in the file using PHP?
I have tried:
$counts = array_count_values($item[1]);
echo $counts;
after exploding ||, but it does not work.
The result should be like:
jim-2,
tanya-1,
and so on.
Thanks for any help...
Read in each line, explode using the delimiter (in this case ||), and add it to an array if it does not already exist. If it does, increment the count.
I won't write the code for you, but here a few pointers:
fread reads in a line
explode will split the line based on a delimiter
use in_array to check if the name has been found before, and to determine whether you need to add the name to the array or just increment the count.
Edit:
Following Jon's advice, you can make it even easier for you.
Read in line-by-line, explode by delimiter and dump all the names into an array (don't worry about checking if it already exists). After you're done, use array_count_values to get every unique name and its frequency.
Here's my take on this:
Use file to read the data file, producing an array where each element corresponds to a line in the input.
Use array_filter with trim as the filter function to remove blank lines from this array. This takes advantage that trim returns a string having removed whitespace from both ends of its argument, leaving the empty string if the argument was all whitespace to begin with. The empty string converts to boolean false -- thus making array_filter disregard lines that are all whitespace.
Use array_map with a callback that involves calling explode to split each array element (line of text) into three parts and returning the second of these. This will produce an array where each element is just a name.
Use array_map again with strtoupper as the callback to convert all names to uppercase so that "jim" and "JIM" will count as the same in the next step.
Finally, use array_count_values to get the count of occurrences for each name.
Code, taking things slowly:
function extract_name($line) {
// The -1 parameter (available as of PHP 5.1.0) makes explode return all elements
// but the last one. We want to do this so that the element we are interested in
// (the second) is actually the last in the returned array, enabling us to pull it
// out with end(). This might seem strange here, but see below.
$parts = explode('||', $line, -1);
return end($parts);
}
$lines = file('data.txt'); // #1
$lines = array_filter($lines, 'trim'); // #2
$names = array_map('extract_name', $lines); // #3
$names = array_map('strtoupper', $names); // #4
$counts = array_count_values($names); // #5
print_r($counts); // to see the results
There is a reason I chose to do this in steps where each steps involves a function call on the result of the previous step -- that it's actually possible to do it in just one line:
$counts = array_count_values(
array_map(function($line){return strtoupper(end(explode('||', $line, -1)));},
array_filter(file('data.txt'), 'trim')));
print_r($counts);
See it in action.
I should mention that this might not be the "best" way to solve the problem in the sense that if your input file is huge (in the ballpark of a few million lines) this approach will consume a lot of memory because it's reading all the input in memory at once. However, it's certainly convenient and unless you know that the input is going to be that large there's no point in making life harder.
Note: Senior-level PHP developers might have noticed that I 'm violating strict standards here by feeding the result of explode to a function that accepts its argument by reference. That's valid criticism, but in my defense I am trying to keep the code as short as possible. In production it would be indeed better to use $a = explode(...); return $a[1]; although there will be no difference as regards the result.
While I do feel that this website's purpose is to answer questions and not do homework assignments, I don't acknowledge the assumption that you are doing your homework, since that fact has not been provided. I personally learned how to program by example. We all learn our own ways, so here is what I would do if I were to attempt to answer your question as accurately as possible, based on the information you have provided.
<?php
$unique_name_count = 0;
$names = array();
$filename = 'Data.txt';
$pointer = fopen($filename,'r');
$contents = fread($pointer,filesize($filename));
fclose($pointer);
$lines = explode("\n",$contents);
foreach($lines as $line)
{
$split_str = explode('|',$line);
if(isset($split_str[2]))
{
$name = strtolower($split_str[2]);
if(!in_array($name,$names))
{
$names[] = $name;
$unique_name_count++;
}
}
}
echo $unique_name_count.' unique name'.(count($unique_name_count) == 1 ? '' : 's').' found in '.$filename."\n";
?>
I have this situation in PHP. I have an array that has these keys for example, wires-1, wires-2, wires-3. I need a function or way for my program to read these keys, and find that the common word is wires? How would that be accomplished in PHP? Thanks for your help.
Take a look at how an autocomplete's functionality works, this is similar to your approach.
I'm sure there's plenty of source codes for autocomplete on google
For the string value of every key in your array:
Throw away all non-alpha characters, i.e. leave only letters such that ctype_alpha($remaining_text) should return true.
Keep an array with the found words as keys, and their frequencies as values, as such:
$array = new array();
function found_word($word)
{ global $array;
if(!isset($array[$word])) { $array[$word] = 1; }
else { $array[$word]++; }
}
Only nicer ;)
Sort the array in reverse by using arsort($array);
$array now contains the most found words as its first elements.
you would have to create every possible suffix of every string you have.
create a map for every suffix you found
count the occurence of every suffix in your string array
you can modify the performance with f.ex. limiting the suffix length