Php: String indexing inconsistant? - php

I have created a function which randomly generates a phrase from a hardcoded list of words. I have a function get_words() which has a string of hardcoded words, which it turns into an array then shuffles and returns.
get_words() is called by generate_random_phrase(), which iterates through get_words() n times, and on every iteration concatenates the n word into the final phrase which is destined to be returned to the user.
My problem is, for some reason PHP keeps giving me inconsistent results. It does give me words which are randomized, but it gives inconsistent number of words. I specify 4 words as the default and it gives me phrases ranging from 1-4 words instead of 4. This program is so simple it is almost unbelievable I can't pinpoint the exact issue. It seems like the broken link in the chain is the $words array which is being indexed, it seems like for some reason sometimes the indexing fails. I am unfamiliar with PHP, can someone explain this to me?
<?php
function generate_random_phrase() {
$words = get_words();
$number_of_words = get_word_count();
$phrase = "";
$symbols = "!##$%^&*()";
echo print_r($phrase);
for ($i = 0;$i < $number_of_words;$i++) {
$phrase .= " ".$words[$i];
}
if (isset($_POST['include_numbers']))
$phrase = $phrase.rand(0, 9);
if (isset($_POST['include_symbols']))
$phrase = $phrase.$symbols[rand(0, 9)];
return $phrase;
}
function get_word_count() {
if ($_POST['word_count'] < 1 || $_POST['word_count'] > 9)
$word_count = 4; #default
else
$word_count = $_POST['word_count'];
return $word_count;
}
function get_words() {
$BASE_WORDS = "my sentence really hope you
like narwhales bacon at midnight but only
ferver where can paper laptops spoon door knobs
head phones watches barbeque not say";
$words = explode(' ', $BASE_WORDS);
shuffle($words);
return $words;
}
?>

In $BASE_WORDS your tabs and new lines are occupying a space in the exploded array that's why. Remove the newlines and tabs and it'll generate the correct answer. Ie:
$BASE_WORDS = "my sentence really hope you like narwhales bacon at midnight but only ferver where can paper laptops spoon door knobs head phones watches barbeque not say";

Your function seems a bit inconsistent since you also include spaces inside the array, thats why when you included them, you include them in your loop, which seems to be 5 words (4 real words with one space index) is not really correct. You could just filter spaces also first, including whitespaces.
Here is the visual representation of what I mean:
Array
(
[0] => // hello im a whitespace, i should not be in here since im not really a word
[1] => but
[2] =>
[3] => bacon
[4] => spoon
[5] => head
[6] => barbeque
[7] =>
[8] =>
[9] => sentence
[10] => door
[11] => you
[12] =>
[13] => watches
[14] => really
[15] => midnight
[16] =>
So when you loop it, you include spaces, in this case. If you got a number of words of 5, you really dont get those 5 words, index 0 - 4 it will look like you only got 3 (1 => but, 3 => bacon, 4 => spoon).
Here is a modified version:
function generate_random_phrase() {
$words = get_words();
$number_of_words = get_word_count();
$phrase = "";
$symbols = "!##$%^&*()";
$words = array_filter(array_map('trim', $words)); // filter empty words
$phrase = implode(' ', array_slice($words, 0, $number_of_words)); // no need for a loop
// this simply gets the array from the first until the desired number of words (0,5 or 0,9 whatever)
// and then implode, just glues all the words with space
// so this ensure its always according to how many words you want
if (isset($_POST['include_numbers']))
$phrase = $phrase.rand(0, 9);
if (isset($_POST['include_symbols']))
$phrase = $phrase.$symbols[rand(0, 9)];
return $phrase;
}

Inconsistent spacing in your words list is the issue.
Here is a fix:
function get_words() {
$BASE_WORDS = "my|sentence|really|hope|you|
|like|narwhales|bacon|at|midnight|but|only|
|ferver|where|can|paper|laptops|spoon|door|knobs|
|head|phones|watches|barbeque|not|say";
$words = explode('|', $BASE_WORDS);
shuffle($words);
return $words;
}

Related

How can multiple identical values be printed from an array in PHP?

I'm trying to create a basic concordance script that will print the ten words before and after the value found inside an array. I did this by splitting the text into an array, identifying the position of the value, and then printing -10 and +10 with the searched value in the middle. However, this only presents the first such occurrence. I know I can find the others by using array_keys (found in positions 52, 78, 80), but I'm not quite sure how to cycle through the matches, since array_keys also results in an array. Thus, using $matches (with array_keys) in place of $location below doesn't work, since you cannot use the same operands on an array as an integer. Any suggestions? Thank you!!
<?php
$text = <<<EOD
The spread of a deadly new virus is accelerating, Chinese President Xi Jinping warned, after holding a special government meeting on the Lunar New Year public holiday.
The country is facing a "grave situation" Mr Xi told senior officials.
The coronavirus has killed at least 42 people and infected some 1,400 since its discovery in the city of Wuhan.
Meanwhile, UK-based researchers have warned of a real possibility that China will not be able to contain the virus.
Travel restrictions have come in place in several affected cities. From Sunday, private vehicles will be banned from central districts of Wuhan, the source of the outbreak.
EOD;
$new = explode(" ", $text);
$location = array_search("in", $new, FALSE);
$concordance = 10;
$top_range = $location + $concordance;
$bottom_range = $location - $concordance;
while($bottom_range <= $top_range) {
echo $new[$bottom_range] . " ";
$bottom_range++;
}
?>
You can just iterate over the values returned by array_keys, using array_slice to extract the $concordance words either side of the location and implode to put the sentence back together again:
$words = explode(' ', $text);
$concordance = 10;
$results = array();
foreach (array_keys($words, 'in') as $idx) {
$results[] = implode(' ', array_slice($words, max($idx - $concordance, 0), $concordance * 2 + 1));
}
print_r($results);
Output:
Array
(
[0] => least 42 people and infected some 1,400 since its discovery in the city of Wuhan.
Meanwhile, UK-based researchers have warned of a
[1] => not be able to contain the virus.
Travel restrictions have come in place in several affected cities. From Sunday, private vehicles will
[2] => able to contain the virus.
Travel restrictions have come in place in several affected cities. From Sunday, private vehicles will be banned
)
If you want to avoid generating similar phrases where a word occurs twice within $concordance words (e.g. indexes 1 and 2 in the above array), you can maintain a position for the end of the last match, and skip occurrences that occur in that match:
$words = explode(' ', $text);
$concordance = 10;
$results = array();
$last = 0;
foreach (array_keys($words, 'in') as $idx) {
if ($idx < $last) continue;
$results[] = implode(' ', array_slice($words, max($idx - $concordance, 0), $concordance * 2 + 1));
$last = $idx + $concordance;
}
print_r($results);
Output
Array
(
[0] => least 42 people and infected some 1,400 since its discovery in the city of Wuhan.
Meanwhile, UK-based researchers have warned of a
[1] => not be able to contain the virus.
Travel restrictions have come in place in several affected cities. From Sunday, private vehicles will
)
Demo on 3v4l.org
Try this:
<?php
$text = <<<EOD
The spread of a deadly new virus is accelerating, Chinese President Xi Jinping warned, after holding a special government meeting on the Lunar New Year public holiday.
The country is facing a "grave situation" Mr Xi told senior officials.
The coronavirus has killed at least 42 people and infected some 1,400 since its discovery in the city of Wuhan.
Meanwhile, UK-based researchers have warned of a real possibility that China will not be able to contain the virus.
Travel restrictions have come in place in several affected cities. From Sunday, private vehicles will be banned from central districts of Wuhan, the source of the outbreak.
EOD;
$words = explode(" ", $text);
$concordance = 10; // range -+
$result = []; // result array
$index = 0;
if (count($words) === 0) // be sure there is no empty array
exit;
do {
$location = array_search("in", $words, false);
if (!$location) // break loop if $location not found
break;
$count = count($words);
// check range of array indexes
$minRange = ($location - $concordance > 0) ? ($location-$concordance) : 0; // array can't have index less than 0 (shorthand if)
$maxRange = (($location + $concordance) < ($count - 1)) ? ($location+$concordance) : $count - 1; // array can't have index equal or higher than array count (shorthand if)
for ($range = $minRange; $range < $maxRange; $range++) {
$result[$index][] = $words[$range]; // group words by index
}
unset($words[$location]); // delete element which contain "in"
$words = array_values($words); // reindex array
$index++;
} while ($location); // repeat until $location exist
print_r($result); // <--- here's your results
?>

Extracting a substring from a string of random letters

I have string that is random in nature example 'CBLBTTCCBB'. My goal is to count the occurrence of the string CTLBT in the string CBLBTTCCBB. Repetition of letters used for checking is not allowed. For example, once CTLBT is formed, the remaining random letters for the next iteration will be BCCB.
The scenario is that we have scratch card where users can win letters to form the word CTLBT. Based on the records of the user the letters that he won are in a string CBLBTTCCBB that is ordered from left to right based on the purchase of the scratch card.
I thought of using strpos but it seems inappropriate since it uses the exact arrangement of the substring from larger string.
Any thoughts on how to solve this?
Thanks!
Note:
Question is not a duplicate of How to count the number of occurrences of a substring in a string? since the solution posted in the given link is different. substr_count counts the occurrence of a substring from a string that assumes the string is in a correct order in which the substring will be formed.
Probably then instead strpos you can use preg_replace then:
function rand_substr_count($haystack, $needle)
{
$result = $haystack;
for($i=0; $i<strlen($needle); $i++) {
$result = preg_replace('/'.$needle[$i].'/', '', $result, 1);
}
if (strlen($haystack) - strlen($result) == strlen($needle)) {
return 1 + rand_substr_count($result, $needle);
} else {
return 0;
}
}
echo rand_substr_count("CBLBTTCCBB", "CTLBT");
If I understood correctly I would do this (with prints for showing the results):
<?
# The string to test
$string="CBLBTTCCBBTLTCC";
# The winning word
$word="CTLBT";
# Get the letters from the word to use them as unique array keys
$letters=array_unique(str_split($word));
print("Letters needed are:\n".print_r($letters,1)."\n");
# Initialize the work array
$match=array("count" => array(),"length"=> array());
# Iterate over the keys to build the array with the number of time the letter is occuring
foreach ($letters as $letter) {
$match['length'][$letter] = substr_count($word,$letter); #Number of time this letter appears in the winning word
$match['count'][$letter] = floor(substr_count($string,$letter) / $match['length'][$letter]); # count the letter (will be 0 if none present) and divide by the number of time it should appear, floor it so we have integer
}
print("Count of each letter divided by their appearance times:\n".print_r($match['count'],1)."\n");
# Get the minimum of all letter to know the number of times we can make the winning word
$wins = min($match['count']);
# And print the result
print("Wins: $wins\n");
?>
wich output:
Letters needed are:
Array
(
[0] => C
[1] => T
[2] => L
[3] => B
)
Count of each letter divided by their appearance times:
Array
(
[C] => 5
[T] => 2
[L] => 2
[B] => 4
)
Wins: 2
As you wish to count the combination regardless of the order, the minimum count of letter will be the number of times the user win, if one letter is not present, it will be 0.
I let you transform this into a function and clean the print lines you don't wish ;)

Splitting string by fixed length

I am looking for ways to split a string of a unicode alpha-numeric type to fixed lenghts.
for example:
992000199821376John Smith 20070603
and the array should look like this:
Array (
[0] => 99,
[1] => 2,
[2] => 00019982,
[3] => 1376,
[4] => "John Smith",
[5] => 20070603
)
array data will be split like this:
Array[0] - Account type - must be 2 characters long,
Array[1] - Account status - must be 1 character long,
Array[2] - Account ID - must be 8 characters long,
Array[3] - Account settings - must be 4 characters long,
Array[4] - User Name - must be 20 characters long,
Array[5] - Join Date - must be 8 characters long.
Or if you want to avoid preg:
$string = '992000199821376John Smith 20070603';
$intervals = array(2, 1, 8, 4, 20, 8);
$start = 0;
$parts = array();
foreach ($intervals as $i)
{
$parts[] = mb_substr($string, $start, $i);
$start += $i;
}
$s = '992000199821376Николай Шмидт 20070603';
if (preg_match('~(.{2})(.{1})(.{8})(.{4})(.{20})(.{8})~u', $s, $match))
{
list (, $type, $status, $id, $settings, $name, $date) = $match;
}
Using the substr function would do this quite easily.
$accountDetails = "992000199821376John Smith 20070603";
$accountArray = array(substr($accountDetails,0,2),substr($accountDetails,2,1),substr($accountDetails,3,8),substr($accountDetails,11,4),substr($accountDetails,15,20),substr($accountDetails,35,8));
Should do the trick, other than that regular expressions (as suggested by akond) is probably the way to go (and more flexible). (Figured this was still valid as an alternate option).
It is not possible to split a unicode string in a way you ask for.
Not possible without making the parts invalid.
Some code points have no way of standing out, for example: שׁ is 2 code points (and 4 bytes in UTF-8 and UTF-16) and you cannot split it because it is undefined.
When you work with unicode, "character" is a very slippery term. There are code points, glyphs, etc. See more at http://www.utf8everywhere.org, the part on "length of a string"

Reading CSV file with unescaped enclosures

I am reading a CSV file but some of the values are not escaped so PHP is reading it wrong. Here is an example of a line that is bad:
" 635"," ","AUBREY R. PHILLIPS (1920- ) - Pastel depicting cottages in
a steep sided river valley, possibly North Wales, signed and dated
2000, framed, 66cm by 48cm. another of a rural landscape, titled verso
"Harvest Time, Somerset" signed and dated '87, framed, 69cm by 49cm.
(2) NB - Aubrey Phillips is a Worcestershire artist who studied at
the Stourbridge School of Art.","40","60","WAT","Paintings, prints and
watercolours",
You can see Harvest Time, Somerset has quotes around it, causing PHP to think its a new value.
When i do print_r() on each line, the broken lines end up looking like this:
Array
(
[0] => 635
[1] =>
[2] => AUBREY R. PHILLIPS (1920- ) - Pastel depicting cottages in a steep sided river valley, possibly North Wales, signed and dated 2000, framed, 66cm by 48cm. another of a rural landscape, titled verso Harvest Time
[3] => Somerset" signed and dated '87
[4] => framed
[5] => 69cm by 49cm. (2) NB - Aubrey Phillips is a Worcestershire artist who studied at the Stourbridge School of Art."
[6] => 40
[7] => 60
[8] => WAT
[9] => Paintings, prints and watercolours
[10] =>
)
Which is obviously wrong, as it now contains many more array elements than other correct rows.
Here is the PHP i am using:
$i = 1;
if (($file = fopen($this->request->data['file']['tmp_name'], "r")) !== FALSE) {
while (($row = fgetcsv($file, 0, ',', '"')) !== FALSE) {
if ($i == 1){
$header = $row;
}else{
if (count($header) == count($row)){
$lots[] = array_combine($header, $row);
}else{
$error_rows[] = $row;
}
}
$i++;
}
fclose($file);
}
Rows with the wrong amount of values get put into $error_rows and the rest get put into a big $lots array.
What can I do to get around this? Thanks.
If you know that you'll always get entries 0 and 1, and that the last 5 entries in the array are always correct, so it's just the descriptive entry that's "corrupted" because of unescaped enclosure characters, then you could extract the first 2 and last 5 using array_slice(), implode() the remainder back into a single string (restoring the lost quotes), and rebuild the array correctly.
$testData = '" 635"," ","AUBREY R. PHILLIPS (1920- ) - Pastel depicting cottages in a steep sided river valley, possibly North Wales, signed and dated 2000, framed, 66cm by 48cm. another of a rural landscape, titled verso "Harvest Time, Somerset" signed and dated \'87, framed, 69cm by 49cm. (2) NB - Aubrey Phillips is a Worcestershire artist who studied at the Stourbridge School of Art.","40","60","WAT","Paintings, prints and watercolours",';
$result = str_getcsv($testData, ',', '"');
$hdr = array_slice($result,0,2);
$bdy = array_slice($result,2,-5);
$bdy = trim(implode('"',$bdy),'"');
$ftr = array_slice($result,-5);
$fixedResult = array_merge($hdr,array($bdy),$ftr);
var_dump($fixedResult);
result is:
array
0 => string ' 635' (length=4)
1 => string ' ' (length=1)
2 => string 'AUBREY R. PHILLIPS (1920- ) - Pastel depicting cottages in a steep sided river valley, possibly North Wales, signed and dated 2000, framed, 66cm by 48cm. another of a rural landscape, titled verso Harvest Time" Somerset" signed and dated '87" framed" 69cm by 49cm. (2) NB - Aubrey Phillips is a Worcestershire artist who studied at the Stourbridge School of Art.' (length=362)
3 => string '40' (length=2)
4 => string '60' (length=2)
5 => string 'WAT' (length=3)
6 => string 'Paintings, prints and watercolours' (length=34)
7 => string '' (length=0)
Not perfect, but possibly good enough
The alternative is to get whoever is generating the csv to properly escape their enclosures
If you can ecape the " in the text like this: \"
and the in fgetcsv use specify th escape char
fgetcsv($file, 0, ',', '"','\');
This is a long shot so don't take i to seriously.
I saw a pattern in the text that all the ',' you want to ignore has a space after it.
Search and replace ', ' with 'FUU' or something unique.
Now parse the csv file. It might get the correct format. You only need to replace 'FUU' back to ', '
:)
You are probably reading the contents of the CSV file as an array of lines, then splitting each line on the comma. This fails since some of the fields also contain commas. One trick that could help you out is to look for ",", which would indicate a field separator which would be unlikely (but not impossible, unfortunately) to occur inside a field.
<?php
$csv = file_get_contents("yourfile.csv");
$lines = split("\r\n", $csv);
echo "<pre>";
foreach($lines as $line)
{
$line = str_replace("\",\"", "\"###\"", $line);
$fields = split("###", $line);
print_r($fields);
}
echo "</pre>";
?>
$csv = explode(' ', $csv);
foreach ($csv as $k => $v) if($v[0] == '"' && substr($v, -1) == '"') {
$csv[$k] = mb_convert_encoding('“' . substr($v, 1, -1) . '”', 'UTF-8', 'HTML-ENTITIES');
}
$csv = implode(' ', $csv);
$csv = str_getcsv($csv);

Get the two most frequent words within several strings

I have a list of phrases and I want to know which two words occurred the most often in all of my phrases.
I tried playing with regex and other codes and I just cannot find the right way to do this.
Can anyone help?
eg:
I am purchasing a wallet
a wallet for 20$
purchasing a bag
I'd know that
a wallet occurred 2 times
purchasing a occurred 2 times
<?
$string = "I am purchasing a wallet a wallet for 20$ purchasing a bag";
//split string into words
$words = explode(' ', $string);
//make chunks block ie [0,1][2,3]...
$chunks = array_chunk($words, 2);
//remove first array element
unset($words[0]);
//make chunks block ie [0,1][2,3]...
//but since first element is removed , the real block will be [1,2][3,4]...
$alternateChunks = array_chunk($words, 2);
//merge both chunks
$totalChunks = array_merge($chunks,$alternateChunks);
$finalChunks = array();
foreach($totalChunks as $t)
{
//change the inside chunk to pharse using +
//+ can be replaced to space, if neeced
//to keep associative working + is used instead of white space
$finalChunks[] = implode('+', $t);
}
//count the words inside array
$result = array_count_values($finalChunks);
echo "<pre>";
print_r($result);
I hesitate to suggest this, as it's an extremely brute force way to go about it:
Take your string of words, explode it using the explode(" ", $string); command, then run it through a for loop checking every two word combination against every two words in the string.
$string = "I am purchasing a wallet a wallet for 20$ purchasing a bag";
$words = explode(" ", $string);
for ($t=0; $t<count($string); $t++)
{
for ($i=0; $i<count($string); $i++)
{
if (($words[$t] . words[$t+1]) == ($words[$i] . $word[$i+1])) {$count[$words[$i].$words[$i+1]]++}
}
}
So the nested for loop steps in, grabs the first two words, compares them to each other set of two consecutive words, then grabs the next two words and does it again. Every answer will have an answer of at least 1 (it will always match itself) but sorting the resulting array by size will give you the most repeated values.
Note that this will run (n-1)*(n-1) iterations, which could get unwieldy FAST.
Place them all into an array, and access them by the current word index and next word index.
I think this should do the trick. It will grab pairs of words, unless you are at the end of the string, where you'll get only one word.
$str = "I purchased a wallet because I wanted a wallet a wallet a wallet";
$words = explode(" ", $str);
$array_results = array();
for ($i = 0; $i<count($words); $i++) {
if ($i < count($words)-1) {
$pair = $words[$i] . " " . $words[$i+1]; echo $pair . "\n";
// Have to check if the key is in use yet to avoid a notice
$array_results[$pair] = isset($array_results[$pair]) ? $array_results[$pair] + 1 : 1;
}
// At the end of the array, just use a single word
else $array_results[$words[$i]] = isset($array_results[$words[$i]]) ? $array_results[$words[$i]] + 1 : 1;
}
// Sort the results
// use arsort() instead to get the highest first
asort($array_results);
// Prints:
Array
(
[I wanted] => 1
[wanted a] => 1
[wallet] => 1
[because I] => 1
[wallet because] => 1
[I purchased] => 1
[purchased a] => 1
[wallet a] => 2
[a wallet] => 4
)
Update changed ++ to +1 above since it wasn't working when tested...
Try to put it with explode into an array and count the values with array_count_values.
<?php
$text = "whatever";
$text_array = explode( ' ', $text);
$double_words = array();
for($c = 1; $c < count($text_array); $c++)
{
$double_words[] = $text_array[$c -1] . ' ' . $text_array[$c];
}
$result = array_count_values($double_words);
?>
I updated it now to two word version. Does this work for you?
array(9) {
["I am"]=> int(1)
["am purchasing"]=> int(1)
["purchasing a"]=> int(2)
["a wallet"]=> int(2)
["wallet a"]=> int(1)
["wallet for"]=> int(1)
["for 20$"]=> int(1)
["20$ purchasing"]=> int(1)
["a bag"]=> int(1)
}
Since you used the excel tag, I thought I'd give it a shot, and it's actually really easy.
Split string using space as delimiter. Data > Text to Columns... > Delimited > Delimiter: Space. Each word is now in its own cell.
Transpose the result (not strictly required but much easier to visualize). Copy, Edit > Paste Special... > Transpose.
Make cells containing consecutive word pairs. So if your words are in cells B5:B15, cell C5 should be =B5&" "&B6 (and drag down).
Count occurence of each word pair: In cell D5, =COUNTIF($C$5:$C$15,"="&C5), drag down.
Highlight the winner(s). Select C5:D15, Format > Conditional Formatting... > Formula Is =$D5=MAX($D$5:$D$15) and choose e.g. a yellow background.
Note that there is some inefficiency in step 4 because the count of each word pair will be calculated multiple times if that word pair occurs multiple times. If this is a concern, then you can first make a list of unique word pairs using Data > Filter > Advanced Filter... > Unique records only.
An automated VBA solution could easily be crafted by recording a macro of the above followed by some minor editing.
One way to go about it is to use SPLIT or a regex to split the sentences into words and store each into an array. Then take the array and create a dictionary object. When you add a term to the dictionary, if it's already there, add 1 to the .value to tally the count.
Here is some example code (far from perfect as it's just to show the overlying concept) that will take all the string in column A and generate a word frequency list in columns B and C. It's not exactly what you want, but should give you some ideas on how you can go about doing it I hope:
Sub FrequencyList()
Dim vArray As Variant
Dim myDict As Variant
Set myDict = CreateObject("Scripting.Dictionary")
Dim i As Long
Dim cell As range
With myDict
For Each cell In range("A1", cells(Rows.count, "A").End(xlUp))
vArray = Split(cell.Value, " ")
For i = LBound(vArray) To UBound(vArray)
If Not .exists(vArray(i)) Then
.Add vArray(i), 1
Else
.Item(vArray(i)) = .Item(vArray(i)) + 1
End If
Next
Next
range("B1").Resize(.count).Value = Application.Transpose(.keys)
range("C1").Resize(.count).Value = Application.Transpose(.items)
End With
End Sub

Categories