PHP find n-grams in an array - php

I have a PHP array:
$excerpts = array(
'I love cheap red apples',
'Cheap red apples are what I love',
'Do you sell cheap red apples?',
'I want red apples',
'Give me my red apples',
'OK now where are my apples?'
);
I would like to find all the n-grams in these lines to get a result like this:
cheap red apples: 3
red apples: 5
apples: 6
I tried to implode the array and then parse it, but it's stupid because new n-grams can be found because of the concatenation of strings that have nothing to see between each other.
How would you proceed?

I want to find group of words without knowing them before although
with your function I need to provide them before anything
Try this:
mb_internal_encoding('UTF-8');
$joinedExcerpts = implode(".\n", $excerpts);
$sentences = preg_split('/[^\s|\pL]/umi', $joinedExcerpts, -1, PREG_SPLIT_NO_EMPTY);
$wordsSequencesCount = array();
foreach($sentences as $sentence) {
$words = array_map('mb_strtolower',
preg_split('/[^\pL+]/umi', $sentence, -1, PREG_SPLIT_NO_EMPTY));
foreach($words as $index => $word) {
$wordsSequence = '';
foreach(array_slice($words, $index) as $nextWord) {
$wordsSequence .= $wordsSequence ? (' ' . $nextWord) : $nextWord;
if( !isset($wordsSequencesCount[$wordsSequence]) ) {
$wordsSequencesCount[$wordsSequence] = 0;
}
++$wordsSequencesCount[$wordsSequence];
}
}
}
$ngramsCount = array_filter($wordsSequencesCount,
function($count) { return $count > 1; });
I'm assuming you only want repeated group of words.
The ouput of var_dump($ngramsCount); is:
array (size=11)
'i' => int 3
'i love' => int 2
'love' => int 2
'cheap' => int 3
'cheap red' => int 3
'cheap red apples' => int 3
'red' => int 5
'red apples' => int 5
'apples' => int 6
'are' => int 2
'my' => int 2
The code could be optimized to, for instance, use less memory.

The code provided by Pedro Amaral Couto above is very good.
Since I use it for French, I modified the regular expression as follows:
$sentences = preg_split('/[^\s|\pL-\'’]/umi', $joinedExcerpts, -1, PREG_SPLIT_NO_EMPTY);
This way, we can analyze the words containing hyphens and apostrophes ("est-ce que", "j'ai", etc.)

Try this (using the implode, since that's you've mentioned as an attempt):
$ngrams = array(
'cheap red apples',
'red apples',
'apples',
);
$joinedExcerpts = implode("\n", $excerpts);
$nGramsCount = array_fill_keys($ngrams, 0);
var_dump($ngrams, $joinedExcerpts);
foreach($ngrams as $ngram) {
$regex = '/(?:^|[^\pL])(' . preg_quote($ngram, '/') . ')(?:$|[^\pL])/umi';
$nGramsCount[$ngram] = preg_match_all($regex, $joinedExcerpts);
}

Assuming you just want to count the number of occurrences of a string:
$cheapRedAppleCount = 0;
$redAppleCount = 0;
$appleCount = 0;
for($i = 0; $i < count($excerpts); $i++)
{
$cheapRedAppleCount += preg_match_all('cheap red apples', $excerpts[$i]);
$redAppleCount += preg_match_all('red apples', $excerpts[$i]);
$appleCount += preg_match_all('apples', $excerpts[$i]);
}
preg_match_all returns the number of matches in a given string so you can just add the number of matches onto a counter.
preg_match_all for more information.
Apologies if I misunderstood.

Related

How to find index of an array when I know only part of string? [duplicate]

This question already has answers here:
Filter multidimensional array based on partial match of search value
(3 answers)
Closed 3 years ago.
I am trying to find a function which can find the index in an array, when I know just part of the string.
The function array_search returns me the index only in case I know whole string.
How do I get the index/key when I only have a substring of the array items?
$array = array(0 => 'blue pants', 1 => 'red pants', 2 => 'green pants', 3 => 'green pants');
echo array_search('red', $array);
I need to echo 1.
Use foreach() to make an iteration over the array and strpos() for searching your needle in the elements of the array.
$array = [0 => 'blue pants', 1 => 'red pants', 2 => 'green pants', 3 => 'green pants'];
foreach ($array as $key => $value) {
if (strpos($value, 'red') !== false) {
echo "Key={$key}, Value: {$value}";
break;
}
}
Working demo.
I took this function from php.net from n-regen who find the value of array from a part of a needle.
Then using the value found, you can get the value of the index using array_search.
function array_find($needle, $haystack)
{
foreach ($haystack as $item)
{
if (strpos($item, $needle) !== FALSE)
{
return $item;
break;
}
}
}
$array = array(0 => 'blue pants', 1 => 'red pants', 2 => 'green pants', 3 => 'green pants');
$completValue = array_find('red', $array);
echo array_search($completValue , $array);
You can also use preg_grep which is regex for arrays.
This will return all items that is red in your array.
And since it's regex, you can also set the pattern to not include "blue redneck pants", which strpos will have slightly more problem with.
$return = preg_grep("/red/", $array);
var_dump($return);
/*
array(2) {
[1]=>
string(9) "red pants"
[4]=>
string(11) "red t-shirt"
}
*/
https://3v4l.org/IVUaJ
If you want to exclude the blue redneck pants then use the pattern /\bred\b/
https://3v4l.org/o9rGi
To make the pattern pick up "Red", "red", "RED", then use /\bred\b/i
Noticed you want the keys in return. Just do array_keys($return); and you will get the keys where "red" is mentioned.

php explode and calculating the sum of strings

i have an array in my db
example : 1:11 1,12 2,13 3,14 4,15 5,16
i don't need 1:11 in the array
and i don't need the (12,13,14,15,16)
i just need to explode them and get only 1,2,3,4,5 in the array
and then i need to calculate the sum of them
$tr_arr = $this->data['troops_intrap_num']; // this is the above array from db
$explode_arr = explode(" ", $tr_arr); // exploding the array
print_r($explode_arr); // this will print the array and it should look like this
Aray
(
[0] => 1:11
[1] => 1,12
[2] => 2,13
[3] => 3,14
[4] => 4,15
[5] => 5,16
[6] => 0
)
i need to make something like that after exploding
Aray
(
[0] => 1
[1] => 2
[2] => 3
[3] => 4
[4] => 5
)
// and then i need to calculate the sum of the numbers 1+2+3+4+5 = 15 and echo it as 15
the question is what should i do to remove the first string 1:11 and the last one witch is 0 and then remove the (12-13-14-15-16)
so the 1,2,3,4,5 only will be left and then i wan't to calculate the sum of the them
help me
You could simply unset the first value using unset(). The last one can be done in the same manner but this isn't neccasary since array_sum() will simply add zero.
$array = array(
'1:14:aaaa',
'1.5',
'3',
'4.5',
'1.5',
'0',
);
// Unset the first value
unset($array[0]);
// Calculate total amount
$total = array_sum($array);
Edit
Because your array has numbers with a comma as decimal character, you will first need to convert each value.
$array = array_map(function($val) {
// Without removing decimals
// return floatval(str_replace(',', '.', str_replace('.', '', $val)));
// Remove decimals as well
return floor(floatval(str_replace(',', '.', str_replace('.', '', $val))));
}, $array);
Complete answer
$array = array(
'1:11',
'2,12',
'3,13',
'4,14',
'5,15',
'0',
);
// Unset the first value
unset($array[0]);
$array = array_map(function($val) {
// Remove decimals as well
return floor(floatval(str_replace(',', '.', str_replace('.', '', $val))));
}, $array);
// Calculate total amount
$total = array_sum($array);
You can make a blacklist array, and check wether the value is contained in that array. If not, add it up to a variable:
<?php
// The values you don't want.
$blacklist = array( "1:11", "12", "13", "14", "15", "16" );
$tr_arr = $this->data['troops_intrap_num']; // this is the above array from db
$explode_arr = explode(" ", $tr_arr); // exploding the array
$array_sum = 0;
foreach( $explode_arr as $index => $value )
{
if( !in_array($value, $blacklist) )
{
$value = str_replace(",",".", str_replace(".","",$value) );
$array_sum += floatval($value);
}
}
print "the sum of the array is: " . $array_sum;
?>
Not sure if I understand what you want exactly, but here's what I would do. It's a little more complicated to settle for a larger variety of array values so you can use it even if you change it a little.
I'd do a foreach loop of the array, ignore invalid values (the string and the zero), take the valid values, get the whole number preceding the comma and then add it to the sum.
Of course there are easier and shorter variants for this, but I'm not sure how much your code would vary.
$tr_arr = $this->data['troops_intrap_num']; // this is the above array from db
$explode_arr = explode(" ", $tr_arr); // exploding the array
$sum = 0;
foreach($explode_arr as $key => $value) {
if($key == 0 OR !is_numeric($value)) {
// omit the first string, OR alternately omit all not-numbers (not sure if you need to remove the first one specificaly)
continue;
}
if(!$value) {
// omit the zero at the end
continue;
}
$number = (int)substr($value, 0, strpos($value, ','));
$sum = $sum + $number;
}
echo $sum;
$sum will be what you are looking for, if I understood you correctly.
You want to tally all of the values that precede a comma while iterating. strstr() with a third parameter of true will extract the substring before the needle (and return false if there is no needle).
Code: (Demo)
$array = array(
'1:11',
'1,12',
'2,13',
'3,14',
'4,15',
'5,16',
'0',
);
$total = 0;
foreach ($array as $value) {
$total += (int)strstr($value, ',', true);
}
echo $total; // 15

Smartest way to extract a string number and convert in to int

I need to create a function which will be able to extract string representations of numbers and return them as integers but I'm unsure about the most efficient way to do this.
I was thinking that I could possibly have a dictionary of numbers and look for matches in the string.
Or I could trim away anything that came before the word "third" and after the word "ninth" and process the results.
string
"What is the third, fifth, sixth and ninth characters to question A"
desired output
array(3,5,6,9);
Rather ugly code (because of "global"), but simply working
$dict = array('third' => 3, 'fifth' => 5, 'sixth' => 6, 'ninth' => 9);
$string = 'What is the third, fifth, sixth and ninth characters to question A';
$output = null;
if (preg_match_all('/(' . implode('|', array_keys($dict)) . ')/', $string, $output))
$output = array_map(function ($in) { global $dict; return $dict[$in]; }, $output[1]);
print_r($output);
Update
The exact code without use of "global":
$dict = array('third' => 3, 'fifth' => 5, 'sixth' => 6, 'ninth' => 9);
$string = 'What is the third, fifth, sixth and ninth characters to question A';
$output = null;
if (preg_match_all('/(' . implode('|', array_keys($dict)) . ')/', $string, $output))
$output = array_map(function ($in) use ($dict) { return $dict[$in]; }, $output[1]);
print_r($output);
See this, complete work for you!
<?php
function get_numbers($s) {
$str2num = array(
'first' => 1,
'second' => 2,
'third' => 3,
'fourth' => 4,
'fifth' => 5,
'sixth' => 6,
'seventh' => 7,
'eighth' => 8,
'ninth' => 9,
);
$pattern = "/(".implode(array_keys($str2num), '|').")/";
preg_match_all($pattern, $s, $matches);
$ans = array();
foreach($matches[1] as $key) {
array_push($ans, $str2num[$key]);
}
return $ans;
}
var_dump(get_numbers("What is the third, fifth, sixth and ninth characters to question A"));
$string = "What is the first, third, first, first, third, sixth and ninth characters to question A";
$numbers = array('first' => 1, 'second' => 2, 'third' => 3); //...
preg_match_all("(".implode('|',array_keys($numbers)).")", $string, $matches );
$result = array();
foreach($matches[0] as $match){
$result[] = $numbers[$match];
}
var_dump($result);

How to split a comma separated string into groups of 2 each and then convert all these groups to an array

I have a string which is like 1,2,2,3,3,4 etc. First of all, I want to make them into groups of strings like (1,2),(2,3),(3,4). Then how I can make this string to array like{(1,2) (2,3) (3,4)}. Why I want this is because I have a array full of these 1,2 etc values and I've put those values in a $_SERVER['query_string']="&exp=".$exp. So Please give me any idea to overcome this issue or solve.Currently this is to create a group of strings but again how to make this array.
function x($value)
{
$buffer = explode(',', $value);
$result = array();
while(count($buffer))
{
$result[] = sprintf('%d,%d', array_shift($buffer), array_shift($buffer));
}
return implode(',', $result);
}
$result = x($expr);
but its not working towards my expectations
I'm not sure I completely understand. You can create pairs of numbers like:
$string = '1,2,3,4,5,6';
$arr = array_chunk(explode(',', $string), 2);
This will give you something like:
array(
array(1, 2),
array(3, 4),
array(5, 6)
)
If you wanted to turn them into a query string, you'd use http_build_query with some data massaging.
Edit: You can build the query like this (100% UNtested):
$numbers = array_map(function($pair) {
return array($pair[0] => $pair[1]);
}, $arr);
$query_string = '?' . http_build_query($numbers);
This:
echo '<pre>';
$str = '1,2,3,4,5,6,7,8';
preg_match_all('/(\d+,\d+)(?=,*)/', $str, $matches);
$pairs = $matches[0];
print_r($pairs);
Outputs:
Array
(
[0] => 1,2
[1] => 3,4
[2] => 5,6
[3] => 7,8
)

about php... how to convert string to array then show out?

i have this string -
$result = "ABCDE";
and i want to seperate them to 3 parts
(like part 1 = A, part 2 = B, part 3 = C..., part 5 = E)
,give a name to each of them
part 1(A) = Apple
part 2(B) = Orange
part 3(C) = Ice-cream
part 3(D) = Water
part 5(E) = Cow
then the finally output is like
Output : You choose Apple, Orange, Ice-cream, Water, Cow
or like this
$result = "ACE";
Output : You choose Apple, Ice-cream, Cow
i have tried using array
$result = "ABCDE";
$showing = array(A => 'Apple , ', B => 'Orange , ', C => 'Ice-cream , ',
D => 'Water , ', E => 'Cow , ');
echo $showing[$result];
but i got nothing while output, seems array is not working in fixed string.
i want to know how to do it
For one line magic:
echo implode('',array_intersect_key($showing,array_flip(str_split($result))));
You can use the function str_split to split the string into individual letters, which can later be used as keys of your associative array.
Also you don't need to add comma at the end of each string. Instead you can use the implode function to join your values:
$input = "ABCDE";
$showing = array('A' => 'Apple', 'B' => 'Orange', 'C' => 'Ice-cream',
'D' => 'Water', 'E' => 'Cow');
$key_arr = str_split($input);
$val_arr = array();
foreach($key_arr as $key) {
$val_arr[] = $showing[$key];
}
echo "You choose ".implode(',',$val_arr)."\n";
You can access characters from a string similar to elements in an array, like this:
$string = "ABCDE";
echo $string[2]; // "C"
Technically, it's not really treating it like an array, it just uses a similar syntax.
You could use
$choices= array('A' => 'Apple', 'B' => 'Orange', 'C' => 'Ice-cream',
'D' => 'Water', 'E' => 'Cow');
$selected[] = array();
for ($i = 0, $len = strlen($result); $i < $len; $i++) {
$selected[] = $choices[$string[$i]];
}
echo "You have selected: " . implode(', ', $selected);
Although str_split, as others have suggested, would also work.

Categories