In php I need to Load a file and get all of the words and echo the word and the number of times each word shows up in the text,
(I also need them to show up in descending order most used words on top) ★✩
Here's an example:
$text = "A very nice únÌcÕdë text. Something nice to think about if you're into Unicode.";
// $words = str_word_count($text, 1); // use this function if you only want ASCII
$words = utf8_str_word_count($text, 1); // use this function if you care about i18n
$frequency = array_count_values($words);
arsort($frequency);
echo '<pre>';
print_r($frequency);
echo '</pre>';
The output:
Array
(
[nice] => 2
[if] => 1
[about] => 1
[you're] => 1
[into] => 1
[Unicode] => 1
[think] => 1
[to] => 1
[very] => 1
[únÌcÕdë] => 1
[text] => 1
[Something] => 1
[A] => 1
)
And the utf8_str_word_count() function, if you need it:
function utf8_str_word_count($string, $format = 0, $charlist = null)
{
$result = array();
if (preg_match_all('~[\p{L}\p{Mn}\p{Pd}\'\x{2019}' . preg_quote($charlist, '~') . ']+~u', $string, $result) > 0)
{
if (array_key_exists(0, $result) === true)
{
$result = $result[0];
}
}
if ($format == 0)
{
$result = count($result);
}
return $result;
}
$words = str_word_count($text, 1);
$word_frequencies = array_count_values($words);
arsort($word_frequencies);
print_r($word_frequencies);
This function uses a regex to find words (you might want to change it, depending on what you define a word as)
function count_words($text)
{
$output = $words = array();
preg_match_all("/[A-Za-z'-]+/", $text, $words); // Find words in the text
foreach ($words[0] as $word)
{
if (!array_key_exists($word, $output))
$output[$word] = 0;
$output[$word]++; // Every time we find this word, we add 1 to the count
}
return $output;
}
This iterates over each word, constructing an associative array (with the word as the key) where the value refers to the occurences of each word. (e.g. $output['hello'] = 3 => hello occured 3 times in the text).
Perhaps you might want to change the function to deal with case insensitivity (i.e. 'hello' and 'Hello' are not the same word, according to this function).
echo count(explode('your_word', $your_text));
Related
I have file with a lot of rows (over 32k). Rows looks like:
34 Item
5423 11Item
44 Item
First digits it is IDs. I want make assoc. array: array("34" => "Item", "5423" => "11Item", "44" => "Item")
IDs can be from 1 to 5 length (1 - 65366)
Name of item can start from with a digit
Minimum one (BUT can be MORE than one) space between IDs and Items name
So main divide is space or certain number of them. Using PHP.
You can use this:
$data = <<<'LOD'
34 Item
5423 11Item
44 Item
546
65535 toto le héros
65536 belzebuth
glups glips
LOD;
$result = array();
$line = strtok($data, "\r\n");
while($line!==false) {
$tmp = preg_split('~\s+~', $line, 2, PREG_SPLIT_NO_EMPTY);
if (count($tmp)==2 && $tmp[0]==(string)(int)$tmp[0] && $tmp[0]<65536)
$result[$tmp[0]] = $tmp[1];
$line = strtok("\r\n");
}
print_r($result);
Here's a method which doesn't check validity of data but might works. It explodes every line according to space(s) and put results in a $res associative array.
For information, preg_split() allows to split a string with a regex.
$res = array();
foreach($lines as $line) {
$data = preg_split('/\s+/', $line);
$res[$data[0]] = $data[1];
}
If you really want to check your conditions you can add some if statement, with the ID limit:
$res = array();
foreach($lines as $line) {
$data = preg_split('/\s+/', $line);
$idx = intval($data[0]);
if($idx > 0 && $idx < 65366) // skip lines where the ID seems invalid
$res[$data[0]] = $data[1];
}
Use preg_match with named capturing groups:
preg_match('/^(?<id>\d+)\s+(?<name>[\w ]+)$/', $row, $matches);
$matches['id'] will contain the ID and $matches['name'] will contain the name.
while (/* get each row */) {
preg_match('/^(?<id>\d+)\s+(?<name>[\w ]+)$/', $row, $matches);
$id = $matches['id'];
$name = $matches['name'];
if ($id > 1 && $id < 65366) {
$arr[$id] = $name;
}
}
print_r($arr);
Example output:
Array
(
[34] => Item
[5423] => 11Item
[44] => Item
[3470] => BLABLA TEF2200
)
Demo
Use http://uk3.php.net/preg_split
i.e.
preg_split("/ +/", $line);
It will return an array of strings.
Sorry for English is not my mother language, maybe the question title is not quite good. I want to do something like this.
$str = array("Lincoln Crown","Crown Court","go holiday","house fire","John Hinton","Hinton Jailed");
here is an array, "Lincoln Crown" contain "Lincoln" and "Crown", so remove next words, which contains these 2 words, and "Crown Court(contain Crown)" has been removed.
in another case. "John Hinton" contain "John" and "Hinton", so "Hinton Jailed(contain Hinton)" has been removed. the final output should be like this:
$output = array("Lincoln Crown","go holiday","house fire","John Hinton");
for my php skill is not good, it is not simply to use array_unique() array_diff(), so open a question for help, thanks.
I think this might work :P
function cool_function($strs){
// Black list
$toExclude = array();
foreach($strs as $s){
// If it's not on blacklist, then search for it
if(!in_array($s, $toExclude)){
// Explode into blocks
foreach(explode(" ",$s) as $block){
// Search the block on array
$found = preg_grep("/" . preg_quote($block) . "/", $strs);
foreach($found as $k => $f){
if($f != $s){
// Place each found item that's different from current item into blacklist
$toExclude[$k] = $f;
}
}
}
}
}
// Unset all keys that was found
foreach($toExclude as $k => $v){
unset($strs[$k]);
}
// Return the result
return $strs;
}
$strs = array("Lincoln Crown","Crown Court","go holiday","house fire","John Hinton","Hinton Jailed");
print_r(cool_function($strs));
Dump:
Array
(
[0] => Lincoln Crown
[2] => go holiday
[3] => house fire
[4] => John Hinton
)
Seems like you would need a loop and then build a list of words in the array.
Like:
<?
// Store existing array's words; elements will compare their words to this array
// if an element's words are already in this array, the element is deleted
// else the element has its words added to this array
$arrayWords = array();
// Loop through your existing array of elements
foreach ($existingArray as $key => $phrase) {
// Get element's individual words
$words = explode(" ", $phrase);
// Assume the element will not be deleted
$keepWords = true;
// Loop through the element's words
foreach ($words as $word) {
// If one of the words is already in arrayWords (another element uses the word)
if (in_array($word, $arrayWords)) {
// Delete the element
unset($existingArray[$key]);
// Indicate we are not keeping any of the element's words
$keepWords = false;
// Stop the foreach loop
break;
}
}
// Only add the element's words to arrayWords if the entire element stays
if ($keepWords) {
$arrayWords = array_merge($arrayWords, $words);
}
}
?>
As I would do in your case:
$words = array();
foreach($str as $key =>$entry)
{
$entryWords = explode(' ', $entry);
$isDuplicated = false;
foreach($entryWords as $word)
if(in_array($word, $words))
$isDuplicated = true;
if(!$isDuplicated)
$words = array_merge($words, $entryWords);
else
unset($str[$key]);
}
var_dump($str);
Output:
array (size=4)
0 => string 'Lincoln Crown' (length=13)
2 => string 'go holiday' (length=10)
3 => string 'house fire' (length=10)
4 => string 'John Hinton' (length=11)
I can imagine quite a few techniques that can provide your desired output, but the logic that you require is poorly defined in your question. I am assuming that whole word matching is required -- so word boundaries should be used in any regex patterns. Case sensitivity isn't mentioned. I am unsure if only fully unique elements (multi-word strings) should have their words entered into the black list. I'll offer a few snippets, but choosing the appropriate technique will depend on exact logical requirements.
Demo
$output = [];
$blacklist = [];
foreach ($input as $string) {
if (!$blacklist || !preg_match('/\b(?:' . implode('|', $blacklist) . ')\b/', $string)) {
$output[] = $string;
}
foreach(explode(' ', $string) as $word) {
$blacklist[$word] = preg_quote($word);
}
}
var_export($output);
Demo
$output = [];
$blacklist = [];
foreach ($input as $string) {
$words = explode(' ', $string);
foreach ($words as $word) {
if (in_array($word, $blacklist)) {
continue 2;
}
}
array_push($blacklist, ...$words);
$output[] = $string;
}
var_export($output);
And my favorite because it performs fewest iterations in the parent loop, is more compact, and doesn't require the declaration/maintenance of a blacklist array.
Demo
$output = [];
while ($input) {
$output[] = $words = array_shift($input);
$input = preg_grep('~\b(?:\Q' . str_replace(' ', '\E|\Q', $words) . '\E)\b~', $input, PREG_GREP_INVERT);
}
var_export($output);
You can explode each string in the original array and then compare per-words using a loop (comparing each word from one array with each word from another, and if they match, remove the whole array).
array_unique() example
<?php
$input = array("a" => "green", "red", "b" => "green", "blue", "red");
$result = array_unique($input);
print_r($result);
?>
output:
Array
(
[a] => green
[0] => red
[1] => blue
)
Source
I have the following code:
<?php
$word = "aeagle";
$letter = "e";
$array = strposall($aegle, $letter);
print_r($array);
function strposall($haystack, $needle) {
$occurrence_points = array();
$pos = strpos($haystack, $needle);
if ($pos !== false) {
array_push($occurrence_points, $pos);
}
while ($pos = strpos($haystack, $needle, $pos + 1)) {
array_push($occurrence_points, $pos);
}
return $occurrence_points;
}
?>
As in the example, if I have aegle as my word and I'm searching for e within it, the function should return an array with the values 1 and 4 in it.
What's wrong with my code?
Why not trying instead
$word = "aeagle";
$letter = "e";
$occurrence_points = array_keys(array_intersect(str_split($word), array($letter)));
var_dump($occurrence_points);
I think you're passing the wrong parameters, shouild be $word instead of $aegle
Little bit more literal than the other answer:
function charpos($str, $char) {
$i = 0;
$pos = 0;
$matches = array();
if (strpos($str, $char) === false) {
return false;
}
while (!!$str) {
$pos = strpos($str, $char);
if ($pos === false) {
$str = '';
} else {
$i = $i + $pos;
$str = substr($str, $pos + 1);
array_push($matches, $i++);
}
}
return $matches;
}
https://ignite.io/code/511ff26eec221e0741000000
Using:
$str = 'abc is the place to be heard';
$positions = charpos($str, 'a');
print_r($positions);
while ($positions) {
$i = array_shift($positions);
echo "$i: $str[$i]\n";
}
Which gives:
Array (
[0] => 0
[1] => 13
[2] => 25
)
0: a
13: a
25: a
Other's have pointed out you're passing the wrong parameters. But you're also reinventing the wheel. Take a look at php's regular expression match-all (whoops, had linked the wrong function), it will already return an array of all matches with offsets, when used with the following flag.
flags
flags can be the following flag:
PREG_OFFSET_CAPTURE
If this flag is passed, for every occurring match the appendant string offset will also be returned. Note that this changes the value of matches into an array where every element is an array consisting of the matched string at offset 0 and its string offset into subject at offset 1.
Use a single letter pattern for the search term $letter = '/e/' and you should get back an array with all your positions as the second element of each result array, which you can then finagle into the output format you're looking for.
Update: Jared points out that you do get the capture of the pattern back, but with the flag set, you also get the offset. As a direct answer to the OP's question, try this code:
$word = "aeagle";
$pattern = "/e/";
$matches = array();
preg_match_all($pattern, $word, $matches, PREG_OFFSET_CAPTURE);
print_r($matches);
It has the following ouput:
Array
(
// Matches of the first pattern: /e/
[0] => Array
(
// First match
[0] => Array
(
// Substring of $word that matched
[0] => e
// Offset into $word where previous substring starts
[1] => 1
)
[1] => Array
(
[0] => e
[1] => 5
)
)
)
The results are 3D instead of 2D because preg_match_all can match multiple patterns at once. The hits are for the first (and in this case: only) pattern supplied and are thus in the first array.
And unlike the OP originally stated, 1 and 5 are the correct indexes of the letter e in the string 'aeagle'
aeagle
012345
^ ^
1 5
Performance wise, the customized version of strposall would probably be faster than a regular expression match. But learning to use an in-built function is almost always faster than developing, testing, supporting and maintaining your own code. And 9 times out of 10, that's the most expensive part of programming.
There is a string variable containing number data , say $x = "OP/99/DIR"; . The position of the number data may change at any circumstance by user desire by modifying it inside the application , and the slash bar may be changed by any other character ; but the number data is mandatory. How to replace the number data to a different number ? example OP/99/DIR is changed to OP/100/DIR.
$string="OP/99/DIR";
$replace_number=100;
$string = preg_replace('!\d+!', $replace_number, $string);
print $string;
Output:
OP/100/DIR
Assuming the number only occurs once:
$content = str_replace($originalText, $numberToReplace, $numberToReplaceWith);
To change the first occurance only:
$content = str_replace($originalText, $numberToReplace, $numberToReplaceWith, 1);
Using regex and preg_replace
$x="OP/99/DIR";
$new = 100;
$x=preg_replace('/\d+/e','$new',$x);
print $x;
The most flexible solution is to use preg_replace_callback() so you can do whatever you want with the matches. This matches a single number in the string and then replaces it for the number plus one.
root#xxx:~# more test.php
<?php
function callback($matches) {
//If there's another match, do something, if invalid
return $matches[0] + 1;
}
$d[] = "OP/9/DIR";
$d[] = "9\$OP\$DIR";
$d[] = "DIR%OP%9";
$d[] = "OP/9321/DIR";
$d[] = "9321\$OP\$DIR";
$d[] = "DIR%OP%9321";
//Change regexp to use the proper separator if needed
$d2 = preg_replace_callback("(\d+)","callback",$d);
print_r($d2);
?>
root#xxx:~# php test.php
Array
(
[0] => OP/10/DIR
[1] => 10$OP$DIR
[2] => DIR%OP%10
[3] => OP/9322/DIR
[4] => 9322$OP$DIR
[5] => DIR%OP%9322
)
I have string:
ABCDEFGHIJK
And I have two arrays of positions in that string that I want to insert different things to.
Array
(
[0] => 0
[1] => 5
)
Array
(
[0] => 7
[1] => 9
)
Which if I decided to add the # character and the = character, it'd produce:
#ABCDE=FG#HI=JK
Is there any way I can do this without a complicated set of substr?
Also, # and = need to be variables that can be of any length, not just one character.
You can use string as array
$str = "ABCDEFGH";
$characters = preg_split('//', $str, -1);
And afterwards you array_splice to insert '#' or '=' to position given by array
Return the array back to string is done by:
$str = implode("",$str);
This works for any number of characters (I am using "#a" and "=b" as the character sequences):
function array_insert($array,$pos,$val)
{
$array2 = array_splice($array,$pos);
$array[] = $val;
$array = array_merge($array,$array2);
return $array;
}
$s = "ABCDEFGHIJK";
$arr = str_split($s);
$arr_add1 = array(0=>0, 1=>5);
$arr_add2 = array(0=>7, 1=>9);
$char1 = '#a';
$char2 = '=b';
$arr = array_insert($arr, $arr_add1[0], $char1);
$arr = array_insert($arr, $arr_add1[1] + strlen($char1), $char2);
$arr = array_insert($arr, $arr_add2[0]+ strlen($char1)+ strlen($char2), $char1);
$arr = array_insert($arr, $arr_add2[1]+ strlen($char1)+ strlen($char2) + strlen($char1), $char2);
$s = implode("", $arr);
print_r($s);
There is an easy function for that: substr_replace. But for this to work, you would have to structure you array differently (which would be more structured anyway), e.g.:
$replacement = array(
0 => '#',
5 => '=',
7 => '#',
9 => '='
);
Then sort the array by keys descending, using krsort:
krsort($replacement);
And then you just need to loop over the array:
$str = "ABCDEFGHIJK";
foreach($replacement as $position => $rep) {
$str = substr_replace($str, $rep, $position, 0);
}
echo $str; // prints #ABCDE=FG#HI=JK
This works by inserting the replacements starting from the end of string. And it would work with any replacement string without having to determine the length of that string.
Working DEMO