Remove lines contains specific number of words - php

My question can be understood with an example given below :
Suppose This is the text file, which contains these lines :
hello this is my word file and this is line number 1
hello this is second line and this is some text
hello this is third line and again some text
jhasg djgha sdgasjhgdjasgh jdkh
sdhgfkjg sdjhgf sjkdghf sdhf
s hdg fjhsgd fjhgsdj gfj ksdgh
I want to get each line into a variable
then get all the words of that line into an array
then compare that array which contains words of that line WITH all the words of next lines
if the number of matches of words is more than 3 that line is deleted
so in the above example the output should be :
hello this is my word file and this is line number 1
jhasg djgha sdgasjhgdjasgh jdkh
sdhgfkjg sdjhgf sjkdghf sdhf
s hdg fjhsgd fjhgsdj gfj ksdgh
Because hello this is line is more than 3 words, so the lines containing those words are deleted. Please note that the first line is not deleted because it is unique....
I tried to code myself and created a mess which created 200mb text file with the unlimited number of first line text. Anyways here is the code, dont execute it else you can end up having your hard disk full.
<?php
$fileA = fopen("names.txt", "r");
$fileB = fopen("anothernames.txt", "r");
$fileC = fopen("uniquenames.txt", "w");
while(!feof($fileA))
{
$line = fgets($fileA);
$words = explode(" ", $line);
$size = count($words);
while(!feof($fileA))
{
$line1 = fgets($fileB);
$words1 = explode(" ", $line1);
$size1 = count($words1);
$c=0;
for($i=0; $i<$size; $i++)
{
for($j=0; $j<$size1; $j++)
{
if($words[$i]==$words1[$j])
$c++;
}
}
if($c<3)
fwrite($fileC, $line);
}
}
fclose($fileA);
fclose($fileB);
fclose($fileC);
?>
Thanks

An easy approach would be the following:
read all the lines, using file()
create an array, containing the sentence, indexed by each word.
finally build a blacklist of every sentence which appears in any of the arrays, counting more than 3 entries for any word.
Then print every line, except the blacklisted:
Example:
<?php
$lines = array("hello this is my word file and this is line number 1",
"hello this is second line and this is some text",
"hello this is third line and again some text",
"jhasg djgha sdgasjhgdjasgh jdkh",
"sdhgfkjg sdjhgf sjkdghf sdhf",
"s hdg fjhsgd fjhgsdj gfj ksdgh");
//$lines = file("path/to/file");
$result = array();
//build "count-per-word" array
foreach ($lines AS $line){
$words = explode(" ", $line);
foreach ($words AS $word){
$word = strtolower($word);
if (isset($result[$word]))
$result[$word][] = $line;
else
$result[$word] = array($line);
}
}
//Blacklist each sentence, containing a word appearing in 3 sentences.
$blacklist = array();
foreach ($result AS $word => $entries){
if (count($entries) >= 3){
foreach($entries AS $entry){
$blacklist[] = $entry;
}
}
}
//list all not blacklisted.
foreach ($lines AS $line){
if (!in_array($line, $blacklist))
echo $line."<br />";
}
?>
Output:
jhasg djgha sdgasjhgdjasgh jdkh
sdhgfkjg sdjhgf sjkdghf sdhf
s hdg fjhsgd fjhgsdj gfj ksdgh
Note, that this will also blacklist a single sentence containing 3 times the same word, such as "Foo Foo Foo bar".
To aovid this, check if the line is already "known" for a certain word before pushing it to the array:
foreach ($words AS $word){
if (isset($result[$word])){
if (!in_array($line, $result[$word])){
$result[$word][] = $line;
}
}else
$result[$word] = array($line);
}

#second
while(!feof($fileA))
#should be
while(!feof($fileB))
and
if($c<3)
fwrite($fileC, $line);
#should
if($c<3){
fwrite($fileC, $line);
continue 2;
}
but
then compare that array which contains words of that line WITH all the words of next lines
makes only sence when you compare the file with itself!
EDIT:my post makes no sence at all, read note from prev post!

Why not just array_intersect?
php > $l1 = 'hello this is my word file and this is line number 1';
php > $l2 = 'hello this is second line and this is some text';
php > $a1 = explode(" ", $l1);
php > $a2 = explode(" ", $l2);
php > var_dump(array_intersect($a1, $a2));
array(7) {
[0]=>
string(5) "hello"
[1]=>
string(4) "this"
[2]=>
string(2) "is"
[6]=>
string(3) "and"
[7]=>
string(4) "this"
[8]=>
string(2) "is"
[9]=>
string(4) "line"
}
if (count of intersection >= 3) {
skip line
}
Or am I reading your "matching" too loosely?

Related

Split a string on every nth character and ensure that all segment strings have the same length

I want to split the following string into 3-letter elements. Additionally, I want all elements to have 3 letters even when the number of characters in the inout string cannot be split evenly.
Sample string with 10 characters:
$string = 'lognstring';
The desired output:
$output = ['log','nst','rin','ing'];
Notice how the in late in the inout string is used a second time to make the last element "full length".
Hope this help you.
$str = 'lognstring';
$arr = str_split($str, 3);
$array1= $arr;
array_push($array1,substr($str, -3));
print_r($array1);
$str = 'lognstring';
$chunk = 3;
$arr = str_split($str, $chunk); //["log","nst","rin","g"]
if(strlen(end($arr)) < $chunk) //if last item string length is under $chunk
$arr[count($arr)-1] = substr($str, -$chunk); //replace last item to last $chunk size of $str
print_r($arr);
/**
array(4) {
[0]=>
string(3) "log"
[1]=>
string(3) "nst"
[2]=>
string(3) "rin"
[3]=>
string(3) "ing"
}
*/
Differently from the earlier posted answers that blast the string with str_split() then come back and mop up the last element if needed, I'll demonstrate a technique that will populate the array of substrings in one clean pass.
To conditionally reduce the last iterations starting point, either use a ternary condition or min(). I prefer the syntactic brevity of min().
Code: (Demo)
$string = 'lognstring';
$segmentLength = 3;
$totalLength = strlen($string);
for ($i = 0; $i < $totalLength; $i += $segmentLength) {
$result[] = substr($string, min($totalLength - $segmentLength, $i), $segmentLength);
}
var_export($result);
Output:
array (
0 => 'log',
1 => 'nst',
2 => 'rin',
3 => 'ing',
)
Alternatively, you can prepare the string BEFORE splitting (instead of after).
Code: (Demo)
$extra = strlen($string) % $segmentLength;
var_export(
str_split(
$extra
? substr($string, 0, -$extra) . substr($string, -$segmentLength)
: $string,
$segmentLength
)
);

Find the word in a sentence with maximum specific character count

I am new to PHP Development and finally with the help of SO I am able to write a program for finding word in a sentence with maximum specific character count.
Below is what I have tried:
<?php
// Program to find the word in a sentence with maximum specific character count
// Example: "O Romeo, Romeo, wherefore art thou Romeo?”
// Solution: wherefore
// Explanation: Because "e" came three times
$content = file_get_contents($argv[1]); // Reading content of file
$max = 0;
$arr = explode(" ", $content); // entire array of strings with file contents
for($x =0; $x<count($arr); $x++) // looping through entire array
{
$array[$x] = str_split($arr[$x]); // converting each of the string into array
}
for($x = 0; $x < count($arr); $x++)
{
$count = array_count_values($array[$x]);
$curr_max = max($count);
if($curr_max > $max)
{
$max = $curr_max;
$word = $arr[$x];
}
}
echo $word;
?>
Question: Since I am new to PHP development I don't know the optimization techniques. Is there anyway I can optimize this code? Also, Can I use regex to optimize this code further? Kindly guide.
I love coding this type of mini-challenges in the minimum lines of code :D. So here is my solution:
function wordsWithMaxCharFrequency($sentence) {
$words = preg_split('/\s+/', $sentence);
$maxCharsFrequency = array_map (function($word) {
return max(count_chars(strtolower($word)));
}, $words);
return array_map(function($index) use($words) {
return $words[$index];
}, array_keys($maxCharsFrequency, max($maxCharsFrequency)));
}
print_r(wordsWithMaxCharFrequency("eeee yyyy"));
//Output: Array ( [0] => eeee [1] => yyyy )
print_r(wordsWithMaxCharFrequency("xx llll x"));
//Output: Array ( [0] => llll )
Update1:
If you want to get only A-Za-z words use the following code:
$matches = [];
//a word is either followed by a space or end of input
preg_match_all('/([a-z]+)(?=\s|$)/i', $sentence, $matches);
$words = $matches[1];
Just a contribution that could inspire you :D!
Good Luck.

PHP script to go through each line of CSV and use IF statement

I am new to PHP and am trying to create a script that goes through a CSV.
For each row (excluding the headers), checks to see if column 2 and 3 (total rows being 0,1,2,3) when combined, are greater or equal to 1; then display a "1" in column 1. If column 2 and 3 are less than 1, then display "0" in column 1.
An example of the CSV is displayed below:-
sku,is_in_stock,warehouse_3,warehouse_4
AP-STYLUS,1,20,5
RC-3049,0,0,0
NFNC-FLAT-CAP,1,20,20
NFNC-HOOD14-ZIP-S,1,0,5
How can this be done?
You need to replace file.csv with the real filename.
<?php
$str = file_get_contents("file.csv");
//$str = "sku,is_in_stock,warehouse_3,warehouse_4
AP-STYLUS,0,20,5
RC-3049,0,0,0
NFNC-FLAT-CAP,0,20,20
NFNC-HOOD14-ZIP-S,1,0,5";
$arr = explode("\n", $str);
$result = array();
Foreach($arr as $line){
$linearr = explode(",", $line);
if(is_numeric($linearr[2])){
if($linearr[2]+$linearr[3]>=1){
$linearr[1]="1";
$line = implode("," , $linearr);
}else{
$linearr[1]="0";
$line = implode("," , $linearr);
}
}
$result[]=$line;
}
$newstr = implode("\n", $result);
file_put_contents("file.csv", $newstr);
?>
Edit, sorry forgot about the "0" part.
https://3v4l.org/63vmh

PHP break string into two parts

Note: I can't use break or next line functions as i am using FPDF
I am having a problem with php strings. I am having a string where i want to show atmost 12 characters in first row and remaining in second row. So basically i want to break string into two parts and assign to two variables so that i can print those two variables. I have tried following code :-
if($length > 12)
{
$first400 = substr($info['business_name'], 0, 12);
$theRest = substr($info['business_name'], 11);
$this->Cell(140,22,strtoupper($first400));
$this->Ln();
$this->Cell(140,22,strtoupper($theRest));
$this->Ln();
}
But using this I am getting as shown below :
Original String : The Best Hotel Ever
Output :
The Best Hot
Tel Ever
It is breaking a word, i don't want to break the word, just check the length and if within 12 characters all the words are complete then print next word in next line. Like this :
Desired OutPut:
The Best
Hotel Ever
Any suggestions ?
I see no built-in function to do it, however you could explode on spaces, and re-build your string until the length with the next words get over 12, everything else going to the second part :
$string = 'The Best Hotel Ever';
$exp = explode(' ', $string);
if (strlen($exp[0]) < 12) {
$tmp = $exp[0];
$i = 1;
while (strlen($tmp . ' ' . $exp[$i]) < 12) {
$tmp .= " " . $exp[$i];
$i++;
}
$array[0] = $tmp;
while (isset($exp[$i])) {
$array[1] .= ' ' . $exp[$i];
$i++;
}
$array[1] = trim($array[1]);
} else {
$array[0] = '';
$array[1] = trim(implode (' ', $exp));
}
var_dump($array);
// Output : array(2) { [0]=> string(8) "The Best" [1]=> string(10) "Hotel Ever" }
// $string1 = 'The';
// array(2) { [0]=> string(3) "The" [1]=> string(0) "" }
// $string2 = 'Thebesthotelever';
// array(2) { [0]=> string(0) "" [1]=> string(16) "Thebesthotelever" }
Im not too crash hot on PHP but it seems to be a simple case of which element of the string you are accessing is futher across from where you want to be:
Try:
if($length > 12)
{
$first400 = substr($info['business_name'], 0, 8);
$theRest = substr($info['business_name'], 11);
$this->Cell(140,22,strtoupper($first400));
$this->Ln();
$this->Cell(140,22,strtoupper($theRest));
$this->Ln();
}
For further help check out because you need to remember to count from zero up:
http://php.net/manual/en/function.substr.php

need advice with logic for coding out this php exercise

I've got a list in a text file with the top 1000 words used in the english language. Each line has a list of up to 50 words, like this:
the,stuff,is,thing,hi,bye,hello,a,stuffs
cool,free,awesome,the,pray,is,crime
etc.
I need to write code using that file as input, to make an output file with the a list of pairs of words which appear together in at least fifty different lists. For example, in the above example, THE & IS appear together twice, but every other pair appears only once.
I can't store all possible pairs of words, so no brute force.
I'm trying to learn the language and I'm stuck on this exercise of the book. Please help. Any logic, guidance or code for this would help me.
This is what I have so far. It doesn't do what's intended but I'm stuck:
Code:
//open the file
$handle = fopen("list.txt", 'r');
$count = 0;
$is = 0;
while(!feof($handle)) {
$line = fgets($handle);
$words = explode(',', $line);
echo $count . "<br /><br />";
print_r($words);
foreach ($words as $word) {
if ($word == "is") {
$is++;
}
}
echo "<br /><br />";
$count++;
}
echo "Is count: $is";
//close the file
fclose($handle);
$fp = fopen('output.txt', 'w');
fwrite($fp, "is count: " . $is);
fclose($fp);
This is what I came up with but I think it's too bloated:
plan:
check the first value of the $words array
store the value into $cur_word
store $cur_word as a key in an array ($compare) and
store the counter (line number) as the value of that key
it'll be 1 at this point
see if $cur_word is on each line and if it is then
put the value into $compare with the key as $cur_word
if array has at least 50 values then continue
else go to the next value of the $words array
if it has 50 values then
go to the next value and do the same thing
compare both lists to see how many values match
if it's at least 50 then append
the words to the output file
repeat this process with every word
There are probably 100's of solutions to this problem. Here is one
$contents = file_get_contents("list.txt");
//assuming all words are separated by a , and converting new lines to word separators as well
$all_words = explode(",", str_replace("\n", ",", $contents));
$unique_words = array();
foreach ($all_words as $word) {
$unique_words[$word] = $word;
}
this will give you all the unique words in the file in an array.
You can also use the same technique to count the words
$word_counts = array();
foreach ($all_words as $word) {
if (array_key_exists($word, $word_counts)) {
$word_counts[$word]++;
} else {
$word_counts[$word] = 1;
}
}
then you can loop through and save the results
$fp = fopen("output.txt", "w");
foreach ($word_counts as $word => $count) {
fwrite($fp, $word . " occured " . $count . " times" . PHP_EOL);
}
fclose($fp);

Categories