Help with string parsing - php

I have a huge library file containing a word and it's synonyms, this is some words and their synonyms in the format of my library:
aantarrão|1
igrejeiro|igrejeiro|aantarrão|beato
aãsolar|1
desolar|desolar|aãsolar|afligir|arrasar|arruinar|consternar|despovoar|devastar|magoar
aba|11
amparo|amparo|aba|abrigo|achego|acostamento|adminículo|agasalho|ajuda|anteparo|apadrinhamento|apoio|arrimo|asilo|assistência|auxíjlio|auxílio|baluarte|bordão|broquel|coluna|conchego|defesa|égide|encosto|escora|esteio|favor|fulcro|muro|patrocínio|proteção|proteçâo|resguardo|socorro|sustentáculo|tutela|tutoria
apoio|apoio|aba|adesão|adminículo|amparo|aprovação|arrimo|assentimento|base|bordão|coluna|conchego|descanso|eixo|encosto|escora|espeque|fé|fulcro|proteçâo|proteção|refúgio|socorro|sustentáculo
beira|beira|aba|beirada|borda|bordo|cairel|encosta|extremidade|falda|iminência|margem|orla|ourela|proximidade|rai|riba|sopé|vertente
beirada|beirada|aba|beira|encosta|falda|margem|sopé|vertente
encosta|encosta|aba|beira|beirada|clivo|falda|lomba|sopé|subida|vertente
falda|falda|aba|beira|beirada|encosta|fralda|sopé|vertente
fralda|fralda|aba|falda|raiss|raiz|sopé
prestígio|prestígio|aba|auréola|autoridade|domínio|força|halo|importância|influência|preponderância|valia|valimento|valor
proteção|proteção|aba|abrigo|agasalho|ajuda|amparo|apoio|arrimo|asilo|auspiciar|auxílio|bafejo|capa|custódia|defesa|égide|escora|fautoria|favor|fomento|garantia|paládio|patrocínio|pistolão|quartel|refúgio|socorro|tutela|tutoria
sopé|sopé|aba|base|beira|beirada|encosta|falda|fralda|raiz|vertente
vertente|vertente|aba|beira|beirada|declive|encosta|falda|sopé
see aantarrão is a word and below it are the synonyms, I can't think of a way to get the word and the synonyms on an associative array, this is what I'm trying to do:
<?
$file = file('library.txt');
$array_sinonimos = array();
foreach($file as $k)
{
$explode = explode($k, "|");
if(is_int($explode[1]))
{
$word = $explode[0];
}
}
?>
nothing, lol, what can I do here ? loop lines until I find an empty line then try to get a new word with the explode ?, help !

Here's some code I cooked up that seems to work.
See the code in action here: http://codepad.org/TVpYgW91
See the code here
UPDATED to read line by line
<?php
$filepointer = fopen("library.txt", "rb");
$words = array();
while(!feof($filepointer)) {
$line = trim(fgets($filepointer));
$content = explode("|", $line);
if (count($content) == 0)
continue;
if (is_numeric(end($content))) {
$word = reset($content);
continue;
}
if (isset($words[$word]))
$words[$word] = array_merge($words[$word], $content);
else
$words[$word] = $content;
}
print_r($words);
So what's the strategy?
fix up the line endings
run through the file line by line
ignore empty lines (count($content))
split the line up on the pipes, if the line has a numerical value for the last value, then this becomes our word
we only get to the last step if none of the other traps got touched, because of the continue statements, so if it is then just split up the words by the pipe and add them to or create the array element.

Try this. I can't remember if array_merge() will work with a null, but the basic idea is that $word is the $key to the assoc array.
<?
$file = file('library.txt');
$array_sinonimos = array();
foreach($file as $k)
{
$explode = explode($k, "|");
if(is_int($explode[1]))
{
$word = $explode[0];
}
else if(!empty($explode))
{
$array_sinonimos[$word] = array_merge($synonyms[$word], $explode);
}
}
?>

Related

PHP search for string in text file & return result after specific character

i want search string in text file. find result and return after : character.
input is alex
text file include this item
alex:+123
david:+1345
john:+1456
output is +123
$input = "alex";
file_get_contents("TextFilePath");
//in this step i don't know what should i do
Maybe not the best solution, but you can use file and loop on the array. explode each line to see if the needle was present.
function findInAFile($filename, $needle) {
// read file split on newline
$lines = file($filename);
// check each line and return first occurence
foreach ($lines as $line) {
$arr = explode($needle, $line, 2);
if (isset($arr[1])) {
return $arr[1];
}
}
}
echo findInAFile('file.txt', $input.':');
You can use a regular expression match to locate lines beginning with the given input:
$input = "alex";
$text = file_get_contents("TextFilePath");
if (preg_match('#^' . preg_quote($input) . ':(.*)#m', $text, $match) {
// Found input
var_dump($match[1]);
}

PHP code to create a negative word dictionary and search if a post has negative words

I'm trying to develop a PHP application where it takes comments from users and then match the string to check if the comment is positive or negative. I have list of negative words in negative.txt file. If a word is matched from the word list, then I want a simple integer counter to increment by 1. I tried the some links and created the a code to check if the comment has is negative or positive but it is only matching the last word of the file.Here's the code what i have done.
<?php
function teststringforbadwords($comment)
{
$file="BadWords.txt";
$fopen = fopen($file, "r");
$fread = fread($fopen,filesize("$file"));
fclose($fopen);
$newline_ele = "\n";
$data_split = explode($newline_ele, $fread);
$new_tab = "\t";
$outoutArr = array();
//process uploaded file data and push in output array
foreach ($data_split as $string)
{
$row = explode($new_tab, $string);
if(isset($row['0']) && $row['0'] != ""){
$outoutArr[] = trim($row['0']," ");
}
}
//---------------------------------------------------------------
foreach($outoutArr as $word) {
if(stristr($comment,$word)){
return false;
}
}
return true;
}
if(isset($_REQUEST["submit"]))
{
$comments = $_REQUEST["comments"];
if (teststringforbadwords($comments))
{
echo 'string is clean';
}
else
{
echo 'string contains banned words';
}
}
?>
Link Tried : Check a string for bad words?
I added the strtolower function around both your $comments and your input from the file. That way if someone spells STUPID, instead of stupid, the code will still detect the bad word.
I also added trim to remove unnecessary and disruptive whitespace (like newline).
Finally, I changed the way how you check the words. I used a preg_match to split about all whitespace so we are checking only full words and don't accidentally ban incorrect strings.
<?php
function teststringforbadwords($comment)
{
$comment = strtolower($comment);
$file="BadWords.txt";
$fopen = fopen($file, "r");
$fread = strtolower(fread($fopen,filesize("$file")));
fclose($fopen);
$newline_ele = "\n";
$data_split = explode($newline_ele, $fread);
$new_tab = "\t";
$outoutArr = array();
//process uploaded file data and push in output array
foreach ($data_split as $bannedWord)
{
foreach (preg_split('/\s+/',$comment) as $commentWord) {
if (trim($bannedWord) === trim($commentWord)) {
return false;
}
}
}
return true;
}
1) Your storing $row['0'] only why not others index words. So problem is your ignoring some of word in text file.
Some suggestion
1) Insert the text in text file one by one i.e new line like this so you can access easily explode by newline to avoiding multiple explode and loop.
Example: sss.txt
...
bad
stupid
...
...
2) Apply trim and lowercase function to both comment and bad string.
Hope it will work as expected
function teststringforbadwords($comment)
{
$file="sss.txt";
$fopen = fopen($file, "r");
$fread = fread($fopen,filesize("$file"));
fclose($fopen);
foreach(explode("\n",$fread) as $word)
{
if(stristr(strtolower(trim($comment)),strtolower(trim($word))))
{
return false;
}
}
return true;
}

How can I remove certain lines in a multiline string?

My code is receiving a string which I have no control of, which I'll call $my_string. The string is the contents of a transcript. If I echo the string, like so:
echo $my_string;
I can see the contents, which look something like this.
1
00:00:00.000 --> 00:00:04.980
[MUSIC]
2
00:00:04.980 --> 00:00:08.120
Hi, my name is holl and I am here
to write some PHP.
3
00:00:08.120 --> 00:00:10.277
You can see my screen, here.
What I'd like to do is run this through a function so it's just the actual words spoken, removing all the lines that signify time, or the order.
[MUSIC]
Hi, my name is holl and I am here
to write some php.
You can see my screen, here.
My idea is to explode the whole string by the break, and try to detect which lines which are either empty or start with a number, like so...
$lines = explode("\n", $my_string);
foreach ($lines as $line) {
if (is_numeric(line[0]) || empty(line[0]) ) {
continue;
}
$exclude[] = $line;
}
$transcript = implode("\n", $exclude);
But the result of this action is exactly the same- the output has numbers and blank lines. I clearly misunderstand something- but what is it? And is there a better way of trying to achieve my goal?
Thanks!
EDIT: Removed an echo where I wasn't actually using one in my code.
The problem is that you use indexing on $line:
$lines = explode("\n", $my_string);
foreach ($lines as $line) {
if (is_numeric(line[0]) || empty(line[0]) ) {//index usage?
continue;
}
$exclude[] = $line;
}
$transcript = echo implode("\n", $exclude); //remove echo
replace by:
$lines = explode("\n", $my_string);
foreach ($lines as $line) {
if (is_numeric($line) || empty($line) ) {//here
continue;
}
$exclude[] = $line;
}
$transcript = implode("\n", $exclude);
You also need regex matching to remove the 00:00:00.000 --> 00:00:04.980 fragments.
You can combine them by:
if(preg_match('/^(|\d+|\d+:\d+:\d+\.\d+\s+-->\s+\d+:\d+:\d+\.\d+)$/',$line)) { //regex
takes all possibilities into account:
$lines = explode("\n", $my_string);
foreach ($lines as $line) {
if(preg_match('/^(|\d+|\d+:\d+:\d+\.\d+\s+-->\s+\d+:\d+:\d+\.\d+)$/',$line)) {
continue;
}
$exclude[] = $line;
}
$transcript = implode("\n", $exclude);
echo $transcript;
Example (with php -a):
$ php -a
php > $my_string='1
php ' 00:00:00.000 --> 00:00:04.980
php ' [MUSIC]
php '
php ' 2
php ' 00:00:04.980 --> 00:00:08.120
php ' Hi, my name is holl and I am here
php ' to write some PHP.
php '
php ' 3
php ' 00:00:08.120 --> 00:00:10.277
php ' You can see my screen, here.';
php > $lines = explode("\n", $my_string);
php > foreach ($lines as $line) {
php { if(preg_match('/^(|\d+|\d+:\d+:\d+\.\d+\s+-->\s+\d+:\d+:\d+\.\d+)$/',$line)) {
php { continue;
php { }
php { $exclude[] = $line;
php { }
php > $transcript = implode("\n", $exclude);
php > echo $transcript;
[MUSIC]
Hi, my name is holl and I am here
to write some PHP.
You can see my screen, here.
Your code works almost. Just forgot $ in line[0] and " " is not empty().
$my_string = <<< EOF
1
00:00:00.000 --> 00:00:04.980
[MUSIC]
2
00:00:04.980 --> 00:00:08.120
Hi, my name is holl and I am here
to write some PHP.
3
00:00:08.120 --> 00:00:10.277
You can see my screen, here.
EOF;
$lines = explode("\n", $my_string);
foreach ($lines as $line) {
$temp = trim($line[0]);
if (is_numeric($temp) || empty($temp) ) {
continue;
}
$exclude[] = $line;
}
$transcript = implode("\n", $exclude);
echo $transcript;
Result:
[MUSIC]
Hi, my name is holl and I am here
to write some PHP.
You can see my screen, here.
It looks like it's a pattern. That is every first and second line contain meta data, the third is text, and the fourth is empty. If that is indeed the case, it should be trivial. You don't have to check the content at all and just grab the third line of every quartet:
$lines = explode("\n", $my_string);
$texts = array();
for ($i = 0; $i < count($lines); $i++) {
if ($i % 4 == 2) { // Index of third line is 2, of course.
$texts[] = $lines[i];
}
}
$transcript = implode($texts, "\n");
With alternative logic, because as you rightfully mentioned there can be more than one line of text, you could say that blocks/entries whatever you call them, are separated by an empty line. Each block starts with two lines of meta data, followed by one (or maybe zero) or more lines of text. With that logic you could parse it like this:
$lines = explode("\n", $my_string);
$texts = array();
$linenr = 0;
foreach ($lines as $line) {
// Keep track of the how manieth non-empty line it is.
if ($line === '')
$linenr = 0;
else
$linenr++;
// Skip the first two lines of a block.
if ($linenr > 2)
$texts[] = $line;
}
$transcript = implode($texts, "\n");
I don't know this particular format, but if I wanted to do this, I would be eager to find a pattern like this rather than parse the lines themselves. It looks like a script or subtitle file, and if you want to turn it into a transcript, it would be a shame if somebody shouted '300' and it would not be transcripted.
to remove theses lines try to use : preg_replace + regex
php man [1]: http://php.net/manual/en/function.preg-replace.php

Get value from file - php

Let's say I have this in my text file:
Author:MJMZ
Author URL:http://abc.co
Version: 1.0
How can I get the string "MJMZ" if I look for the string "Author"?
I already tried the solution from another question (Php get value from text file) but with no success.
The problem may be because of the strpos function. In my case, the word "Author" got two. So the strpos function can't solve my problem.
Split each line at the : using explode, then check if the prefix matches what you're searching for:
$lines = file($filename, FILE_IGNORE_NEW_LINES);
foreach($lines as $line) {
list($prefix, $data) = explode(':', $line);
if (trim($prefix) == "Author") {
echo $data;
break;
}
}
Try the following:
$file_contents = file_get_contents('myfilename.ext');
preg_match('/^Author\s*\:\s*([^\r\n]+)/', $file_contents, $matches);
$code = isset($matches[1]) && !empty($matches[1]) ? $matches[1] : 'no-code-found';
echo $code;
Now the $matches variable should contains the MJMZ.
The above, will search for the first instance of the Author:CODE_HERE in your file, and will place the CODE_HERE in the $matches variable.
More specific, the regex. will search for a string that starts with the word Author followed with an optional space \s*, followed by a semicolon character \:, followed by an optional space \s*, followed by one or more characters that it is not a new line [^\r\n]+.
If your file will have dinamically added items, then you can sort it into array.
$content = file_get_contents("myfile.txt");
$line = explode("\n", $content);
$item = new Array();
foreach($line as $l){
$var = explode(":", $l);
$value = "";
for($i=1; $i<sizeof($var); $i++){
$value .= $var[$i];
}
$item[$var[0]] = $value;
}
// Now you can access every single item with his name:
print $item["Author"];
The for loop inside the foreach loop is needed, so you can have multiple ":" in your list. The program will separate name from value at the first ":"
First take lines from file, convert to array then call them by their keys.
$handle = fopen("file.txt", "r");
if ($handle) {
while (($line = fgets($handle)) !== false) {
$pieces = explode(":", $line);
$array[$pieces[0]] = $pieces[1];
}
} else {
// error opening the file.
}
fclose($handle);
echo $array['Author'];

PHP search text file line by line for two strings then output line

I am trying to search a text file for two values on a line. If both values are present I need to output the entire line. The values I am searching for may not be next to each other which is where I am getting stuck. I have the following code which works well but only for one search value:
<?php
$search = $_REQUEST["search"];
// Read from file
$lines = file('archive.txt');
foreach($lines as $line)
{
// Check if the line contains the string we're looking for, and print if it does
if(strpos($line, $search) !== false)
echo"<html><title>SEARCH RESULTS FOR: $search</title><font face='Arial'> $line <hr>";
}
?>
Any assistance much appreciated. Many thanks in advance.
Assuming the values you're searching for are separated by a space, and they will both always be present, explode should do the trick:
$search = explode(' ', $_REQUEST["search"]); // change ' ' to ',' if you separate the search terms with a comma, etc.
// Read from file
$lines = file('archive.txt');
foreach($lines as $line)
{
// Check if the line contains the string we're looking for, and print if it does
if(strpos($line, $search[0]) !== false && strpos($line, $search[1] !== false)) {
echo"<html><title>SEARCH RESULTS FOR: $search</title><font face='Arial'> $line <hr>";
}
}
I'll leave it up to you to add some validation to make sure there are always two elements in the $search array, etc.
I also corrected the HTML code. The script looks for two values, $search and $search2. It is using stristr(). For the case-sensitive version of stristr, refer to strstr(). The script will return all lines containing both $search and $search2.
<?php
$search = $_REQUEST["search"];
$search2 = $_REQUEST['search2'];
// Read from file
$lines = file('archive.txt');
echo"<html><head><title>SEARCH RESULTS FOR: $search</title></head><body>";
foreach($lines as $line)
{
// Check if the line contains the string we're looking for, and print if it does
if(stristr($line,$search) && stristr($line,$search2)) // case insensitive
echo "<font face='Arial'> $line </font><hr>";
}
?>
</body></html>
Just search for your other value also and use && to check for both.
<?php
$search1 = $_REQUEST["search1"];
$search2 = $_REQUEST["search2"];
// Read from file
$lines = file('archive.txt');
foreach($lines as $line)
{
// Check if the line contains the string we're looking for, and print if it does
if(strpos($line, $search1) !== false && strpos($line, $search2) !== false)
echo"<html><title>SEARCH RESULTS FOR: $search1 and $search2</title><font face='Arial'> $line <hr>";
}
?>
This worked for me. You may define what you like in searchthis aray and it will be displayed with whole line.
<?php
$searchthis = array('1','2','3');
$matches = array();
$handle = fopen("file_path", "r");
if ($handle)
{
while (!feof($handle))
{
$buffer = fgets($handle);
foreach ($searchthis as $param) {
if(strpos($buffer, $param) !== FALSE)
$matches[] = $buffer;
}}
fclose($handle);
}
foreach ($matches as $parts) {
echo $parts;
}
?>

Categories