Parsing text files to get some contains - php

I have some text files. for example: file1.txt and file2.txt.
The contain of file1.txt is Walk word1 in the rain
Walking in the rain is one of the most beautiful word2 experiences.
There are some conditions :
If there are word1 AND word2, I wanna get the text between those 2 words as $between so I will get in the rain
Walking in the rain is one of the most beautiful. And also I wanna get the text after word2 as $content so I will get experiences
If there are only word1 OR word2 (eg = Walk in the rain
Walking in the rain is one of the most beautiful word1 experiences.) Then $between ='' and $content is all of texts-> Walk in the rain
Walking in the rain is one of the most beautiful word1 experiences.
If word2 in front of word1 for example : Walk in word2 the rain
Walking in the rain is one of the most word1 beautiful word1 experiences. then $between = ''and$content` is all of texts.
here's my code :
//to get and open the text files
$txt = glob($savePath.'*.txt');
foreach ($txt as $file => $files) {
$handle = fopen($files, "r") or die ('can not open file');
$ori_content = file_get_contents($files);
//count the words of text, to reach until the last word
$words = preg_split('/\s+/',$ori_content ,-1,PREG_SPLIT_NO_EMPTY);
$count = count ($words);
$word1 ='word1';
$word2 ='word2';
if (stripos($ori_content, $word1) && stripos($ori_content, $word2)){
$between = substr($ori_content, stripos($ori_content, $word1)+ strlen($word1), stripos($ori_content, $word2) - stripos($ori_content, $word1)- strlen($word1));
$content = substr($ori_content, stripos($ori_content, $word2)+strlen($word2), stripos($ori_content, $ori_content[$count+1]) - stripos($ori_content,$word2));
}
else
$content = $ori_content;
$q0 = mysql_query("INSERT INTO tb VALUES('','$files','$content','$between')") or die(mysql_error());
but my code still cannot handle for :
the condition number 2(above), I get the result $between = experiences, it should be $between=''
the condition number 3(above). I get the result $etween = the rain
Walking in the rain is one of the most word1 beautiful word1 experiences, it should be $between=''
If I get $between in file1.txt, but not in file2.txt, in table between in database, for data file2.txt it should be null in the column between. but it doesn't null, it filled by the between of other text files
I cannot reach the last word.
please help me.. thanks in advance ! :)

I think you're just missing one statement:
...
}
else {
$between = '';
$content = $ori_content;
}
You're probably using this in a loop, so you get the values of the previous loop if you're not explicitly setting $between to an empty string :)
Edit
You also forgot to compare the positions:
if (stripos($ori_content, $word1) && stripos($ori_content, $word2)){
Should be:
$pos1 = stripos($ori_content, $word1);
$pos2 = stripos($ori_content, $word2);
if (false !== $pos1 && false !== $pos2 && $pos1 < $pos2) {
Edit 2
Another thing; your SQL is prone to injection and you can't properly use the NULL value this way. You could use this kind of construct, but it's more preferable to use PDO or mysqli.
$sql_between = is_null($between) ? 'NULL' : "'" . mysql_real_escape_string($between) . "'";
// apply the same treatment for `$files`, etc.
...
mysql_query("INSERT INTO tb VALUES('', $sql_files, $sql_content, $sql_between)");
In this manner you can set $between to null and have it properly get sent to MySQL.

I've wrapped the parser logic into a function parse_content.
$txt = glob($savePath.'*.txt');
foreach ($txt as $file => $files) {
$handle = fopen($files, "r") or die ('can not open file');
$ori_content = file_get_contents($files);
$word1 ='word1';
$word2 ='word2';
$result = parse_content($word1, $word2, $ori_content);
extract($result);
$q0 = mysql_query("INSERT INTO tb VALUES('','$files','$content','$between')") or die(mysql_error());
}
function parse_content($word1, $word2, $input) {
$between = '';
$content = '';
$w1 = stripos($input, $word1);
$w2 = stripos($input, $word2);
if($w1 && $w2) {
if($w2 < $w1) {
// Case 3
$content = $input;
} else {
// Case 1
$reg_between = '/' . $word1 . '(.*?)' . $word2 . '/';
$reg_content = '/' . $word2 . '(.*)$/';
preg_match($reg_between, $input, $match);
$between = trim($match[1]);
preg_match($reg_content, $input, $match);
$content = trim($match[1]);
}
} else if($w1 || $w2) {
// Case 2
$content = $input;
} else {
// Case 4
$content = $input;
}
return compact('between', 'content');
}

Related

php unable to open file when open and write into file

i try to read a text file line by line and if any line contain "/" then i need to write them into separate file.
example line
CA,T2B,Calgary (Forest Lawn / Dover / Erin Woods),Alberta,AB,Calgary,,,,51.0209,-113.981,6
i need to write this as 4 lines, like
CA,T2B,Calgary,Alberta,AB,Calgary,,,,51.0209,-113.981,6
CA,T2B, Forest Lawn ,Alberta,AB,Calgary,,,,51.0209,-113.981,6
CA,T2B, Dover,Alberta,AB,Calgary,,,,51.0209,-113.981,6
CA,T2B, Erin Woods,Alberta,AB,Calgary,,,,51.0209,-113.981,6
what i've tried so far is
$file = fopen("test.txt", "r");
while (!feof($file)) {
$my_string = fgets($file);
$special_chars = array("/");
if (array_intersect(str_split($my_string), $special_chars)) {
echo fgets($file) . "<br />";
$myfile = fopen("fileWithFL.txt", "w") or die("Unable to open file!");
fwrite($myfile, fgets($file));
fclose($myfile);
}else{
echo fgets($file) . "<br />";
$myfile = fopen("fileWithoutFL.txt", "w") or die("Unable to open file!");
fwrite($myfile, fgets($file));
fclose($myfile);
}
}
fclose($file);
[
file i get from "CA.zip"
how can i do this?
thank you!
You're repeatedly opening and closing fileWithFL.txt and fileWithoutFL.txt, which is inefficient. Better to just open them once before you loop through the input file.
You're also using fgets(), which makes it difficult to parse the input file. Since the input file seems to be in CSV format, you should use fgetcsv().
As for detecting rows that contain multiple cities, I'm looking for the presence of /, splitting on ( or /), removing any trailing ), and trimming the resulting name. That should give you all the cities in a neat array.
$file = fopen("test.txt", "r");
$file_with_fl = fopen("fileWithFL.txt", "w+");
$file_without_fl = fopen("fileWithoutFL.txt", "w+");
while ($a = fgetcsv($file)) {
if ( FALSE == strpos( $a[2], '/' ) ) {
fputcsv( $file_without_fl, $a );
} else {
$cities = preg_split( '/[\(\/]/', $a[2] ); // Split on '(' and '/'
foreach ( $cities as $city ) {
$city = trim(preg_replace('/\)/', '', $city)); // Remove trailing ')' and trim leading and trailing spaces
$a[2] = $city;
fputcsv( $file_with_fl, $a );
}
}
}
Checking for failure of fopen() and fputcsv() left as an exercise for the reader.
You can use file_put_contents(file, string, FILE_APPEND) to add a line to the end of a file.
The rest is just processing the Calgary (Forest Lawn / Dover / Erin Woods) part of your string.
$string = 'CA,T2B,Calgary (Forest Lawn / Dover / Erin Woods),Alberta,AB,Calgary,,,,51.0209,-113.981,6';
//test if string needs processing
//if not, write straight to new file
if(strpos($string,'/') === false){
file_put_contents("fileWithoutFL.txt" , $string , FILE_APPEND);
}
//process
else{
//get all the parts split by comma
//$parts[2] is the one you need processing
$parts = explode(',',$string);
//clean up $part[2], replacing ( , ) with *
//then split on the *
$com=explode('*',str_replace(['(','/',')'],'*',$parts[2]));
//loop $com, creating new arrays by replacing $part[2] in the original array
foreach($com as $val){
if($val == '')continue;
//replace $part[2] cleaning up spaces
$parts[2] = trim($val);
//make a new line
$write = implode(',',$parts);
//write to the new file
file_put_contents("fileWithoutFL.txt" , $write , FILE_APPEND);
}
}
Now you can read every line of the original file and output to the new file. (Tip: use SplFileObject)
$file = new SplFileObject("fileWithFL.txt");
while (!$file->eof()) {
$string = $file->fgets();
// ... process here with previous code
}
$file = null;
Not the best answer but its works
$line = file_get_contents("test.txt");
$body = "";
if(false !== strpos($line,"/")) {
$split = preg_split("/[()]+/", $line,-1, PREG_SPLIT_NO_EMPTY);
$contains = explode("/",$split[1]);
$last = explode(",",$split[0]);
$lastvalue = end($last);
$search = array_search($lastvalue,$last);
unset($last[$search]);
$merge = implode(", ", $last);
$body .= $merge . $split[2] . " ";
foreach($contains as $contain) {
$body .= $split[0] . "," . $contain . $split[2] . " ";
}
if(file_put_contents("fileWithFL.txt",$body) !== false) {
echo $body;
} else {
echo "failed";
}
} else {
if(file_put_contents("fileWithoutFL.txt",$line) !== false) {
echo $line;
} else {
echo "failed";
}
}
Output :
CA, T2B,Alberta,AB,Calgary,,,,51.0209,-113.981,6 CA,T2B,Calgary ,Forest Lawn ,Alberta,AB,Calgary,,,,51.0209,-113.981,6 CA,T2B,Calgary , Dover ,Alberta,AB,Calgary,,,,51.0209,-113.981,6 CA,T2B,Calgary , Erin Woods,Alberta,AB,Calgary,,,,51.0209,-113.981,6

How to replace a word (search_string) with the value of an array where the key is the search_string

I dont find the correct way to replace a word in a string where the word_to_be_replaced is a key and the word_to_replace_with is the corresponding value from a csv.
Example:
String: "The water is blue."
csv:
sky, ocean
colour, mood
water, painting
Expected outcome:
"The painting is blue."
I´m a beginner in php. I've asked a somewhat similar question - but I can´t make the answer I received work...
So far I´ve got:
$file = fopen("mods/test.csv","r");
while (($csv = fgetcsv($file)) !== false) {
$replace[$csv[0]] = $csv[1];
}
$blub = strtr($mpref, $replace);
What am I missing?
You should use str_replace. Check the Docs
You need to build 2 arrays, $search and $replace which will contain the values to be searched and replaced respectively.
$file = fopen("mods/test.csv","r");
$search = array();
$replace = array();
while (($csv = fgetcsv($file)) !== false) {
//$replace[$csv[0]] = $csv[1];
$search = $csv[0];
$replace = $csv[1];
}
$mpref = "The water is blue";
echo str_replace($search, $replace, $mpref);
//prints The painting is blue
Try this:
$file = fopen("mods/test.csv","r");
$search = array();
$replace = array();
while (($csv = fgetcsv($file)) !== false) {
$search = $csv[0];
$replace = $csv[1];
}
$mpref = "The water is blue";
echo str_replace($search, $replace, $mpref);
In above it will create an array of words to be replaced i.e. $search and array of world to be replaced with i.e. $replace. And str_replace consider arrays and replace a word in $search with the word in $replace having the same key index in both the arrays.
For more info check this str_replace()

Get specific sentence of text files

I have the following text file :
====================================================================================
INDEXNUMARTICLE: '1997'
FILE: '###\www.kkk.com\kompas-pront\0004\25\economic\index.htm' NUMSENT: '22' DOMAIN: 'economic'
====================================================================================
2. Social change is a general term which refers to:
4. change in social structure: the nature, the social institutions.
6. When behaviour pattern changes in large numbers, and is visible and sustained, it results in a social change.
I wanna get only the sentence without the numbering and save it in database :
=========================================================================
= id = topic = content =
=========================================================================
= 1 = economic = Social change is a general term which refers to: =
= change in social structure: the nature, =
= the social institutions. When behaviour pattern =
= changes in large numbers, and is visible and sustained,
= it results in a social change. =
CODE
function isNumber($string) {
return preg_match('/^\\s*[0-9]/', $string) > 0;
}
$txt = "C:/Users/User/Downloads/economic.txt";
$lines = file($txt);
foreach($lines as $line_num => $line) {
$checkFirstChar = isNumber($line);
if ($checkFirstChar !== false) {
$line_parts = explode(' ', $line);
$line_number = array_shift($line_parts);
foreach ($line_parts as $part) {
if (empty($part)) continue;
$parts = array();
$string = implode(' ', $parts);
$query = mysql_query("INSERT INTO tb_file VALUES ('','economic','$string')");
}
}
}
I have the problem with array, the data that inserted in column content are words by words in different row. please help me. thank you :)
I think your idea is to complicated - try this short one:
$txt = "C:/Users/User/Downloads/economic.txt";
$lines = file($txt);
foreach($lines as $line_num => $line) {
$checkFirstChar = isNumber($line);
if ($checkFirstChar !== false) {
//entire text line without number
$string = substr($line,strpos($line,"")+1);
$query = mysql_query("INSERT INTO tb_file VALUES ('','economic','$string')");
}
}
Try this one, with regex.
$regex = "/[0-9]\. /";
$txt = "C:/Users/User/Downloads/economic.txt";
$str = file_get_contents($txt);
$index = -1;
//Find the first ocurrence of a number followed by '.' and a whitespace
if(preg_match($regex, $str, $matches, PREG_OFFSET_CAPTURE)) {
$index = $matches[0][1];
}
//Remove all the text before that first occurrence
$str = substr($str, $index);
//Replace all the occurrences of number followed by '. ' with ' '
$text = preg_replace($regex, " ", $str);

Extract part of string matching pattern

I would like to scan a large piece of text using PHP and find all matches for a pattern, but then also 2 lines above the match and 2 lines below.
My text looks like this, but with some extra unnecessary text above and below this sample:
1
Description text
123.456.12
10.00
10.00
3
Different Description text
234.567.89
10.00
30.00
#Some footer text that is not needed and will change for each text file#
15
More description text
564.238.02
4.00
60.00
15
More description text
564.238.02
4.00
60.00
#Some footer text that is not needed and will change for each text file#
15
More description text
564.238.02
4.00
60.00
15
More description text
564.238.02
4.00
60.00
Using PHP, I am looking to match each number in bold (always same format - 3 numbers, dot, 3 numbers, dot, 2 numbers) but then also return the previous 2 lines and the next 2 lines and hopefully return an array so that I can use:
$contents[$i]["qty"] = "1";
$contents[$i]["description"] = "Description text";
$contents[$i]["price"] = "10.00";
$contents[$i]["total"] = "10.00";
etc...
Is this possible and would I use regex? Any help or advice would be greatly appreciated!
Thanks
ANSWERED BY vzwick
This is my final code that I used:
$items_array = array();
$counter = 0;
if (preg_match_all('/(\d+)\n\n(\w.*)\n\n(\d{3}\.\d{3}\.\d{2})\n\n(\d.*)\n\n(\d.*)/', $text_file, $matches)) {
$items_string = $matches[0];
foreach ($items_string as $value){
$item = explode("\n\n", $value);
$items_array[$counter]["qty"] = $item[0];
$items_array[$counter]["description"] = $item[1];
$items_array[$counter]["number"] = $item[2];
$items_array[$counter]["price"] = $item[3];
$items_array[$counter]["total"] = $item[4];
$counter++;
}
}
else
{
die("No matching patterns found");
}
print_r($items_array);
$filename = "yourfile.txt";
$fp = #fopen($filename, "r");
if (!$fp) die('Could not open file ' . $filename);
$i = 0; // element counter
$n = 0; // inner element counter
$field_names = array('qty', 'description', 'some_number', 'price', 'total');
$result_arr = array();
while (($line = fgets($fp)) !== false) {
$result_arr[$i][$field_names[$n]] = trim($line);
$n++;
if ($n % count($field_names) == 0) {
$i++;
$n = 0;
}
}
fclose($fp);
print_r($result_arr);
Edit: Well, regex then.
$filename = "yourfile.txt";
$file_contents = #file_get_contents($filename);
if (!$file_contents) die("Could not open file " . $filename . " or empty file");
if (preg_match_all('/(\d+)\n\n(\w.*)\n\n(\d{3}\.\d{3}\.\d{2})\n\n(\d.*)\n\n(\d.*)/', $file_contents, $matches)) {
print_r($matches[0]);
// do your matching to field names from here ..
}
else
{
die("No matching patterns found");
}
(.)+\n+(.)+\n+(\d{3}\.\d{3}\.\d{2})\n+(.)+\n+(.)+
It might be necessary to replace \n with \r\n. Make sure the regex is in a mode when the "." doesn't match with the new line character.
To reference groups by names, use named capturing group:
(?P<name>regex)
example of named capturing groups.
You could load the file in an array, and them use array_slice, to slice each 5 blocks of lines.
<?php
$file = file("myfile");
$finalArray = array();
for($i = 0; $i < sizeof($file); $i = $i+5)
{
$finalArray[] = array_slice($file, $i, 5);
}
print_r($finalArray);
?>

Intelligently removing excess indention from a string

I'm trying to remove some excessive indention from a string, in this case it's SQL, so it can be put into a log file. So I need the find the smallest amount of indention (aka tabs) and remove it from the front of each line, but the following code ends up printing out exactly the same, any ideas?
In other words, I want to take the following (NOTE: StackOverflow editor converted my tabs to spaces, in the code, a tab simulates 4 spaces, but it really is a \t character)
SELECT
blah
FROM
table
WHERE
id=1
and convert it to
SELECT
blah
FROM
table
WHERE
id=1
here's the code I tried and fails
$sql = '
SELECT
blah
FROM
table
WHERE
id=1
';
// it's most likely idented SQL, remove any idention
$lines = explode("\n", $sql);
$space_count = array();
foreach ( $lines as $line )
{
preg_match('/^(\t+)/', $line, $matches);
$space_count[] = strlen($matches[0]);
}
$min_tab_count = min($space_count);
$place = 0;
foreach ( $lines as $line )
{
$lines[$place] = preg_replace('/^\t{'. $min_tab_count .'}/', '', $line);
$place++;
}
$sql = implode("\n", $lines);
print '<pre>'. $sql .'</pre>';
It seems the problem was
strlen($matches[0])
returns 0 and 1 for the first and last line, which isn't the 3 I actually wanted as the minimum, so a quick hack was to
trim the SQL
skip counting the length if it's less than 2
Not the most elegant solution, but it'll always work because tabs are usually in the 4+ count in this code. Here's the fixed code:
$sql = '
SELECT
blah
FROM
table
WHERE
id=1
';
// it's most likely idented SQL, remove any idention
$lines = explode("\n", $sql);
$space_count = array();
foreach ( $lines as $line )
{
preg_match('/^(\t+)/', $line, $matches);
if ( strlen($matches[0]) > 1 )
{
$space_count[] = strlen($matches[0]);
}
}
$min_tab_count = min($space_count);
$place = 0;
foreach ( $lines as $line )
{
$lines[$place] = preg_replace('/^\t{'. $min_tab_count .'}/', '', $line);
$place++;
}
$sql = implode("\n", $lines);
print $sql;
private function cleanIndentation($str) {
$content = '';
foreach(preg_split("/((\r?\n)|(\r\n?))/", trim($str)) as $line) {
$content .= " " . trim($line) . PHP_EOL;
}
return $content;
}

Categories