Get specific sentence of text files - php

I have the following text file :
====================================================================================
INDEXNUMARTICLE: '1997'
FILE: '###\www.kkk.com\kompas-pront\0004\25\economic\index.htm' NUMSENT: '22' DOMAIN: 'economic'
====================================================================================
2. Social change is a general term which refers to:
4. change in social structure: the nature, the social institutions.
6. When behaviour pattern changes in large numbers, and is visible and sustained, it results in a social change.
I wanna get only the sentence without the numbering and save it in database :
=========================================================================
= id = topic = content =
=========================================================================
= 1 = economic = Social change is a general term which refers to: =
= change in social structure: the nature, =
= the social institutions. When behaviour pattern =
= changes in large numbers, and is visible and sustained,
= it results in a social change. =
CODE
function isNumber($string) {
return preg_match('/^\\s*[0-9]/', $string) > 0;
}
$txt = "C:/Users/User/Downloads/economic.txt";
$lines = file($txt);
foreach($lines as $line_num => $line) {
$checkFirstChar = isNumber($line);
if ($checkFirstChar !== false) {
$line_parts = explode(' ', $line);
$line_number = array_shift($line_parts);
foreach ($line_parts as $part) {
if (empty($part)) continue;
$parts = array();
$string = implode(' ', $parts);
$query = mysql_query("INSERT INTO tb_file VALUES ('','economic','$string')");
}
}
}
I have the problem with array, the data that inserted in column content are words by words in different row. please help me. thank you :)

I think your idea is to complicated - try this short one:
$txt = "C:/Users/User/Downloads/economic.txt";
$lines = file($txt);
foreach($lines as $line_num => $line) {
$checkFirstChar = isNumber($line);
if ($checkFirstChar !== false) {
//entire text line without number
$string = substr($line,strpos($line,"")+1);
$query = mysql_query("INSERT INTO tb_file VALUES ('','economic','$string')");
}
}

Try this one, with regex.
$regex = "/[0-9]\. /";
$txt = "C:/Users/User/Downloads/economic.txt";
$str = file_get_contents($txt);
$index = -1;
//Find the first ocurrence of a number followed by '.' and a whitespace
if(preg_match($regex, $str, $matches, PREG_OFFSET_CAPTURE)) {
$index = $matches[0][1];
}
//Remove all the text before that first occurrence
$str = substr($str, $index);
//Replace all the occurrences of number followed by '. ' with ' '
$text = preg_replace($regex, " ", $str);

Related

How to replace a word (search_string) with the value of an array where the key is the search_string

I dont find the correct way to replace a word in a string where the word_to_be_replaced is a key and the word_to_replace_with is the corresponding value from a csv.
Example:
String: "The water is blue."
csv:
sky, ocean
colour, mood
water, painting
Expected outcome:
"The painting is blue."
I´m a beginner in php. I've asked a somewhat similar question - but I can´t make the answer I received work...
So far I´ve got:
$file = fopen("mods/test.csv","r");
while (($csv = fgetcsv($file)) !== false) {
$replace[$csv[0]] = $csv[1];
}
$blub = strtr($mpref, $replace);
What am I missing?
You should use str_replace. Check the Docs
You need to build 2 arrays, $search and $replace which will contain the values to be searched and replaced respectively.
$file = fopen("mods/test.csv","r");
$search = array();
$replace = array();
while (($csv = fgetcsv($file)) !== false) {
//$replace[$csv[0]] = $csv[1];
$search = $csv[0];
$replace = $csv[1];
}
$mpref = "The water is blue";
echo str_replace($search, $replace, $mpref);
//prints The painting is blue
Try this:
$file = fopen("mods/test.csv","r");
$search = array();
$replace = array();
while (($csv = fgetcsv($file)) !== false) {
$search = $csv[0];
$replace = $csv[1];
}
$mpref = "The water is blue";
echo str_replace($search, $replace, $mpref);
In above it will create an array of words to be replaced i.e. $search and array of world to be replaced with i.e. $replace. And str_replace consider arrays and replace a word in $search with the word in $replace having the same key index in both the arrays.
For more info check this str_replace()

PHP find tags in content text and wrap in <a> tags and set limit the number of links

Im finding keywords "denounce,and,demoralized" in a string, and wrapping it in "html a" tags to change it to link with following function...
function link2tags($text, $tags){
$tags = preg_replace('/\s+/', ' ', trim($tags));
$words = explode(',', $tags);
$linked = array();
foreach ( $words as $word ){
$linked[] = ''.$word.'';
}
return str_replace($words, $linked, $text);
}
echo link2tags('we denounce with righteous indignation and dislike men who are so beguiled and demoralized by the charms of pleasure of the moment', 'denounce,and,demoralized');
The output of the above function is as follows...
Output:
we denounce with righteous indignation and dislike men who are so beguiled and demoralized by the charms of pleasure of the moment
Here, the word "and" is linked 2 times I want to limit the number of links to a word
Repeat words are only linked once
You need to get only first occurrence of words and then need to replace those. Check below code:
function link2tags($text, $tags){
$tags = preg_replace('/\s+/', ' ', trim($tags));
$words = explode(',', $tags);
$linked = array();
$existingLinks = array();
foreach ( $words as $word ){
if (!in_array($word, $existingLinks)) {
$existingLinks[] = $word;
$linked[] = ''.$word.'';
}
}
foreach ($existingLinks as $key => $value) {
$text = preg_replace("/".$value."/", $linked[$key], $text, 1);
}
return $text;
}
Hope it helps you.
Here you can check existing word as below:
if(!in_array($word,$alreadyusedword)) {
$linked[] = ''.$word.'';
$alreadyusedword[] = $word;
}

combine string of lines become one string

I do some process of text files like take the lines that begin with number.
$file = $_FILES['file']['tmp_name'];
$lines = file($file);
foreach ($lines as $line_num => $line) {
$checkFirstChar = isNumber($line);
if ($checkFirstChar !== false){
$line_parts = explode(' ', $line);
$line_number = array_shift($line_parts);
$string1 = mysql_real_escape_string(implode(' ', $line_parts));
$string2 = implode(' ', $line_parts);
// insert sentence $string1 in database a sentence a row
// then I wanna get the text in one string that i've filtered before to do another process
I use $string2 but it still the collection of string, every sentence is a string,
string(165) "sentence1 . " string(273) "sentence2 . " etc
all i need is all of sentence become one string again. what could I do? thanks
example input text file :
=========================
file : jssksksks
=========================
1. blablabla.
2. bliblibli .
3. balbalba
=========================
file : jkklkok
=========================
1.blulbulbu.
2.bleblelbl
$string2 is inside the loop. You can collect all the lines that match the criteria and only implode them after the loop:
$file = $_FILES['file']['tmp_name'];
$lines = file($file);
newlines = array();
foreach ($lines as $line_num => $line) {
$checkFirstChar = isNumber($line);
if ($checkFirstChar !== false){
$line_parts = explode(' ', $line);
$line_number = array_shift($line_parts);
// Concat this line together and add it to another array.
$newlines[] = implode(' ', $line_parts);
}
}
// Concat all the lines together.
$newfile = implode("\n", $newlines);

filtering bad words from text

This function filer the email from text and return matched pattern
function parse($text, $words)
{
$resultSet = array();
foreach ($words as $word){
$pattern = 'regex to match emails';
preg_match_all($pattern, $text, $matches, PREG_OFFSET_CAPTURE );
$this->pushToResultSet($matches);
}
return $resultSet;
}
Similar way I want to match bad words from text and return them as $resultSet.
Here is code to filter badwords
TEST HERE
$badwords = array('shit', 'fuck'); // Here we can use all bad words from database
$text = 'Man, I shot this f*ck, sh/t! fucking fu*ker sh!t f*cking sh\t ;)';
echo "filtered words <br>";
echo $text."<br/>";
$words = explode(' ', $text);
foreach ($words as $word)
{
$bad= false;
foreach ($badwords as $badword)
{
if (strlen($word) >= strlen($badword))
{
$wordOk = false;
for ($i = 0; $i < strlen($badword); $i++)
{
if ($badword[$i] !== $word[$i] && ctype_alpha($word[$i]))
{
$wordOk = true;
break;
}
}
if (!$wordOk)
{
$bad= true;
break;
}
}
}
echo $bad ? 'beep ' : ($word . ' '); // Here $bad words can be returned and replace with *.
}
Which replaces badwords with beep
But I want to push matched bad words to $this->pushToResultSet() and returning as in first code of email filtering.
can I do this with my bad filtering code?
Roughly converting David Atchley's answer to PHP, does this work as you want it to?
$blocked = array('fuck','shit','damn','hell','ass');
$text = 'Man, I shot this f*ck, damn sh/t! fucking fu*ker sh!t f*cking sh\t ;)';
$matched = preg_match_all("/(".implode('|', $blocked).")/i", $text, $matches);
$filter = preg_replace("/(".implode('|', $blocked).")/i", 'beep', $text);
var_dump($filter);
var_dump($matches);
JSFiddle for working example.
Yes, you can match bad words (saving for later), replace them in the text and build the regex dynamically based on an array of bad words you're trying to filter (you might store it in DB, load from JSON, etc.). Here's the main portion of the working example:
var blocked = ['fuck','shit','damn','hell','ass'],
matchBlocked = new RegExp("("+blocked.join('|')+")", 'gi'),
text = $('.unfiltered').text(),
matched = text.match(matchBlocked),
filtered = text.replace(matchBlocked, 'beep');
Please see the JSFiddle link above for the full working example.

Intelligently removing excess indention from a string

I'm trying to remove some excessive indention from a string, in this case it's SQL, so it can be put into a log file. So I need the find the smallest amount of indention (aka tabs) and remove it from the front of each line, but the following code ends up printing out exactly the same, any ideas?
In other words, I want to take the following (NOTE: StackOverflow editor converted my tabs to spaces, in the code, a tab simulates 4 spaces, but it really is a \t character)
SELECT
blah
FROM
table
WHERE
id=1
and convert it to
SELECT
blah
FROM
table
WHERE
id=1
here's the code I tried and fails
$sql = '
SELECT
blah
FROM
table
WHERE
id=1
';
// it's most likely idented SQL, remove any idention
$lines = explode("\n", $sql);
$space_count = array();
foreach ( $lines as $line )
{
preg_match('/^(\t+)/', $line, $matches);
$space_count[] = strlen($matches[0]);
}
$min_tab_count = min($space_count);
$place = 0;
foreach ( $lines as $line )
{
$lines[$place] = preg_replace('/^\t{'. $min_tab_count .'}/', '', $line);
$place++;
}
$sql = implode("\n", $lines);
print '<pre>'. $sql .'</pre>';
It seems the problem was
strlen($matches[0])
returns 0 and 1 for the first and last line, which isn't the 3 I actually wanted as the minimum, so a quick hack was to
trim the SQL
skip counting the length if it's less than 2
Not the most elegant solution, but it'll always work because tabs are usually in the 4+ count in this code. Here's the fixed code:
$sql = '
SELECT
blah
FROM
table
WHERE
id=1
';
// it's most likely idented SQL, remove any idention
$lines = explode("\n", $sql);
$space_count = array();
foreach ( $lines as $line )
{
preg_match('/^(\t+)/', $line, $matches);
if ( strlen($matches[0]) > 1 )
{
$space_count[] = strlen($matches[0]);
}
}
$min_tab_count = min($space_count);
$place = 0;
foreach ( $lines as $line )
{
$lines[$place] = preg_replace('/^\t{'. $min_tab_count .'}/', '', $line);
$place++;
}
$sql = implode("\n", $lines);
print $sql;
private function cleanIndentation($str) {
$content = '';
foreach(preg_split("/((\r?\n)|(\r\n?))/", trim($str)) as $line) {
$content .= " " . trim($line) . PHP_EOL;
}
return $content;
}

Categories