Dealing with huge xml file using preg_replace [duplicate]

Dealing with huge xml file using preg_replace [duplicate] - php

This question already has an answer here:
Why preg_replace throws me a "Unknown modifier" error? [duplicate]
(1 answer)
Closed 7 years ago.
I was trying to make a "search replace between for a huge xml file (1GB).
I found this great code that is work perfectly while using str_replace on my file-
<?php
function replace_file($path, $string, $replace)
{
set_time_limit(0);
if (is_file($path) === true)
{
$file = fopen($path, 'r');
$temp = tempnam('./', 'tmp');
if (is_resource($file) === true)
{
while (feof($file) === false)
{
file_put_contents($temp, str_replace($string, $replace, fgets($file)), FILE_APPEND);
}
fclose($file);
}
unlink($path);
}
return rename($temp, $path);
}
replace_file('myfile.xml', '<search>', '<replace>');
so far so good and it works great.
Now I changed the str_replace to preg_replace and the search value to '/^[^]/' so the code looks like this-
<?php
function replace_file($path, $string, $replace)
{
set_time_limit(0);
if (is_file($path) === true)
{
$file = fopen($path, 'r');
$temp = tempnam('./', 'tmp');
if (is_resource($file) === true)
{
while (feof($file) === false)
{
file_put_contents($temp, preg_replace($string, $replace, fgets($file)), FILE_APPEND);
}
fclose($file);
}
unlink($path);
}
return rename($temp, $path);
}
replace_file('myfile.xml', '/[^<search>](.*)[^</search>]/', '<replace>');
I get an error "preg_replace unknown modifier" 'd' on line 16
line 16 is -
file_put_contents($temp, preg_replace($string, $replace, fgets($file)), FILE_APPEND);

[] in PCRE is a character class. With [^<category>] you're actually matching the same as with [^<>acegorty]. You're matching characters (or bytes), not words.
PCRE is not best solution for this anyway. Use XMLReader and XMLWriter.

Related

Search all php files for a string in between a class

I am using a simple php translation class and I have about more than 2000 php files which the translation class was implemented and new strings are as well implemented so I need an updated text file with all the translation strings.
I need to get all the translated values from each php file and save it into a text file without any repeated value.
Translation class
<?php $translate->__('Calendar'); ?>
So I need to get Calendar saved into a txt file and this should be done for all the files in all folders.
Everything in between $translate->__(' and ') should be saved.
The below code not working for some reason.
$fn = $_SERVER['DOCUMENT_ROOT']."/apps/test/test2/calendar.php";
$handle = fopen($fn, 'r');
$valid = false;
$search = "\/\\$translate\\-\\>__\\(\\'(.*?)'\\)\/g";
while (($buffer = fgets($handle)) !== false) {
if(preg_match_all($search, $buffer, $m)) {
print $m[1];
} else {
}
}
fclose($handle);

You're extracting strings with this pattern:
/\$translate\-\>__\(\'(.*?)'\)/g
extract all of matched items and save them any where.
Demo and Details : https://regex101.com/r/LzMyJY/1
$fn = $_SERVER['DOCUMENT_ROOT']."/apps/test/test2/calendar.php";
$handle = fopen($fn, 'r');
$valid = false;
$search = "/\\".'$'."translate\\-\\>__\\(\\'(.*?)'\\)/g";
while (($buffer = fgets($handle)) !== false) {
if(preg_match_all($search, $buffer, $m)) {
print $m[1];
} else {
}
}
fclose($handle);
Note:
In use of regex patterns, remember handle backslash \ when putting pattern in ".." (change all \ to \\ in this case)
If using '...' don't change \ with \\ !

PHP Extract Code out of File [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
i have a file (in my case debug.log) and there is a lot of source code from many files in it. I want to extract these lines of code in seperate files.
Structure of my debug.log:
#NewFile#path/to/file.php
<?php
class ClassA {
function A() { do smth(); }
}
#NewFile#path/to/nextFile.php
<?php
class ClassA {
function A() { do smth(); }
}
#NewFile#path/to/thirdFile.php
...
Now i want to split by #NewFile# and want to save the Content in a new .php File.
This is my code for doing this:
$handle = fopen('debug.log', 'r');
$index = 1;
$filename = '/home/myuser/folder/file';
while (($line = fgets($handle)) !== false) {
if (strpos($line, '#NewFile#') !== false) {
$content = file_get_contents($filename . $index . '.php');
file_put_contents($filename . $index . '.php', $content . $line);
} else {
$index++;
}
}
fclose($handle);
Thanks for your help :)

Apart from the fact that a file called debug.log seems to contain PHP source (which, no matter how you look at it, is really weird), it's a fairly trivial thing to do:
The simplest way to reliably parse php files in php is to use the token_get_all function. In this case, it's a matter of doing something like this:
$tokens = token_get_all(file_get_contents('input_file.php'));
$file = null;
$contents = [];
foreach ($tokens as $token) {
//comment with #NewFile# in there?
if ($token[0] === T_COMMENT && strstr($token[1]{0}, '#NewFile#')) {
if ($file) {
//write code to file
file_put_contents($file, implode(PHP_EOL, $contents));
}
$contents = ['<?php '];
$file = str_replace('#NewFile#', '', $token[1]);//set file path
} else {
//use line numbers as key, append value of current token to the line
$contents[$token[2]] .= $token[1];
}
}
//write the last file
if ($file) {
file_put_contents($file, implode(PHP_EOL, $contents));
}
I'm iterating over all the parser tokens. If I encounter a T_COMMENT token containing the string #NewFile#, I take that as sign that I need to write my current buffer ($contents into the file that I last read from the previous comment. After that, I reassign $file to point to a new file (again, path and name taken from the comment), and start building the $contents buffer again.
After the loop, $file and $contents will contain all the tokens that should go in the last file, so I just do a quick check (make sure $file is set), and write whatever is in the buffer to that file.

Here is my own solution for my Problem, that solved it :)
$handle = fopen(dirname(__FILE__) . '/debug.log', 'r');
$fileName = '/file';
$dir = '/home/myuser/folder';
while (($line = fgets($handle)) !== false) {
if (strpos($line, '#NewFile#') === false) {
if (file_exists($dir . $fileName)) {
file_put_contents($dir . $fileName, $line, FILE_APPEND);
} else {
preg_match("/(\/.*\/)/", $fileName, $path);
if (!is_dir($dir . $path[0])) {
mkdir($dir . $path[0], 0777, true);
}
file_put_contents($dir . $fileName, $line);
}
} else {
$fileName = str_replace(".#NewFile#", '', $line);
$fileName = str_replace("#NewFile#", '', $fileName);
}
}
fclose($handle);

PHP File handling: Remove characters [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
I have a PHP function that reads the contents of a file given its path and removes first characters until it finds a '#' (first correct character in my file).This function works fine but how do I reduce the execution time ?
Please suggest/advice.
function foo($filepath)
{
if(($contents = file_get_contents($filepath)) !== false)
{
while ($contents[0] != '#')
$contents = substr($contents, 1);
file_put_contents($filepath, $contents);
}
return false;
}

This can be optimised in two ways: speed and memory management. You reading the entire file into memory is quite expensive and may fail entirely on large files. Instead, this'll be more memory efficient, but requires a second temporary file:
$fh = fopen($filepath, 'r');
do {
$chr = fread($fh, 1);
} while ($chr != '#' && !feof($fh));
fseek($fh, -1, SEEK_CUR);
$temppath = tempnam(sys_get_temp_dir(), 'substr');
$tempfh = fopen($temppath, 'w');
stream_copy_to_stream($fh, $tempfh);
fclose($fh);
fclose($tempfh);
rename($temppath, $filepath);
Speed-wise your existing solution can be simplified to:
if (($contents = file_get_contents($filepath)) !== false) {
$index = strpos($contents, '#');
file_put_contents($filepath, substr($contents, $index));
}
But again, it's reading everything into memory, which may be an important bottleneck to begin with.

function foo($filepath)
{
if(($contents = file_get_contents($filepath)) !== false)
{
$contentExplode = explode("#",$contents );
array_shift($contentExplode);//remove chars to the first #
$contents = implode("#",$contentExplode);
file_put_contents($filepath, $contents);
}
return false;
}

print contents of file until found the word hi

I want the program to print the document contents line by line while not reaching neither the end of file or found the word hi
The problem is when it found the word hi, it prints nothing although it is at position 22. Why not print the previous words how to solve this issue.
My file contain "Php is a special case hi. You will use less memory using the iterative solution. Moreover, function calls in PHP are costly, so it's better to avoid function calls when you can." string.
Here is my code
<?php
$contents = file_get_contents('m.txt');
$search_keyword = 'hi';
// check if word is there
$file=fopen("m.txt","r+");
while(!feof($file)&&strpos($contents, $search_keyword) == FALSE)
{
echo fgets($file)."<br>";
}
?>

change this condition
while(!feof($file)&&strpos($contents, $search_keyword) == FALSE)
to
while(!feof($file)) {
if(strpos($contents, $search_keyword) === FALSE) {
echo fgets($file)."<br>";
} else
break;
}
}

You mean print the file line by line until the word 'hi' is found?
<?php
$search_keyword = 'hi';
$handle = #fopen("m.txt", "r");
if ( $handle )
{
// Read file one line at a time
while ( ($buffer = fgets($handle, 4096)) !== false )
{
echo $buffer . '<br />';
if ( preg_match('/'.$search_keyword.'/i', $subject) )
break;
}
fclose($handle);
}
?>
You can replace the preg_match to strpos if you like.

PHP- trim() function not working

I have problem when used function trim() with this code
$handle = #fopen("55.txt", "r");
if ($handle) {
while (($buffer = fgets($handle, 4096)) !== false) {
$d = explode(" ", $buffer);
foreach($d as $val) {
echo '<br>'.trim($val,'.'); //why not work
}
}
if (!feof($handle)) {
echo "Error: unexpected fgets() fail\n";
}
fclose($handle);
}
The trim() doesn't trim '.'.

Additional to the lines themselves, the fgets() function returns the line breaks from the file, which are after the dots in the string, thus preventing the dots from being trimmed, because they are not actually the last character.
Try to trim the dots and possible line breaks at the same time:
echo '<br>'.trim($val, ".\r\n");

do
substr(trim($val),1,stlen(trim($val)));
instead of
trim($val,'.');
if you want to remove the leading and trailing '.'

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Dealing with huge xml file using preg_replace [duplicate] - php

[] in PCRE is a character class. With [^<category>] you're actually matching the same as with [^<>acegorty]. You're matching characters (or bytes), not words. PCRE is not best solution for this anyway. Use XMLReader and XMLWriter.

Related

Search all php files for a string in between a class

PHP Extract Code out of File [closed]

PHP File handling: Remove characters [closed]

print contents of file until found the word hi

PHP- trim() function not working

Categories

Resources