PHP preg_replace error when using on array - php

We have got web app which does replacing some text with another using str_replace().
Find strings and replace strings are stored in template file.
We what to replace str_replace() function to preg_replace() to have possibility to use regex in find strings (to set them in the same template file).
In original scripts we have such parts of php code.
In one file:
class SiteConfig {
// Strings to search for in HTML before processing begins (used with $replace_string)
public $find_string = array();
// Strings to replace those found in $find_string before HTML processing begins
public $replace_string = array();
// a lot of code goes here
public function append(SiteConfig $newconfig) {
foreach (array('find_string', 'replace_string') as $var) {
// append array elements for this config variable from $newconfig to this config
//$this->$var = $this->$var + $newconfig->$var;
$this->$var = array_merge($this->$var, $newconfig->$var);
}
}
// a lot of code goes here
public static function build_from_array(array $lines) {
$config = new SiteConfig();
foreach ($lines as $line) {
$line = trim($line);
// skip comments, empty lines
if ($line == '' || $line[0] == '#') continue;
// get command
$command = explode(':', $line, 2);
// if there's no colon ':', skip this line
if (count($command) != 2) continue;
$val = trim($command[1]);
$command = trim($command[0]);
//if ($command == '' || $val == '') continue;
// $val can be empty, e.g. replace_string:
if ($command == '') continue;
// strip_attr is now an alias for strip.
// In FTR 3.8 we can strip attributes from elements, not only the elements themselves
// e.g. strip: //img/#srcset (removes srcset attribute from all img elements)
// but for backward compatibility (to avoid errors with new config files + old version of FTR)
// we've introduced strip_attr and we'll recommend using that in our public site config rep.
// strip_attr: //img/#srcset
if ($command == 'strip_attr') $command = 'strip';
// check for commands where we accept multiple statements
if (in_array($command, array('title', 'body', 'author', 'date', 'strip', 'strip_id_or_class', 'strip_image_src', 'single_page_link', 'single_page_link_in_feed', 'next_page_link', 'native_ad_clue', 'http_header', 'test_url', 'find_string', 'replace_string'))) {
array_push($config->$command, $val);
// check for single statement commands that evaluate to true or false
} elseif (in_array($command, array('tidy', 'prune', 'autodetect_on_failure', 'insert_detected_image'))) {
$config->$command = ($val == 'yes');
// check for single statement commands stored as strings
} elseif (in_array($command, array('parser'))) {
$config->$command = $val;
// special treatment for test_contains
} elseif (in_array($command, array('test_contains'))) {
$config->add_test_contains($val);
// special treatment for if_page_contains
} elseif (in_array($command, array('if_page_contains'))) {
$config->add_if_page_contains_condition($val);
// check for replace_string(find): replace
} elseif ((substr($command, -1) == ')') && preg_match('!^([a-z0-9_]+)\((.*?)\)$!i', $command, $match)) {
if (in_array($match[1], array('replace_string'))) {
array_push($config->find_string, $match[2]);
array_push($config->replace_string, $val);
} elseif (in_array($match[1], array('http_header'))) {
$_header = strtolower(trim($match[2]));
$config->http_header[$_header] = $val;
}
}
}
return $config;
}
}
In another file:
public function process($html, $url, $smart_tidy=true) {
// a lot of code goes before
// do string replacements
if (!empty($this->config->find_string)) {
if (count($this->config->find_string) == count($this->config->replace_string)) {
$html = str_replace($this->config->find_string, $this->config->replace_string, $html, $_count);
$this->debug("Strings replaced: $_count (find_string and/or replace_string)");
} else {
$this->debug('Skipped string replacement - incorrect number of find-replace strings in site config');
}
unset($_count);
}
// a lot of code goes after
}
I tried to replace str_replace() with preg_replace(), but while testing it shows an error:
Warning: preg_replace(): No ending matching delimiter '>' found in this line:
$html = preg_replace($this->config->find_string, $this->config->replace_string, $html, $_count);
Where is the error and how to replace str_replace() function to preg_replace() correctly?
I'm very very beginning in php, so any help is badly needed.
Big thanks in advance!

Rewrite your process function like this:
public function process($html, $url, $smart_tidy=true) {
// a lot of code goes before
// do string replacements
if (!empty($this->config->find_string)) {
if (count($this->config->find_string) == count($this->config->replace_string)) {
$new_config_find_string = array_map(function($new_pattern)
{
return '/'.preg_quote($new_pattern).'/';
},$this->config->find_string);
$html = preg_replace($new_config_find_string, $this->config->replace_string, $html, $_count);
$this->debug("Strings replaced: $_count (find_string and/or replace_string)");
} else {
$this->debug('Skipped string replacement - incorrect number of find-replace strings in site config');
}
unset($_count);
}
// a lot of code goes after
}

Related

Recursive regex pattern in PHP

I'd like to use tag-style annotations in html text to replace sections of text/html depending on variable names using PHP. The replacement itself works perfectly if not using nested tags.
But if there are nested tags, only the outer one gets replaced.
My regex is this one:
\[\#if(not)?:([a-zA-Z0-9]+)(?:=(.*?))?\].*?\[\#endif:\2\]
You can see this regex in action with example content to parse here:
https://regex101.com/r/rE3fL1/2
I've read about (?R) but can't get it to work.
I tried replacing the .*? in the middle with (.*?|(?R)) but that doesn't even change anything.
How do I change the regex to also capture nested Tags?
Code: ($this->output accesses the Text)
public function output($dbAccess = true) {
// only translate when dbaccess is granted
if ($dbAccess)
$this->localize();
// insert values into template
foreach ( $this->values as $key => $value ) {
$tagToReplace = "[#$key]";
$this->output = str_replace ( $tagToReplace, $value, $this->output );
}
// gather conditional content sections from output
$condis = array();
$conmatches = array ();
preg_match_all ( '/\[\#if(not)?:([a-zA-Z0-9]+)(?:=(.*?))?\].*?\[\#endif:\2\]/s', $this->output, $conmatches );
if (count($conmatches) > 0) {
$c = $conmatches[0];
// if (count($c) > 0)
// echo "found " . count($c[0]) . " conditional tpl statement matches!";
for ($i=0; $i<count($c); $i++) {
$text = $c[$i];
$not = $conmatches[1][$i];
$name = $conmatches[2][$i];
$value = $conmatches[3][$i];
$condis[] = new ConditionalContent($text, $not, $name, $value);
}
// substitute conditional content sections
foreach ($condis as $cc) {
// convenience and readability vars!
$varname = $cc->name();
$vals = &$this->values;
$value = $cc->value();
// if condition is bound to value of variable and not just existence
if ($value != "") {
// del if name == exists(value)
if ($cc->not() && isset($vals[$varname])) {
if ($vals[$varname] == $value) {
$this->delContent($cc->content());
}
}
// del if not exists(value) or value != name
else {
if (!isset($vals[$varname]) || $vals[$varname] != $value) {
$this->delContent($cc->content());
}
}
}
else {
if ( isset($vals[$varname]) && $cc->not() ||
!isset($vals[$varname]) && !$cc->not()) {
$this->delContent($cc->content());
}
}
}
// delete all left over if(not) and endif statements
$this->output = preg_replace('/\[#(?:if(?:not){0,1}|endif):[a-zA-Z0-9]+(=.*?)?\]/', '', $this->output);
}
//else { echo "found no conditional tpl statements"; }
return $this->output;
}

Data Not Being Parsed Correctly

I have a simple data format that goes as follows:
stuff/stuff/stuff
An example would be:
data/test/hello/hello2
In order to retrieve a certain piece of data, one would use my parser, which tries to do the following:
In data/test/hello/hello2
You want to retrieve the data under data/test (which is hello). My parser's code is below:
function getData($data, $pattern)
{
$info = false;
$dataLineArray = explode("\n", $data);
foreach($dataLineArray as &$line)
{
if (strpos($line,$pattern) !== false) {
$lineArray = explode("/", $line);
$patternArray = explode("/", $pattern);
$iteration = 0;
foreach($lineArray as &$lineData)
{
if($patternArray[$iteration] == $lineData)
{
$iteration++;
}
else
{
$info = $lineData;
}
}
}
}
return $info;
}
However, it always seems to return the last item, which in this case is hello2:
echo getData("data/test/hello/hello2", "data/test");
Gives Me;
hello2
What am I doing wrong?
If you want the first element after the pattern, put break in the loop:
foreach($lineArray as $lineData)
{
if($patternArray[$iteration] == $lineData)
{
$iteration++;
}
elseif ($iteration == count($patternArray))
{
$info = $lineData;
break;
}
}
I also check $iteration == count($patternArray) so that it won't return intermediate elements, e.g.
/data/foo/test/hello/hello2
will return hello rather than foo.
P.S. There doesn't seem to be any reason to use references instead of ordinary variables in your loops, since you never assign to the reference variables.

Parsing Out Code Between Comment Blocks PHP

Lets say I have the following piece of start in a PHP file:
/**
* #SomethingStart
*/
protected static $var1 = '1';
protected static $var2 = '2';
protected static $var3 = '3';
/**
* #SomethingEnd
*/
I am trying to figure out how I can first parse out the content between the comments with #SomethingStart and #SomethingEnd (not including the comment and then secondly, how I can replace the content between those two tags.
You can get the contents of the file with the function:
file
http://www.php.net/manual/en/function.file.php
That returns an array of lines. Then you can use foreach, and match the line content with
$switch = false;
$lines = file('filepath');
$string = '';
foreach($lines as $k => $v)
{
if(preg_match('/#(.*)End$/'. $v))
{
$switch = false;
break;
}
if($switch == true)
{
// do replacements, or anything you want with the following lines
// or add, or remove, even if you might have some problems with it
// for this you might not consider using foreach, instead you might
// try array_walk
}
if(preg_match('/#(.*)Start$/', $v))
{
$switch = true;
}
$string .= $v;
}
echo $string;
For array_walk, read this http://www.php.net/manual/en/function.array-walk.php
Try it.

php string comparison error parsing an ini file

I have a simple ini file:
[section_one]
test = abc
[section_two]
yada = blah
#and_so=on
I wrote a parser function to update it b/c my comment char is '#' instead of ';' -- so parse_ini_file() complains. But here's my quick & dirty solution:
<?php
function edit_ini_file ($fName, $fKey, $fVal) {
print"<h4>Search: $fKey = $fVal </h4>";
// declarations
$_comntChar='#';
$_headrChar='[';
$keyMatch=FALSE;
$iniArray = array(); // new array in memory
$dataOut = ''; // datastream for output file
// temp cursor vars for looping & reporting
$verbose = 1;
$curSec = ''; // current section
$curKey = ''; // current key
$curVal = ''; // current value
$curLine=-1; // current line Number
if (isset($fName)) {
if (!is_file($fName)) return FALSE;
$lines = file($fName);
//read file as array of lines
foreach ($lines as $line) {
$curLine+=1;
if ($verbose) print '<br/>['.$curLine.'][IN:] '.$line;
//parse for k/v pairs, comments & section headings
if ( (strpos($line,$_headrChar)==1) // assume heading
|| (strpos($line,$_comntChar)==1) // assume comment
|| (!strpos($line,'=')) // also skip invalid k/v pairs
){
array_push($iniArray, $lines[$curLine] ); //stuff the entire line into array.
if ($verbose) print " - no k/v";
} else { // assume valid k/v pair
//split k/v pairs & parse for match
$pair = explode('=', $line);
$curKey = trim($pair[0]);
$curVal = trim($pair[1]);
if ($verbose) print "[KV]: k=$curKey:v=$curVal";
if (trim($curKey) === trim($fkey)) { // <=== THE BUGGER: never returns true:
$keyMatch=TRUE;
print ("MATCH: Replacing value in for key=$curKey in Section $curSec at line $curLine<br/>");
array_push ($iniArray, array($curKey => $fVal ));
} else {
array_push ($iniArray, array($curKey => $curVal ));
} //end-matcher
} //end-parser
} //end foreach
if (!$keyMatch) { //append new data to end
print "<br/>Key not Found. Appending! <br/>";
array_push ($iniArray, array($fKey => $fVal) );
}
//reformat nested array as one long string for a single bulk-write to disk.
foreach($iniArray as $curSect => $val) {
if (is_array($val)) {
foreach($val as $curKey => $curVal)
$dataOut .= "$curKey = $curVal\n";
} else { $dataOut .= "$val"; }
}
print "dataout:<pre>" .$dataOut. "</pre>";
//put file & pass return val
return (file_put_contents($filename, $dataOut)) ? TRUE : FALSE;
}//if isset
}//end-func
Basically I'm just exploding a text file line-by-line stuffing a new array and dumping it back to the disk
MY BUG: for some reason my comparison trying strcmp() or "==" or "===" never seems to work...
if (trim($curKey) === trim($fkey)) { doSomething.. }
That little BUGGER is driving me nuts b/c I know it's gotta be something stupid.
ANy point in the right direction would be appreciated...
Is it $fKey or $fkey?
Make a decision!
;)

Regex to parse define() contents, possible?

I am very new to regex, and this is way too advanced for me. So I am asking the experts over here.
Problem
I would like to retrieve the constants / values from a php define()
DEFINE('TEXT', 'VALUE');
Basically I would like a regex to be able to return the name of constant, and the value of constant from the above line. Just TEXT and VALUE . Is this even possible?
Why I need it? I am dealing with language file and I want to get all couples (name, value) and put them in array. I managed to do it with str_replace() and trim() etc.. but this way is long and I am sure it could be made easier with single line of regex.
Note: The VALUE may contain escaped single quotes as well. example:
DEFINE('TEXT', 'J\'ai');
I hope I am not asking for something too complicated. :)
Regards
For any kind of grammar-based parsing, regular expressions are usually an awful solution. Even smple grammars (like arithmetic) have nesting and it's on nesting (in particular) that regular expressions just fall over.
Fortunately PHP provides a far, far better solution for you by giving you access to the same lexical analyzer used by the PHP interpreter via the token_get_all() function. Give it a character stream of PHP code and it'll parse it into tokens ("lexemes"), which you can do a bit of simple parsing on with a pretty simple finite state machine.
Run this program (it's run as test.php so it tries it on itself). The file is deliberately formatted badly so you can see it handles that with ease.
<?
define('CONST1', 'value' );
define (CONST2, 'value2');
define( 'CONST3', time());
define('define', 'define');
define("test", VALUE4);
define('const5', //
'weird declaration'
) ;
define('CONST7', 3.14);
define ( /* comment */ 'foo', 'bar');
$defn = 'blah';
define($defn, 'foo');
define( 'CONST4', define('CONST5', 6));
header('Content-Type: text/plain');
$defines = array();
$state = 0;
$key = '';
$value = '';
$file = file_get_contents('test.php');
$tokens = token_get_all($file);
$token = reset($tokens);
while ($token) {
// dump($state, $token);
if (is_array($token)) {
if ($token[0] == T_WHITESPACE || $token[0] == T_COMMENT || $token[0] == T_DOC_COMMENT) {
// do nothing
} else if ($token[0] == T_STRING && strtolower($token[1]) == 'define') {
$state = 1;
} else if ($state == 2 && is_constant($token[0])) {
$key = $token[1];
$state = 3;
} else if ($state == 4 && is_constant($token[0])) {
$value = $token[1];
$state = 5;
}
} else {
$symbol = trim($token);
if ($symbol == '(' && $state == 1) {
$state = 2;
} else if ($symbol == ',' && $state == 3) {
$state = 4;
} else if ($symbol == ')' && $state == 5) {
$defines[strip($key)] = strip($value);
$state = 0;
}
}
$token = next($tokens);
}
foreach ($defines as $k => $v) {
echo "'$k' => '$v'\n";
}
function is_constant($token) {
return $token == T_CONSTANT_ENCAPSED_STRING || $token == T_STRING ||
$token == T_LNUMBER || $token == T_DNUMBER;
}
function dump($state, $token) {
if (is_array($token)) {
echo "$state: " . token_name($token[0]) . " [$token[1]] on line $token[2]\n";
} else {
echo "$state: Symbol '$token'\n";
}
}
function strip($value) {
return preg_replace('!^([\'"])(.*)\1$!', '$2', $value);
}
?>
Output:
'CONST1' => 'value'
'CONST2' => 'value2'
'CONST3' => 'time'
'define' => 'define'
'test' => 'VALUE4'
'const5' => 'weird declaration'
'CONST7' => '3.14'
'foo' => 'bar'
'CONST5' => '6'
This is basically a finite state machine that looks for the pattern:
function name ('define')
open parenthesis
constant
comma
constant
close parenthesis
in the lexical stream of a PHP source file and treats the two constants as a (name,value) pair. In doing so it handles nested define() statements (as per the results) and ignores whitespace and comments as well as working across multiple lines.
Note: I've deliberatley made it ignore the case when functions and variables are constant names or values but you can extend it to that as you wish.
It's also worth pointing out that PHP is quite forgiving when it comes to strings. They can be declared with single quotes, double quotes or (in certain circumstances) with no quotes at all. This can be (as pointed out by Gumbo) be an ambiguous reference reference to a constant and you have no way of knowing which it is (no guaranteed way anyway), giving you the chocie of:
Ignoring that style of strings (T_STRING);
Seeing if a constant has already been declared with that name and replacing it's value. There's no way you can know what other files have been called though nor can you process any defines that are conditionally created so you can't say with any certainty if anything is definitely a constant or not nor what value it has; or
You can just live with the possibility that these might be constants (which is unlikely) and just treat them as strings.
Personally I would go for (1) then (3).
This is possible, but I would rather use get_defined_constants(). But make sure all your translations have something in common (like all translations starting with T), so you can tell them apart from other constants.
Try this regular expression to find the define calls:
/\bdefine\(\s*("(?:[^"\\]+|\\(?:\\\\)*.)*"|'(?:[^'\\]+|\\(?:\\\\)*.)*')\s*,\s*("(?:[^"\\]+|\\(?:\\\\)*.)*"|'(?:[^'\\]+|\\(?:\\\\)*.)*')\s*\);/is
So:
$pattern = '/\\bdefine\\(\\s*("(?:[^"\\\\]+|\\\\(?:\\\\\\\\)*.)*"|\'(?:[^\'\\\\]+|\\\\(?:\\\\\\\\)*.)*\')\\s*,\\s*("(?:[^"\\\\]+|\\\\(?:\\\\\\\\)*.)*"|\'(?:[^\'\\\\]+|\\\\(?:\\\\\\\\)*.)*\')\\s*\\);/is';
$str = '<?php define(\'foo\', \'bar\'); define("define(\\\'foo\\\', \\\'bar\\\')", "define(\'foo\', \'bar\')"); ?>';
preg_match_all($pattern, $str, $matches, PREG_SET_ORDER);
var_dump($matches);
I know that eval is evil. But that’s the best way to evaluate the string expressions:
$constants = array();
foreach ($matches as $match) {
eval('$constants['.$match[1].'] = '.$match[1].';');
}
var_dump($constants);
You might not need to go overboard with the regex complexity - something like this will probably suffice
/DEFINE\('(.*?)',\s*'(.*)'\);/
Here's a PHP sample showing how you might use it
$lines=file("myconstants.php");
foreach($lines as $line) {
$matches=array();
if (preg_match('/DEFINE\(\'(.*?)\',\s*\'(.*)\'\);/i', $line, $matches)) {
$name=$matches[1];
$value=$matches[2];
echo "$name = $value\n";
}
}
Not every problem with text should be solved with a regexp, so I'd suggest you state what you want to achieve and not how.
So, instead of using php's parser which is not really useful, or instead of using a completely undebuggable regexp, why not write a simple parser?
<?php
$str = "define('nam\\'e', 'va\\\\\\'lue');\ndefine('na\\\\me2', 'value\\'2');\nDEFINE('a', 'b');";
function getDefined($str) {
$lines = array();
preg_match_all('#^define[(][ ]*(.*?)[ ]*[)];$#mi', $str, $lines);
$res = array();
foreach ($lines[1] as $cnt) {
$p = 0;
$key = parseString($cnt, $p);
// Skip comma
$p++;
// Skip space
while ($cnt{$p} == " ") {
$p++;
}
$value = parseString($cnt, $p);
$res[$key] = $value;
}
return $res;
}
function parseString($s, &$p) {
$quotechar = $s[$p];
if (! in_array($quotechar, array("'", '"'))) {
throw new Exception("Invalid quote character '" . $quotechar . "', input is " . var_export($s, true) . " # " . $p);
}
$len = strlen($s);
$quoted = false;
$res = "";
for ($p++;$p < $len;$p++) {
if ($quoted) {
$quoted = false;
$res .= $s{$p};
} else {
if ($s{$p} == "\\") {
$quoted = true;
continue;
}
if ($s{$p} == $quotechar) {
$p++;
return $res;
}
$res .= $s{$p};
}
}
throw new Exception("Premature end of line");
}
var_dump(getDefined($str));
Output:
array(3) {
["nam'e"]=>
string(7) "va\'lue"
["na\me2"]=>
string(7) "value'2"
["a"]=>
string(1) "b"
}

Categories