I need to perform some modifications to PHP files (PHTML files to be exact, but they are still valid PHP files), from a Bash script. My original thought was to use sed or similar utility with regex, but reading some of the replies here for other HTML parsing questions it seems that there might be a better solution.
The problem I was facing with the regex was a lack of support for detecting if the string I wanted to match: (src|href|action)=["']/ was in <?php ?> tags or not, so that I could then either perform string concatenation if the match was in PHP tags, or add in new PHP tags should it not be. For example:
(1) <img id="icon-loader-small" src="/css/images/loader-small.gif" style="vertical-align:middle; display:none;"/>
(2) <li><span class="name"><?php echo $this->loggedInAs()?></span> | Logout</li>
(3) <?php echo ($watched_dir->getExistsFlag())?"":"<span class='ui-icon-alert'><img src='/css/images/warning-icon.png'></span>"?><span><?php echo $watched_dir->getDirectory();?></span></span><span class="ui-icon ui-icon-close"></span>
(EDIT: 4) <form method="post" action="/Preference/stream-setting" enctype="application/x-www-form-urlencoded" onsubmit="return confirm('<?php echo $this->confirm_pypo_restart_text ?>');">
In (1) there a src="/css, and as it is not in PHP tags I want that to become src="<?php echo $baseUrl?>/css. In (2), there is a PHP tag but it is not around the href="/Login, so it also becomes href="<?php echo $baseUrl?>/Login.
Unfortunately, (3) has src='/css but inside the PHP tags (it is an echoed string). It is also quoted by " in the PHP code, so the modification needs to pick up on that too. The final result would look something like: src='".$baseUrl."/css.
All the other modifications to my HTML and PHP files have been done using a regex (I know, I know...). If regexes could support matching everything except a certain pattern, like [^(<\?php)(\?>)]* then I would be flying through this part. Unfortunately it seems that this is Type 2 grammar territory. So - what should I use?
Ideally it needs to be installed by default with the GNU suite, but other tools like PHP itself or other interpreters are fine too, just not preferred. Of course, if someone could structure a regex that would work on the above examples, then that would be excellent.
EDIT: (4) is the nasty match, where most regexes will fail.
The way I solved this problem was by separating my file into sections that were encapsulated by . The script kept track of the 'context' it was currently in - by default set to html but switching to php when it hit those tags. An operation (not necessarily a regex) then performs on that section, which is then appended to the output buffer. When the file is completely processed the output buffer is written back into the file.
I attempted to do this with sed, but I faced the problem of not being able to control where newlines would be printed. The context based logic was also hardcoded meaning it would be tedious to add in a new context, like ASP.NET support for example. My current solution is written in Perl and mitigates both problems, although I am having a bit of trouble getting my regex to actually do something, but this might just be me coding my regex incorrectly.
Script is as follows:
#!/usr/bin/perl -w
use strict;
#Prototypes
sub readFile(;\$);
sub writeFile($);
#Constants
my $file;
my $outputBuffer;
my $holdBuffer;
# Regexes should have s and g modifiers
# Pattern is in $_
my %contexts = (
html => {
operation => ''
},
php => {
openTag => '<\?php |<\? ', closeTag => '\?>', operation => ''
},
js => {
openTag => '<script>', closeTag => '<\/script>', operation => ''
}
);
my $currentContext = 'html';
my $combinedOpenTags;
#Initialisation
unshift(#ARGV, '-') unless #ARGV;
foreach my $key (keys %contexts) {
if($contexts{$key}{openTag}) {
if($combinedOpenTags) {
$combinedOpenTags .= "|".$contexts{$key}{openTag};
} else {
$combinedOpenTags = $contexts{$key}{openTag};
}
}
}
#Main loop
while(readFile($holdBuffer)) {
$outputBuffer = '';
while($holdBuffer) {
$currentContext = "html";
foreach my $key (keys %contexts) {
if(!$contexts{$key}{openTag}) {
next;
}
if($holdBuffer =~ /\A($contexts{$key}{openTag})/) {
$currentContext = $key;
last;
}
}
if($currentContext eq "html") {
$holdBuffer =~ s/\A(.*?)($combinedOpenTags|\z)/$2/s;
$_ = $1;
} else {
$holdBuffer =~ s/\A(.*?$contexts{$currentContext}{closeTag}|\z)//s;
$_ = $1;
}
eval($contexts{$currentContext}{operation});
$outputBuffer .= $_;
}
writeFile($outputBuffer);
}
# readFile: read file into $_
sub readFile(;\$) {
my $argref = #_ ? shift() : \$_;
return 0 unless #ARGV;
$file = shift(#ARGV);
open(WORKFILE, "<$file") || die("$0: can't open $file for reading ($!)\n");
local $/;
$$argref = <WORKFILE>;
close(WORKFILE);
return 1;
}
# writeFile: write $_[0] to file
sub writeFile($) {
open(WORKFILE, ">$file") || die("$0: can't open $file for writing ($!)\n");
print WORKFILE $_[0];
close(WORKFILE);
}
I hope that this can be used and modified by others to suit their needs.
Related
So I've got a concept of how to do this - but actually implementing me is a bit of a stumper for myself; mostly due to my lack of regex experience - but let's get into it.
I'd like to 'parse' through a 'php' file that could contain something like the following:
<?php
function Something()
{
}
?>
<html>
<body>
<? Something(); ?>
</body>
</html>
<?php
// Some more code or something
?>
If interpreted exactly - the above is worthless jibberish - but it is a good example of what I'd like to be able to parse, or interpret...
The idea is that I would read the contents of the above file, and break it out into an ordered array of its respective pieces; while tracking what 'type' each 'segment' is, so that I can either simply echo it, or run an 'eval()' on it.
Effectively, I'd like to end up with an array something like this:
$FileSegments = array();
$FileSegments[0]['type'] = "PHP";
$FileSegments[0]['content'] = "
function Something()
{
}";
$FileSegments[1]['type'] = "HTML";
$FileSegments[1]['content'] = "
<html>
<body>";
$FileSegments[2]['type'] = "PHP";
$FileSegments[2]['content'] = "Something();"
And so on...
The initial idea was to simply 'include()' or 'require()' the file in question, and grab its output from the output buffer - but it dawned on me that I would like to be able to inject some 'top level' variables into each one of these files before evaluating the code. To do this, I would have to 'eval()' my injected code, with the contents of the file after said injection - but in order to do this with the ability to handle raw HTML in the file too, I would have to basically write a temporary clone of the whole file, that just had my injected code written before the actual contents... Cumbersome, and slow.
I hope you're all following here... If not I can clarify...
The only other piece I feel I should note before finalizing this question; is that I would like to retain any variables or symbols in general ( for instance the 'Something() function ) created in segments 0 and 2, for instance, and pass them down to segment '4'... I feel like this might be achievable using the extract method, and then manually writing in those pieces of data before my next segment executes - but again I'm shooting a little in the dark on that.
If anyone has a better approach, or can give me some brief code on just extracting these 'segments' out of a file, I would be ecstatic.
cheers
ETA: It dawns on me that I can probably pose this question a little more simply: If there isn't a 'simple' way to do the above, is there a way to handle a String in the exact same way that 'require()' and 'include()' handle a File?
<?php
$str = file_get_contents('filename.php');
// get values from starting characters
$php_full = array_filter(explode('<?php', $str));
$php = array_filter(explode('<?', $str));
$html = array_filter(explode('?>', $str));
// remove values after last expected characters
foreach ($php_full as $key => $value) {
$php_full_result[] = substr($value, 0, strpos($value, '?>'));
}
foreach ($php as $key => $value) {
if( strpos($value,'php') !== 0 )
{
$php_result[] = substr($value, 0, strpos($value, '?>'));
}
}
$html_result[] = substr($str, 0, strpos($str, '<?'));
foreach ($html as $key => $value) {
$html_result[] = substr($value, 0, strpos($value, '<?'));
}
$html_result = array_filter($html_result);
echo '<pre>';
print_r($php_full_result);
echo '</pre>';
echo '<pre>';
print_r($php_result);
echo '</pre>';
echo '<pre>';
var_dump($html_result);
echo '</pre>';
?>
This will give you 3 arrays of file segments you want, not the exact format you wanted but you can easily modify this arrays to your needs.
For "I'd like to break all of my '$GLOBALS' variables out into their 'simple' names" part you can use extract like
extract($GLOBALS);
I have a file like this
**buffer.php**
ob_start();
<h1>Welcome</h1>
{replace_me_with_working_php_include}
<h2>I got a problem..</h2>
ob_end_flush();
Everything inside the buffer is dynamically made with data from the database.
And inserting php into the database is not an option.
The issue is, I got my output buffer and i want to replace '{replace}' with a working php include, which includes a file that also has some html/php.
So my actual question is: How do i replace a string with working php-code in a output-buffer?
I hope you can help, have used way to much time on this.
Best regards - user2453885
EDIT - 25/11/14
I know wordpress or joomla is using some similar functions, you can write {rate} in your post, and it replaces it with a rating system(some rate-plugin). This is the secret knowledge I desire.
You can use preg_replace_callback and let the callback include the file you want to include and return the output. Or you could replace the placeholders with textual includes, save that as a file and include that file (sort of compile the thing)
For simple text you could do explode (though it's probably not the most efficient for large blocks of text):
function StringSwap($text ="", $rootdir ="", $begin = "{", $end = "}") {
// Explode beginning
$go = explode($begin,$text);
// Loop through the array
if(is_array($go)) {
foreach($go as $value) {
// Split ends if available
$value = explode($end,$value);
// If there is an end, key 0 should be the replacement
if(count($value) > 1) {
// Check if the file exists based on your root
if(is_file($rootdir . $value[0])) {
// If it is a real file, mark it and remove it
$new[]['file'] = $rootdir . $value[0];
unset($value[0]);
}
// All others set as text
$new[]['txt'] = implode($value);
}
else
// If not an array, not a file, just assign as text
$new[]['txt'] = $value;
}
}
// Loop through new array and handle each block as text or include
foreach($new as $block) {
if(isset($block['txt'])) {
echo (is_array($block['txt']))? implode(" ",$block['txt']): $block['txt']." ";
}
elseif(isset($block['file'])) {
include_once($block['file']);
}
}
}
// To use, drop your text in here as a string
// You need to set a root directory so it can map properly
StringSwap($text);
I might be misunderstanding something here, but something simple like this might work?
<?php
# Main page (retrieved from the database or wherever into a variable - output buffer example shown)
ob_start();
<h1>Welcome</h1>
{replace_me_with_working_php_include}
<h2>I got a problem..</h2>
$main = ob_get_clean();
# Replacement
ob_start();
include 'whatever.php';
$replacement = ob_get_clean();
echo str_replace('{replace_me_with_working_php_include}', $replacement, $main);
You can also use a return statement from within an include file if you wish to remove the output buffer from that task too.
Good luck!
Ty all for some lovely input.
I will try and anwser my own question as clear as I can.
problem: I first thought that I wanted to implement a php-function or include inside a buffer. This however is not what I wanted, and is not intended.
Solution: Callback function with my desired content. By using the function preg_replace_callback(), I could find the text I wanted to replace in my buffer and then replace it with whatever the callback(function) would return.
The callback then included the necessary files/.classes and used the functions with written content in it.
Tell me if you did not understand, or want to elaborate/tell more about my solution.
I have a file which reads as follows
<<row>> 1|test|20110404<</row>>
<<row>> 1|test|20110404<</row>>
<<row>><</row>> indicates start and end of line.I want to read line between this tags and also check whether this tags are present.
The first thing you need to do is locate the position of this "tag". The strpos() function does just that.
$tag_pos=strpos('<> 1|test|20110404<> <> 1|test|20110404<>', '<>');
if ($tag_pos===false) {
//The tag was not found!
} else {
//$tag_pos equals the numeric position of the first character of your tag
}
If these are truly lines, an efficient way to get them all is just to split on <>.
$lines=explode('<>', '<> 1|test|20110404<> <> 1|test|20110404<>');
$lines=array_filter($lines); //Removes blank strings from array
You could improve this by adding a callback function to the array_filter() call that uses trim() to remove any whitespace and then see if it is blank or not.
Edit: Great, I see that your "tags" were missing from your post. Since your start and end tags do not match, the code above will be of little use to you. Let me try again...
function strbetweenstrs($source, $tag1, $tag2, $casesensitive=true) {
$whatsleft=$source;
while ($whatsleft<>'') {
if ($casesensitive) {
$pos1=strpos($whatsleft, $str1);
$pos2=strpos($whatsleft, $str2, $pos1+strlen($str1));
} else {
$pos1=strpos(strtoupper($whatsleft), strtoupper($str1));
$pos2=strpos(strtoupper($whatsleft), strtoupper($str2), $pos1+strlen($str1));
}
if (($pos1===false) || ($pos2===false)) {
break;
}
array_push($results, substr($whatsleft, $pos1+strlen($str1), $pos2-($pos1_strlen($str1))));
$whatsleft=substr($whatsleft, $pos2+strlen($str2));
}
}
Note that I haven't tested this... but you get the generally idea. There is probably a much more efficient way to go about doing it.
Creating your own format is not so hard, but creating a script to read it can be difficult.
The advantage of using standardized formats is that most programming languages has support for them already. For example:
XML: You can use the simplexml_load_string() function and it can make you navigate easily through your content.
$str = "<?xml version="1.0" encoding="utf-8"?>
<data>
<row>1|test|20110404</row>
<row>1|test|20110404</row>
</data>";
$xml = simplexml_load_string($str);
Now you can access your data
echo $xml->row[0];
echo $xml->row[1];
i'm sure you get the idea,
there is also a very good support for JSON (Javascript Object Notation) using the jsondecode() function;
Check it on php.net for more details
i would suggest to use preg_match :-
preg_match( '#<< row>>(.*)<< /row>>#', $line, $matches);
if( ! empty($matches))
{
// line was found
print_r( $matches[1] ); // will contain the content between the start and end row tags
}
I have a text ($text) and an array of words ($tags). These words in the text should be replaced with links to other pages so they don't break the existing links in the text. In CakePHP there is a method in TextHelper for doing this but it is corrupted and it breaks the existing HTML links in the text. The method suppose to work like this:
$text=Text->highlight($text,$tags,'\1',1);
Below there is existing code in CakePHP TextHelper:
function highlight($text, $phrase, $highlighter = '<span class="highlight">\1</span>', $considerHtml = false) {
if (empty($phrase)) {
return $text;
}
if (is_array($phrase)) {
$replace = array();
$with = array();
foreach ($phrase as $key => $value) {
$key = $value;
$value = $highlighter;
$key = '(' . $key . ')';
if ($considerHtml) {
$key = '(?![^<]+>)' . $key . '(?![^<]+>)';
}
$replace[] = '|' . $key . '|ix';
$with[] = empty($value) ? $highlighter : $value;
}
return preg_replace($replace, $with, $text);
} else {
$phrase = '(' . $phrase . ')';
if ($considerHtml) {
$phrase = '(?![^<]+>)' . $phrase . '(?![^<]+>)';
}
return preg_replace('|'.$phrase.'|i', $highlighter, $text);
}
}
You can see (and run) this algorithm here:
http://www.exorithm.com/algorithm/view/highlight
It can be made a little better and simpler with a few changes, but it still isn't perfect. Though less efficient, I'd recommend one of Ben Doom's solutions.
Replacing text in HTML is fundamentally different than replacing plain text. To determine whether text is part of an HTML tag requires you to find all the tags in order not to consider them. Regex is not really the tool for this.
I would attempt one of the following solutions:
Find the positions of all the words. Working from last to first, determine if each is part of a tag. If not, add the anchor.
Split the string into blocks. Each block is either a tag or plain text. Run your replacement(s) on the plain text blocks, and re-assemble.
I think the first one is probably a bit more efficient, but more prone to programmer error, so I'll leave it up to you.
If you want to know why I'm not approaching this problem directly, look at all the questions on the site about regex and HTML, and how regex is not a parser.
This code works just fine. What you may need to do is check the CSS for the <span class="highlight"> and make sure it is set to some color that will allow you to distinguish that it is high lighted.
.highlight { background-color: #FFE900; }
Amorphous - I noticed Gert edited your post. Are the two code fragments exactly as you posted them?
So even though the original code was designed for highlighting, I understand you're trying to repurpose it for generating links - it should, and does work fine for that (tested as posted).
HOWEVER escaping in the first code fragment could be an issue.
$text=Text->highlight($text,$tags,'\1',1);
Works fine... but if you use speach marks rather than quote marks the backslashes disappear as escape marks - you need to escape them. If you don't you get %01 links.
The correct way with speach marks is:
$text=Text->highlight($text,$tags,"\\1",1);
(Notice the use of \1 instead of \1)
UPDATE:
Thank you all for your input. Some additional information.
It's really just a small chunk of markup (20 lines) I'm working with and had aimed to to leverage a regex to do the work.
I also do have the ability to hack up the script (an ecommerce one) to insert the classes as the navigation is built. I wanted to limit the number of hacks I have in place to keep things easier on myself when I go to update to the latest version of the software.
With that said, I'm pretty aware of my situation and the various options available to me. The first part of my regex works as expected. I posted really more or less to see if someone would say, "hey dummy, this is easy just change this....."
After coming close with a few of my efforts, it's more of the principle at this point. To just know (and learn) a solution exists for this problem. I also hate being beaten by a piece of code.
ORIGINAL:
I'm trying to leverage regular expressions to add a CSS a class to the first and last list items within an ordered list. I've tried a bunch of different ways but can't produce the results I'm looking for.
I've got a regular expression for the first list item but can't seem to figure a correct one out for the last. Here is what I'm working with:
$patterns = array('/<ul+([^<]*)<li/m', '/<([^<]*)(?<=<li)(.*)<\/ul>/s');
$replace = array('<ul$1<li class="first"','<li class="last"$2$3</ul>');
$navigation = preg_replace($patterns, $replace, $navigation);
Any help would be greatly appreciated.
Jamie Zawinski would have something to say about this...
Do you have a proper HTML parser? I don't know if there's anything like hpricot available for PHP, but that's the right way to deal with it. You could at least employ hpricot to do the first cleanup for you.
If you're actually generating the HTML -- do it there. It looks like you want to generate some navigation and have a .first and .last kind of thing on it. Take a step back and try that.
+1 to generating the right html as the best option.
But a completely different approach, which may or may not be acceptable to you: you could use javascript.
This uses jquery to make it easy ...
$(document).ready(
function() {
$('#id-of-ul:firstChild').addClass('first');
$('#id-of-ul:lastChild').addClass('last');
}
);
As I say, may or may not be any use in this case, but I think its a valid solution to the problem in some cases.
PS: You say ordered list, then give ul in your example. ol = ordered list, ul = unordered list
You wrote:
$patterns = array('/<ul+([^<]*)<li/m','/<([^<]*)(?<=<li)(.*)<\/ul>/s');
First pattern:
ul+ => you search something like ullll...
The m modifier is useless here, since you don't use ^ nor $.
Second pattern:
Using .* along with s is "dangerous", because you might select the whole document up to the last /ul of the page...
And well, I would just drop s modifier and use: (<li\s)(.*?</li>\s*</ul>) with replace: '$1class="last" $2'
In view of above remarks, I would write the first expression: <ul.*?>\s*<li
Although I am tired of seeing the Jamie Zawinski quote each time there is a regex question, Dustin is right in pointing you to a HTML parser (or just generating the right HTML from the start!): regexes and HTML doesn't mix well, because HTML syntax is complex, and unless you act on a well known machine generated output with very predictable result, you are prone to get something breaking in some cases.
I don't know if anyone cares any longer, but I have a solution that works in my simple test case (and I believe it should work in the general case).
First, let me point out two things: While PhiLho is right in that the s is "dangerous", since dots may match everything up to the final of the document, this may very well be what you want. It only becomes a problem with not well formed pages. Be careful with any such regex on large, manually written pages.
Second, php has a special meaning of backslashes, even in single quotes. Most regexen will perform well either way, but you should always double-escape them, just in case.
Now, here's my code:
<?php
$navigation='<ul>
<li>Coffee</li>
<li>Tea</li>
<li>Milk</li>
<li>Beer</li>
<li>Water</li>
</ul>';
$patterns = array('/<ul.*?>\\s*<li/',
'/<li((.(?<!<li))*?<\\/ul>)/s');
$replace = array('$0 class="first"',
'<li class="last"$1');
$navigation = preg_replace($patterns, $replace, $navigation);
echo $navigation;
?>
This will output
<ul>
<li class="first">Coffee</li>
<li>Tea</li>
<li>Milk</li>
<li>Beer</li>
<li class="last">Water</li>
</ul>
This assumes no line feeds inside the opening <ul...> tag. If there are any, use the s modifier on the first expression too.
The magic happens in (.(?<!<li))*?. This will match any character (the dot) that is not the beginning of the string <li, repeated any amount of times (the *) in a non-greedy fashion (the ?).
Of course, the whole thing would have to be expanded if there is a chance the list items already have the class attribute set. Also, if there is only one list item, it will match twice, giving it two such attributes. At least for xhtml, this would break validation.
You could load the navigation in a SimpleXML object and work with that. This prevents you from breaking your markup with some crazy regex :)
As a preface .. this is waaay over-complicating things in most use-cases. Please see other answers for more sanity :)
Here is a little PHP class I wrote to solve a similar problem. It adds 'first', 'last' and any other classes you want. It will handle li's with no "class" attribute as well as those that already have some class(es).
<?php
/**
* Modify list items in pre-rendered html.
*
* Usage Example:
* $replaced_text = ListAlter::addClasses($original_html, array('cool', 'awsome'));
*/
class ListAlter {
private $classes = array();
private $classes_found = FALSE;
private $count = 0;
private $total = 0;
// No public instances.
private function __construct() {}
/**
* Adds 'first', 'last', and any extra classes you want.
*/
static function addClasses($html, $extra_classes = array()) {
$instance = new self();
$instance->classes = $extra_classes;
$total = preg_match_all('~<li([^>]*?)>~', $html, $matches);
$instance->total = $total ? $total : 0;
return preg_replace_callback('~<li([^>]*?)>~', array($instance, 'processListItem'), $html);
}
private function processListItem($matches) {
$this->count++;
$this->classes_found = FALSE;
$processed = preg_replace_callback('~(\w+)="(.*?)"~', array($this, 'appendClasses'), $matches[0]);
if (!$this->classes_found) {
$classes = $this->classes;
if ($this->count == 1) {
$classes[] = 'first';
}
if ($this->count == $this->total) {
$classes[] = 'last';
}
if (!empty($classes)) {
$processed = rtrim($matches[0], '>') . ' class="' . implode(' ', $classes) . '">';
}
}
return $processed;
}
private function appendClasses($matches) {
array_shift($matches);
list($name, $value) = $matches;
if ($name == 'class') {
$value = array_filter(explode(' ', $value));
$value = array_merge($value, $this->classes);
if ($this->count == 1) {
$value[] = 'first';
}
if ($this->count == $this->total) {
$value[] = 'last';
}
$value = implode(' ', $value);
$this->classes_found = TRUE;
}
return sprintf('%s="%s"', $name, $value);
}
}