How do I insert html code into a php regex? - php

So I am currently in the middle of making a forums software. Something that I wanted for that forums software was a custom template engine. For the most part I have created the template engine, but I am having a small issue with the regex that I use for my IF, ELSEIF, and FOREACH statements.
The issue that I am having is that when I put a chunk of html code in to my regex, nothing will work. Here is an example: https://regex101.com/r/jlawz3/1.
Here is the PHP code that checks for the regex.
$isMatchedAgain = preg_match_all('/{IF:(.*?)}[\s]*?(.*?)[\s]*?{ELSE}[\s]*?(.*?)[\s]*?{ENDIF}/', $this->template, $elseifmatches);
for ($i = 0; $i < count($elseifmatches[0]); $i++) {
$condition = $elseifmatches[1][$i];
$trueval = $elseifmatches[2][$i];
$falseval = (isset($elseifmatches[3][$i])) ? $elseifmatches[3][$i] : false;
$res = eval('return ('.$condition.');');
if ($res===true) {
$this->template = str_replace($elseifmatches[0][$i],$trueval,$this->template);
} else {
$this->template = str_replace($elseifmatches[0][$i],$falseval,$this->template);
}
}

You can do it like this:
function render($content) {
$match = preg_match_all('/{IF:\((.*?)\)}(.*?){ELSE}(.*?)({ENDIF})/s', $content, $matches, PREG_OFFSET_CAPTURE);
if (!$match) {
return $content;
}
$beforeIf = substr($content, 0, $matches[0][0][1]);
$afterIf = substr($content, $matches[4][0][1] + strlen('{ENDIF}'));
$evalCondition = eval('return (' . $matches[1][0][0] . ');');
if ($evalCondition) {
$ifResult = $matches[2][0][0];
} else {
$ifResult = $matches[3][0][0];
}
return
$beforeIf .
$ifResult .
render($afterIf);
}
Working example.
This is a first step. This wont work for example if you have an if within an if.
Talking about mentioned security risk. Since we are using eval (nickname EVIL - for a reason). You should never ever ever process user-input through eval - or use eval at all - there is always a better solution.
For me it looks like you want to give users the ability to write "code" in their posts. If this is the case you can have a look at something like bbcode.
Whatever you do be sure to provide the desired functionality. Taking your example:
!isset($_SESSION['loggedin'])
You could do something like this:
{IS_LOGGED_IN}
Output whatever you want :)
{/IS_LOGGED_IN}
Your renderer would look specificly for this tag and act accordingly.

So after a bit of research, I have figured out that the issue I was facing could be solved by adding /ims to the end of my regex statement. So now my regex statement looks like:
/{IF:(.*?)}[\s]*?(.*?)[\s]*?{ELSE}[\s]*?(.*?)[\s]*?{ENDIF}/ims

Related

In PHP is there a better way to do dynamic templating

I'm creating module for an application which will have simple custom templating with tags that will be replaced with data from a database. The field names will be different in each instance of this module. I want to know if there is a better way to do this.
The code below is what I've come up with, But I believe there must be a better way. I struggled with preg_split and preg_match_all and just hit my limit so I did it the dumb person way.
<?php
$customTemplate = "
<div>
<<This>>
<<that>>
</div>
";
function process_template ($template, $begin = '<<', $end = '>>') {
$begin_exploded = explode($begin, $template);
if (is_array($begin_exploded)) {
foreach ($begin_exploded as $key1 => $value1) {
$end_exploded = explode($end, $value1);
if (is_array($end_exploded)) {
foreach ($end_exploded as $key2 => $value2) {
$tag = $begin.$value2.$end;
$variable = trim($value2);
$find_it = strpos($template,$tag);
if ($find_it !== false) {
//str_replace ($tag, $MyClass->get($variable), $template );
$template = str_replace ($tag, $variable, $template);
}
}
}
}
}
return $template;
}
echo(process_template($customTemplate));
/* Will Echo
<div>
This
that
</div>
*/
?>
In the future I will connect $MyClass->get() to replace the tag with the proper data. And the custom template will be built by the user.
Rather than preg_split or preg_match I would rather use preg_replace_callback, since you are doing replacements, and the replacement value is derived from what looks like will end up being a method in another class.
function process_template($template, $begin = '<<', $end = '>>') {
// get $MyClass in the function scope somehow. Maybe pass it as another parameter?
return preg_replace_callback("/$begin(\w+)$end/", function($var) use ($MyClass) {
return $MyClass->get($var[1]);
}, $template);
}
Here's an example to play with: https://3v4l.org/N1p03
I assume this is just for fun/learning. If I really needed to use a template for something I would rather start with composer require "twig/twig:^2.0" instead. In fact, if you're interested in learning more about how it works you could go check out a well-established system like twig or blade does it. (Better than I've done it in this answer.)
There are tons templating engines around, but sometimes... just add complexity and dependencies for a maybe simple thing. This a modified sample of what I used for make some javascript corrections. This works for your template.
function process_template($html,$b='<<',$e='>>'){
$replace=['this'=>'<input name="this" />','that'=>'<input name="that" />'];
if(preg_match_all('/('.$b.')(.*?)('.$e.')/is',$html,$matches,PREG_SET_ORDER|PREG_OFFSET_CAPTURE)){
$t='';$o=0;
foreach($matches as $m){
//for reference $m[1][0] contains $b, $m[2][0] contains $e
$t.=substr($html,$o,$m[0][1]-$o);
$t.=$replace[$m[2][0]];
$o=$m[3][1]+strlen($m[3][0]);
}
$t.=substr($html,$o);
$html=$t;
}
return $html;
}
$html="
<div>
<<this>>
<<that>>
</div>
";
$new=process_template($html);
echo $new;
For demo purpose I put the array $replace that handling the substitutions. You replace those with your function that will handle the replacement.
Here is a working snippet: https://3v4l.org/MBnbR
I like this function because you have control of what to replace and what to put back on the final result. By the way by using the PREG_OFFSET_CAPTURE also return on the matches the position where the regexp groups happens. Those are on the $m[x][1]. The captured text will be on $m[x][0].

Adding a query string to a random URL

I want to add a query string to a URL, however, the URL format is unpredictable. The URL can be
http://example.com/page/ -> http://example.com/page/?myquery=string
http://example.com/page -> http://example.com/page?myquery=string
http://example.com?p=page -> http://example.com?p=page&myquery=string
These are the URLs I'm thinking of, but it's possible that there are other formats that I'm not aware of.
I'm wondering if there is a standard, library or a common way to do this. I'm using PHP.
Edit: I'm using Cbroe explanation and Passerby code. There is another function by Hamza but I guess it'd be better to use PHP functions and also have cleaner/shorter code.
function addQuery($url,array $query)
{
$cache=parse_url($url,PHP_URL_QUERY);
if(empty($cache)) return $url."?".http_build_query($query);
else return $url."&".http_build_query($query);
}
// test
$test=array("http://example.com/page/","http://example.com/page","http://example.com/?p=page");
print_r(array_map(function($v){
return addQuery($v,array("myquery"=>"string"));
},$test));
Live demo
I'm wondering if there is a standard, library or a common way to do this. I'm using PHP.
Depends on how failsafe – and thereby more complex – you want it to be.
The simplest way would be to look for whether there’s a ? in the URL – if so, append &myquery=string, else append ?myquery=string. This should cover most cases of standards-compliant URLs just fine.
If you want it more complex, you could take the URL apart using parse_url and then parse_str, then add the key myquery with value string to the array the second function returns – and then put it all back together again, using http_build_query for the new query string part.
Some spaghetti Code:
echo addToUrl('http://example.com/page/','myquery', 'string').'<br>';
echo addToUrl('http://example.com/page','myquery', 'string').'<br>';
echo addToUrl('http://example.com/page/wut/?aaa=2','myquery', 'string').'<br>';
echo addToUrl('http://example.com?p=page','myquery', 'string');
function addToUrl($url, $var, $val){
$array = parse_url($url);
if(isset($array['query'])){
parse_str($array['query'], $values);
}
$values[$var] = $val;
unset($array['query']);
$options = '';$c = count($values) - 1;
$i=0;
foreach($values as $k => $v){
if($i == $c){
$options .= $k.'='.$v;
}else{
$options .= $k.'='.$v.'&';
}
$i++;
}
$return = $array['scheme'].'://'.$array['host'].(isset($array['path']) ? $array['path']: '') . '?' . $options;
return $return;
}
Results:
http://example.com/page/?myquery=string
http://example.com/page?myquery=string
http://example.com/page/wut/?aaa=2&myquery=string
http://example.com?p=page&myquery=string
You should try the http_build_query() function, I think that's what you're looking for, and maybe a bit of parse_str(), too.

How can i have counter for php preg_match?

function getContent($xml,$tag,$id="") {
if ($id=="") {
$tag_regex = '/<'.$tag.'[^>]*>(.*?)<\/'.$tag.'>/si';
} else {
$tag_regex = '/<'.$tag.'[^>]*id=[\'"]'.$id.'[\'"]>(.*?)<\/'.$tag.'>/si';
}
preg_match($tag_regex,$xml,$matches);
return $matches[1];
}
$omg = file_get_contents("Generated/index.php");
$extract = getContent($omg,"div","lolz2");
echo $extract;
For example i have something like this. And html have something like this inside:
<div id="lolz">qwg1eqwe</div>
<div id="lolz1"><div id='lolz2'>qwdqw2cq</div>asd3qwe</div>
If we search for id lolz we get the correct answer, but if we search for lolz1 we stop at first </div> that's inner <div id="lolz2">. It's possible to keep something like counter for preg_match that's will keep how many <div>'s i pass till i find </div>?
HTML isn't a regular language, so building something like that would be overkill and is the job of an HTML parser. Please see: RegEx match open tags except XHTML self-contained tags.
The reason your code was failing however was because you were using both single and double quotes in your input but your regex didn't account for it. This works for me:
function getContent($xml,$tag,$id="") {
if ($id=="") {
$tag_regex = '/<'.$tag.'[^>]*>(.*?)<\/'.$tag.'>/si';
} else {
$tag_regex = '/<'.$tag.'[^>]*id=[\\\'"]'.$id.'[\\\'"]>(.*?)<\/'.$tag.'>/si';;
}
preg_match($tag_regex,$xml,$matches);
return $matches[1];
}
$omg = '<div id="lolz">qwg1eqwe</div>
<div id="lolz1"><div id="lolz2">qwdqw2cq</div>asd3qwe</div>';
$extract = getContent($omg,"div","lolz2");
var_dump($extract);
As long as you don't have nested elements this code will work and you won't need to use a DOM parser, though you really should for anything more complicated that might be nested (e.g. you don't have control over the input).

Is using PHP's explode() for HTML scraping considered a bad practice?

I have been coding for a while now but just can't seem to get my head around regular expressions.
This brings me to my question which is the following: is it bad practice to use PHP's explode for breaking up a string of html code to select bits of text? I need to scrape a page for various bits of information and due to my horrific regex knowledge (In a full software engineering degree I had to write maybe one....) I decided upon using explode().
I have provided my code below so someone more seasoned than me can tell me if it's essential that I use regex for this or not!
public function split_between($start, $end, $blob)
{
$strip = explode($start,$blob);
$strip2 = explode($end,$strip[1]);
return $strip2[0];
}
public function get_abstract($pubmed_id)
{
$scrapehtml = file_get_contents("http://www.ncbi.nlm.nih.gov/m/pubmed/".$pubmed_id);
$data['title'] = $this->split_between('<h2>','</h2>',$scrapehtml);
$data['authors'] = $this->split_between('<div class="auth">','</div>',$scrapehtml);
$data['journal'] = $this->split_between('<p class="j">','</p>',$scrapehtml);
$data['aff'] = $this->split_between('<p class="aff">','</p>',$scrapehtml);
$data['abstract'] = str_replace('<p class="no_t_m">','',str_replace('</p>','',$this->split_between('<h3 class="no_b_m">Abstract','</div>',$scrapehtml)));
$strip = explode('<div class="ids">', $scrapehtml);
$strip2 = explode('</div>', $strip[1]);
$ids[] = $strip2[0];
$id_test = strpos($strip[2],"PMCID");
if (isset($strip[2]) && $id_test !== false)
{
$step = explode('</div>', $strip[2]);
$ids[] = $step[0];
}
$id_count = 0;
foreach ($ids as &$value) {
$value = str_replace("<h3>", "", $value);
$data['ids'][$id_count]['id'] = str_replace("</h3>", "", str_replace('<span>','',str_replace('</span>','',$value)));
$id_count++;
}
$jsonAbstract = json_encode($data);
echo $this->indent($jsonAbstract);
}
I highly recommend you try out the PHP Simple HTML DOM Parser library. It handles invalid HTML and has been designed to solve the same problem you're working on.
A simple example from the documentation is as follows:
// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');
// Find all images
foreach($html->find('img') as $element)
echo $element->src . '<br>';
// Find all links
foreach($html->find('a') as $element)
echo $element->href . '<br>';
It's not essential to use regular expressions for anything, although it'll be useful to get comfortable with them and know when to use them.
It looks like your scraping PubMed, which I'm guessing has fairly static mark-up in terms of mark-up. If what you have works and performs as you hope I can't see any reason to switch over to using regular expressions, they're not necessarily going to be any quicker in this example.
Learn regular expressions and try to use a language that has libraries for this kind of task like perl or python. It will save you a lot of time.
At first they might seem daunting but they are really easy for most of the tasks.
Try reading this: http://perldoc.perl.org/perlre.html

Need a regex to add css class to first and last list item

UPDATE:
Thank you all for your input. Some additional information.
It's really just a small chunk of markup (20 lines) I'm working with and had aimed to to leverage a regex to do the work.
I also do have the ability to hack up the script (an ecommerce one) to insert the classes as the navigation is built. I wanted to limit the number of hacks I have in place to keep things easier on myself when I go to update to the latest version of the software.
With that said, I'm pretty aware of my situation and the various options available to me. The first part of my regex works as expected. I posted really more or less to see if someone would say, "hey dummy, this is easy just change this....."
After coming close with a few of my efforts, it's more of the principle at this point. To just know (and learn) a solution exists for this problem. I also hate being beaten by a piece of code.
ORIGINAL:
I'm trying to leverage regular expressions to add a CSS a class to the first and last list items within an ordered list. I've tried a bunch of different ways but can't produce the results I'm looking for.
I've got a regular expression for the first list item but can't seem to figure a correct one out for the last. Here is what I'm working with:
$patterns = array('/<ul+([^<]*)<li/m', '/<([^<]*)(?<=<li)(.*)<\/ul>/s');
$replace = array('<ul$1<li class="first"','<li class="last"$2$3</ul>');
$navigation = preg_replace($patterns, $replace, $navigation);
Any help would be greatly appreciated.
Jamie Zawinski would have something to say about this...
Do you have a proper HTML parser? I don't know if there's anything like hpricot available for PHP, but that's the right way to deal with it. You could at least employ hpricot to do the first cleanup for you.
If you're actually generating the HTML -- do it there. It looks like you want to generate some navigation and have a .first and .last kind of thing on it. Take a step back and try that.
+1 to generating the right html as the best option.
But a completely different approach, which may or may not be acceptable to you: you could use javascript.
This uses jquery to make it easy ...
$(document).ready(
function() {
$('#id-of-ul:firstChild').addClass('first');
$('#id-of-ul:lastChild').addClass('last');
}
);
As I say, may or may not be any use in this case, but I think its a valid solution to the problem in some cases.
PS: You say ordered list, then give ul in your example. ol = ordered list, ul = unordered list
You wrote:
$patterns = array('/<ul+([^<]*)<li/m','/<([^<]*)(?<=<li)(.*)<\/ul>/s');
First pattern:
ul+ => you search something like ullll...
The m modifier is useless here, since you don't use ^ nor $.
Second pattern:
Using .* along with s is "dangerous", because you might select the whole document up to the last /ul of the page...
And well, I would just drop s modifier and use: (<li\s)(.*?</li>\s*</ul>) with replace: '$1class="last" $2'
In view of above remarks, I would write the first expression: <ul.*?>\s*<li
Although I am tired of seeing the Jamie Zawinski quote each time there is a regex question, Dustin is right in pointing you to a HTML parser (or just generating the right HTML from the start!): regexes and HTML doesn't mix well, because HTML syntax is complex, and unless you act on a well known machine generated output with very predictable result, you are prone to get something breaking in some cases.
I don't know if anyone cares any longer, but I have a solution that works in my simple test case (and I believe it should work in the general case).
First, let me point out two things: While PhiLho is right in that the s is "dangerous", since dots may match everything up to the final of the document, this may very well be what you want. It only becomes a problem with not well formed pages. Be careful with any such regex on large, manually written pages.
Second, php has a special meaning of backslashes, even in single quotes. Most regexen will perform well either way, but you should always double-escape them, just in case.
Now, here's my code:
<?php
$navigation='<ul>
<li>Coffee</li>
<li>Tea</li>
<li>Milk</li>
<li>Beer</li>
<li>Water</li>
</ul>';
$patterns = array('/<ul.*?>\\s*<li/',
'/<li((.(?<!<li))*?<\\/ul>)/s');
$replace = array('$0 class="first"',
'<li class="last"$1');
$navigation = preg_replace($patterns, $replace, $navigation);
echo $navigation;
?>
This will output
<ul>
<li class="first">Coffee</li>
<li>Tea</li>
<li>Milk</li>
<li>Beer</li>
<li class="last">Water</li>
</ul>
This assumes no line feeds inside the opening <ul...> tag. If there are any, use the s modifier on the first expression too.
The magic happens in (.(?<!<li))*?. This will match any character (the dot) that is not the beginning of the string <li, repeated any amount of times (the *) in a non-greedy fashion (the ?).
Of course, the whole thing would have to be expanded if there is a chance the list items already have the class attribute set. Also, if there is only one list item, it will match twice, giving it two such attributes. At least for xhtml, this would break validation.
You could load the navigation in a SimpleXML object and work with that. This prevents you from breaking your markup with some crazy regex :)
As a preface .. this is waaay over-complicating things in most use-cases. Please see other answers for more sanity :)
Here is a little PHP class I wrote to solve a similar problem. It adds 'first', 'last' and any other classes you want. It will handle li's with no "class" attribute as well as those that already have some class(es).
<?php
/**
* Modify list items in pre-rendered html.
*
* Usage Example:
* $replaced_text = ListAlter::addClasses($original_html, array('cool', 'awsome'));
*/
class ListAlter {
private $classes = array();
private $classes_found = FALSE;
private $count = 0;
private $total = 0;
// No public instances.
private function __construct() {}
/**
* Adds 'first', 'last', and any extra classes you want.
*/
static function addClasses($html, $extra_classes = array()) {
$instance = new self();
$instance->classes = $extra_classes;
$total = preg_match_all('~<li([^>]*?)>~', $html, $matches);
$instance->total = $total ? $total : 0;
return preg_replace_callback('~<li([^>]*?)>~', array($instance, 'processListItem'), $html);
}
private function processListItem($matches) {
$this->count++;
$this->classes_found = FALSE;
$processed = preg_replace_callback('~(\w+)="(.*?)"~', array($this, 'appendClasses'), $matches[0]);
if (!$this->classes_found) {
$classes = $this->classes;
if ($this->count == 1) {
$classes[] = 'first';
}
if ($this->count == $this->total) {
$classes[] = 'last';
}
if (!empty($classes)) {
$processed = rtrim($matches[0], '>') . ' class="' . implode(' ', $classes) . '">';
}
}
return $processed;
}
private function appendClasses($matches) {
array_shift($matches);
list($name, $value) = $matches;
if ($name == 'class') {
$value = array_filter(explode(' ', $value));
$value = array_merge($value, $this->classes);
if ($this->count == 1) {
$value[] = 'first';
}
if ($this->count == $this->total) {
$value[] = 'last';
}
$value = implode(' ', $value);
$this->classes_found = TRUE;
}
return sprintf('%s="%s"', $name, $value);
}
}

Categories