Using preg_replace_callback to identify and manipulate latex code - php

I have latex + html code somewhere in the following form:
...some text1.... \[latex-code1\]....some text2....\[latex-code2\]....etc
Firstly I want to obtain the latex codes in an array codes[] to be able to send them to a server for rendering, so that
code[0]=latex-code1, code[1]=latex-code2, etc
Secondly, I want to modify this text so that it looks like:
...some text1.... <img src="root/1.png">....some text2....<img src="root/2.png">....etc
i.e, the i-th latex code fragment is replaced by the link to the i-th rendered image.
I have been trying to do this with preg_replace_callback and preg_match_all but being new to PHP haven't been able to make it work. Please advise.

If you're looking for codez:
$html = '...some text1.... \[latex-code1\]....some text2....\[latex-code2\]....etc';
$codes = array();
$count = 0;
$replace = function($matches) use (&$codes, &$count) {
list(, $codes[]) = $matches;
return sprintf('<img src="root/%d.png">', ++$count);
};
$changed = preg_replace_callback('~\\\\\\[(.+?)\\\\\\]~', $replace, $html);
echo "Original: $html\n";
echo "Changed : $changed\n\nLatex Codes: ", print_r($codes, 1), "Count: ", $count;
I don't know at which part you've got the problems, if it's the regex pattern, you use characters inside your markers that needs heavy escaping: For PHP and PCRE, that's why there are so many slashes.
Another tricky part is the callback function because it needs to collect the codes as well as having a counter. It's done in the example with an anonymous function that has variable aliases / references in it's use clause. This makes the variables $codes and $count available inside the callback.

Related

In PHP is there a better way to do dynamic templating

I'm creating module for an application which will have simple custom templating with tags that will be replaced with data from a database. The field names will be different in each instance of this module. I want to know if there is a better way to do this.
The code below is what I've come up with, But I believe there must be a better way. I struggled with preg_split and preg_match_all and just hit my limit so I did it the dumb person way.
<?php
$customTemplate = "
<div>
<<This>>
<<that>>
</div>
";
function process_template ($template, $begin = '<<', $end = '>>') {
$begin_exploded = explode($begin, $template);
if (is_array($begin_exploded)) {
foreach ($begin_exploded as $key1 => $value1) {
$end_exploded = explode($end, $value1);
if (is_array($end_exploded)) {
foreach ($end_exploded as $key2 => $value2) {
$tag = $begin.$value2.$end;
$variable = trim($value2);
$find_it = strpos($template,$tag);
if ($find_it !== false) {
//str_replace ($tag, $MyClass->get($variable), $template );
$template = str_replace ($tag, $variable, $template);
}
}
}
}
}
return $template;
}
echo(process_template($customTemplate));
/* Will Echo
<div>
This
that
</div>
*/
?>
In the future I will connect $MyClass->get() to replace the tag with the proper data. And the custom template will be built by the user.
Rather than preg_split or preg_match I would rather use preg_replace_callback, since you are doing replacements, and the replacement value is derived from what looks like will end up being a method in another class.
function process_template($template, $begin = '<<', $end = '>>') {
// get $MyClass in the function scope somehow. Maybe pass it as another parameter?
return preg_replace_callback("/$begin(\w+)$end/", function($var) use ($MyClass) {
return $MyClass->get($var[1]);
}, $template);
}
Here's an example to play with: https://3v4l.org/N1p03
I assume this is just for fun/learning. If I really needed to use a template for something I would rather start with composer require "twig/twig:^2.0" instead. In fact, if you're interested in learning more about how it works you could go check out a well-established system like twig or blade does it. (Better than I've done it in this answer.)
There are tons templating engines around, but sometimes... just add complexity and dependencies for a maybe simple thing. This a modified sample of what I used for make some javascript corrections. This works for your template.
function process_template($html,$b='<<',$e='>>'){
$replace=['this'=>'<input name="this" />','that'=>'<input name="that" />'];
if(preg_match_all('/('.$b.')(.*?)('.$e.')/is',$html,$matches,PREG_SET_ORDER|PREG_OFFSET_CAPTURE)){
$t='';$o=0;
foreach($matches as $m){
//for reference $m[1][0] contains $b, $m[2][0] contains $e
$t.=substr($html,$o,$m[0][1]-$o);
$t.=$replace[$m[2][0]];
$o=$m[3][1]+strlen($m[3][0]);
}
$t.=substr($html,$o);
$html=$t;
}
return $html;
}
$html="
<div>
<<this>>
<<that>>
</div>
";
$new=process_template($html);
echo $new;
For demo purpose I put the array $replace that handling the substitutions. You replace those with your function that will handle the replacement.
Here is a working snippet: https://3v4l.org/MBnbR
I like this function because you have control of what to replace and what to put back on the final result. By the way by using the PREG_OFFSET_CAPTURE also return on the matches the position where the regexp groups happens. Those are on the $m[x][1]. The captured text will be on $m[x][0].

file_get_contents( - Fix relative urls

I am trying to display a website to a user, having downloaded it using php.
This is the script I am using:
<?php
$url = 'http://stackoverflow.com/pagecalledjohn.php';
//Download page
$site = file_get_contents($url);
//Fix relative URLs
$site = str_replace('src="','src="' . $url,$site);
$site = str_replace('url(','url(' . $url,$site);
//Display to user
echo $site;
?>
So far this script works a treat except for a few major problems with the str_replace function. The problem comes with relative urls. If we use an image on our made up pagecalledjohn.php of a cat (Something like this: ). It is a png and as I see it it can be placed on the page using 6 different urls:
1. src="//www.stackoverflow.com/cat.png"
2. src="http://www.stackoverflow.com/cat.png"
3. src="https://www.stackoverflow.com/cat.png"
4. src="somedirectory/cat.png"
4 is not applicable in this case but added anyway!
5. src="/cat.png"
6. src="cat.png"
Is there a way, using php, I can search for src=" and replace it with the url (filename removed) of the page being downloaded, but without sticking url in there if it is options 1,2 or 3 and change procedure slightly for 4,5 and 6?
Rather than trying to change every path reference in the source code, why don't you simply inject a <base> tag in your header to specifically indicate the base URL upon which all relative URL's should be calculated?
https://developer.mozilla.org/en-US/docs/Web/HTML/Element/base
This can be achieved using your DOM manipulation tool of choice. The example below would show how to do this using DOMDocument and related classes.
$target_domain = 'http://stackoverflow.com/';
$url = $target_domain . 'pagecalledjohn.php';
//Download page
$site = file_get_contents($url);
$dom = DOMDocument::loadHTML($site);
if($dom instanceof DOMDocument === false) {
// something went wrong in loading HTML to DOM Document
// provide error messaging and exit
}
// find <head> tag
$head_tag_list = $dom->getElementsByTagName('head');
// there should only be one <head> tag
if($head_tag_list->length !== 1) {
throw new Exception('Wow! The HTML is malformed without single head tag.');
}
$head_tag = $head_tag_list->item(0);
// find first child of head tag to later use in insertion
$head_has_children = $head_tag->hasChildNodes();
if($head_has_children) {
$head_tag_first_child = $head_tag->firstChild;
}
// create new <base> tag
$base_element = $dom->createElement('base');
$base_element->setAttribute('href', $target_domain);
// insert new base tag as first child to head tag
if($head_has_children) {
$base_node = $head_tag->insertBefore($base_element, $head_tag_first_child);
} else {
$base_node = $head_tag->appendChild($base_element);
}
echo $dom->saveHTML();
At the very minimum, it you truly want to modify all path references in the source code, I would HIGHLY recommend doing so with DOM manipulation tools (DOMDOcument, DOMXPath, etc.) rather than regex. I think you will find it a much more stable solution.
I don't know if I get your question completely right, if you want to deal with all text-sequences enclosed in src=" and ", the following pattern could make it:
~(\ssrc=")([^"]+)(")~
It has three capturing groups of which the second one contains the data you're interested in. The first and last are useful to change the whole match.
Now you can replace all instances with a callback function that is changing the places. I've created a simple string with all the 6 cases you've got:
$site = <<<BUFFER
1. src="//www.stackoverflow.com/cat.png"
2. src="http://www.stackoverflow.com/cat.png"
3. src="https://www.stackoverflow.com/cat.png"
4. src="somedirectory/cat.png"
5. src="/cat.png"
6. src="cat.png"
BUFFER;
Let's ignore for a moment that there are no surrounding HTML tags, you're not parsing HTML anyway I'm sure as you haven't asked for a HTML parser but for a regular expression. In the following example, the match in the middle (the URL) will be enclosed so that it's clear it matched:
So now to replace each of the links let's start lightly by just highlighting them in the string.
$pattern = '~(\ssrc=")([^"]+)(")~';
echo preg_replace_callback($pattern, function ($matches) {
return $matches[1] . ">>>" . $matches[2] . "<<<" . $matches[3];
}, $site);
The output for the example given then is:
1. src=">>>//www.stackoverflow.com/cat.png<<<"
2. src=">>>http://www.stackoverflow.com/cat.png<<<"
3. src=">>>https://www.stackoverflow.com/cat.png<<<"
4. src=">>>somedirectory/cat.png<<<"
5. src=">>>/cat.png<<<"
6. src=">>>cat.png<<<"
As the way of replacing the string is to be changed, it can be extracted, so it is easier to change:
$callback = function($method) {
return function ($matches) use ($method) {
return $matches[1] . $method($matches[2]) . $matches[3];
};
};
This function creates the replace callback based on a method of replacing you pass as parameter.
Such a replacement function could be:
$highlight = function($string) {
return ">>>$string<<<";
};
And it's called like the following:
$pattern = '~(\ssrc=")([^"]+)(")~';
echo preg_replace_callback($pattern, $callback($highlight), $site);
The output remains the same, this was just to illustrate how the extraction worked:
1. src=">>>//www.stackoverflow.com/cat.png<<<"
2. src=">>>http://www.stackoverflow.com/cat.png<<<"
3. src=">>>https://www.stackoverflow.com/cat.png<<<"
4. src=">>>somedirectory/cat.png<<<"
5. src=">>>/cat.png<<<"
6. src=">>>cat.png<<<"
The benefit of this is that for the replacement function, you only need to deal with the URL match as single string, not with regular expression matches array for the different groups.
Now to your second half of your question: How to replace this with the specific URL handling like removing the filename. This can be done by parsing the URL itself and remove the filename (basename) from the path component. Thanks to the extraction, you can put this into a simple function:
$removeFilename = function ($url) {
$url = new Net_URL2($url);
$base = basename($path = $url->getPath());
$url->setPath(substr($path, 0, -strlen($base)));
return $url;
};
This code makes use of Pear's Net_URL2 URL component (also available via Packagist and Github, your OS packages might have it, too). It can parse and modify URLs easily, so is nice to have for the job.
So now the replacement done with the new URL filename replacement function:
$pattern = '~(\ssrc=")([^"]+)(")~';
echo preg_replace_callback($pattern, $callback($removeFilename), $site);
And the result then is:
1. src="//www.stackoverflow.com/"
2. src="http://www.stackoverflow.com/"
3. src="https://www.stackoverflow.com/"
4. src="somedirectory/"
5. src="/"
6. src=""
Please note that this is exemplary. It shows how you can to it with regular expressions. You can however to it as well with a HTML parser. Let's make this an actual HTML fragment:
1. <img src="//www.stackoverflow.com/cat.png"/>
2. <img src="http://www.stackoverflow.com/cat.png"/>
3. <img src="https://www.stackoverflow.com/cat.png"/>
4. <img src="somedirectory/cat.png"/>
5. <img src="/cat.png"/>
6. <img src="cat.png"/>
And then process all <img> "src" attributes with the created replacement filter function:
$doc = new DOMDocument();
$saved = libxml_use_internal_errors(true);
$doc->loadHTML($site, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
libxml_use_internal_errors($saved);
$srcs = (new DOMXPath($doc))->query('//img/#hsrc') ?: [];
foreach ($srcs as $src) {
$src->nodeValue = $removeFilename($src->nodeValue);
}
echo $doc->saveHTML();
The result then again is:
1. <img src="//www.stackoverflow.com/cat.png">
2. <img src="http://www.stackoverflow.com/cat.png">
3. <img src="https://www.stackoverflow.com/cat.png">
4. <img src="somedirectory/cat.png">
5. <img src="/cat.png">
6. <img src="cat.png">
Just a different way of parsing has been used - the replacement still is the same. Just to offer two different ways that are also the same in part.
I suggest doing it in more steps.
In order to not complicate the solution, let's assume that any src value is always an image (it could as well be something else, e.g. a script).
Also, let's assume that there are no spaces, between equals sign and quotes (this can be fixed easily if there are). Finally, let's assume that the file name does not contain any escaped quotes (if it did, regexp would be more complicated).
So you'd use the following regexp to find all image references:
src="([^"]*)". (Also, this does not cover the case, where src is enclosed into single quotes. But it is easy to create a similar regexp for that.)
However, the processing logic could be done with preg_replace_callback function, instead of str_replace. You can provide a callback to this function, where each url can be processed, based on its contents.
So you could do something like this (not tested!):
$site = preg_replace_callback(
'src="([^"]*)"',
function ($src) {
$url = $src[1];
$ret = "";
if (preg_match("^//", $url)) {
// case 1.
$ret = "src='" . $url . '"';
}
else if (preg_match("^https?://", $url)) {
// case 2. and 3.
$ret = "src='" . $url . '"';
}
else {
// case 4., 5., 6.
$ret = "src='http://your.site.com.com/" . $url . '"';
}
return $ret;
},
$site
);

identify and execute php code on a string

I would like to know if it's possible to execute the php code in a string. I mean if I have:
$string = If i say <?php echo 'lala';?> I wanna get "<?php echo 'dada'; ?>";
Does anybody knows how?
[EDIT] It looks like nobody understood. I wanna save a string like
$string = If i say <?php count(array('lala'));?>
in a database and then render it. I can do it using
function render_php($string){
ob_start();
eval('?>' . $string);
$string = ob_get_contents();
ob_end_clean();
return $string;
}
The problem is that I does not reconize php code into "" (quotes) like
I say "<?php echo 'dada'; ?>"
$string = ($test === TRUE) ? 'lala' : 'falala';
There are lots of ways to do what it looks like you're trying to do (if I'm reading what you wrote correctly). The above is a ternary. If the condition evaluates to true then $string will be set to 'lala' else set to 'falala'.
If you're literally asking what you wrote, then use the eval() function. It takes a passed string and executes it as if it were php code. Don't include the <?php ?> tags.
function dropAllTables() {
// drop all tables in db
}
$string = 'dropAllTables();';
eval($string); // will execute the dropAllTables() function
[edit]
You can use the following regular expression to find all the php code:
preg_match_all('/(<\?php )(.+?)( \?>)/', $string, $php_code, PREG_OFFSET_CAPTURE);
$php_code will be an array where $php_code[0] will return an array of all the matches with the code + <?php ?> tags. $php_code[2] will be an array with just the code to execute.
So,
$string = "array has <?php count(array('lala')); ?> 1 member <?php count(array('falala')); ?>";
preg_match_all('/(<\?php )(.+?)( \?>)/', $string, $php_code, PREG_OFFSET_CAPTURE);
echo $php_code[0][0][0]; // <?php count(array('lala')); ?>
echo $php_code[2][0][0]; // count(array('lala'));
This should be helpful for what you want to do.
Looks like you are trying to concatenate. Use the concatenation operator "."
$string = "if i say " . $lala . " I wanna get " . $dada;
or
$string = "if i say {$lala} I wanna get {$dada}.";
That is what I get since your string looks to be a php variable.
EDIT:
<?php ?> is used when you want to tell the PHP interpreter that the code in those brackets should be interpreted as PHP. When working within those PHP brackets you do not need to include them again. So as you would just do this:
// You create a string:
$myString = "This is my string.";
// You decide you want to add something to it.
$myString .= getMyNameFunction(); // not $myString .= <?php getMyNameFunction() ?>;
The string is created, then the results of getMyNameFunction() are appended to it. Now if you declared the $myString variable at the top of your page, and wanted to use it later you would do this:
<span id="myString"><?php echo $myString; ?></span>
This would tell the interpreter to add the contents of the $myString variable between the tags.
Use token_get_all() on the string, then look for a T_OPEN_TAG token, start copying from there, look for a T_CLOSE_TAG token and stop there. The string between the token next to T_OPEN_TAG and until the token right before T_CLOSE_TAG is your PHP code.
This is fast and cannot fail, since it uses PHP's tokenizer to parse the string. You will always find the bits of PHP code inside the string, even if the string contains comments or other strings which might contain ?> or any other related substrings that will confuse regular expressions or a hand-written, slow, pure PHP parser.
I would consider not storing your PHP code blocks in a database and evaluating them using eval. There is usually a better solution. Read about Design Pattern, OOP, Polymorphism.
You could use the eval() function.

Need to extract special tags and replace them based upon their contents using regular expression

I'm working on a simple templating system. Basically I'm setting it up such that a user would enter text populated with special tags of the form: <== variableName ==>
When the system would display the text it would search for all tags of the form mentioned and replace the variableName with its corresponding value from a database result.
I think this would require a regular expression but I'm really messed up in REGEX here. I'm using php btw.
Thanks for the help guys.
A rather quick and dirty hack here:
<?php
$teststring = "Hello <== tag ==>";
$values = array();
$values['tag'] = "world";
function replaceTag($name)
{
global $values;
return $values[$name];
}
echo preg_replace('/<== ([a-z]*) ==>/e','replaceTag(\'$1\')',$teststring);
Output:
Hello world
Simply place your 'variables' in the variable array and they will be replaced.
The e modifier to the regular expression tells it to eval the replacement, the [a-z] lets you name the "variables" using the characters a-z (you could use [a-z0-9] if you wanted to include numbers). Other than that its pretty much standard PHP.
Very useful - Pointed me to what I was looking for...
Replacing tags in a template e.g.
<<page_title>>, <<meta_description>>
with corresponding request variables e,g,
$_REQUEST['page_title'], $_REQUEST['meta_description'],
using a modified version of the code posted:
$html_output=preg_replace('/<<(\w+)>>/e', '$_REQUEST[\'$1\']', $template);
Easy to change this to replace template tags with values from a DB etc...
If you are doing a simple replace, then you don't need to use a regexp. You can just use str_replace() which is quicker.
(I'm assuming your '<== ' and ' ==>' are delimiting your template var and are replaced with your value?)
$subject = str_replace('<== '.$varName.' ==>', $varValue, $subject);
And to cycle through all your template vars...
$tplVars = array();
$tplVars['ONE'] = 'This is One';
$tplVars['TWO'] = 'This is Two';
// etc.
// $subject is your original document
foreach ($tplVars as $varName => $varValue) {
$subject = str_replace('<== '.$varName.' ==>', $varValue, $subject);
}

Replace 'keys' in file with DB data

Currently I have a the following way of retrieving data from my DB:
$school->get('studentCount');
I required a shortcut to access these fields within the page and so came up with a format like this:
<p>Blah blah blah [[studentCount]]</p>
I have output buffering turned on but just need an easy way of replacing that key ('[[field-name]]') with its corresponding data from the DB.
If it was just one field I could do a str_replace on the output like this:
str_replace($output, '[[studentCount]]', $school->get('studentCount'))
Unfortunately that's not suitable. My ideal solution would grab whatever is between '[[' and ']]' and then run the 'get' method and replace the entire key ('[[...]]') with whatever is returned.
Well you could create two arrays, one with the field-name strings [[field-name]] and one with the responses $school->get('field-name'). Then throw those in str_replace as it supports arrays.
Example from PHP Manual:
$phrase = "You should eat fruits, vegetables, and fiber every day.";
$healthy = array("fruits", "vegetables", "fiber");
$yummy = array("pizza", "beer", "ice cream");
$newphrase = str_replace($healthy, $yummy, $phrase);
// Resulting String: "You should eat pizza, beer, and ice cream every day."
If you still wanted to implement your suggestion (finding all [[]]s and replacing them), I'll try to write up a quick function.
Edit: Here are two methods of doing it via your request:
$html = "Hello, [[FirstName]]! Welcome to [[SiteName]].";
$count = preg_match_all("/\[\[([\w]+)\]\]/", $html, $matches);
for ($x = 0; $x < $count; $x++)
$html = str_replace($matches[0][$x], $school->get($matches[1][$x]), $html);
Or using arrays:
$html = "Hello, [[FirstName]]! Welcome to [[SiteName]].";
$count = preg_match_all("/\[\[([\w]+)\]\]/", $html, $matches);
for ($x = 0; $x < $count; $x++)
$matches[1][$x] = $school->get($matches[1][$x]);
$html = str_replace($matches[0], $matches[1], $html);
I'm pretty sure this will work. :)
<?php
// $output contains the string
preg_match_all('/\[{2}([^\[]+)\]{2}/', $output, $matches);
$replaces = $matches['1'];
foreach($replaces as $replace) $str = str_replace('[['.$replace.']]', $school->get($replace), $output);
?>
You will need to use regex to find things inside of two [[ and ]] and take that an insert what is in between into your ->get() function.
Function would be preg_replace
http://us2.php.net/preg-replace
Assuming you can cache the result, a regex and file cache is a great method to do this. First you convert the file:
function cache_it($filename, $tablvar) {
$tmplt = file_get_contents($filename);
$tmplt = preg_replace('/\[\[(.+)\]\]/',
'<?php echo $' . $tablevar . '->get(\1);?>',
$tmplt);
file_put_contents($filename . '.php', $tmplt);
}
Then whenever you need to access the file.
function print_it($filename, $tablevar, $table) {
$_GLOBAL[$tablevar] = $table;
include $filename . '.php';
unset($_GLOBAL[$tablevar]);
}
You probably want to check that the cached file's create date is greater than the last modify date of the source file. Wrapping that and the two functions above in class helps avoid a lot of little pitfalls. But the general idea is sound. There are also some security issues with the cache file being a .php file that you would need to address.
I added this style template caching to the OSS CMS I work on. By caching the regex results, we sped up the original code by over 50%. The real benefit is the templates are PHP files. So anything that speeds the interpreting of PHP files (APC, eAccelerator, etc) speeds up your templates too.
You need to write a parser to look through the $output to find what is between your delimiters and then call a function if it is defined.
I assume you want to do it this way to save the calls until they are needed.
I wrote a template parser that effectively worked this way. Unfortunately it wasn't in PHP.
Thanks for the responses so far. I did think about using str/preg _replace with arrays but I wanted the key ('[[...]]') to directly tie in with the 'get' method, so it's fully expandable. I don't want to have to add to two different arrays every time a new field is added to the DB.
In JavaScript, for example, I would achieve it like this: (JavaScript allows you to pass an anonymous function as the 'replacement' parameter in its equivalent of preg_replace):
('this is the output blah blah [[studentCount]]').replace(/\[{2}([^\[]+)\]{2}/g, function($0, $1) {
get($1);
})

Categories