I am trying to use this simple BBcode parser shown below, but I am not sure how to actually make it work on my webpage. I have used previously some lines which have used some functions that are not recognised. Such as:
require_once('parser.php'); // path to Recruiting Parsers' file
$parser = new parser; // start up Recruiting Parsers
$parsed = $parser-> p($mytext); // p() is function which parses
Where the p() function is not recognised and hence, nothing is parsed. I am using a text editor but it outputs bbcode, which I am trying to convert back into html. Do you know what code I should use so that it would parse? I am not a developer so this is all very strange.
Here is the perser.php:
<?php
function bbcodeParser($bbcode){
/* bbCode Parser
*Syntax: bbcodeParser(bbcode)
*/
/* Matching codes */
$urlmatch = "([a-zA-Z]+[:\/\/]+[A-Za-z0-9\-_]+\\.+[A-Za-z0-9\.\/%&=\?\-_]+)";
/* Basically remove HTML tag's functionality */
$bbcode = htmlspecialchars($bbcode);
/* Replace "special character" with it's unicode equivilant */
$match["special"] = "/\�/s";
$replace["special"] = '�';
/* Bold text */
$match["b"] = "/\[b\](.*?)\[\/b\]/is";
$replace["b"] = "<b>$1</b>";
/*many other properties as before: italics, colours, fonts etc.*/
/* Parse */
$bbcode = preg_replace($match, $replace, $bbcode);
/* New line to <br> tag */
$bbcode=nl2br($bbcode);
/* Code blocks - Need to specially remove breaks */
function pre_special($matches)
{
$prep = preg_replace("/\<br \/\>/","",$matches[1]);
return "�<pre>$prep</pre>�";
}
$bbcode = preg_replace_callback("/\[code\](.*?)\[\/code\]/ism","pre_special",$bbcode);
/* Remove <br> tags before quotes and code blocks */
$bbcode=str_replace("�<br />","",$bbcode);
$bbcode=str_replace("�","",$bbcode); //Clean up any special characters that got misplaced...
/* Return parsed contents */
return $bbcode;
}
?>
Have you tried replacing your p() function with bbcodeParser()? Looks like if you do this, it should work as expected:
require_once('parser.php'); // path to Recruiting Parsers' file
$parsed = bbcodeParser($mytext); // bbcodeParser() is function which parses
Related
I'm using Twig's markdown_to_html filter, and it works very well.
However, in some use cases, I'd want it to generate HTML, but without the paragraph tags.
For instance, from this Markdown content:
Hello, this is **some Markdown**
I want the exported HTML to be:
Hello, this is <strong>some Markdown</strong>
But the result is currently:
<p>Hello, this is <strong>some Markdown</strong></p>
I looked into the filter's source and didn't se any option to do so.
Is there a way to do this, or should I create my own Twig filter?
I'd prefer to avoid the striptags filter if possible, because I don't want to list all the tags I'll allow (unless there is a reverse striptags where you can specify the tags you want removed ?)
It looks like you're using league/commonmark which has an "Inlines Only" extension for this exact purpose! It will avoid outputting block-level elements like paragraphs, headers, etc. - only things like emphasis and links would be rendered as HTML.
To use it, construct your Markdown converter like this:
<?php
use League\CommonMark\Environment\Environment;
use League\CommonMark\Extension\InlinesOnly\InlinesOnlyExtension;
use League\CommonMark\MarkdownConverter;
// Define your configuration, if needed
$config = [];
// Create a new, empty environment
$environment = new Environment($config);
// Add this extension
$environment->addExtension(new InlinesOnlyExtension());
// Instantiate the converter engine and start converting some Markdown!
$converter = new MarkdownConverter($environment);
echo $converter->convert('Hello, this is **some Markdown**');
This will be more reliable than parsing HTML with regular expressions.
(I'm the maintainer of league/commonmark and would be happy to answer any follow-up questions you might have in the comments)
I updated my filter, following Colin O'Dells answer. This way, it is more robust, and it will allow the usage or creation of more CommonMark extensions in the future.
# src\Twig\CustomTwigExtension.php
<?php
declare(strict_types=1);
namespace App\Twig;
use League\CommonMark\Environment\Environment;
use League\CommonMark\Extension\CommonMark\CommonMarkCoreExtension;
use League\CommonMark\Extension\ConfigurableExtensionInterface;
use League\CommonMark\Extension\InlinesOnly\InlinesOnlyExtension;
use League\CommonMark\MarkdownConverter;
use Twig\Extension\AbstractExtension;
use Twig\Markup;
use Twig\TwigFilter;
use Webmozart\Assert\Assert;
class CustomTwigExtension extends AbstractExtension
{
private const CONFIG = [
'default' => [CommonMarkCoreExtension::class],
'inline' => [InlinesOnlyExtension::class],
];
/**
* #return TwigFilter[]
*/
public function getFilters(): array
{
return [
new TwigFilter('custom_markdown', [$this, 'customMarkdown']),
];
}
public function customMarkdown(string $str, string $configName = 'default'): Markup
{
$env = new Environment();
foreach ($this->getExtensions($configName) as $extension) {
$env->addExtension($extension);
}
$converter = new MarkdownConverter($env);
$html = $converter->convert($str)->getContent();
return new Markup($html, 'UTF-8');
}
/**
* #return ConfigurableExtensionInterface[]
*/
private function getExtensions(string $configName): array
{
Assert::keyExists(self::CONFIG, $configName);
$extensions = [];
foreach (self::CONFIG[$configName] as $extension) {
$extensions[] = new $extension();
}
return $extensions;
}
}
It is called like this in the templates:
{{ markdown_content|custom_markdown }} {# to use the default markdown configuration #}
{{ markdown_content|custom_markdown('inline') }} {# to remove all paragraph tags from the result #}
I'm required to create a simple template engine; I can't use Twig or Smarty, etc. because the designer on the project needs to be able to just copy/paste her HTML into the template with no configuration, muss/fuss, whatever. It's gotta be really easy.
So I created something that will allow her to do just that, by placing her content between {{ CONTENT }} {{ !CONTENT }} tags.
My only problem is that I want to make sure that if she uses multiple spaces in the tags - or NO spaces - it won't break; i.e. {{ CONTENT }} or {{CONTENT}}
What I have below accomplishes this, but I'm afraid it may be overkill. Anybody know a way to simplify this function?
function defineContent($tag, $string) {
$offset = strlen($tag) + 6;
// add a space to our tags if none exist
$string = str_replace('{{'.$tag, '{{ '.$tag, $string);
$string = str_replace($tag.'}}', $tag.' }}', $string);
// strip consecutive spaces
$string = preg_replace('/\s+/', ' ', $string);
// now that extra spaces have been stripped, we're left with this
// {{ CONTENT }} My content goes here {{ !CONTENT }}
// remove the template tags
$return = substr($string, strpos($string, '{{ '.$tag.' }}') + $offset);
$return = substr($return, 0, strpos($return, '{{ !'.$tag.' }}'));
return $return;
}
// here's the string
$string = '{{ CONTENT }} My content goes here {{ !CONTENT }}';
// run it through the function
$content = defineContent('CONTENT', $string);
echo $content;
// gives us this...
My content goes here
EDIT
Ended up creating a repo, for anyone interested.
https://github.com/timgavin/tinyTemplate
I would suggest to take a look at variable extraction into the template scope.
It's a bit easier to maintain and less overhead, than the replace approach and its often easier to use for the designer. In its basic form, its just PHP variables and short tags.
It depends on which side you generate, e.g. a table and its rows (or complete content blocks) - it could be just <?=$table?> ;) Less work for the designer, more work for you. Or just provide a few rendering examples and helpers, because copy/pasting examples should always work, even with an untrained designer.
Template
The template is just HTML mixed with <?=$variable?> - uncluttered.
src/Templates/Article.php
<html>
<body>
<h1><?=$title?></h1>
<div><?=$content?></div>
</body>
</html>
Usage
src/Controller/Article.php
...
// initalize
$view = new View;
// assign
$view->data['title'] = 'The title';
$view->data['content'] = 'The body';
// render
$view->render(dirname(__DIR__) . '/Templates/Article.php');
View / TemplateRenderer
The core function here is render(). The template file is included and the variable extraction happens in a closure to avoid any variable clashes/scope problems.
src/View.php
class View
{
/**
* Set data from controller: $view->data['variable'] = 'value';
* #var array
*/
public $data = [];
/**
* #var sting Path to template file.
*/
function render($template)
{
if (!is_file($template)) {
throw new \RuntimeException('Template not found: ' . $template);
}
// define a closure with a scope for the variable extraction
$result = function($file, array $data = array()) {
ob_start();
extract($data, EXTR_SKIP);
try {
include $file;
} catch (\Exception $e) {
ob_end_clean();
throw $e;
}
return ob_get_clean();
};
// call the closure
echo $result($template, $this->data);
}
}
Answering specifically what you asked:
My only problem is that I want to make sure that if she uses multiple spaces in the tags - or NO spaces - it won't break
What I have below accomplishes this, but I'm afraid it may be overkill. Anybody know a way to simplify this function?
... the only "slow" part of your function is the preg_replace. Use trim instead, for a very slight increase in speed. Otherwise, don't worry about it. There's no magic PHP command to do what you're looking to do.
I have a CMS where users can create and edit their own content in their websites. I also provide the possibility to include forms and galleries by simply replacing specific Div's in their content.
In the past I simply exploded the content on these Div's to an array, replaced the whole Div's with the needed html code (by using PHP's include) to show the form or gallery at that exact position, imploded the whole array to a string again (html) and used in the website.
Now I am trying to achieve the same in Laravel 5:
// example plugins div in HTML
// ******************************
<div class="plugin form"></div>
// PageController.php
// ******************************
$page = Page::where(('url', '=', "home")->first();
$page->text = Helpers::getPlugins($page->text);
// Helpers.php (a non default custom class with functions)
// ******************************
public static function getPlugins($page)
{
$dom = new DOMDocument();
$dom->loadHTML($page, LIBXML_HTML_NOIMPLIED);
$x = $dom->getElementsByTagName("div");
foreach ($x as $node)
{
if (strstr($node->getAttribute('class'), "plugin"))
{
$plugin = explode(" ",$node->getAttribute('class'));
$filename = base_path() . "/resources/views/plugins/" . trim($plugin[1]) . ".blade.php";
if (is_file($filename))
{
ob_start();
include($filename);
ob_get_contents();
$node->nodeValue = ob_get_clean();
}
else
{
$node->nodeValue = "Plugin <strong>".$node->getAttribute('class')."</strong> Not found</div>";
}
}
}
return $dom->saveHTML();
}
Sofar so good, the content is returned but what I get is all the pure text blade markup instead of the Laravel generated html which I want to use.
I think there is a way this could work but I cannot come to think of it.
Try manually building the template by using the method BladeCompiler->compile(), read more here
Edit: I think the facade Blade::compile() will give you access to this function too, just add use Blade at the top of the file.
I'm trying to pipe my incoming mails to a PHP script so I can store them in a database and other things. I'm using the class MIME E-mail message parser (registration required) although I don't think that's important.
I have a problem with email subjects. It works fine when the title is in English but if the subject uses non-latin Characters I get something like
=?UTF-8?B?2KLYstmF2KfbjNi0?=
for a title like
یک دو سه
I decode the subject like this:
$subject = str_replace('=?UTF-8?B?' , '' , $subject);
$subject = str_replace('?=' , '' , $subject);
$subject = base64_decode($subject);
It works fine with short subjects with like 10-15 characters but with a longer title I get half of the original title with something like ��� at the end.
If the title is even longer, like 30 characters, I get nothing. Am I doing this right?
You can use the mb_decode_mimeheader() function to decode your string.
Despite the fact that this is almost a year old - I found this and am facing a similar problem.
I'm unsure why you're getting odd characters, but perhaps you are trying to display them somewhere your charset is unsupported.
Here's some code I wrote which should handle everything except the charset conversion, which is a large problem that many libraries handle much better. (PHP's MB library, for instance)
class mail {
/**
* If you change one of these, please check the other for fixes as well
*
* #const Pattern to match RFC 2047 charset encodings in mail headers
*/
const rfc2047header = '/=\?([^ ?]+)\?([BQbq])\?([^ ?]+)\?=/';
const rfc2047header_spaces = '/(=\?[^ ?]+\?[BQbq]\?[^ ?]+\?=)\s+(=\?[^ ?]+\?[BQbq]\?[^ ?]+\?=)/';
/**
* http://www.rfc-archive.org/getrfc.php?rfc=2047
*
* =?<charset>?<encoding>?<data>?=
*
* #param string $header
*/
public static function is_encoded_header($header) {
// e.g. =?utf-8?q?Re=3a=20Support=3a=204D09EE9A=20=2d=20Re=3a=20Support=3a=204D078032=20=2d=20Wordpress=20Plugin?=
// e.g. =?utf-8?q?Wordpress=20Plugin?=
return preg_match(self::rfc2047header, $header) !== 0;
}
public static function header_charsets($header) {
$matches = null;
if (!preg_match_all(self::rfc2047header, $header, $matches, PREG_PATTERN_ORDER)) {
return array();
}
return array_map('strtoupper', $matches[1]);
}
public static function decode_header($header) {
$matches = null;
/* Repair instances where two encodings are together and separated by a space (strip the spaces) */
$header = preg_replace(self::rfc2047header_spaces, "$1$2", $header);
/* Now see if any encodings exist and match them */
if (!preg_match_all(self::rfc2047header, $header, $matches, PREG_SET_ORDER)) {
return $header;
}
foreach ($matches as $header_match) {
list($match, $charset, $encoding, $data) = $header_match;
$encoding = strtoupper($encoding);
switch ($encoding) {
case 'B':
$data = base64_decode($data);
break;
case 'Q':
$data = quoted_printable_decode(str_replace("_", " ", $data));
break;
default:
throw new Exception("preg_match_all is busted: didn't find B or Q in encoding $header");
}
// This part needs to handle every charset
switch (strtoupper($charset)) {
case "UTF-8":
break;
default:
/* Here's where you should handle other character sets! */
throw new Exception("Unknown charset in header - time to write some code.");
}
$header = str_replace($match, $data, $header);
}
return $header;
}
}
When run through a script and displayed in a browser using UTF-8, the result is:
آزمایش
You would run it like so:
$decoded = mail::decode_header("=?UTF-8?B?2KLYstmF2KfbjNi0?=");
Use php native function
<?php
mb_decode_mimeheader($text);
?>
This function can handle utf8 as well as iso-8859-1 string.
I have tested it.
Use php function:
<?php
imap_utf8($text);
?>
Just to add yet one more way to do this (or if you don't have the mbstring extension installed but do have iconv):
iconv_mime_decode($str, ICONV_MIME_DECODE_CONTINUE_ON_ERROR, 'UTF-8')
Would the imap-mime-header-decode function help here?
Found myself in a similar situation today.
http://www.php.net/manual/en/function.imap-mime-header-decode.php
I had made a personal note software in PHP so I can store and organize my notes and wished for a nice simple format to write them in.
I had done it in Markdown but found it was a little confusing and there was no simple syntax highlighting, so I did bbcode before and wished to implement that.
Now for GeSHi which I really wish to implement (the syntax highlighter), it requires the most simple code like this:
$geshi = new GeSHi($sourcecode, $language);
$geshi->parse_code();
Now this is the easy part , but what I wish to do is allow my bbcode to call it.
My current regular expression to match a made up [syntax=cpp][/syntax] bbcode is the following:
preg_replace('#\[syntax=(.*?)\](.*?)\[/syntax\]#si' , 'geshi(\\2,\\1)????', text);
You will notice I capture the language and the content, how on earth would I connect it to the GeSHi code?
preg_replace seems to just be able to replace it with a string not an 'expression', I am not sure how to use those two lines of code for GeSHi up there with the captured data..
I really am excited about this project and wish to overcome this.
I wrote this class a while back, the reason for the class was to allow easy customization / parsing. Maybe a little overkill, but works well and I needed it overkill for my application. The usage is pretty simple:
$geshiH = new Geshi_Helper();
$text = $geshiH->geshi($text); // this assumes that the text should be parsed (ie inline syntaxes)
---- OR ----
$geshiH = new Geshi_Helper();
$text = $geshiH->geshi($text, $lang); // assumes that you have the language, good for a snippets deal
I had to do some chopping from other custom items I had, but pending no syntax errors from the chopping it should work. Feel free to use it.
<?php
require_once 'Geshi/geshi.php';
class Geshi_Helper
{
/**
* #var array Array of matches from the code block.
*/
private $_codeMatches = array();
private $_token = "";
private $_count = 1;
public function __construct()
{
/* Generate a unique hash token for replacement) */
$this->_token = md5(time() . rand(9999,9999999));
}
/**
* Performs syntax highlights using geshi library to the content.
*
* #param string $content - The context to parse
* #return string Syntax Highlighted content
*/
public function geshi($content, $lang=null)
{
if (!is_null($lang)) {
/* Given the returned results 0 is not set, adding the "" should make this compatible */
$content = $this->_highlightSyntax(array("", strtolower($lang), $content));
}else {
/* Need to replace this prior to the code replace for nobbc */
$content = preg_replace('~\[nobbc\](.+?)\[/nobbc\]~ie', '\'[nobbc]\' . strtr(\'$1\', array(\'[\' => \'[\', \']\' => \']\', \':\' => \':\', \'#\' => \'#\')) . \'[/nobbc]\'', $content);
/* For multiple content we have to handle the br's, hence the replacement filters */
$content = $this->_preFilter($content);
/* Reverse the nobbc markup */
$content = preg_replace('~\[nobbc\](.+?)\[/nobbc\]~ie', 'strtr(\'$1\', array(\'[\' => \'[\', \']\' => \']\', \':\' => \':\', \'@\' => \'#\'))', $content);
$content = $this->_postFilter($content);
}
return $content;
}
/**
* Performs syntax highlights using geshi library to the content.
* If it is unknown the number of blocks, use highlightContent
* instead.
*
* #param string $content - The code block to parse
* #param string $language - The language to highlight with
* #return string Syntax Highlighted content
* #todo Add any extra / customization styling here.
*/
private function _highlightSyntax($contentArray)
{
$codeCount = $contentArray[1];
/* If the count is 2 we are working with the filter */
if (count($contentArray) == 2) {
$contentArray = $this->_codeMatches[$contentArray[1]];
}
/* for default [syntax] */
if ($contentArray[1] == "")
$contentArray[1] = "php";
/* Grab the language */
$language = (isset($contentArray[1]))?$contentArray[1]:'text';
/* Remove leading spaces to avoid problems */
$content = ltrim($contentArray[2]);
/* Parse the code to be highlighted */
$geshi = new GeSHi($content, strtolower($language));
return $geshi->parse_code();
}
/**
* Substitute the code blocks for formatting to be done without
* messing up the code.
*
* #param array $match - Referenced array of items to substitute
* #return string Substituted content
*/
private function _substitute(&$match)
{
$index = sprintf("%02d", $this->_count++);
$this->_codeMatches[$index] = $match;
return "----" . $this->_token . $index . "----";
}
/**
* Removes the code from the rest of the content to apply other filters.
*
* #param string $content - The content to filter out the code lines
* #return string Content with code removed.
*/
private function _preFilter($content)
{
return preg_replace_callback("#\s*\[syntax=(.*?)\](.*?)\[/syntax\]\s*#siU", array($this, "_substitute"), $content);
}
/**
* Replaces the code after the filters have been ran.
*
* #param string $content - The content to replace the code lines
* #return string Content with code re-applied.
*/
private function _postFilter($content)
{
/* using dashes to prevent the old filtered tag being escaped */
return preg_replace_callback("/----\s*" . $this->_token . "(\d{2})\s*----/si", array($this, "_highlightSyntax"), $content);
}
}
?>
It looks to me like you already got the regex right. Your problem lies in the invocation, so I suggest making a wrapper function:
function geshi($src, $l) {
$geshi = new GeSHi($sourcecode, $language);
$geshi->parse_code();
return $geshi->how_do_I_get_the_results();
}
Now this would normally suffice, but the source code is likely to contain single or dobule quotes itself. Therefore you cannot write preg_replace(".../e", "geshi('$2','$1')", ...) as you would need. (Note that '$1' and '$2' need quotes because preg_replace just substitutes the $1,$2 placeholders, but this needs to be valid php inline code).
That's why you need to use preg_replace_callback to avoid escaping issues in the /e exec replacement code.
So for example:
preg_replace_callback('#\[syntax=(.*?)\](.*?)\[/syntax\]#si' , 'geshi_replace', $text);
And I'd make a second wrapper, but you can combine it with the original code:
function geshi_replace($uu) {
return geshi($uu[2], $uu[1]);
}
Use preg_match:
$match = preg_match('#\[syntax=(.*?)\](.*?)\[/syntax\]#si', $text);
$geshi = new GeSHi($match[2], $match[1]);