Parse Wordpress like Shortcode - php

I want to parse shortcode like Wordpress with attributes:
Input:
[include file="header.html"]
I need output as array, function name "include" and attributes with values as well , any help will be appreciated.
Thanks

Here's a utility class that we used on our project
It will match all shortcodes in a string (including html) and it will output an associative array including their name, attributes and content
final class Parser {
// Regex101 reference: https://regex101.com/r/pJ7lO1
const SHORTOCODE_REGEXP = "/(?P<shortcode>(?:(?:\\s?\\[))(?P<name>[\\w\\-]{3,})(?:\\s(?P<attrs>[\\w\\d,\\s=\\\"\\'\\-\\+\\#\\%\\!\\~\\`\\&\\.\\s\\:\\/\\?\\|]+))?(?:\\])(?:(?P<content>[\\w\\d\\,\\!\\#\\#\\$\\%\\^\\&\\*\\(\\\\)\\s\\=\\\"\\'\\-\\+\\&\\.\\s\\:\\/\\?\\|\\<\\>]+)(?:\\[\\/[\\w\\-\\_]+\\]))?)/u";
// Regex101 reference: https://regex101.com/r/sZ7wP0
const ATTRIBUTE_REGEXP = "/(?<name>\\S+)=[\"']?(?P<value>(?:.(?![\"']?\\s+(?:\\S+)=|[>\"']))+.)[\"']?/u";
public static function parse_shortcodes($text) {
preg_match_all(self::SHORTOCODE_REGEXP, $text, $matches, PREG_SET_ORDER);
$shortcodes = array();
foreach ($matches as $i => $value) {
$shortcodes[$i]['shortcode'] = $value['shortcode'];
$shortcodes[$i]['name'] = $value['name'];
if (isset($value['attrs'])) {
$attrs = self::parse_attrs($value['attrs']);
$shortcodes[$i]['attrs'] = $attrs;
}
if (isset($value['content'])) {
$shortcodes[$i]['content'] = $value['content'];
}
}
return $shortcodes;
}
private static function parse_attrs($attrs) {
preg_match_all(self::ATTRIBUTE_REGEXP, $attrs, $matches, PREG_SET_ORDER);
$attributes = array();
foreach ($matches as $i => $value) {
$key = $value['name'];
$attributes[$i][$key] = $value['value'];
}
return $attributes;
}
}
print_r(Parser::parse_shortcodes('[include file="header.html"]'));
Output:
Array
(
[0] => Array
(
[shortcode] => [include file="header.html"]
[name] => include
[attrs] => Array
(
[0] => Array
(
[file] => header.html
)
)
)
)

Using this function
$code = '[include file="header.html"]';
$innerCode = GetBetween($code, '[', ']');
$innerCodeParts = explode(' ', $innerCode);
$command = $innerCodeParts[0];
$attributeAndValue = $innerCodeParts[1];
$attributeParts = explode('=', $attributeAndValue);
$attribute = $attributeParts[0];
$attributeValue = str_replace('"', '', $attributeParts[1]);
echo $command . ' ' . $attribute . '=' . $attributeValue;
//this will result in include file=header.html
$command will be "include"
$attribute will be "file"
$attributeValue will be "header.html"

I also needed this functionality in my PHP framework. This is what I've written, it works pretty well. It works with anonymous functions, which I really like (it's a bit like the callback functions in JavaScript).
<?php
//The content which should be parsed
$content = '<p>Hello, my name is John an my age is [calc-age day="4" month="10" year="1991"].</p>';
$content .= '<p>Hello, my name is Carol an my age is [calc-age day="26" month="11" year="1996"].</p>';
//The array with all the shortcode handlers. This is just a regular associative array with anonymous functions as values. A very cool new feature in PHP, just like callbacks in JavaScript or delegates in C#.
$shortcodes = array(
"calc-age" => function($data){
$content = "";
//Calculate the age
if(isset($data["day"], $data["month"], $data["year"])){
$age = date("Y") - $data["year"];
if(date("m") < $data["month"]){
$age--;
}
if(date("m") == $data["month"] && date("d") < $data["day"]){
$age--;
}
$content = $age;
}
return $content;
}
);
//http://stackoverflow.com/questions/18196159/regex-extract-variables-from-shortcode
function handleShortcodes($content, $shortcodes){
//Loop through all shortcodes
foreach($shortcodes as $key => $function){
$dat = array();
preg_match_all("/\[".$key." (.+?)\]/", $content, $dat);
if(count($dat) > 0 && $dat[0] != array() && isset($dat[1])){
$i = 0;
$actual_string = $dat[0];
foreach($dat[1] as $temp){
$temp = explode(" ", $temp);
$params = array();
foreach ($temp as $d){
list($opt, $val) = explode("=", $d);
$params[$opt] = trim($val, '"');
}
$content = str_replace($actual_string[$i], $function($params), $content);
$i++;
}
}
}
return $content;
}
echo handleShortcodes($content, $shortcodes);
?>
The result:
Hello, my name is John an my age is 22.
Hello, my name is Carol an my age is 17.

This is actually tougher than it might appear on the surface. Andrew's answer works, but begins to break down if square brackets appear in the source text [like this, for example]. WordPress works by pre-registering a list of valid shortcodes, and only acting on text inside brackets if it matches one of these predefined values. That way it doesn't mangle any regular text that might just happen to have a set of square brackets in it.
The actual source code of the WordPress shortcode engine is fairly robust, and it doesn't look like it would be all that tough to modify the file to run by itself -- then you could use that in your application to handle the tough work. (If you're interested, take a look at get_shortcode_regex() in that file to see just how hairy the proper solution to this problem can actually get.)
A very rough implementation of your question using the WP shortcodes.php would look something like:
// Define the shortcode
function inlude_shortcode_func($attrs) {
$data = shortcode_atts(array(
'file' => 'default'
), $attrs);
return "Including File: {$data['file']}";
}
add_shortcode('include', 'inlude_shortcode_func');
// And then run your page content through the filter
echo do_shortcode('This is a document with [include file="header.html"] included!');
Again, not tested at all, but it's not a very hard API to use.

I have modified above function with wordpress function
function extractThis($short_code_string) {
$shortocode_regexp = "/(?P<shortcode>(?:(?:\\s?\\[))(?P<name>[\\w\\-]{3,})(?:\\s(?P<attrs>[\\w\\d,\\s=\\\"\\'\\-\\+\\#\\%\\!\\~\\`\\&\\.\\s\\:\\/\\?\\|]+))?(?:\\])(?:(?P<content>[\\w\\d\\,\\!\\#\\#\\$\\%\\^\\&\\*\\(\\\\)\\s\\=\\\"\\'\\-\\+\\&\\.\\s\\:\\/\\?\\|\\<\\>]+)(?:\\[\\/[\\w\\-\\_]+\\]))?)/u";
preg_match_all($shortocode_regexp, $short_code_string, $matches, PREG_SET_ORDER);
$shortcodes = array();
foreach ($matches as $i => $value) {
$shortcodes[$i]['shortcode'] = $value['shortcode'];
$shortcodes[$i]['name'] = $value['name'];
if (isset($value['attrs'])) {
$attrs = shortcode_parse_atts($value['attrs']);
$shortcodes[$i]['attrs'] = $attrs;
}
if (isset($value['content'])) {
$shortcodes[$i]['content'] = $value['content'];
}
}
return $shortcodes;
}
I think this one help for all :)

Updating the #Duco's snippet, As it seems like, it's exploding by spaces which ruins when we have some like
[Image source="myimage.jpg" alt="My Image"]
To current one:
function handleShortcodes($content, $shortcodes){
function read_attr($attr) {
$atList = [];
if (preg_match_all('/\s*(?:([a-z0-9-]+)\s*=\s*"([^"]*)")|(?:\s+([a-z0-9-]+)(?=\s*|>|\s+[a..z0-9]+))/i', $attr, $m)) {
for ($i = 0; $i < count($m[0]); $i++) {
if ($m[3][$i])
$atList[$m[3][$i]] = null;
else
$atList[$m[1][$i]] = $m[2][$i];
}
}
return $atList;
}
//Loop through all shortcodes
foreach($shortcodes as $key => $function){
$dat = array();
preg_match_all("/\[".$key."(.*?)\]/", $content, $dat);
if(count($dat) > 0 && $dat[0] != array() && isset($dat[1])){
$i = 0;
$actual_string = $dat[0];
foreach($dat[1] as $temp){
$params = read_attr($temp);
$content = str_replace($actual_string[$i], $function($params), $content);
$i++;
}
}
}
return $content;
}
$content = '[image source="one" alt="one two"]';
Result:
array(
[source] => myimage.jpg,
[alt] => My Image
)
Updated (Feb 11, 2020)
It appears to be following regex under preg_match only identifies shortcode with attributes
preg_match_all("/\[".$key." (.+?)\]/", $content, $dat);
to make it work with as normal [contact-form] or [mynotes]. We can change the following to
preg_match_all("/\[".$key."(.*?)\]/", $content, $dat);

I just had the same problem. For what I have to do, I am going to take advantage of existing xml parsers instead of writing my own regex. I am sure there are cases where it won't work
example.php
<?php
$file_content = '[include file="header.html"]';
// convert the string into xml
$xml = str_replace("[", "<", str_replace("]", "/>", $file_content));
$doc = new SimpleXMLElement($xml);
echo "name: " . $doc->getName() . "\n";
foreach($doc->attributes() as $key => $value) {
echo "$key: $value\n";
}
$ php example.php
name: include
file: header.html
to make it work on ubuntu I think you have to do this
sudo apt-get install php-xml
(thanks https://drupal.stackexchange.com/a/218271)
If you have lots of these strings in a file, then I think you can still do the find replace, and then just treat it all like xml.

Related

Get between every strings

Below is a function that can get a string with two other strings without a problem,
function GetBetween($content,$start,$end){
$r = explode($start, $content);
if (isset($r[1])){
$r = explode($end, $r[1]);
return $r[0];
}
return '';
}
Let's say I have code like this:
<code>Sample one</code>
<code>Sample two</code>
<code>Sample three</code>
When using GetBetween($content,'<code>',</code>') Instead of returning something like array("Sample one","Sample two","Sample three") it will only return the first one which is "Sample one"
How can I get it to return EVERYTHING between the two things I specify? I would appreciate it if I could get a solution that isn't hardcoded with the "" tags because I will be needing this for many different things.
Firstly regex is not the correct tool for parsing HTML/XML instead you can simply use DOMDocument like as
$xml = "<code>Sample one</code><code>Sample two</code><code>Sample three</code>";
$dom = new DOMDocument;
$dom->loadHTMl($xml);
$root = $dom->documentElement;
$code_data = $root->getElementsByTagName('code');
$code_arr = array();
foreach ($code_data as $key => $value) {
$code_arr[] = $value->nodeValue;
}
print_r($code_arr);
Output:
Array
(
[0] => Sample one
[1] => Sample two
[2] => Sample three
)
I've had to use a function like this, so I keep it handy:
//where a = content, b = start, c = end
function getBetween($a, $b, $c) {
$y = explode($b, $a);
$len = sizeof($y);
$arr = [];
for ($i = 1; $i < $len; $i++)
$arr[] = explode($c, $y[$i])[0];
return $arr;
}
Anything beyond this, you'll need to start using DomDocument.
Guess you could try something like this,
function GetBetween($content,$tagname){
$pattern = "#<\s*?$tagname\b[^>]*>(.*?)</$tagname\b[^>]*>#s";
preg_match($pattern, $string, $matches);
unset($matches[0]);
return $matches;
}
$content= "<code>Sample one</code><code>Sample two</code><code>Sample three</code>";
//The matching items are:
print_r(GetBetween($content, 'code'));

Regex extract variables from [shortcode]

After migrating some content from WordPress to Drupal, I've got som shortcodes that I need to convert:
String content:
Irrelevant tekst...
[sublimevideo class="sublime"
poster="http://video.host.com/_previews/600x450/sbx-60025-00-da-ANA.png"
src1="http://video.host.com/_video/H.264/LO/sbx-60025-00-da-ANA.m4v"
src2="(hd)http://video.host.com/_video/H.264/HI/sbx-60025-00-da-ANA.m4v"
width="560" height="315"]
..more irrelevant text.
I need to find all variables within the shortcode [sublimevideo ...] and turn it into an array:
Array (
class => "sublime"
poster => "http://video.host.com/_previews/600x450/sbx-60025-00-da-FMT.png"
src1 => "http://video.host.com/_video/H.264/LO/sbx-60025-00-da-FMT.m4v"
src2 => "(hd)http://video.host.com/_video/H.264/HI/sbx-60025-00-da-FMT.m4v"
width => "560"
height => "315"
)
And preferably handle multiple instances of the shortcode.
I guess it can be done with preg_match_all() but I've had no luck.
This will give you what you want.
$data = 'Irrelevant tekst... [sublimevideo class="sublime" poster="http://video.host.com/_previews/600x450/sbx-60025-00-da-ANA.png" src1="http://video.host.com/_video/H.264/LO/sbx-60025-00-da-ANA.m4v" src2="(hd)http://video.host.com/_video/H.264/HI/sbx-60025-00-da-ANA.m4v" width="560" height="315"] ..more irrelevant text.';
$dat = array();
preg_match("/\[sublimevideo (.+?)\]/", $data, $dat);
$dat = array_pop($dat);
$dat= explode(" ", $dat);
$params = array();
foreach ($dat as $d){
list($opt, $val) = explode("=", $d);
$params[$opt] = trim($val, '"');
}
print_r($params);
In anticipation of the next challenge you will face with processing short codes you can use preg_replace_callback to replace the short tag data with it's resultant markup.
$data = 'Irrelevant tekst... [sublimevideo class="sublime" poster="http://video.host.com/_previews/600x450/sbx-60025-00-da-ANA.png" src1="http://video.host.com/_video/H.264/LO/sbx-60025-00-da-ANA.m4v" src2="(hd)http://video.host.com/_video/H.264/HI/sbx-60025-00-da-ANA.m4v" width="560" height="315"] ..more irrelevant text.';
function processShortCode($matches){
// parse out the arguments
$dat= explode(" ", $matches[2]);
$params = array();
foreach ($dat as $d){
list($opt, $val) = explode("=", $d);
$params[$opt] = trim($val, '"');
}
switch($matches[1]){
case "sublimevideo":
// here is where you would want to return the resultant markup from the shorttag call.
return print_r($params, true);
}
}
$data = preg_replace_callback("/\[(\w+) (.+?)]/", "processShortCode", $data);
echo $data;
You could use the following RegEx to match the variables:
$regex = '/(\w+)\s*=\s*"(.*?)"/';
I would suggest to first match the sublimevideo shortcode and get that into a string with the following RegEx:
$pattern = '/\[sublimevideo(.*?)\]/';
To get the correct array keys I used this code:
// $string is string content you specified
preg_match_all($regex, $string, $matches);
$sublimevideo = array();
for ($i = 0; $i < count($matches[1]); $i++)
$sublimevideo[$matches[1][$i]] = $matches[2][$i];
This returns the following array: (the one that you've requested)
Array
(
[class] => sublime
[poster] => http://video.host.com/_previews/600x450/sbx-60025-00-da-ANA.png
[src1] => http://video.host.com/_video/H.264/LO/sbx-60025-00-da-ANA.m4v
[src2] => (hd)http://video.host.com/_video/H.264/HI/sbx-60025-00-da-ANA.m4v
[width] => 560
[height] => 315
)
This is my interpretation, I come from a WordPress background and tried to recreate the setup for a custom php project.
It'll handle things like [PHONE] [PHONE abc="123"] etc
The only thing it falls flat on is the WordPress style [HERE] to [HERE]
Function to build a list of available shortcodes
// Setup the default global variable
function create_shortcode($tag, $function)
{
global $shortcodes;
$shortcodes[$tag] = $function;
}
define shortcodes individually, e.g. [IFRAME url="https://www.bbc.co.uk"]:
/**
* iframe, allows the user to add an iframe to a page with responsive div wrapper
*/
create_shortcode('IFRAME', function($atts) {
// ... some validation goes here
// The parameters that can be set in the shortcode
if (empty($atts['url'])) {
return false;
}
return '
<div class="embed-responsive embed-responsive-4by3">
<iframe class="embed-responsive-item" src="' . $atts['url'] . '">
</iframe>
</div>';
});
Then when you want to pass a block of html via the shortcode handling do... handle_shortcodes($some_html_with_shortcodes);
function handle_shortcodes($content)
{
global $shortcodes;
// Loop through all shortcodes
foreach($shortcodes as $key => $function){
$matches = [];
// Look for shortcodes, returns an array of ALL matches
preg_match_all("/\[$key([^_^\]].+?)?\]/", $content, $matches, PREG_UNMATCHED_AS_NULL);
if (!empty($matches))
{
$i = 0;
$full_shortcode = $matches[0];
$attributes = $matches[1];
if (!empty($attributes))
{
foreach($attributes as $attribute_string) {
// Decode the values (e.g. " to ")
$attribute_string = htmlspecialchars_decode($attribute_string);
// Find all the query args, looking for `arg="anything"`
preg_match_all('/\w+\=\"(.[^"]+)\"/', $attribute_string, $query_args);
$params = [];
foreach ($query_args[0] as $d) {
// Split the
list($att, $val) = explode('=', $d, 2);
$params[$att] = trim($val, '"');
}
$content = str_replace($full_shortcode[$i], $function($params), $content);
$i++;
}
}
}
}
return $content;
}
I've plucked these examples from working code so hopefully it's readable and doesn't have any extra functions exclusive to our setup.
As described in this answer, I'd suggest letting WordPress do the work for you using the get_shortcode_regex() function.
$pattern = get_shortcode_regex();
preg_match_all("/$pattern/",$wp_content,$matches);
This will give you an array that is easy to work with and shows the various shortcodes and affiliated attributes in your content. It isn't the most obvious array format, so print it and take a look so you know how to manipulate the data you need.

PHP foreach overwrite value with array

I'm making a simple PHP Template system but I'm getting an error I cannot solve, the thing is the layout loads excellent but many times, can't figure how to solve, here my code
Class Template {
private $var = array();
public function assign($key, $value) {
$this->vars[$key] = $value;
}
public function render($template_name) {
$path = $template_name.'.tpl';
if (file_exists($path)) {
$content = file_get_contents($path);
foreach($this->vars as $display) {
$newcontent = str_replace(array_keys($this->vars, $display), $display, $content);
echo $newcontent;
}
} else {
exit('<h1>Load error</h1>');
}
}
}
And the output is
Title is : Welcome to my template system
Credits to [credits]
Title is : [title]
Credits to Credits to Alvaritos
As you can see this is wrong, but don't know how to solve it.
You're better off with strtr:
$content = file_get_contents($path);
$new = strtr($content, $this->vars);
print $new;
str_replace() does the replaces in the order the keys are defined. If you have variables like array('a' => 1, 'aa' => 2) and a string like aa, you will get 11 instead of 2. strtr() will order the keys by length before replacing (highest first), so that won't happen.
Use this:
foreach($this->vars as $key => $value)
$content = str_replace($key,$value,$content);
echo $content;

an array of parameter values

function test()
{
$content = "lang=en]text en|lang=sp]text sp";
$atts = explode('|', $content);
}
What I'm trying to do is to allow myself to echo $param[en] to get "text en", $param[sp] to get "text sp". Is that possible?
the $content is actually from a database record.
$param = array();
$langs = explode('|', $content);
foreach ($langs as $lang) {
$arr = explode(']', $lang);
$key = substr($arr[0], 5);
$param[$key] = $arr[1];
}
This is if you are sure $content is well-formatted. Otherwise you will need to put in additional checks to make sure $langs and $arr are what they should be. Use the following to quickly check what's inside an array:
echo '<pre>'.print_r($array_to_be_inspected, true).'</pre>';
Hope this helps
if this is not hard coded string in $content
function test()
{
$content = "lang=en]text en|lang=sp]text sp";
$atts = explode('|', $content);
foreach($atts as $att){
$tempLang = explode("]", $att);
$params[array_pop(explode("=", $tempLang[0]))] = $tempLang[1];
}
var_dump($params);
}
I think in this case you could use regular expressions.
$atts = explode('|', $content);
foreach ($atts as $subtext) {
if (preg_match('/lang=(\w+)\](\w+) /', $subtext, $regs)) {
$param[$regs[0]] = $regs[1];
}
}
Although it seems that you have a bad database structure if that value comes from a database - if you can edit it, try to make the database adhere to make the database normal.

php match string to multiple array of keywords

I'm writing a basic categorization tool that will take a title and then compare it to an array of keywords. Example:
$cat['dining'] = array('food','restaurant','brunch','meal','cand(y|ies)');
$cat['services'] = array('service','cleaners','framing','printing');
$string = 'Dinner at seafood restaurant';
Are there creative ways to loop through these categories or to see which category has the most matches? Note that in the 'dining' array, I have regex to match variations on the word candy. I tried the following, but with these category lists getting pretty long, I'm wondering if this is the best way:
$keywordRegex = implode("|",$cat['dining']);
preg_match_all("/(\b{$keywordRegex}\b)/i",$string,$matches]);
Thanks,
Steve
EDIT:
Thanks to #jmathai, I was able to add ranking:
$matches = array();
foreach($keywords as $k => $v) {
str_replace($v, '#####', $masterString,$count);
if($count > 0){
$matches[$k] = $count;
}
}
arsort($matches);
This can be done with a single loop.
I would split candy and candies into separate entries for efficiency. A clever trick would be to replace matches with some token. Let's use 10 #'s.
$cat['dining'] = array('food','restaurant','brunch','meal','candy','candies');
$cat['services'] = array('service','cleaners','framing','printing');
$string = 'Dinner at seafood restaurant';
$max = array(null, 0); // category, occurences
foreach($cat as $k => $v) {
$replaced = str_replace($v, '##########', $string);
preg_match_all('/##########/i', $replaced, $matches);
if(count($matches[0]) > $max[1]) {
$max[0] = $k;
$max[1] = count($matches[0]);
}
}
echo "Category {$max[0]} has the most ({$max[1]}) matches.\n";
$cat['dining'] = array('food','restaurant','brunch','meal');
$cat['services'] = array('service','cleaners','framing','printing');
$string = 'Dinner at seafood restaurant';
$string = explode(' ',$string);
foreach ($cat as $key => $val) {
$kwdMatches[$key] = count(array_intersect($string,$val));
}
arsort($kwdMatches);
echo "<pre>";
print_r($kwdMatches);
Providing the number of words is not too great, then creating a reverse lookup table might be an idea, then run the title against it.
// One-time reverse category creation
$reverseCat = array();
foreach ($cat as $cCategory => $cWordList) {
foreach ($cWordList as $cWord) {
if (!array_key_exists($cWord, $reverseCat)) {
$reverseCat[$cWord] = array($cCategory);
} else if (!in_array($cCategory, $reverseCat[$cWord])) {
$reverseCat[$cWord][] = $cCategory;
}
}
}
// Processing a title
$stringWords = preg_split("/\b/", $string);
$matchingCategories = array();
foreach ($stringWords as $cWord) {
if (array_key_exists($cWord, $reverseCat)) {
$matchingCategories = array_merge($matchingCategories, $reverseCat[$cWord]);
}
}
$matchingCategories = array_unique($matchingCategories);
You are performing O(n*m) lookup on n being the size of your categories and m being the size of a title. You could try organizing them like this:
const $DINING = 0;
const $SERVICES = 1;
$categories = array(
"food" => $DINING,
"restaurant" => $DINING,
"service" => $SERVICES,
);
Then for each word in a title, check $categories[$word] to find the category - this gets you O(m).
Okay here's my new answer that lets you use regex in $cat[n] values...there's only one caveat about this code that I can't figure out...for some reason, it fails if you have any kind of metacharacter or character class at the beginning of your $cat[n] value.
Example: .*food will not work. But s.afood or sea.* etc... or your example of cand(y|ies) will work. I sort of figured this would be good enough for you since I figured the point of the regex was to handle different tenses of words, and the beginnings of words rarely change in that case.
function rMatch ($a,$b) {
if (preg_match('~^'.$b.'$~i',$a)) return 0;
if ($a>$b) return 1;
return -1;
}
$string = explode(' ',$string);
foreach ($cat as $key => $val) {
$kwdMatches[$key] = count(array_uintersect($string,$val,'rMatch'));
}
arsort($kwdMatches);
echo "<pre>";
print_r($kwdMatches);

Categories