I need help modifying a regular expression for PHP markdown

I need help modifying a regular expression for PHP markdown - php

I'm modifying PHP Markdown (a PHP parser of the markup language which is used here on Stack Overflow) trying to implement points 1, 2 and 3 described by Jeff in this blog post. I've easily done the last two, but this one is proving very difficult:
Removed support for intra-word
emphasis like_this_example
In fact, in the "normal" markdown implementation like_this_example would be rendered as likethisexample. This is very undesirable; I want only _example_ to become example.
I looked in the source code and found the regex used to do the emphasis:
var $em_relist = array(
'' => '(?:(?<!\*)\*(?!\*)|(?<!_)_(?!_))(?=\S|$)(?![.,:;]\s)',
'*' => '(?<=\S|^)(?<!\*)\*(?!\*)',
'_' => '(?<=\S|^)(?<!_)_(?!_)',
);
var $strong_relist = array(
'' => '(?:(?<!\*)\*\*(?!\*)|(?<!_)__(?!_))(?=\S|$)(?![.,:;]\s)',
'**' => '(?<=\S|^)(?<!\*)\*\*(?!\*)',
'__' => '(?<=\S|^)(?<!_)__(?!_)',
);
var $em_strong_relist = array(
'' => '(?:(?<!\*)\*\*\*(?!\*)|(?<!_)___(?!_))(?=\S|$)(?![.,:;]\s)',
'***' => '(?<=\S|^)(?<!\*)\*\*\*(?!\*)',
'___' => '(?<=\S|^)(?<!_)___(?!_)',
);
I tried to open it in Regex Buddy but it wasn't enough, and after spending half an hour working on it I still don't know where to start. Any suggestions?
Some people, when confronted with a
problem, think "I know, I'll use
regular expressions." Now they have
two problems.

I use RegexBuddy too. :)
You may want to try the following code:
<?php
$line1 = "like_this_example";
$line2 = "I want only _example_ to become example";
$pattern = '/\b_(?P<word>.*?)_\b/si';
if (preg_match($pattern, $line1, $matches))
{
$result = $matches['word'];
var_dump($result);
}
if (preg_match($pattern, $line2, $matches))
{
$result = $matches['word'];
var_dump($result);
}
?>

I was able to grab only individual _enclosed_ words via:
$input = 'test of _this_ vs stuff_like_this...and here is _anothermatch_ and_another_fake_string';
$pattern = '#(?<=\s|^)(?<!_)(_[^_]*_)(?!_)#is';
preg_match_all($pattern, $input, $matches);
print_r($matches);
I'm not sure how exactly that would fit into the above code though. You would probably need to pair it with the other patterns below to account for the two and three match situations:
$pattern = '#(?<=\s|^)(?<!_)(__[^_]*__)(?!_)#is';
$pattern = '#(?<=\s|^)(?<!_)(___[^_]*___)(?!_)#is';

Related

Making a simple templating engine in PHP

I need to write a simple PHP function to replace text between {{ }} characters with their respective data.
Example:
String: "and with strange aeons even {{noun}} may {{verb}}"
$data = ['noun' => 'bird', 'verb' => 'fly'];
Result:
"and with strange aeons even bird may fly"
I have it almost working with the following code based on preg_replace_callback
function compile($str,$data){
foreach ($data as $k => $v) {
$pattern = '/\b(?<!\-)(' . $k . ')\b(?!-)/i';
$str = preg_replace_callback($pattern, function($m) use($v){
return $v;
}, $str);
}
return $str;
}
But I cant seem to account for the {{ }}.
The result looks like this:
"and with strange aeons even {{bird}} may {{fly}}"
How can I adjust the regex and/or code to account for the double curly brackets?
Also, before anyone asks why I'm trying to do this manually rather than use PHP itself or the Smarty plugin -- its too narrow a use case to install a plugin and I cannot use PHP itself because the input string is coming in as raw text from a database. I need to compile that raw text with data stored in a PHP array.

Since you're looping anyway, keep it simple:
foreach ($data as $k => $v) {
$str = str_ireplace('{{'.$k.'}}', $v, $str);
}
You can add a space before {{ and after }} if needed.

You can use
$str = "and with strange aeons even {{noun}} may {{verb}}";
$data = ['noun' => 'bird', 'verb' => 'fly'];
$pattern = '/{{(' . implode('|', array_keys($data)) . ')}}/i';
echo preg_replace_callback($pattern, function($m) use($data){
return $data[strtolower($m[1])];
}, $str);
// => and with strange aeons even bird may fly
See the PHP demo.
The $pattern will look like /{{(noun|verb)}}/i, and will match noun or verb inside double braces while capturing the word itself. The replacement will be the corresponding key value of the $data array. Turning the Group 1 value to lower case with strtolower($m[1]) is required since the keys in the $data array are all lowercase, and the $pattern can match uppercase variants, too.

Make use of strtr() and call it a day:
$string = 'and with strange aeons even {{noun}} may {{verb}}';
$data = ['{{noun}}' => 'bird', '{{verb}}' => 'fly'];
echo strtr( $string, $data );
produces:
and with strange aeons even bird may fly
strtr() is nice because it won't mess up the string in the event of:
$data = ['{{noun}}' => 'bi{{verb}}rd', '{{verb}}' => 'fly'];

PHP get the <h[1-6]></h[1-6]> values from an html text

On my code I have the follwoing regexp:
preg_match_all('/<title>([^>]*)<\/title>/si', $contents, $match );
That retrieves the <h>..</h> tags from a webpage. But sometimes it may have html tags such as <strong>,<b> etc etc therefore It needs some modification therefore I tried this one
preg_match_all('/<h[1-6]>(.*)<\/h[1-6]>/si', $contents, $match );
But something wrong and does not retrieve the content that is in html <h> tags.
Can you help me to modify correctly the regexp?

preg_match_all('<h\d>', $contents, $matches);
foreach($matches as $match){
$num[] = substr ( $match , 1 , 1 );
}

When use (.*) you take everything, for just words, digits and space, maybe you can use a range with them and take one or more:
preg_match_all('/<h[1-6]>([\w\d\s]+)<\/h[1-6]>/si', $contents, $match);

Now, here's no Regex expert but should he be in your shoes; He'd do it like so:
<?php
// SIMULATED SAMPLE HTML CONENT - WITH ATTRIBUTES:
$contents = '<section id="id-1">And even when darkness covers your path and no one is there to lend a hand;
<h3 class="class-1">Always remember that <em>There is always light at the end of the Tunnel <span class="class-2">if you can but hang on to your Faith!</span></em></h3>
<div>Now; let no one deceive you: <h2 class="class-2">You will be tried in ever ways - sometimes beyond your limits...</h2></div>
<article>But hang on because You are the Voice... You are the Light and you shall rule your Destiny because it is all about<h6 class="class4">YOU - THE REAL YOU!!!</h6></article>
</section>';
// SPLIT THE CONTENT AT THE END OF EACH <h[1-6]> TAGS
$parts = preg_split("%<\/h[1-6]>%si", $contents);
$matches = array();
// LOOP THROUGH $parts AND BUNDLE APPROPRIATE ELEMENTS TO THE $matches ARRAY.
foreach($parts as $part){
if(preg_match("%(.*|.?)(<h)([1-6])%si", $part)){
$matches[] = preg_replace("%(.*|.?)(<)(h[1-6])(.*)%si", "$2$3$4$2/$3>", $part);
}
}
var_dump($matches);
//DUMPS::::
array (size=3)
0 => string '<h3 class="class-1">Always remember that <em>There is always light at the end of the Tunnel <span class="class-2">if you can but hang on to your Faith!</span></em></h3>' (length=168)
1 => string '<h2 class="class-2">You will be tried in ever ways - sometimes beyond your limits...</h2>' (length=89)
2 => string '<h6 class="class4">YOU - THE REAL YOU!!!</h6>' (length=45)
As a Function, this is what it boils down to:
<?php
function pseudoMatchHTags($htmlContentWithHTags){
$parts = preg_split("%<\/h[1-6]>%si", $htmlContentWithHTags);
$matches = array();
foreach($parts as $part){
if(preg_match("%(.*|.?)(<h)([1-6])%si", $part)){
$matches[] = preg_replace("%(.*|.?)(<)(h[1-6])(.*)%si", "$2$3$4$2/$3>", $part);
}
}
return $matches;
}
var_dump(pseudoMatchHTags($contents));
You can test it here: https://eval.in/571312 ... perhaps it helps a bit... i hope... ;-)

PHP regex replace from string to string

I want to replace a section of a string based that starts with one string and ends with another, and I want the section between also replaced. I think this is possible using regex but I cant' seem to find any decent examples showing this.
For Example:
I have "http://www.website.com" and I want to replace from "www" to "com" with "123xyz".
So"http://www.website.com/something" becomes "http://123xyz/something.
I am assuming I have to use preg_replace(), and I think the regex should start with "^www" and end with "com$", but I cant seem to get a grasp of the syntax of regex enough to create the desired effect.
please help

With reference to your example , you can try like this
$string = 'http://www.website.com/something';
$pattern = '/www(.*)com/';
$replacement = '123xyz';
echo preg_replace($pattern, $replacement, $string);

$phrase = "http://www.website.com";
$phraseWords = array("www", "com");
$replaceTo = array("123xyz", "something");
$result = str_replace($phraseWords, $replaceTo, $phrase);
echo $result;

Thanks so much to both #CodingAnt and #PHPWeblineindia for your great answers. Using #CodingAnt's answer (and some more research I did online) I wrote this function:
function replaceBetween(&$target, $from, $to, $with){
if(strpos($target, $from)===false)return false;
$regex = "'".$from."(.*?)".$to."'si";
preg_match_all($regex, $target, $match);
$match = $match[1];
foreach($match as $m) $target = str_replace($from.$m.$to, $with, $target);
return $target;
}
It seems to work pretty well. I hope someone finds this useful.

PHP - Keyword matching in text strings - How to enhance the accuracy of returned keywords?

I have a piece of PHP code as follows:
$words = array(
'Art' => '1',
'Sport' => '2',
'Big Animals' => '3',
'World Cup' => '4',
'David Fincher' => '5',
'Torrentino' => '6',
'Shakes' => '7',
'William Shakespeare' => '8'
);
$text = "I like artists, and I like sports. Can you call the name of a big animal? Brazil World Cup matchers are very good. William Shakespeare is very famous in the world.";
$all_keywords = $all_keys = array();
foreach ($words as $word => $key) {
if (strpos(strtolower($text), strtolower($word)) !== false) {
$all_keywords[] = $word;
$all_keys[] = $key;
}
}
echo $keywords_list = implode(',', $all_keywords) ."<br>";
echo $keys_list = implode(',', $all_keys) . "<br>";
The code echos Art,Sport,World Cup,Shakes,William Shakespeare and 1,2,4,7,8; however, the code is very simple and is not accurate enough to echo the right keywords. For example, the code returns 'Shakes' => '7' because of the Shakespeare word in $text, but as you can see, "Shakes" can not represent "Shakespeare" as a proper keyword. Basically I want to return Art,Sport,World Cup,William Shakespeare and 1,2,4,8 instead of Art,Sport,World Cup,Shakes,William Shakespeare and 1,2,4,7,8. So, could you please help me how to develop a better code to extract the keywords without having similar problems? thanks for your help.

You may want to look at regular expressions to weed out partial matches:
// create regular expression by using alternation
// of all given words
$re = '/\b(?:' . join('|', array_map(function($keyword) {
return preg_quote($keyword, '/');
}, array_keys($words))) . ')\b/i';
preg_match_all($re, $text, $matches);
foreach ($matches[0] as $keyword) {
echo $keyword, " ", $words[$keyword], "\n";
}
The expression uses the \b assertion to match word boundaries, i.e. the word must be on its own.
Output
World Cup 4
William Shakespeare 8

You're better off using regular expressions if you want accurate matches.
I modified your original code to use them instead of strpos() as it will result in partial matches, as was the case with your code.
There's room for improvement, but hopefully you get the basic gist of it.
Let me know if you have any questions.
Code was modified to a shell script, so save to demo.php and chmod +x demo.php && ./demo.php
`
#!/usr/bin/php
//array of regular expressions to match your words/phrases
$words = array(
'/\b[Aa]rt\b/',
'/\bI\b/',
'/\bSport\b/',
'/\bBig Animals\b/' ,
'/\bWorld Cup\b/' ,
'/\bDavid Fincher\b/',
'/\bTorrentino\b/' ,
'/\bShakes\b/' ,
'/\b[sS]port[s]{0,1}\b/' ,
'/\bWilliam Shakespeare\b/',
);
$text = "I like artists and art, and I like sports. Can you call the name of a big animal? Brazil World Cup matchers are very good. William Shakespeare is very famous in the world.";
$all_keywords = array(); //changed formatting for clarity
$all_keys = array();
foreach ($words as $regex) {
$m = array();
if (preg_match_all($regex, $text, $m, PREG_OFFSET_CAPTURE)>=1)
for ($n=0;$n<count($m); ++$n) {
$match = $m[0];
foreach($match as $mm) {
$key = $mm[1]; //key is the offset in $text where the match begins
$word = $mm[0]; //the matched word/phrase
$all_keywords[] = $word;
$all_keys[] = $key;
}
}
}
echo "\$text = \"$text\"\n";
echo $keywords_list = implode(',', $all_keywords) ."<br>\n";
echo $keys_list = implode(',', $all_keys) . "<br>\n";
`

Replace
strpos(strtolower($text), strtolower($word)
With
preg_match('/\b'.$word.'\b/',$text)
Or, since you don't seem to care about capital letters:
preg_match('/\b'.strtolower($word).'\b/', strtolower($text))
I suggest in that case that you perform strtolower($text) beforehand, for instance just before the beginning of foreach.

From the top of my head, I think there are two additional steps to make this function a bit robust.
If we somehow sort the $words array by strlen (descending, bigger words at the top and smaller at the bottom) there would be greater chance for desired "match".
In the for loop, when a word "matches" or strcmp returns true, we can remove the matched word from the string to avoid further unnecessary match. (e.g. Shakes will always match where William Shakespeare matches.)
P.S. SO ios app rocks! But still not easy to code(bloody autocorrect!)

Swear filter case sensitive

I have a little problem with my function:
function swear_filter($string){
$search = array(
'bad-word',
);
$replace = array(
'****',
);
return preg_replace($search , $replace, $string);
}
It should transform "bad-word" to "**" but the problem is the case sensivity
eg. if the user type "baD-word" it doesn't work.

The values in your $search array are not regular expressions.
First, fix that:
$search = array(
'/bad-word/',
);
Then, you can apply the i flag for case-insensitivity:
$search = array(
'/bad-word/i',
);
You don't need the g flag to match globally (i.e. more than once each) because preg_replace will handle that for you.
However, you could probably do with using the word boundary metacharacter \b to avoid matching your "bad-word" string inside another word. This may have consequences on how you form your list of "bad words".
$search = array(
'/\bbad-word\b/i',
);
Live demo.
If you don't want to pollute $search with these implementation details, then you can do the same thing a bit more programmatically:
$search = array_map(
create_function('$str', 'return "/\b" . preg_quote($str, "/") . "\b/i";'),
$search
);
(I've not used the recent PHP lambda syntax because codepad doesn't support it; look it up if you are interested!)
Live demo.
Update Full code:
function swear_filter($string){
$search = array(
'bad-word',
);
$replace = array(
'****',
);
// regex-ise input
$search = array_map(
create_function('$str', 'return "/\b" . preg_quote($str, "/") . "\b/i";'),
$search
);
return preg_replace($search, $replace, $string);
}

I think you mean
'/bad-word/i',

Do you even need to use regex?
function swear_filter($string){
$search = array(
'bad-word',
);
if (in_array(strtolower($string), $search){
return '****';
}
return $search
}
makes the following assumptions.
1) $string contains characters acceptable in the current local
2) all contents of the $search array are lowercase
edit: 3) Entire string consists of bad word
I suppose this would only work if the string was split and evaluated on a per word basis.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

I need help modifying a regular expression for PHP markdown - php

Related

Making a simple templating engine in PHP

PHP get the <h[1-6]></h[1-6]> values from an html text

PHP regex replace from string to string

PHP - Keyword matching in text strings - How to enhance the accuracy of returned keywords?

Swear filter case sensitive

Categories

Resources