Replacing comments in php with preg_replace - php

I need to replace all block comments with preg_replace() in php.
For example:
/**asdfasdf
fasdfasdf*/
echo "hello World\n";
For this:
echo "hello World\n";
I tried some solutions from this site, but no one works for me.
My code:
$file = file_get_contents($fileinput);
$file = preg_replace('/\/\*([^\\n]*[\\n]?)*\*\//', '', $file);
echo $file;
My output for example is same as input.Link to my regex test

Use the http://www.php.net/manual/en/function.token-get-all.php:
$file = file_get_contents($fileinput);
$tokens = token_get_all($file); // prepend an open tag if your file doesnt have one
$plain = '';
foreach ($tokens as $token) {
if (is_array($token)) {
list($number, $string) = $token;
if (!in_array($number, [T_OPEN_TAG, T_COMMENT])) { // add all tokens you dont want
$plain .= $string;
}
} else {
$plain .= $token;
}
}
print_r($plain);
Output:
echo "hello World\n";
Here is a list of all PHP tokens:
http://www.php.net/manual/en/tokens.php

Try this
$file = preg_replace('/^\s*?\/\*.*?\*\//m', '', $file);

The best way to parse PHP code is to use the tokenizer.
However it is not so difficult to do it with a regex. You must only skip all strings:
$pattern = <<<'EOD'
~
(?(DEFINE)
(?<sq> ' (?>[^'\\]++|\\{2}|\\.)* ' ) # single quotes
(?<dq> " (?>[^"\\]++|\\{2}|\\.)* " ) # double quotes
(?<hd> <<< \s* (["']?)(\w+)\g{-2} \R .*? (?<=\n) \g{-1} ;? (\R|$) ) # heredoc like
(?<string> \g<sq> | \g<dq> | \g<hd>)
)
\g<string> (*SKIP)(*FAIL) | /\* .*? \*/
~xs
EOD;
$result = preg_replace($pattern, '', $data);

Related

PHP Preg Replace. Remove strings inside {~ string ~} pattern, but skip <pre>{~ string ~}</pre> [duplicate]

I am using a WordPress plugin named Acronyms (https://wordpress.org/plugins/acronyms/). This plugin replaces acronyms with their description. It uses a PHP PREG_REPLACE function.
The issue is that it replaces the acronyms contained in a <pre> tag, which I use to present a source code.
Could you modify this expression so that it won't replace acronyms contained inside <pre> tags (not only directly, but in any moment)? Is it possible?
The PHP code is:
$text = preg_replace(
"|(?!<[^<>]*?)(?<![?.&])\b$acronym\b(?!:)(?![^<>]*?>)|msU"
, "<acronym title=\"$fulltext\">$acronym</acronym>"
, $text
);
You can use a PCRE SKIP/FAIL regex trick (also works in PHP) to tell the regex engine to only match something if it is not inside some delimiters:
(?s)<pre[^<]*>.*?<\/pre>(*SKIP)(*F)|\b$acronym\b
This means: skip all substrings starting with <pre> and ending with </pre>, and only then match $acronym as a whole word.
See demo on regex101.com
Here is a sample PHP demo:
<?php
$acronym = "ASCII";
$fulltext = "American Standard Code for Information Interchange";
$re = "/(?s)<pre[^<]*>.*?<\\/pre>(*SKIP)(*F)|\\b$acronym\\b/";
$str = "<pre>ASCII\nSometext\nMoretext</pre>More text \nASCII\nMore text<pre>More\nlines\nASCII\nlines</pre>";
$subst = "<acronym title=\"$fulltext\">$acronym</acronym>";
$result = preg_replace($re, $subst, $str);
echo $result;
Output:
<pre>ASCII</pre><acronym title="American Standard Code for Information Interchange">ASCII</acronym><pre>ASCII</pre>
It is also possible to use preg_split and keep the code block as a group, only replace the non-code block part then combine it back as a complete string:
function replace($s) {
return str_replace('"', '"', $s); // do something with `$s`
}
$text = 'Your text goes here...';
$parts = preg_split('#(<\/?[-:\w]+(?:\s[^<>]+?)?>)#', $text, null, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
$text = "";
$x = 0;
foreach ($parts as $v) {
if (trim($v) === "") {
$text .= $v;
continue;
}
if ($v[0] === '<' && substr($v, -1) === '>') {
if (preg_match('#^<(\/)?(?:code|pre)(?:\s[^<>]+?)?>$#', $v, $m)) {
$x = isset($m[1]) && $m[1] === '/' ? 0 : 1;
}
$text .= $v; // this is a HTML tag…
} else {
$text .= !$x ? replace($v) : $v; // process or skip…
}
}
return $text;
Taken from here.

Matching string in 2 different patterns in PHP. The code must return TRUE

Objective: strings with ' should match the string without it.
Example:
$first_string = "alex ern o'brian";
$second_string = "alex-ern o brian";
$pattern = array("/(-|\.| )/", "/(')/");
$replace = array(' ', '(\s|)');
$first_string = preg_replace($pattern, $replace, $first_string);
$second_string = preg_replace($pattern, $replace, $second_string);
$first_string_split = preg_split("/(-|\.| )/", $first_string);
$first_string_split[] = $first_string;
$second_string_split = preg_split("/(-|\.| )/", $second_string);
$second_string_split[] = $second_string;
$first_string = array_slice($first_string_split, -1)[0];
$second_string = array_slice($second_string_split, -1)[0];
if(in_array($first_string, $second_string_split) || in_array($second_string, $first_string_split))
{
echo 'true';
} else {
echo 'false';
}
I think you are expecting this.
Solution 1: Try this code snippet here
Regex: (\s|) this will match either space or null.
<?php
ini_set('display_errors', 1);
$string = "o'brian";
$string=str_replace("'", "(\s|)",$string);
$list = array("o'neal", "o brian", "obrian");
$result=array();
foreach($list as $value)
{
if(preg_match("/$string/", $value))
{
$result[]=$value;
}
}
print_r($result);
Solution 2:
Regex: [a-z]+ will match character from a to z.
$string1="o brian";
$string2="obrian";
if(preg_match("/".implode(" ", $matches[0])."/", $string1))
{
echo "matched";
}
if( preg_match("/".implode("", $matches[0])."/", $string2))
{
echo "matched";
}
I'm not sure if I got your question right, but this should do it:
(?<=\w)'(?=\w)
It matches every ' character, which is followed and preceded by a word character. The word character \w is equal to [a-zA-Z0-9_].
Here is a live example to test the regex
Here is a live PHP example

PHP Regex expression excluding <pre> tag

I am using a WordPress plugin named Acronyms (https://wordpress.org/plugins/acronyms/). This plugin replaces acronyms with their description. It uses a PHP PREG_REPLACE function.
The issue is that it replaces the acronyms contained in a <pre> tag, which I use to present a source code.
Could you modify this expression so that it won't replace acronyms contained inside <pre> tags (not only directly, but in any moment)? Is it possible?
The PHP code is:
$text = preg_replace(
"|(?!<[^<>]*?)(?<![?.&])\b$acronym\b(?!:)(?![^<>]*?>)|msU"
, "<acronym title=\"$fulltext\">$acronym</acronym>"
, $text
);
You can use a PCRE SKIP/FAIL regex trick (also works in PHP) to tell the regex engine to only match something if it is not inside some delimiters:
(?s)<pre[^<]*>.*?<\/pre>(*SKIP)(*F)|\b$acronym\b
This means: skip all substrings starting with <pre> and ending with </pre>, and only then match $acronym as a whole word.
See demo on regex101.com
Here is a sample PHP demo:
<?php
$acronym = "ASCII";
$fulltext = "American Standard Code for Information Interchange";
$re = "/(?s)<pre[^<]*>.*?<\\/pre>(*SKIP)(*F)|\\b$acronym\\b/";
$str = "<pre>ASCII\nSometext\nMoretext</pre>More text \nASCII\nMore text<pre>More\nlines\nASCII\nlines</pre>";
$subst = "<acronym title=\"$fulltext\">$acronym</acronym>";
$result = preg_replace($re, $subst, $str);
echo $result;
Output:
<pre>ASCII</pre><acronym title="American Standard Code for Information Interchange">ASCII</acronym><pre>ASCII</pre>
It is also possible to use preg_split and keep the code block as a group, only replace the non-code block part then combine it back as a complete string:
function replace($s) {
return str_replace('"', '"', $s); // do something with `$s`
}
$text = 'Your text goes here...';
$parts = preg_split('#(<\/?[-:\w]+(?:\s[^<>]+?)?>)#', $text, null, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
$text = "";
$x = 0;
foreach ($parts as $v) {
if (trim($v) === "") {
$text .= $v;
continue;
}
if ($v[0] === '<' && substr($v, -1) === '>') {
if (preg_match('#^<(\/)?(?:code|pre)(?:\s[^<>]+?)?>$#', $v, $m)) {
$x = isset($m[1]) && $m[1] === '/' ? 0 : 1;
}
$text .= $v; // this is a HTML tag…
} else {
$text .= !$x ? replace($v) : $v; // process or skip…
}
}
return $text;
Taken from here.

Conditionally Replace a specific Character in a string

I am trying to remove the # sign from a block of text. The problem is that in certain cases (when at the beginning of a line, the # sign needs to stay.
I have succeeded by using the RegEx pattern .\#, however on when the # sign does get removed it also removes the character preceding it.
Goal: remove all # signs UNLESS the # sign is the first character in the line.
<?php
function cleanFile($text)
{
$pattern = '/.\#/';
$replacement = '%40';
$val = preg_replace($pattern, $replacement, $text);
$text = $val;
return $text;
};
$text = ' Test: test#test.com'."\n";
$text .= '#Test: Leave the leading at sign alone'."\n";
$text .= '#Test: test#test.com'."\n";
$valResult = cleanFile($text);
echo $valResult;
?>
Output:
Test: tes%40test.com
#Test: Leave the leading at sign alone
#Test: tes%40test.com
You can do this with regex using a negative lookbehind: /(?<!^)#/m (an # sign not preceded by the start of a line (or the start of the string if you skip out the m modifier)).
Regex 101 Demo
In code:
<?php
$string = "Test: test#test.com\n#Test: Leave the leading at sign alone\n#Test: test#test.com;";
$string = preg_replace("/(?<!^)#/m", "%40", $string);
var_dump($string);
?>
which outputs the following:
string(84) "Test: test%40test.com
#Test: Leave the leading at sign alone
#Test: test%40test.com;"
Codepad demo
There's no need for regexp in such simple case.
function clean($source) {
$prefix = '';
$offset = 0;
if( $source[0] == '#' ) {
$prefix = '#';
$offset = 1;
}
return $prefix . str_replace('#', '', substr( $source, $offset ));
}
and test case
$test = array( '#foo#bar', 'foo#bar' );
foreach( $test as $src ) {
echo $src . ' => ' . clean($src) . "\n";
}
would give:
#foo#bar => #foobar
foo#bar => foobar
the syntax [^] means negative match (as in don't match), but I don't think the following would work
$pattern = '/[^]^#/';

SyntaxHighlighter BBCode PHP

I'm having some problems with the BBCode I created to use with the SyntaxHighlighter
function bb_parse_code($str) {
while (preg_match_all('`\[(code)=?(.*?)\]([\s\S]*)\[/code\]`', $str, $matches)) foreach ($matches[0] as $key => $match) {
list($tag, $param, $innertext) = array($matches[1][$key], $matches[2][$key], $matches[3][$key]);
switch ($tag) {
case 'code': $replacement = '<pre class="brush: '.$param.'">'.str_replace(" ", " ", str_replace(array("<br>", "<br />"), "\n", $innertext))."</pre>"; break;
}
$str = str_replace($match, $replacement, $str);
}
return $str;
}
And I have the bbcode:
[b]bold[/b]
[u]underlined[/u]
[code=js]function (lol) {
alert(lol);
}[/code]
[b]bold2[/b]
[code=php]
<? echo 'lol' ?>
[/code]
Which returns this:
I know the problem is on the ([\s\S]*) of the regex that allows any character, but how do to make the code work with line breaks?
You should use the following pattern:
`\[(code)=?(.*?)\](.*?)\[/code\]`s
A couple of changes:
The switch to .*? to make the quantifier lazy.
The s modifier at the end, which causes . to match new lines too.

Categories