When we create a html page comments like
<!-- Comment 1 -->
or inside php
// Comment2
are obvious from a right click of the page - Show code
How can i prevent that ?
Hiding the comments inside is the answer.. Thanks everyone.
Html comments will show on the HTML page but as long as you include your PHP comments in the <?php tag they won't show to the user
To leave html comments are normal if you check amazon.com's code you will see all the html comments but none php or whatever server lang they use so don't worry about html comments just don't include stupid stuff like your admin password or some revealing database schema stuff in the html comments.
if you still want to remove all the comments even html(vscode):
Easy way:
Open extensions (ctrl-shift-x) * type in remove comments in the
search box. * Install the top pick and read instructions.
Hard way: * search replace(ctrl-h) * toggle regex on (alt-r). * Learn some regular expressions! https://docs.rs/regex/0.2.5/regex/#syntax
A simple //.* will match all single line comments (and more ;D).
#.* could be used to match python comments. And /\*[\s\S\n]*\*/
matches block comments. And you can combine them as well:
//.*|/\*[\s\S\n]*\*/ (| in regex means "or", . means any
character, * means "0 or more" and indicates how many characters to
match, therefore .* means all characters until the end of the line
(or until the next matching rule))
Of course with caveats, such as urls (https://...) has double
slashes and will match that first rule, and god knows where there are
# in code that will match that python-rule. So some
reading/adjusting has to be done!
Once you start fiddling with your regexes it can take a lifetime to
get them perfect, so be careful and go the easy route if you are short
on time, but knowing some simple regex by heart will do you good,
since regular expressions are usable almost everywhere.
From https://stackoverflow.com/a/50575194/17239314
Related
I've got a database with a lot of user made entries grown about 10 years. The users had the option to put HTML-code in their content. And some didn't that well. So I've a lot of content in where the quotes are missing. Need a valid HTML-code for an ex/import via XML.
Had tested to replace width but my regex doesn't work. Do you've an idea where's my fault?
$out=preg_replace("/<a href=h(.)*>/","<a href=\"h$1\">",$out);
PS: If you have an idea how to automatically make a correction on wrong html source this would alternatively be great.
I think you wanted to use "/<a href=h(.*)>/" (mind the star inside the parenthesis) since you want to capture all characters after the h and before the > inside the capture group.
You can also use <a href=([^"].*)> since the href may not start with h. This regex captures all href values that do not start with ".
Yet, all of these assume that the href is the last attribute in your a, i.e.., ending with >.
As a more general rule, I came up with (?<key>\w*)\s*=\s*(?<value>[^"][^\s>]*) that finds attribute-value pairs, separated by =. The values may not start with ", and they go until the next whitespace or >. Use this with caution, since it may fail in serveral circumstances: Multi-line html, inline JavaScript, etc.
Whether it is a good idea to use RegEx for such a task is a different discussion.
I try to make bbcode-ish engine for me website. But the thing is, it is not clear which codes are available, because the codes are made by the users. And on top of that, the whole thing has to be recursive.
For example:
Hello my name is [name user-id="1"]
I [bold]really[/bold] like cheeseburgers
These are the easy ones and i achieved making it work.
Now the problem is, what happens, when two of those codes are behind each other:
I [bold]really[/bold] like [bold]cheeseburgers[/bold]
Or inside each other
I [bold]really like [italic]cheeseburgers[/italic][/bold]
These codes can also have attributes
I [bold strengh="600"]really like [text font-size="24px"]cheeseburgers[/text][bold]
The following one worked quite well, but lacks in the recursive part (?R)
(?P<code>\[(?P<code_open>\w+)\s?(?P<attributes>[a-zA-Z-0-1-_=" .]*?)](?:(?P<content>.*?)\[\/(?P<code_close>\w+)\])?)
I just dont know where to put the (?R) recursive tag.
Also the system has to know that in this string here
I [bold]really like [italic]cheeseburgers[/italic][/bold] and [bold]football[/bold]
are 2 "code-objects":
1. [bold]really like [italic]cheeseburgers[/italic][/bold]
and
2. [bold]football[/bold]
... and the content of the first one is
really like [italic]cheeseburgers[/italic]
which again has a code in it
[italic]cheeseburgers[/italic]
which content is
cheeseburgers
I searched the web for two days now and i cant figure it out.
I thought of something like this:
Look for something like [**** attr="foo"] where the attributes are optional and store it in a capturing group
Look up wether there is a closing tag somewhere (can be optional too)
If a closing tag exists, everything between the two tags should be stored as a "content"-capturing group - which then has to go through the same procedure again.
I hope there are some regex specialist which are willing to help me. :(
Thank you!
EDIT
As this might be difficult to understand, here is an input and an expected output:
Input:
[heading icon="rocket"]I'm a cool heading[/heading][textrow][text]<p>Hi!</p>[/text][/textrow]
I'd like to have an array like
array[0][name] = heading
array[0][attributes][icon] = rocket
array[0][content] = I'm a cool heading
array[1][name] = textrow
array[1][content] = [text]<p>Hi!</p>[/text]
array[1][0][name] = text
array[1][0][content] = <p>Hi!</p>
Having written multiple BBCode parsing systems, I can suggest NOT using regexes only. Instead, you should actually parse the text.
How you do this is up to you, but as a general idea you would want to use something like strpos to locate the first [ in your string, then check what comes after it to see if it looks like a BBCode tag and process it if so. Then, search for [ again starting from where you ended up.
This has certain advantages, such as being able to examine each code and skip it if it's invalid, as well as enforcing proper tag closing order ([bold][italic]Nesting![/bold][/italic] should be considered invalid) and being able to provide meaningful error messages to the user if something is wrong (invalid parameter, perhaps) because the parser knows exactly what is going on, whereas a regex would output something unexpected and potentially harmful.
It might be more work (or less, depending on your skill with regex), but it's worth it.
I want to match a PHP regex string.
From what I know, they are always in the format (correct me if I am wrong):
/ One opening forward slash
the expression Any regular expression
/ One closing forward slash
[imsxe] Any number of the modifiers NOT REPEATING
My expression for this was:
^/.+/[imsxe]{0,5}$
Written as a PHP string, (with the open/close forward slash and escaped inner forward slashes) it is this:
$regex = '/^\/.+\/[imsxe]{0,5}$/';
which is:
^ From the beginning
/ Literal forward slash
.+ Any character, one or more
/ Literal forward slash
[imsxe]{0,5} Any of the chars i,m,s,x,e, 0-5 times (only 5 to choose from)
$ Until the end
This works, however it allows repeating modifiers, i.e:
This: ^/.+/[imsxe]{0,5}$
Allows this: '/blah/ii'
Allows this: '/blah/eee'
Allows this: '/blah/eise'
etc...
When it should not.
I personally use RegexPal to test, because its free and simple.
If (in order to help me) you would like to do the same, click the link above (or visit http://regexpal.com), paste my expression in the top text box
^/.+/[imsxe]{0,5}$
Then paste my tests in the bottom textbox
/^[0-9]+$/i
/^[0-9]+$/m
/^[0-9]+$/s
/^[0-9]+$/x
/^[0-9]+$/e
/^[0-9]+$/ii
/^[0-9]+$/mm
/^[0-9]+$/ss
/^[0-9]+$/xx
/^[0-9]+$/ee
/^[0-9]+$/iei
/^[0-9]+$/mim
/^[0-9]+$/sis
/^[0-9]+$/xix
/^[0-9]+$/eie
ensure you click the second checkbox at the top where it says '^$ match at line breaks (m)' to enable the multi-line testing.
Thanks for the help
Edit
After reading comments about Regex often having different delimiters i.e
/[0-9]+/ == #[0-9]+#
This is not a problem and can be factored in to my regex solution.
All I really need to know is how to prevent duplicate characters!
Edit
This bit isn't so important but it provides context
The need for such a feature is simple...
I'm using jQuery UI MultiSelect Widget written by Eric Hynds.
Simple demo found here
Now In my application, I'm extending the plugin so that certain options popup a little menu on the right when hovered. The menu that pops up can be ANY html element.
I wanted multiple options to be able to show the same element. So my API works like this:
$('#select_element_id')
// Erics MultiSelect API
.multiselect({
// MultiSelect options
})
// My API
.multiselect_side_pane({
menus: [
{
// This means, when an option with value 'MENU_1' is hovered,
// the element '#my_menu_1' will be shown. This makes attaching
// menus to options REALLY SIMPLE
menu_element: $('#my_menu_1'),
target: ['MENU_1']
},
// However, lets say we have option value 'USER_ID_132', I need the
// target name to be dynamic. What better way to be dynamic than regex?
{
menu_element: $('#user_details_box'),
targets: ['USER_FORM', '/^USER_ID_[0-9]+$/'],
onOpen: function(target)
{
// here the TARGET can be interrogated, and the correct
// user info can be displayed
// Target will be 'USER_FORM' or 'USER_ID_3' or 'USER_ID_234'
// so if it is USER_FORM I can clear the form ready for input,
// and if the target starts with 'USER_ID_', I can parse out
// the user id, and display the correct user info!
}
}
]
});
So as you can see, The whole reason I need to know if a string a regex, is so in the widget code, I can decide whether to treat the TARGET as a string (i.e. 'USER_FORM') or to treat the TARGET as an expression (i.e '/^USER_ID_[0-9]+$/' for USER_ID_234')
Unfortunately, the regexp string can be "anything". The forward slashes you talk about can be a lot of characters. i.e. a hash (#) will also work.
Secondly, to match up to 5 characters without having them double could probably be done with lookahead / lookbehind etc, but will create such complex regexp that it's faster to post-process it.
It is possibly faster to search for the regular expression functions (preg_match, preg_replace etc.) in code to be able to deduct where regular expressions are used.
$var = '#placeholder#';
Is a valid regular expression in PHP, but doesn't have to be one, where:
const ESCAPECHAR = '#';
$var = 'text';
$regexp = ESCAPECHAR . $var . ESCAPECHAR;
Is also valid, but might not be seen as such.
In order to prevent duplicate in modifier section, I'd do:
^/.+/(?:(?=[^i]*i[^i]*)?(?=[^m]*m[^m]*)?(?=[^s]*s[^s]*)?(?=[^x]*x[^x]*)?(?=[^e]*e[^e]*)?)?$
I need to match all three types of comments that PHP might have:
# Single line comment
// Single line comment
/* Multi-line comments */
/**
* And all of its possible variations
*/
Something I should mention: I am doing this in order to be able to recognize if a PHP closing tag (?>) is inside a comment or not. If it is then ignore it, and if not then make it count as one. This is going to be used inside an XML document in order to improve Sublime Text's recognition of the closing tag (because it's driving me nuts!). I tried to achieve this a couple of hours, but I wasn't able. How can I translate for it to work with XML?
So if you could also include the if-then-else login I would really appreciate it. BTW, I really need it to be in pure regular expression expression, no language features or anything. :)
Like Eicon reminded me, I need all of them to be able to match at the start of the line, or at the end of a piece of code, so I also need the following with all of them:
<?php
echo 'something'; # this is a comment
?>
Parsing a programming language seems too much for regexes to do. You should probably look for a PHP parser.
But these would be the regexes you are looking for. I assume for all of them that you use the DOTALL or SINGLELINE option (although the first two would work without it as well):
~#[^\r\n]*~
~//[^\r\n]*~
~/\*.*?\*/~s
Note that any of these will cause problems, if the comment-delimiting characters appear in a string or somewhere else, where they do not actually open a comment.
You can also combine all of these into one regex:
~(?:#|//)[^\r\n]*|/\*.*?\*/~s
If you use some tool or language that does not require delimiters (like Java or C#), remove those ~. In this case you will also have to apply the DOTALL option differently. But without knowing where you are going to use this, I cannot tell you how.
If you cannot/do not want to set the DOTALL option, this would be equivalent (I also left out the delimiters to give an example):
(?:#|//)[^\r\n]*|/\*[\s\S]*?\*/
See here for a working demo.
Now if you also want to capture the contents of the comments in a group, then you could do this
(?|(?:#|//)([^\r\n]*)|/\*([\s\S]*?)\*/)
Regardless of the type of comment, the comments content (without the syntax delimiters) will be found in capture 1.
Another working demo.
Single-line comments
singleLineComment = /'[^']*'|"[^"]*"|((?:#|\/\/).*$)/gm
With this regex you have to replace (or remove) everything that was captured by ((?:#|\/\/).*$). This regex will ignore contents of strings that would look like comments (e.g. $x = "You are the #1"; or $y = "You can start comments with // or # in PHP, but I'm a code string";)
Multiline comments
multilineComment = /^\s*\/\*\*?[^!][.\s\t\S\n\r]*?\*\//gm
I've created a simple template 'engine' in PHP to substitute PHP-generated data into the HTML page. Here's how it works:
In my main template file, I have variables like so:
<title><!-- %{title}% --></title>
I then assign data into those variables for the main page load
$assign = array (
'title' => 'my website - '
);
I then have separate template blocks that get loaded for the content pages. The above really just handles the header and the footer. In one of these 'content template files', I have variables like so:
<!-- %{title=content page}% -->
Once this gets executed, the main template data is edited to include the content page variables resulting in:
<title>my website - content page</title>
It does this with the following code:
if (preg_match('/<!-- %{title=\s*(.*?)}% -->/s', $string, $matches)) {
// Find variable names in the form of %{varName=new data to append}%
// If found, append that new data to the exisiting data
$string = preg_replace('/<!-- %{title=\s*(.*?)}% -->/s', null, $string);
$varData[$i] .= $matches[1];
}
This basically removes the template variables and then assigns the variable data to the existing variable. Now, this all works fine. What I'm having issues with is nesting template variables. If I do something like:
<!-- %{title=content page (author: <!-- %{name}% -->) -->
The pattern, at times, messes up the opening and closing tags of each variable.
How can I fix my regular expression to prevent this?
Thank you.
The answer is you don't do this with regex. Regular expressions are a regular language. When you start nesting things it is no longer a regular language. It is, at a minimum, a context-free language ("CFL"). CFLs can only be processed (assuming they're unambiguous) with a stack.
Specifically, regular languages can be represented with a finite state machine ("FSM"). CFLs require a pushdown automaton ("PDA").
An example of the difference is nested tags in HTML:
<div>
<div>inner</div>
</div>
My advice is don't write your own template language. That's been done. Many times. Use Smarty or something in Zend, Kohana or whatever. If you do write your own, do it properly. Parse it.
Why are you rolling your own template engine? If you want this kind of complexity, there's a lot of places that have already come up with solutions for it. You should just plug in Smarty or something like that.
If you're asking what I think you're asking, it's literally impossible. If I read your question correctly, you want to match arbitrarily-nested <!-- ... --> sequences with particular things inside. Unfortunately, regular expressions can only match certain classes of strings; any regular expression can match only a regular language. One well-known language which is not regular is the language of balanced parentheses (also known as the the Dyck language), which is exactly what you're trying to match. In order to match arbitrarily-nested comment strings, you need a more powerful tool. I'm fairly sure there are pre-existing PHP template engines; you might look into one of those.
To resolve your problem you should
replace preg_match() with preg_match_all();
find the pattern, and replace them from the last one to the first one;
use a more restrictive pattern like '/<!-- %{title=\s*([^}]*?)}% -->/s'.
I've done something similar in the past, and I have encountered the same nesting issue you did. In your case, what I would do is repeatedly search your text for matches (rather than searching once and looping through the matches) and extract the strings you want by searching for anything that doesn't include your closing string.
In your case, it would probably look like this:
/(<!--([^(-->)]*?)-->)/
Regexes like this are a nightmare to explain, but basically, ([^(-->)]*) will find any string that doesn't include your closing tag (let's call that AAA). It will be inside a matching group that is, itself, your template tag, (<!--AAA-->).
I'm convinced this sort of templating method is the wrong way to do things, but I've never known enough to do it better. It's always bothered me in ASP and ColdFusion that you had to nest your scripting tags inside HTML and when I started to do it myself, I considered it a personal failure.
Most Regexes I do now are in JavaScript and so I may be missing some of the awesome nuances PHP has via Perl. I'd be happy if someone can write this more cleanly.
I too have ran into this problem in the past, although I didn't use regular expressions.
If instead you search from right to left for the opening tag, <!-- %{ in your syntax, using strrpos (PHP5+), then search forwards for the first occurrence of the next closing tag, and then replace that chunk first, you will end up replacing the inner-most nested variables first. This should resolve your problem.
You can also do it the other way around and find the first occurrence of a closing tag, and work backwards to find its corresponding opening tag.