This is probably a very simple question, but I just can't figure it out. I want to define a string containing two forward slashes
$htmlcode="text//text";
From what I understand what follows after // are comments.
Question: How do create a string containing //?
Parsing of the language is a little bit tricky. Within a string literal, comments and other language features are not triggered (except for special characters which need to be escaped). Also, within block comments, line-comments are not valued.
$example1 = 'hello /* this is not a comment */ '; /* but this is */
$example2 = 'hello // this is not a comment '; //but this is
$example3 = "works the same with double quotes /* not a comment */ //not a comment ";
/* comment example
$thisIsAComment
//this does not escape the closing */
$htmlcode="text//text"; //this is comment.
Your string is already defined as you want it to be.
Check out docs: http://php.net/manual/en/language.basic-syntax.comments.php
You should use some IDE or syntax highlighter, you will understand code more clearly.
Notepad++ is free and lightweight http://notepad-plus-plus.org/download
Your code is fine.
I would highly suggest reading the Basic Syntax - Comments guide to get a better understanding.
Related
I need to remove the comment lines from my code.
preg_replace('!//(.*)!', '', $test);
It works fine. But it removes the website url also and left the url like http:
So to avoid this I put the same like preg_replace('![^:]//(.*)!', '', $test);
It's work fine. But the problem is if my code has the line like below
$code = 'something';// comment here
It will replace the comment line with the semicolon. that is after replace my above code would be
$code = 'something'
So it generates error.
I just need to delete the single line comments and the url should remain same.
Please help. Thanks in advance
try this
preg_replace('#(?<!http:)//.*#','',$test);
also read more about PCRE assertions http://cz.php.net/manual/en/regexp.reference.assertions.php
If you want to parse a PHP file, and manipulate the PHP code it contains, the best solution (even if a bit difficult) is to use the Tokenizer : it exists to allow manipulation of PHP code.
Working with regular expressions for such a thing is a bad idea...
For instance, you thought about http:// ; but what about strings that contain // ?
Like this one, for example :
$str = "this is // a test";
This can get complicated fast. There are more uses for // in strings. If you are parsing PHP code, I highly suggest you take a look at the PHP tokenizer. It's specifically designed to parse PHP code.
Question: Why are you trying to strip comments in the first place?
Edit: I see now you are trying to parse JavaScript, not PHP. So, why not use a javascript minifier instead? It will strip comments, whitespace and do a lot more to make your file as small as possible.
I have a string of a comment that looks like this:
**#First Last** rest of comment info.
What I need to do is replace the ** with <b> and </b> so that the name is bold.
I currently have this:
$result = preg_replace('/\*\*/i', '<b>$0</b>', $comment);
But that results in:
"<b>**</b>#First Last<b>**</b> rest of comment info"
Which is obviously not what I need. Does anyone have any suggestions on how to achieve this?
A better idea would be to use a markdown parser library which does this for you.
Improper use of the regular expressions can lead to security vulnerabilities with unclosed tags etc..
However, a simple approach could be this:
$result = preg_replace('/\*\*(.+)\*\*/sU', '<b>$1</b>', $comment);
The U means ungreedy and takes the first possible match.
The s modifies . to be any character, including newline.
(see modifiers page)
This also works with an example text like this:
**#First Last** rest of comment info **also bold**.
The next line also **has a bold example**. This spans **two
lines**.
I am sure there are counter examples which are broken with this. Again, please use a proper markdown library to do this for you (or copy their implementation if the LICENSE allows it).
I'm new to PHP and don't understand what the point of <<<_END is. Could someone please explain when this should be used? I've looked at various examples and they all seem to have HTML embedded within them. But I can use HTML without the <<<_END tags, so why should I use them? I tried searching the manual, but I keep finding the end() method for arrays.
It's the start of a heredoc. you can do:
$data = <<< _END
You can write anything you want in between the start and end
_END;
_END can be just about anything. You could put EOF or STUFF. as long as you use the same thing at the start and the finish.
This signifies the beginning of a heredoc (a multi-line string that allows you to use quotation marks in the middle, unescaped) that ends when you encounter the _END
It can be useful to define HTML in one of these if the goal is to assign it to a variable or pass it to a function rather than printing it to the web server immediately.
That syntax is called heredoc
<<<_END
some text
_END
Basically, it's a way of writing a string without worrying about escaping quotes and so on.
As you've mentioned, it doesn't really provide a lot of benefit over other string formats - although, it does mean you can write a block of HTML without escaping out of PHP with ?>
It also isn't too popular as its use generally goes against the practice of seperating content from logic by embedding the content in the middle of your script.
Does this help? http://www.php.net/manual/en/language.types.string.php#language.types.string.syntax.heredoc
It allows you to echo out a block of text (just the same as with echo "words";), but without using the beginning/ending quotes, and without having to escape contained double quotes. Read the manual link above for more detail.
It's a heredoc. It's just a way of defining a string.
Well if I comment something it's skipped in all languages, but how are they skipped and what is readed?
Example:
// This is commented out
Now does PHP reads the whole comment to go to next lines or just reads the //?
The script is parsed and split into tokens.
You can actually try this out yourself on any valid PHP source code using token_get_all(), it uses PHP's native tokenizer.
The example from the manual shows how a comment is dealt with:
<?php
$tokens = token_get_all('<?php echo; ?>'); /* => array(
array(T_OPEN_TAG, '<?php'),
array(T_ECHO, 'echo'),
';',
array(T_CLOSE_TAG, '?>') ); */
/* Note in the following example that the string is parsed as T_INLINE_HTML
rather than the otherwise expected T_COMMENT (T_ML_COMMENT in PHP <5).
This is because no open/close tags were used in the "code" provided.
This would be equivalent to putting a comment outside of <?php ?>
tags in a normal file. */
$tokens = token_get_all('/* comment */');
// => array(array(T_INLINE_HTML, '/* comment */'));
?>
There is a tokenization phase while compiling. During this phase, it see the // and then just ignores everything to the end of the line. Compilers CAN get complicated, but for the most part are pretty straight forward.
http://compilers.iecc.com/crenshaw/
Your question doesn't make sense. Having read the '//', it then has to keep reading to the newline to find it. There's no choice about this. There is no other way to find the newline.
Conceptually, compiling has several phases that are logically prior to parsing:
Scanning.
Screening.
Tokenization.
(1) basically means reading the file character by character from left to right.
(2) means throwing things away of no interest, e.g. collapsing multiple newline/whitespace sequences to a single space.
(3) means combining what's left into tokens, e.g. identifiers, keywords, literals, punctuation.
Comments are screened out during (2). In modern compilers this is all done at once by a deterministic automaton.
I am writing a comment-stripper and trying to accommodate for all needs here. I have the below stack of code which removes pretty much all comments, but it actually goes too far. A lot of time was spent trying and testing and researching the regex patterns to match, but I don't claim that they are the best at each.
My problem is that I also have situation where I have 'PHP comments' (that aren't really comments' in standard code, or even in PHP strings, that I don't actually want to have removed.
Example:
<?php $Var = "Blah blah //this must not comment"; // this must comment. ?>
What ends up happening is that it strips out religiously, which is fine, but it leaves certain problems:
<?php $Var = "Blah blah ?>
Also:
will also cause problems, as the comment removes the rest of the line, including the ending ?>
See the problem? So this is what I need...
Comment characters within '' or "" need to be ignored
PHP Comments on the same line, that use double-slashes, should remove perhaps only the comment itself, or should remove the entire php codeblock.
Here's the patterns I use at the moment, feel free to tell me if there's improvement I can make in my existing patterns? :)
$CompressedData = $OriginalData;
$CompressedData = preg_replace('!/\*.*?\*/!s', '', $CompressedData); // removes /* comments */
$CompressedData = preg_replace('!//.*?\n!', '', $CompressedData); // removes //comments
$CompressedData = preg_replace('!#.*?\n!', '', $CompressedData); // removes # comments
$CompressedData = preg_replace('/<!--(.*?)-->/', '', $CompressedData); // removes HTML comments
Any help that you can give me would be greatly appreciated! :)
If you want to parse PHP, you can use token_get_all to get the tokens of a given PHP code. Then you just need to iterate the tokens, remove the comment tokens and put the rest back together.
But you would need a separate procedure for the HTML comments, preferably a real parser too (like DOMDocument provides with DOMDocument::loadHTML).
You should first think carefully whether you actually want to do this. Though what you're doing may seem simple, in the worst case scenario, it becomes extremely complex problem (to solve with just few regular expressions). Let me just illustrate just of the few problems you would be facing when trying to strip both HTML and PHP comments from a file.
You can't straight out strip HTML comments, because you may have PHP inside the HTML comments, like:
<!-- HTML comment <?php echo 'Actual PHP'; ?> -->
You can't just simply separately deal with stuff inside the <?php and ?> tags either, since the ending thag ?> can be inside strings or even comments, like:
<?php /* ?> This is still a PHP comment <?php */ ?>
Let's not forget, that ?> actually ends the PHP, if it's preceded by one line comment. For example:
<?php // ?> This is not a PHP comment <?php ?>
Of course, like you already illustrated, there will be plenty of problems with comment indicators inside strings. Parsing out strings to ignore them isn't that simple either, since you have to remember that quotes can be escaped. Like:
<?php
$foo = ' /* // None of these start a comment ';
$bar = ' \' // Remember escaped quotes ';
$orz = " ' \" \' /* // Still not a comment ";
?>
Parsing order will also cause you headache. You can't just simply choose to parse either the one line comments first or the multi line comments first. They both have to be parsed at the same time (i.e. in the order they appear in the document). Otherwise you may end up with broken code. Let me illustrate:
<?php
/* // Multiline comment */
// /* Single Line comment
$omg = 'This is not in a comment */';
?>
If you parse multi line comments first, the second /* will eat up part of the string destroying the code. If you parse the single line comments first, you will end up eating the first */, which will also destroy the code.
As you can see, there are many complex scenarios you'd have to account, if you intend to solve your problem with regular expression. The only correct solution is to use some sort of PHP parser, like token_get_all(), to tokenize the entire source code and strip the comment tokens and rebuild the file. Which, I'm afraid, isn't entirely simple either. It also won't help with HTML comments, since the HTML is left untouched. You can't use XML parsers to get the HTML comments either, because the HTML is rarely well formed with PHP.
To put it short, the idea of what you're doing is simple, but the actual implementation is much harder than it seems. Thus, I would recommend trying to avoid doing this, unless you have a very good reason to do it.
One way to do this in REGEX is to use one compound expression and preg_replace_callback.
I was going to post a poor example but the best place to look is at the source code to the PHP port of Dean Edwards' JS packer script - you should see the general idea.
http://joliclic.free.fr/php/javascript-packer/en/
try this
private function removeComments( $content ){
$content = preg_replace( "!/\*.*?\*/!s" , '', $content );
$content = preg_replace( "/\n\s*\n/" , "\n", $content );
$content = preg_replace( '#^\s*//.+$#m' , "", $content );
$content = preg_replace( '![\s\t]//.*?\n!' , "\n", $content );
$content = preg_replace( '/<\!--.*-->/' , "\n", $content );
return $content;
}