PHP - Best way to get meta data from comments

PHP - Best way to get meta data from comments - php

PHP has a function called get_meta_tags which can read meta tags of HTML files. However, as far as I know there is no standard way to define meta tags for PHP files. The de facto solution seems to be to add comment to the top of the file like so:
<?php
# Author: Ood
# Description: Hello World
?>
Is there any way to read these "meta tags" with PHP similar to the way get_meta_tags works using the default PHP library? Preferably without parsing the entire file with file_get_contents followed by a regex for best performance. If not, maybe someone knows of a better solution to add meta data capabilities to PHP files. Thanks in advance!

In our project we are happy with the standard JavaDoc that was adopted by PHPDoc using the #field syntax as you might know it from any PHP function or class definition. This is pretty fine readable using the PHPDocumenter.
In our adoption we use the very first multi-line comment, ie anyting between /** and closing tag */, using the JavaDoc style to describe all the details about the current script.
So to adopt your example in our project we would have following syntax:
<?php
/**
* #author Ood
* #desc Hello World
*/
Of course you may end up with your custom function reading the beginning of the php file parsing just the very first multi-line comment to get the script description aka meta tags.

Related

How to convert HTML into XHTML [duplicate]

This question already has an answer here:
Closed 10 years ago.
Possible Duplicate:
PHP library for converting HTML4 to XHTML?
Is there any ready made function in PHP to achieve this? Basically I'm taking HTML data from Smarty template and want to convert it into XHTML through coding.

$filename = 'template.php'; // filepath to file
// All options : http://tidy.sourceforge.net/docs/quickref.html
$options = array('output-xhtml' => true, 'clean' => true, 'wrap-php' => true);
$tidy = new tidy(); // create new instance of Tidy
$tidy->parseFile($filename, $options); // open file
$tidy->cleanRepair(); // process with specified options
copy($filename, $filename . '.bak'); // backup current file
file_put_contents($filename, $tidy); // overwrite current file with XHTML version
I don't have a Smarty template file to test this on, but give it a try and see if it works correctly in converting one. Backup your files as always when running something of this nature. Test out on sample files first.

The problem is that you do not have an html file to work with. You have a php template written in the programming language "smarty" that is not markup, even though it contains blocks of markup. You're looking for a magic wand and no such wand exists.
If it was purely html, then you could probably use Domdocument to read the files into a Dom structure and generate xhtml, but that is simply not going to work with the pure source files, although you could potentially write a parser to read the smarty tpl files, look for the html snippets and try and load them into Domdocument objects.
With that said, I have to ask first -- why you really want to convert to xhtml when xhtml is basically a failed standard that is obsolete at this point in time, and secondarily, if you have some legitimate reason for wanting to forge ahead, why you can't use some regex search and replace snippets that change the doctypes and some regex based searches to look for tags that lack the end tags, and the other relatively minor tweaks needed. The differences between html and xhtml can be boiled down to a handful of rules that are pretty easy to understand.

In answer to your original question: sort of. Core PHP -> DOM, SimpleXML, SPL = templating engine. That's why (and how) templating engines such as Smarty exist.
Re: installing Tidy as suggested in comments,
Tidy has a prerequisite lib. If you don't already have it:
http://php.net/manual/en/tidy.installation.php
To use Tidy, you will need libtidy installed, available on the tidy homepage »
http://tidy.sourceforge.net/.
To enable, you will need to recompile PHP and include it in your config flags:
"This extension is bundled with PHP 5 and greater, and is installed
using the --with-tidy configure option."
So, get your existing config flags:
php -i | grep config
and add --with-tidy.
However, this is probably the wrong approach. It does not solve your actual problem (outputting XHTML instead of HTML) - it fixes Smarty's problem. Recompiling PHP to add an extension so you can use it to fix a templating engine's doctype shortcomings probably means you should consider using a different templating engine, if possible. That's sort of drastic (and adds a lot of overhead for what you get, which amounts to for a hacky non-solution bandaid workaround retroactively repairing broken output.)
PEAR's HTML_Template_PHPTAL is probably the best solution to your problem, and the closest answer to your original question.
And if PHPTAL doesn't quite cut it, there are at least 5 others available as PEAR libs to choose from.
pear install http://phptal.org/latest.tar.gz
Or it's been ported to Git:
git clone git://github.com/pornel/PHPTAL
A cursory google search: http://webification.com/best-php-template-engines
HTH

Can I use a hash sign (#) for commenting in PHP?

I have never, ever, seen a PHP file using hashes (#) for commenting. But today I realized that I actually can! I'm assuming there's a reason why everybody uses // instead though, so here I am.
Is there any reason, aside from personal preference, to use // rather than # for comments?

2021 UPDATE: As of PHP 8, the two characters are not the same. The sequence #[ is used for Attributes.(Thanks to i336 for the comment)
Original Answer:
The answer to the question Is there any difference between using "#" and "//" for single-line comments in PHP? is no.
There is no difference. By looking at the parsing part of PHP source code, both "#" and "//" are handled by the same code and therefore have the exact same behavior.

PHP's documentation describes the different possibilities of comments. See http://www.php.net/manual/en/language.basic-syntax.comments.php
But it does not say anything about differences between "//" and "#". So there should not be a technical difference. PHP uses C syntax, so I think that is the reason why most of the programmers are using the C-style comments '//'.

<?php
echo 'This is a test'; // This is a one-line C++ style comment
/* This is a multi-line comment.
Yet another line of comment. */
echo 'This is yet another test.';
echo 'One Final Test'; # This is a one-line shell-style comment
?>
RTM

Is there any reason, aside from personal preference, to use // rather than # for comments?
I think it is just a personal preference only. There is no difference between // and #. I personally use # for one-line comment, // for commenting out code and /** */ for block comment.
<?php
# This is a one-line comment
echo 'This is a test';
// echo 'This is yet another test'; // commenting code
/**
* This is a block comment
* with multi-lines
*/
echo 'One final test';
?>

One might think that the # form of commenting is primarily intended to make a shell script using the familiar "shebang" (#!) notation. In the following script, PHP should ignore the first line because it is also a comment. Example:
#!/usr/bin/php
<?php
echo "Hello PHP\n";
If you store it in an executable file you can then run it from a terminal like this
./hello
The output is
Hello PHP
However, this reasoning is incorrect, as the following counterexample shows:
#!/usr/bin/php
#A
<?php
#B
echo "Hello PHP\n";
The first line (the shebang line) is specially ignored by the interpreter. The comment line before the PHP tag is echoed to standard output because it is not inside a PHP tag. The comment after the opening PHP tag is interpreted as PHP code but it is ignored because it is a comment.
The output of the revised version is
#A
Hello PHP

If you establish some rule sets in your team / project... the 2 types of comments can be used to outline the purpose of the commented code.
For example I like to use # to mute / disable config settings, sub functions and in general a piece of code that is useful or important, but is just currently disabled.

There's no official PSR for that.
However, in all PSR example code, they use // for inline comments.
There's an PSR-2 extension proposal that aims to standardize it, but it's not official: https://github.com/php-fig-rectified/fig-rectified-standards/blob/master/PSR-2-R-coding-style-guide-additions.md#commenting-code
// is more commonly used in the PHP culture, but it's fine to use # too. I personally like it, for being shorter and saving bytes. It's personal taste and biased, there's no right answer for it, until, of course, it becomes a standard, which is something we should try to follow as much as possible.

Yes, however there are cross platform differences.
I use # all the time for commenting in PHP, but I have noticed an adoption difference.
On windows keyboard the # key is easy to use.
On mac keyboard # key mostly isn't present.
So for mac users, [Alt] + [3] or [⌥] + [3] is more difficult to type than //, so // has become a cross platform way of displaying code with comments.
This is my observation.

From https://php.net/manual/en/migration53.deprecated.php
"Deprecated features in PHP 5.3.x ...Comments starting with '#' are now deprecated in .INI files."
There you have it. Hash '#' appears to remain as a comment option by default by not being deprecated. I plan to use it to distinguish various layers of nested if/else statements and mark their closing brackets, or use to distinguish code comments from commented out code as others have suggested in related posts. (Note: Link was valid/working as of 4/23/19, although who knows if it'll still be working when you're reading this.)

Is there any reason, aside from personal preference, to use // rather
than # for comments?
I came here for the answer myself, and its good to know there is NO code difference.
However, preference-wise one could argue that you'd prefer the 'shell->perl->php' comment consistency vs the 'c->php' way.
Since I did approach php as a poor man's webby perl, I was using #.. and then I saw someone else's code and came straight to SO. ;)

OP Question: "Is there any reason, aside from personal preference, to use // rather than # for comments?"
One 2021 Answer, which is certainly not the only answer as we see in this thread:
If you're using Visual Studio Code and using regions to block your code, then you must use # rather than // to define the region. To the question, No, even for this use case : If you are commenting out a region, you can use # or // or /** */, the technique you use for this is personal preference.
Examples for block definition in VSCode :
#region this is a major block
/** DocBlock */
function one() {}
/** DocBlock */
function two() {
#region nested region based on indentation
// comments and code in here
# another nested region based on indentation
// foo
#endregion
#endregion
}
#endregion
On Fold of the inner block:
#region this is a major block
/** DocBlock */
function one() {}
/** DocBlock */
function two() {
> #region nested region based on indentation
}
#endregion
On Fold of the outer block:
> #region this is a major block
I cite the following specific usage which one might be tempted to try, but these do not work. In fact this is exactly how you DISable a #region block:
// #region
// #endregion
/** #region */
/** #endregion */
As to commenting out a region in VSCode:
/** You can now collapse this block
#region Test1
// foo
#endregion
// everything through to here is collapsed
*/
// #region Test1
// folding is disabled here
// #endregion
# #region Test1
// this also disables the fold
# #endregion
All of that said, "Is there any reason, aside from personal preference, to use // rather than # for comments?" I agree with comments in this thread and in the other thread: // is more commonly recognized and used, which is usually a good reason to use that comment style over #.
Final note, be careful about nesting based on indentation, as code formatting can remove your manual indentation and thus ruin your scheme of nested blocks based on comments. I've tested this with both # and // (which BTW, // nests on indentation too. Again, in context with the OP question, No, there is no reason to use // over # for nested indentation in this context in the current VSCode because both work exactly the same. However, this is a use case for using # over //.
Ref - no extension required, verified in 1.62.3. See notes on indentation there as well.

Comments with "#" are deprecated with PHP 5.3. So always use // or /.../

Why the '#' in the comments?

Sometimes I see php comments with '#' in front of some lines. Like #Author. Is there any particular reason for this? I cannot seem to find anything about this. I am assuming there is highly used parser that looks for '#'s.

This is most likely phpDocumentor notation, a program which parses source code (and those # comments) to auto-generate documentation. Many IDEs also provide intelligent lookup and autocomplete functionality based on these comments.
Example:
/**
* Echoes "example".
* #author Pekka
* #version 1.5
* #return void
*/
function example()
{
echo "example";
}

This is used by automatic documentation generators to create documentation from the comments. There are a few tools and formats out there:
phpDocumentor
Doxygen
HeaderDoc
To list a few.

There are lots of packages that parse source code for comments and create intricate help files in various formats (like HTML, Windows .chm files, etc.). Java, obviously, has javadoc, but there's also Doxygen and Doc-O-Matic, just to name a few.

Template extraction in python/php

Are there existing template extract libraries in either python or php? Perl has Template::Extract, but I haven't been able to find a similar implementation in either python or php.
The only thing close in python that I could find is TemplateMaker (http://code.google.com/p/templatemaker/), but that's not really a template extraction library.

After digging around some more I found a solution to exactly what I was looking for. filippo posted a list of python solutions for screen scraping in this post: Options for HTML scraping? among which is a package called scrapemark ( http://arshaw.com/scrapemark/ ).
Hope this helps anyone else who is looking for the same solution.

TmeplateMaker does seem to do what you need, at least according to its documentation. Instead of receiving a template as an input, it infers ("learns") if from a few documents. Then, it has the extract method to extract the data from other documents that were created with this template.
The example shows:
# Now that we have a template, let's extract some data.
>>> t.extract('<b>red and green</b>')
('red', 'green')
>>> t.extract('<b>django and stephane</b>')
('django', 'stephane')
# The extract() method is very literal. It doesn't magically trim
# whitespace, nor does it have any knowledge of markup languages such as
# HTML.
>>> t.extract('<b> spacy and <u>underlined</u></b>')
(' spacy ', '<u>underlined</u>')
# The extract() method will raise the NoMatch exception if the data
# doesn't match the template. In this example, the data doesn't have the
# leading and trailing "<b>" tags.
>>> t.extract('this and that')
Traceback (most recent call last):
...
So, to achieve the task you require, I think you should:
Give it a few documents rendered from your template - it will have no trouble inferring the template from them.
Use the inferred template to extract data from new documents.
Come to think about it, it's even more useful than Perl's Template::Extract as it doesn't expect you to provide it a clean template - it learns it on its own from sample text.

Here is an interesting discussion from Adrian the author of TemplateMaker http://www.holovaty.com/writing/templatemaker/
It seems to be a lot like what I would call a wrapper induction library.
If your looking for something else that is more configurable (less for scraping) take a look at lxml.html and BeautifulSoup, also for python.

"Safe" markdown processor for PHP?

Is there a PHP implementation of markdown suitable for using in public comments?
Basically it should only allow a subset of the markdown syntax (bold, italic, links, block-quotes, code-blocks and lists), and strip out all inline HTML (or possibly escape it?)
I guess one option is to use the normal markdown parser, and run the output through an HTML sanitiser, but is there a better way of doing this..?
We're using PHP markdown Extra for the rest of the site, so we'd already have to use a secondary parser (the non-"Extra" version, since things like footnote support is unnecessary).. It also seems nicer parsing only the *bold* text and having everything escaped to <a href="etc">, than generating <b>bold</b> text and trying to strip the bits we don't want..
Also, on a related note, we're using the WMD control for the "main" site, but for comments, what other options are there? WMD's javascript preview is nice, but it would need the same "neutering" as the PHP markdown processor (it can't display images and so on, otherwise someone will submit and their working markdown will "break")
Currently my plan is to use the PHP-markdown -> HTML santiser method, and edit WMD to remove the image/heading syntax from showdown.js - but it seems like this has been done countless times before..
Basically:
Is there a "safe" markdown implementation in PHP?
Is there a HTML/javascript markdown editor which could have the same options easily disabled?
Update: I ended up simply running the markdown() output through HTML Purifier.
This way the Markdown rendering was separate from output sanitisation, which is much simpler (two mostly-unmodified code bases) more secure (you're not trying to do both rendering and sanitisation at once), and more flexible (you can have multiple sanitisation levels, say a more lax configuration for trusted content, and a much more stringent version for public comments)

PHP Markdown has a sanitizer option, but it doesn't appear to be advertised anywhere. Take a look at the top of the Markdown_Parser class in markdown.php (starts on line 191 in version 1.0.1m). We're interested in lines 209-211:
# Change to `true` to disallow markup or entities.
var $no_markup = false;
var $no_entities = false;
If you change those to true, markup and entities, respectively, should be escaped rather than inserted verbatim. There doesn't appear to be any built-in way to change those (e.g., via the constructor), but you can always add one:
function do_markdown($text, $safe=false) {
$parser = new Markdown_Parser;
if ($safe) {
$parser->no_markup = true;
$parser->no_entities = true;
}
return $parser->transform($text);
}
Note that the above function creates a new parser on every run rather than caching it like the provided Markdown function (lines 43-56) does, so it might be a bit on the slow side.

JavaScript Markdown Editor Hypothesis:
Use a JavaScript-driven Markdown Editor, e.g., based on showdown
Remove all icons and visual clues from the Toolbar for unwanted items
Set up a JavaScript filter to clean-up unwanted markup on submission
Test and harden all JavaScript changes and filters locally on your computer
Mirror those filters in the PHP submission script, to catch same on the server-side.
Remove all references to unwanted items from Help/Tutorials
I've created a Markdown editor in JavaScript, but it has enhanced features. That took a big chunk of time and SVN revisions. But I don't think it would be that tough to alter a Markdown editor to limit the HTML allowed.

How about running htmlspecialchars on the user entered input, before processing it through markdown? It should escape anything dangerous, but leave everything that markdown understands.
I'm trying to think of a case where this wouldn't work but can't think of anything off hand.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.