"Pretty" code and PHP

"Pretty" code and PHP - php

I understand the concepts of \t and \n in most programming languages that have dynamic web content. What I use them for, as do most people, is shifting all the fancy HTML to make it all readable and "pretty" in view source. Currently, I am making a website that uses PHP include ("whateverfile.php") to construct the layout. You can probably tell where this is going.
How can I tab over a whole block of a PHP include so it aligns with the rest of the page's source?
If this question is worded incorrectly or doesn't make sense, English is my native language so I can't use any excuses.

You can do this by using a layer in your application that is taking care of the output, like a theme system. This will add the additional benefit to you that your code will be better separated into data-processing and output-processing.
A good example is given in the following article: When Flat PHP meets Symfony.
Next to that, there is another trick you can do: Install an output buffer and then run tidy on it to make it look just great: Tidying up your HTML with PHP 5.
On top of this you can always put tabs into your include's output, however you don't know always how much tabs this will need. There are some other tricks related to output buffering and intending html fragments when they return from includes, however this is very specific and most often of not much use. So the two articles linked above might give you two areas to look into which are of much more use for you in the end.

I think you can do it adding extra tabs in the included .php pages. But i wouldn't recommend that (obviously). Instead, use tools like firebug or chrome's inspect element to look at the code.

Related

Is it possible to prevent standard HTML comments from showing up in source code?

I'm going to assume the answer is 'no' here, but since I haven't actually found that answer, I'm asking.
Quite basically, all I want to do is leave some HTML commenting in my files for 'author eyes only', simply to make editing the file later a much more pleasant experience.
<!-- Doing it like this --> leaves nice clean comments but they show up when viewing the page source after output.
I am using PHP, so technically I could <?PHP /* wrap comments in PHP tags */ ?> which would prevent them from being output at all, but if possible I'd like to avoid all of the extra arbitrary PHP tagging that would be needed for commenting throughout the file. After all, the point of commenting is to make the document feel less cluttered and more organized.
Is there anything else I could try or are these my best options?

No, anything in html will show up.
You could, have a script that parses the code, and removes the comments, before it puts it up on the server, and then you would have the original, and the uncommented source.
A tool to accomplish this:
http://code.google.com/p/htmlcompressor/

I guess these are your best options, yes, unless you run the entire HTML output through some sort of cleanup module before being sent to the client.
Anything not wrapped in server side syntax will will be output to the client if not modified on its way out (through template engines, for example). This goes for most (probably all) server side languages).

You could definitely write a parser that uses regex to strip out HTML comments, but unless you're already dealing with a roll-your-own CMS, most likely the work involved in this far outweighs the benefits of not using PHP comments as you suggested.

What is the best way to parse Wikipedia markup in PHP?

I'm trying to parse specific Wikipedia content in a structured way. Here's an example page:
http://en.wikipedia.org/wiki/Polar_bear
I'm having some success. I can detect that this page is a "specie" page, and I can also parse the Taxobox (on the right) information into a structure. So far so good.
However, I'm also trying to parse the text paragraphs. These are returned by the API in Wiki format or HTML format, I'm currently working with the Wiki format.
I can read these paragraphs, but I'd like to "clean" them in a specific way, because ultimately I will have to display it in my app and it has no sense of Wiki markup. For example, I'd like to remove all images. That's fairly easy by filtering out [[Image:]] blocks. Yet there are also blocks that I simply cannot remove, such as:
{{convert|350|-|680|kg|abbr=on}}
Removing this entire block would break the sentence. And there are dozens of notations like this that have special meaning. I'd like to avoid writing a 100 regular expressions to process all of this and see how I can parse this in a smarter way.
My dilemma is as follow:
I could continue my current path of semi-structured parsing where I'd
have a lot of work deleting unwanted elements as well as "mimicing"
templates that do need to be rendered.
Or, I could start with the rendered HTML output and parse that, but my worry is that it's just as fragile and complex to parse in a structured way
Ideally, there's be a library to solve this problem, but I haven't found one yet that is up to this job. I also had a look at structured Wikipedia databases like DBPedia but those only have the same structured I already have, they don't provide any structure in the Wiki text itself.

There are too many templates in use to reimplement all of them by hand and they change all the time. So, you will need actual parser of the wiki syntax that can process all the templates.
And the wiki syxtax is quite complex, has lots of quirks and no formal specification. This means creating your own parser would be too much work, you should use the one in MediaWiki.
Because of this, I think getting the parsed HTML through the MediaWiki API is your best bet.
One thing that's probably easier to be parsed from wiki markup are the infoboxes, so maybe they should be a special case.

Text Parser with PHP, like Instapaper

I'm trying to write a text parser with PHP, like Instapaper did. What I want to do is; get a webpage and parse it in text-only mode.
It's simple to get the webpage with cURL and strip HTML tags. But every webpage have some common areas; like header, navigation, sidebar, footer, banners etc. I only want to get the article in text mode and exclude all other parts. It's also simple to exclude those parts if I know the "id" or "class" info. But I'm trying to automatize this process and apply for any page, like Instapaper.
I get all the content between but I don't know how to exclude header, sidebar or footer and get only the main article body. I have to develop a logic to get only the main article part.
It's not important for me to find the exact code. It would also be useful to understand how to exclude unnecessary parts as I can try to write my own code with PHP. It would also be useful if there any examples in other languages.
Thanks for helping.

You might try looking at the algorithms behind this bookmarklet, readability - It's got a decent success rate for extracting content among on all web page rubbish.
Friend of mine made it, that's why I'm recommending it - since I know it works, and I'm aware of the many techniques he's using to parse the data. You could apply these techniques for what your asking.

you can take a look at the source from Goose -> it already does alot of this like instapaper text extractions
https://github.com/jiminoc/goose/wiki

Have a look at the ExtractContent code from Shuyo Nakatani.
See original Ruby source http://rubyforge.org/projects/extractcontent/ or a port of it to Perl http://metacpan.org/pod/HTML::ExtractContent

You really should consider using a HTML parser for this. Gather similar pages and compare the DOM trees to find the differing nodes.

this article provides a comparison of different approaches. the java library boilerpipe was rated highly. at the boilerpipe site you find his scientific paper which compares to other algorithms.
not all algorithms suite all purposes. the biggest application of such tools is to just get the raw text to index as a search engine. the idea being that you don't want search results to be messed up by adverts. such extractions can be destructive; meaning that it wont give you "the best reading area" which is what people want with instapaper or readability.

separating php and html... why?

So I have seen some comments on various web sites, pages, and questions I have asked about separating php and html.
I assume this means doing this:
<?php
myPhpStuff();
?>
<html>
<?php
morePhpStuff();
?>
Rather than:
<?php
doPhpStuff();
echo '<html>';
?>
But why does this matter? Is it really important to do or is it a preference?
Also it seems like when I started using PHP doing something like breaking out of PHP in a while loop would cause errors. Perhaps this is not true anymore or never was.
I made a small example with this concept but to me it seems so messy:
<?php
$cookies = 100;
while($cookies > 0)
{
$cookies = $cookies -1;
?>
<b>Fatty has </b><?php echo $cookies; ?> <b>cookies left.</b><br>
<?php
}
?>
Are there instances when it is just better to have the HTML inside the PHP?
<?php
$cookies = 100;
while($cookies > 0)
{
$cookies = $cookies -1;
echo'<b>Fatty has </b> '.$cookies.' <b>cookies left.</b><br>';
}
?>

When people talk about separating PHP and HTML they are probably referring to the practice of separating a website's presentation from the code that is used to generate it.
For example, say you had a DVD rental website and on the homepage you showed a list of available DVDs. You need to do several things: get DVD data from a database, extract and/or format that data and maybe mix some data from several tables. format it for output, combine the DVD data with HTML to create the webpage the user is going to see in their browser.
It is good practice to separate the HTML generation from the rest of the code, this means you can easily change your HTML output (presentation) without having to change the business logic (the reading and manipulation of data). And the opposite is true, you can change your logic, or even your database, without having to change your HTML.
A common pattern for this is called MVC (model view controller).
You might also want to look at the Smarty library - it's a widely used PHP library for separating presentation and logic.

Let's make it clear what is not separation
you switch from php mode to html mode
you use print or echo statements to write out html code
you use small php snipplets inside html files
If you do this, there is no separation at all, no matter if you escape from php to html blocks or do it the other way and put php code into html.
Have a look at a good templating engine, there are a plenty of reasons in the "why use ...." parts of the manuals. I'd suggert www.smarty.net especially http://www.smarty.net/whyuse.php
It will answer all your questions now you have.

It is very important to separate application logic from presentation logic in projects. The benefits include:
Readability: Your code will be much easier to read if it does not mix PHP and HTML. Also, HTML can become difficult to read if its stored and escaped in PHP strings.
Reusability: If you hard-code HTML strings within PHP code, the code will be very specifc to your project and it won't be possible to reuse your code in later projects. On the other hand, if you write small functions that do one task at a time, and put HTML into separate template files, reusing your code in future projects will be possible and much easier.
Working in a team: If you are working in a team that contains developers and designers, separation of application logic and presentation templates will be advantageous to both. Developers will be able to work on the application without worrying about the presentation, and designers (who don't necessarily know PHP very will) will be able to create and update templates without messing with PHP code.

for pages that contain a lot of HTML, embedding PHP code into the page could be easier. this is one of the first intentions behind PHP. anyway when you are developing an application with lots and lots of logic, different types of connectivity, data manipulation, ... your PHP code gets too complicated if you want to just embed them in the same pages that are shown to users. and then the story of maintenance begins. how are you going to change something in the code, fix a bug, add a new feature?
the best way is to separate your logic (where most of the code is PHP) in different files (even directories) from your page files (where most of the code is HTML, XML, CSV, ...).
this has been a concern for developers for so many years and there are recommendations to handle these general problems, that are called design patterns.
since not everyone has the experience, and can apply these design patterns into his application, some experienced developers create Frameworks, that will help other developers to use all the knowledge and experience laying in the hear of that framework.
when you look at toady's most used PHP frameworks, you see that all of them put code into PHP Classes in special directories, make configurations, and .... in none of these files you see a line of HTML. but there are special files that are used to show the results to users, and they have a lot of HTML, so you can embed your PHP values inside those HTML pages to show to users. but remember that these values are not calculated on the same page, they are results of a lot of other PHP codes, written in other PHP files that have no HTML in them.

I find it preferable to separate application logic from the view file (done well with CodeIgniter framework with MVC) as it leaves code looking relatively tidy and understandable. I have also found that separating the two leaves less margin for PHP errors, if the HTML elements are separated from the PHP there is a smaller amount of PHP that can go wrong.
Ultimately I believe it is down to preference however I feel that separation has the following pros:
Tidier Code
Less of an Error Margin
Easy to Interpret
Easier to change HTML elements
Easier to changed Application Logic
Faster Loading (HTML is not going from Parser->Browser it goes straight to browser)
However some cons may be:
It only works in PHP5 (I Believe, could be wrong, correct if needed)
It may not be what one is used to
Untidy if done incorrectly (without indentation etc, however this is the same with anything)
But as you can see, the pros outweigh said cons. Try not to mix the two also, some separation and some intergration - this may get confusing for yourself and other developers that work with you.
I hope this helped.

Benefits of the first method (separating PHP and HTML):
You don't need to escape characters
It's also possible for code editors
to highlight/indent the markup.
It's arguably easier to read
There is no downside to this method,
compared to the second method.

Functionally: they both will work, so ultimately it is a preference.
Yet, you might consider that comments are a preference as well, your code would compile and run exactly the same without comments. However most people would agree comments are essential to writing and maintaining good code. I see this as being a similar subject matter. In the long run it will make it easier to read and maintain the code it if the two are separated.
So is it important? I would say Yes.

I kick off with: the first one you can open in a WYSIWYG editor, and still see some markup, which might makes it easier to maintain.

It says that what you put in echo '' it is first processed by the programming language and then sent to the browser, but if you directly put there html code without php, that code will load faster because there is no programming involved.
And the second reason as people above said is that you should have your 'large programming code' stored separately of the html code, and in the html code just put some calls to print results like 'echo $variable'. Or use a template engine like Smarty (like I do).
Best regards,
Alexandru.

Ouch!
All of the examples in your question are perfectly impossible to read. I'd say, you do yourself and those, who might read your code a great favour and use a template engine of sorts, say, Smarty. It is extremely easy to set up and use and it WILL separate your code from presentation. It doesn't require you to put everything in classes, it just makes sure, that your logic is in one file and presentation - in another one.

I don't know how in php but in asp.net separation has the following advantages.
1. separated code is easy to understand and develop
2. designer can work in html in the same time developer can write a code

PHP coding standards

I've been looking for some guidelines on how to layout PHP code. I've found some good references, such as the following:
http://www.dagbladet.no/development/phpcodingstandard/
and this question on SO.
However, none of that quite gets to what I'm specifically wondering about, which is the integration of HTML and PHP. For example:
Is it OK to have a PHP file that starts out with HTML tags and only has PHP inserted where needed? Or should you just have one section of PHP code that contains everything?
If you have a chunk of PHP code in the middle of which is a set of echo's that just output a fixed bit of HTML, is it better to break out of PHP and just put in the HTML directly?
Should functions all be defined in dedicated PHP files, or is it OK to define a bunch of functions at the top of a file and call them later on in that same file?
There are probably other questions I'd like to ask, but really I'm looking for someone to point me at some kind of resource online that offers guidance on the general idea of how HTML and PHP should be combined together.

Combining programming code and output data (including HTML) is IMHO a very bad idea. Most of the PHP gurus I know use a template engine such as Smarty to help keep the two things separate.

There's really not a single, common standard for these things. Most languages are more restrictive than PHP in this sense.
In the later years, a lot of so-called frameworks have emerged, and amongst other things they define a set of rules for everything from naming over where to place files and to which style your code should follow. There are several frameworks around, so you can't really pick one and call it the standard. However, most frameworks have a subset of commonality. For example, most follows some variant of the PEAR Coding Standards.

I usually try and follow the standards that are set by the language's core libraries.... oh wait.
Seriously through - you should try and follow the MVC pattern in any web application as it is pretty much standard practice regardless of language. In PHP this can be achieved in a quick-and-dirty way by treating index.php as your controller and separating data logic and presentation by file. This small step will at least let you move your code to a full featured framework when and if you choose.

Use a template engine when you can. If you haven't learned one, or don't want the overhead (which is minimal), use practices that cause you to have quick-and-dirty templating:
Functions that do not display anything have no place in a file that does display something.
Print variables, not HTML. Whenever outputting HTML, break out of the PHP and write HTML with print statements to handle any small details that are needed (actions for forms, IDs for controls, etc.).
Remember, when you include a file that breaks out of the PHP to print content, that will be treated the same as if you do it in the main file. So you can create simple templates that just included PHP files, and those files will print variables in the right places. Your index.php (or whatever) does all the real work, but all the display is done by the secondary "template".
Many PHP tutorials intermix logic and display code. It took me years to break the bad habits that encouraged in me. In my experience you can't separate things too much in PHP, and entropy will pull you towards intermixed code if you don't fight it.

Coding standards should be more than how to layout your syntax, but sadly that's what they tend to be in PHP.
FWIW, phc will pretty print your code in the Zend style.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.