This question already has answers here:
PHP function/class that formats/indents my HTML code? [duplicate]
(3 answers)
Closed 9 years ago.
Is there any PHP Tidy alternative to only tab-indent HTML output? I need the latter for development/debug purposes only to go through the generated output code. Though, as much as I tried to configure Tidy for this simple task, I couldn't without preventing other changes.
Two years later and there is still no library to achieve HTML output indentation without using implementations that rely on DOM API (ie. Tidy and alike).
I've developed library that tokenises HTML input using regular expression. None of the HTML is changed beyond adding the required spacing for indentation.
https://github.com/gajus/dindent
I always use jsbeautifier. Though it doesn't follow my standards with javascript, the html indentation is awesome.
EDIT: Before you downvote, notice that jsbeautifier is open source, and has ports in several languages, all serverside: https://github.com/einars/js-beautify
You can try the htmLawed library. It's a Tidy alternative for PHP. If you just need an indenting function, you can use the code for the hl_tidy function of the library.
// indent using one tab per indent, with all HTML being within an imaginary div
$out = hl_tidy($in, 't', 'div')
I use LogicHammers HTMLFormatter which you need to pay for but is worth every penny. Use it to format the html before you look at it and it makes it much easier.
Though this is not the exact answer , see if this helps you. I use netbeans and to make code indented I simply right click and Format the code. If you are using any other IDE search for similar functionality or may be you can import with help of 3rd party plugins.
Related
PHP has no problem with outputting a different language than HTML but as it seems, VSCode doesn't understand this. I've searched a bit for solutions, but Google gives me nothing.
For example, I'm using PHP to generate dynamic Markdown files.
<?php
header("Type: text/markdown");
# Some PHP code
?>
# Header
Some **markdown** code.
This is a code block.
It is not much of a problem for me that the above example provides no syntax highlighting for Markdown. The real issue is with the HTML formatter. It removes leading space, which would cause the This is a code block. part to stop being a code block, since the indentation would be removed. Similar problem is with lists and double spaces.
Is there any way I could stop the HTML formatter in VSCode from breaking my Markdown code?
The VS Code PHP language syntax is for PHP embedded in HTML documents (which, along with pure PHP (which is compatible)) is the most common form of PHP.
If you want support for PHP embedded in Markdown, you'll need a syntax library for that. I'm not aware of any existing one, so you might have to write your own.
The relevant documentation can be found on the VS Code website.
Your simplist approach is likely to be looking for an existing PHP grammar and an existing Markdown grammar and then combining them (which removing the HTML support).
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to parse and process HTML with PHP?
I am learning PHP and when I have to extract (parse) some data from a webpage that does not have an available API, I use regular expressions or a function which takes the string that is between two strings.
I would like to know if there is a more "professional", easier way to do this, since regexp are resource consuming and not the easiest thing to write right now for me.
You should never try to parse XML (html) using regular-expressions, instead get yourself a proper parser library for XML and do it the correct way. I might sound like a harder task but you'll thank yourself in the end.
Parsing could be done using one of the below, or similar resources.
php.net - PHP: DOM - Manual
simplehtmldom.sourceforge.net - PHP Simple HTML DOM Parser
The popular and legendary answer regarding html and regular-expressions, poetry worth reading:
stackoverflow.com - The legendary HTML+RegExp answer!
PHP comes with a default XML parsing library for you to use in this specific case. Use file_get_contents in order to retrieve the HTML page and parse accordingly.
XML: http://php.net/manual/en/book.xml.php
file_get_contents: http://php.net/manual/en/function.file-get-contents.php
I want to implement a commenting system for my website. I looked around and found CKEditor to be the best WYSIWYG editor I found. I tried its bbcode output and it works perfectly. However if I use bbcode output, when I want to show the comments to the users, I should use a reliable parser to parse the bbcode to HTML. If I use HTML output, I may need to use something to prevent XSS in the comments. Which way you suggest for a simple commenting system. I already integrated CKEditor to my system and prefer a very lightweight and simple approach without so much bloat (like PEAR). Also, StackOverflow seems pretty awesome. Is it possible to use something similar for my php?
I should use a reliable parser to parse the bbcode to HTML.
PHP has a pecl BBCode extension.
Also, StackOverflow seems pretty awesome. Is it possible to use something similar for my php?
SO uses Markdown. Markdown parser in PHP is also available
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Best methods to parse HTML with PHP
I'm trying to parse a webpage using RegEx, and I'm having some trouble making it work in a reliable manner.
Say I wanted to parse the code that creates a div element, and I want to extract everything between <div> and </div>. Now, this code could just be <div></div>, but it could also very well be something like:
<div class="thisIsMyDivClass"><p>This text is inside the div</p></div>
How can I make sure that no matter how many characters that are in between the greater-than/less-than signs of the initial div tag and the corresponding last div tag, I'll always only get the content in between them? If I specify that the number of characters following < can be anything from one to ten thousand, I will always be extracting the > after ten thousand characters, and thus (most likely, unless there is a lot of code or text in between) retrieve a bunch of code in between that I don't need.
This is my code so far (not reliable for the aforementioned reason):
/<.{1,10000}>/
Regular expressions describe so called regular languages - or Type 3 in the Chomsky hierarchy. On the other hand HTML is a context free language which is Type 2 in the Chomsky hierarchy. So: There is no way to reliably parse HTML with regular expressions in general. Use a HTML parser instead. For PHP you can find some suggestions in this question: How do you parse and process HTML/XML in PHP?
You will need a Lexical analyser and grammar checker to parse html correctly. RegEx main focus was for searching strings for patterns.
I would suggest using something like DOM. I am doing a large scale site with and using DOM like crazy on it. It works, works good, and with a little work can be extremely powerful.
http://php.net/manual/en/book.dom.php
This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 9 years ago.
I've been doing some HTML scraping in PHP using regular expressions. This works, but the result is finicky and fragile. Has anyone used any packages that provide a more robust solution? A config driven solution would be ideal, but I'm not picky.
I would recomend PHP Simple HTML DOM Parser after you have scraped the HTML from the page. It supports invalid HTML, and provides a very easy way to handle HTML elements.
If the page you're scraping is valid X(HT)ML, then any of PHP's built-in XML parsers will do.
I haven't had much success with PHP libraries for scraping. If you're adventurous though, you can try simplehtmldom. I'd recommend Hpricot for Ruby or Beautiful Soup for Python, which are both excellent parsers for HTML.
I would also recommend 'Simple HTML DOM Parser.' It is a good option particularly if your familiar with jQuery or JavaScript selectors then you will find yourself at home.
I have even blogged about it in the past.
I had some fun working with htmlSQL, which is not so much a high end solution, but really simple to work with.
Using PHP for HTML scraping, I'd recommend cURL + regexp or cURL + some DOM parsers though I personally use cURL + regexp. If you have a profound taste of regexp, it's actually more accurate sometimes.
I've had very good with results with the Simple Html DOM Parser mentioned above as well. And then there's the tidy Extension for PHP as well which works really well too.
I had to use curl on my host 1and1.
http://www.quickscrape.com/ is what I came up with using the Simple DOM class!