jBBCode - BBcode to html - php

What I'm trying to do
I'm trying to use jBBCode to parse and de-parse bbcode into and from html.
The problem
When trying to get the html and turn that back in to bbcode, it just displays html.
Here is the code I'm using to try and switch html back into bbcode.
$parser = new JBBCode\Parser();
$parser->loadDefaultCodes();
$parser->parse($MYHTMLSTRING);
echo $parser->getAsBBCode();
Does anyone know what I'm doing wrong here? I'm sure it's something very simple that I haven't figured out. Any help is appreciated! :D

$parser->parse() takes as input BBCode, not HTML.
After studying the documentation, it is my understanding that this is a one-way parser:
BBCode -> HTML
I believe the design is for you to store the BBCode in your database, and then when it is time to render HTML to visitors, you parse the BBCode at that time.
This way, you are always storing the raw, editable BBCode in the database.
This is a pretty common design-pattern. For example, for applications that use the Markdown Language (instead of BBCode), they typically store raw markdown in the database and only render it to HTML at page-load time.
In Summary:
Store Raw BBCode/Text in your database
when you render a page to a visitor, you do your conversion at that time (to HTML).
$parser->parse($MyBBCode); echo $parser->getAsHTML();
If the user edits the BBCode, save that straight back to DB as BB Code
Documentation Reference

Related

How to convert bbcode to html in usage of SCEditor

I used SCEditor in my forum. I managed inserting bbcodes to database. But when I try to show codes in the page, bbcodes are shown without styling. No html, no style. Only bad bbcodes. I am investigating for a long time in its documentation pages but I did not find any php parser. Here is the screenshot. Please, could you help me, how do I parse bbcode to html in PHP?
You can use SBBCodeParser. SCEditor and this class were coded by the same person. So it would be more compatible.

Parse HTML and replace content in DIV

I want to know how i can find the DIV tag in a HTML page. This is because i want to replace the links inside that DIV with different links. I do not understand what exact code i require.
First, notice that PHP won't do anything client side. But you should already know it.
you should use file_get_contents to read the webpage as a string (or what is provided by a library for html parsing).
There is already a question that explain how to parse html in any way: Robust and Mature HTML Parser for PHP
If it doesn't fit your needs, try searching it on google: php html parsing, I found some libraries
For example this library I've found allows you to find all tags: http://simplehtmldom.sourceforge.net/
Notice that this is not a great approach and I suggest you change your html page to be a PHP page, and insert some code in place of A tags. This will make everything easier.
Last thing, if the html page is static (it doesn't change), you can use easily line counting to get contents from X line to Y line, put your customized A-tags and then read from J to the end of file.
Good luck anyway.

Using PHP to retrieve information from a different site

I was wondering if there's a way to use PHP (or any other server-side or even client-side [if possible] language) to obtain certain pieces of information from a different website (NOT a local file like the include 'nav.php'.
What I mean is that...Say I have a blog at www.blog.com and I have another website at www.mysite.com
Is there a way to gather ALL of the h2 links from www.blog.com and put them in a div in www.mysite.com?
Also, is there a way I could grab the entire information inside a DIV (with an ID of-course) from blog.com and insert it in mysite.com?
Thanks,
Amit
First of all, if you want to retrieve content from a blog, check if the blog generator (ie, Blogger, WordPress) does not have a API thanks to which you won't have to reinvent the wheel. Usually, good APis come with good documentations (meaning that probably 5% out of all APIs are good APIs) and these documentations should come with code examples for top languages such as PHP, JavaScript, Java, etc... Once again, if it is to retrieve content from a blog, there should be tons of frameworks that are here for you
Check out the PHP Simple HTML DOM library
Can be as easy as:
// Create DOM from URL or file
$html = file_get_html('http://www.otherwebsite.com/');
// Find all images
foreach($html->find('h2') as $element)
echo $element->src;
This can be done by opening the remote website as a file, then taking the HTML and using the DOM parser to manipulate it.
$site_html = file_get_contents('http://www.example.com/');
$document = new DOMDocument();
$document->loadHTML($site_html);
$all_of_the_h2_tags = $document->getElementsByTagName('h2');
Read more about PHP's DOM functions for what to do from here, such as grabbing other tags, creating new HTML out of bits and pieces of the DOM, and displaying that on your own site.
Your first step would be to use CURL to do a request on the other site, and bring down the HTML from the page you want to access. Then comes the part of parsing the HTML to find all the content you're looking for. One could use a bunch of regular expressions, and you could probably get the job done, but the Stackoverflow crew might frown at you. You could also take the resulting HTML and use the domDocument object, and loadHTML to parse the HTML and load the content you want.
Also, if you control both sites, you can set up a special page on the first site (www.blog.com) with exactly the information you need, properly formatted either in HTML you can output directly, or XML that you can manipulate more easily from www.mysite.com.

How to know if the website being scraped has changed?

I'm using PHP to scrape a website and collect some data. It's all done without using regex. I'm using php's explode() method to find particular HTML tags instead.
It is possible that if the structure of the website changes (CSS, HTML), then wrong data may be collected by the scraper. So the question is - how do I know if the HTML structure has changed? How to identify this before storing any data to my database to avoid wrong data being stored.
I think you don't have any clean solutions if you are scraping a page where content changes.
I have developed several python scrapers and I know how can be frustrating when site just makes a subtle change on its layout.
You could try a solution a la mechanize (don't know the php counterpart) and if you are lucky you could isolate the content you need to extract (links?).
Another possibile approach would be to code some constraints and check them before store to db.
For example, if you are scraping Urls, you will need to verify that what scraper has parsed is formally a valid Url; same for integer ID or whatever you want to scrape that can be recognized as valid.
If you are scraping plain text, it will be more difficult to check.
Depends on the site but you could count the number of page elements in the scraped page like div, class & style tags then by comparing these totals against those of later scrapes detect if the page structure has been changed.
A similiar process could be used for the CSS file where the names of each each class or id could be extracted using simple regex, stored and checked as needed. If this list has new additions then the page structure has almost certainly changed somewhere on the site being scraped.
Speaking out of my ass here, but its possible you might want to look at some Document Object Model PHP methods.
http://php.net/manual/en/book.dom.php
If my very, very limited understanding of DOM is correct, a change in HTML site structure would change the Document Object Model, but a simple content change within a fixed structure wouldn't. So, if you could capture the DOM state, and then compare it at each scrape, couldn't you in theory determine that such a change has been made?
(By the way, the way I did this when I was trying to get an email notification when the bar exam results were posted on a particular page was just compare file_get_contents() values. Surprisingly, worked flawlessly: No false positives, and emailed me as soon as the site posted the content.)
If you want to know changes with respect to structure, I think the best way is to store the DOM structure of your first page and then compare it with new one.
There are lot of way you can do it:-
SaxParser
DOmParser etc
I have a small blog which will give some pointers to what I mean
http://let-them-c.blogspot.com/2009/04/xml-as-objects-in-oops.html
or you can use http://en.wikipedia.org/wiki/Simple_API_for_XML or DOm Utility parser.
First, in some cases you may want to compare hashes of the original to the new html. MD5 and SHA1 are two popular hashes. This may or may not be valid in all circumstances but is something you should be familiar with. This will tell you if something has changed - content, tags, or anything.
To understand if the structure has changed you would need to capture a histogram of the tag occurrences and then compare those. If you care about tags being out of order then you would have to capture a tree of the tags and do a comparison to see if the tags occur in the same order. This is going to be very specific to what you want to achieve.
PHP Simple HTML DOM Parser is a tool which will help you parse the HTML.
Explode() is not an HTML parser, but you want to know about changes in the HTML structure. That's going to be tricky. Try using an HTML parser. Nothing else will be able to do this properly.

Whats the best way to pass html embed code via rss feed to a rss parser in php?

Im trying to put an html embed code for a flash video into the rss feed, which will then be parser by a parser (magpie) on my other site. How should I encode the embed code on one side, and then decode it on the other so I can insert clean html into the DB on the receiving server?
Since RSS is XML, you might want to check out CDATA, which I believe is valid in the various RSS specs.
<summary><![CDATA[Data Here]]>
Here's the w3schools entry on it: http://www.w3schools.com/XML/xml_cdata.asp
htmlencode/htmldecode should do the trick.
Ive been using htmlentities/html_entity_decode but for some reason it doesnt work with the parser. In a normal test it works, but parser always returns html code without < > " characters.
RSS is XML. It has very specific rules for encoding HTML. If you're generating it, I'd recommend using an xml library to write the node containing HTML, to be sure you get the encoding right.
HTMLencode will only perform the escaping necessary for embedding data within HTML, XML rules are more strict.
Instead of writing your own RSS XML feed, consider using the Django syndication framework from django.contrib.syndication:
https://docs.djangoproject.com/en/dev/ref/contrib/syndication/
It also supports enclosures, which is the RSS way for embedding images or video.
For custom tags, there is also an lowlevel API which allows you to change the XML:
https://docs.djangoproject.com/en/dev/ref/contrib/syndication/#the-low-level-framework

Categories