Simple HTML Parser issue

Simple HTML Parser issue - php

Hi so I'm trying to parse the ratemyprofessor website for professor name and comments and convert each div into plaintext. Here is the div class structure that I'm working with.
<div id="ratingTable">
<div class="ratingTableHeader"></div>
<div class="entry odd"><a name="18947089"></a>
<div class="date">
8/24/11 // the date which I want to parse
</div><div class="class"><p>
ENGL2323 // the class which I want to parse
</p></div><div class="rating"></div><div class="comment" style="width:350px;">
<!-- comment section -->
<p class="commentText"> // this is what I want to parse as plaintext for each entry
I have had Altimont for 4 classes. He is absolutely one of my favorite professors at St. Ed's. He's generous with his time, extremely knowledgeable, and such an all around great guy to know. Having class with him he would always have insightful comments on what we were reading, and he speaks with a lot of passion about literature. Just the best!
</p><div class="flagsIcons"></div></div>
<!-- closes comment -->
</div>
<!-- closes even or odd -->
<div class="entry even"></div> // these divs are the entries for each professor
<!-- closes even or odd -->
<div class="entry odd"></div>
<!-- closes even or odd -->
</div>
<!-- closes rating table -->
So every entry is encapsulated under this "ratingtable" div and each entry is either "entry odd" or "entry even" div.
Here is my attempt so far but it just produces a huge garbled array with a lot of garbage.
<?php
header('Content-type: text/html; charset=utf-8'); // this just makes sure encoding is right
include('simple_html_dom.php'); // the parser library
$html = file_get_html('http://www.ratemyprofessors.com/SelectTeacher.jsp?sid=834'); // the url for the teacher rating profile
//first attempt, rendered nothing though
foreach($html->find("div[class=commentText]") as $content){
echo $content.'<hr />';
}
foreach($html->find("div[class=commentText]") as $content){
$content = <div class="commentText"> // first_child() should be the <p>
echo $content->first_child().'<hr />';
//Get the <p>'s following the <div class="commentText">
$next = $content->next_sibling();
while ($next->tag == 'p') {
echo $next.'<hr />';
$next = $next->next_sibling();
}
}
?>

Confusing HTML... Could you try and see if this works?
foreach (DOM($html, '//div[#class="commentText"]//div[contains(#class,"entry")]') as $comment)
{
echo strval($comment);
}
Oh, and yeah - I don't like simple_html_dom, use this instead:
function DOM($html, $xpath = null, $key = null, $default = false)
{
if (is_string($html) === true)
{
$dom = new \DOMDocument();
if (libxml_use_internal_errors(true) === true)
{
libxml_clear_errors();
}
if (#$dom->loadHTML(mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8')) === true)
{
return DOM(simplexml_import_dom($dom), $xpath, $key, $default);
}
}
else if (is_object($html) === true)
{
if (isset($xpath) === true)
{
$html = $html->xpath($xpath);
}
if (isset($key) === true)
{
if (is_array($key) !== true)
{
$key = explode('.', $key);
}
foreach ((array) $key as $value)
{
$html = (is_object($html) === true) ? get_object_vars($html) : $html;
if ((is_array($html) !== true) || (array_key_exists($value, $html) !== true))
{
return $default;
}
$html = $html[$value];
}
}
return $html;
}
return false;
}

If you still want to use simple_html_dom.. see below code for the mistakes in your code:
<?php
header('Content-type: text/html; charset=utf-8'); // this just makes sure encoding is right
include('simple_html_dom.php'); // the parser library
// you were trying to parse the wrong link.. your previous link did not have <div> tag with commentText class .. I chose a random link.. choose link for whichever professor you like or grab the links of professor from previous page store it in an array and loopr through them to get comments
$html = file_get_html('http://www.ratemyprofessors.com/ShowRatings.jsp?tid=1398302'); // the url for the teacher rating profile
//first attempt, rendered nothing though
//your div tag has class "comment" not "commentText"
foreach($html->find("div[class=comment]") as $content){
echo $content.'<hr />';
}
foreach($html->find("div[class=comment]") as $content){
// I am not sure what you are trying to do here but watever it is it's wrong
//$content = <div class='commentText'>"; // first_child() should be the <p>
//echo $content->first_child().'<hr />';
//correct way to do it
echo $html->firstChild();// ->first_child().'<hr />';
//this whole code does not make any sense since you are already retrieving the comments from the above code.. but if you still want to use it .. I can figure out what to do
//Get the <p>'s following the <div class="commentText">
// $next = $html->firstChild()->next_sibling();
// while ($next->tag == 'p') {
// echo $next.'<hr />';
// $next = $next->next_sibling();
// }
}
?>
Output
Comment
Dr.Alexander was the best. I would recommend him for American Experience or any class he teaches really. He is an amazing professor and one of the nicest most kind hearted people i've ever met.
Report this rating
Professor Alexander is SO great. I would recommend him to everyone for american experience. He has a huge heart and he's really interested in getting to know his students as actual people. The class isn't difficult and is super interesting. He's amazing.
Report this rating
Dins

Related

Fail to using if/else to hover image by dynamic url

I'm using the basic way to doing the hover image as the CSS method doesn't work for me. Current I'm using the if/else statement to do so. If the contain the URL like abc.com it will hover the image.
But now I only can hover the group url but if there is sub categories in groups I won't able to hover, how can I do it all the activity inside the group, the image will hover?
How to doing if the URL contain the words or path. For example abc.com/groups/* it will hover the groups. Similar like we doing searching in MySQL the words/variable as using "%".
<?php
$request_url = apache_getenv("HTTP_HOST") . apache_getenv("REQUEST_URI");
$e = 'abc.com/dev/';
$f = 'abc.com/dev/groups/';
$g = 'abc.com/dev/user/';
?>
<div class="submenu">
<?php
if ($request_url == $e) {
echo '<div class="icon-home active"></div>';
} else {
echo '<div class = "icon-home"></div>';
}
?>
<?php
if ($request_url == $f) {
echo '<div class="icon-groups active"></div>';
} else {
echo '<div class = "icon-groups"></div>';
}
?>
</div>

I propose a javascript way to do so, with jQuery
$("a[href*='THE_URL_PATTERN_YOU_WANT_TO_MATCH']").children(".icon-home").addClass("active");
BTW, it is NOT a good idea to wrap a div into a a tag.

PHP replacing 'editable' areas in html files

I'm working on a tool to replace tagged areas in a html document. I've had a look at a few php template systems, but they are not really what I am looking for, so here is what I am after as the "engine" of the system. The template itself has no php and I'm searching the file for the keyword 'editable' to set the areas that are updatable. I don't want to use a database to store anything, instead read everything from the html file itself.
It still has a few areas to fix, but most importantly, I need the part where it iterates over the array of 'editable' regions and updates the template file.
Here is test.html (template file for testing purposes):
<html>
<font class="editable">
This is editable section 1
</font>
<br><br><hr><br>
<font class="editable">
This is editable section 2
</font>
</html>
I'd like to be able the update the 'editable' sections via a set of form textareas. This still needs a bit of work, but here is as far as I've got:
<?php
function g($string,$start,$end){
preg_match_all('/' . preg_quote($start, '/') . '(.*?)'. preg_quote($end, '/').'/i', $string, $m);
$out = array();
foreach($m[1] as $key => $value){
$type = explode('::',$value);
if(sizeof($type)>1){
if(!is_array($out[$type[0]]))
$out[$type[0]] = array();
$out[$type[0]][] = $type[1];
} else {
$out[] = $value;
}
}
return $out;
};
// GET FILES IN DIR
$directory="Templates/";
// create a handler to the directory
$dirhandler = opendir($directory);
// read all the files from directory
$i=0;
while ($file = readdir($dirhandler)) {
// if $file isn't this directory or its parent
//add to the $files array
if ($file != '.' && $file != '..')
{
$files[$i]=$file;
//echo $files[$i]."<br>";
$i++;
}
};
//echo $files[0];
?>
<div style="float:left; width:300px; height:100%; background-color:#252525; color:#cccccc;">
<form method="post" id="Form">
Choose a template:
<select>
<?php
// Dropdown of files in directory
foreach ($files as $file) {
echo "<option>".$file."</option>"; // do somemething to make this $file on selection. Refresh page and populate fields below with the editable areas of the $file html
};
?>
</select>
<br>
<hr>
Update these editable areas:<br>
<?php
$file = 'test.html'; // make this fed from form dropdown (list of files in $folder directory)
$html = file_get_contents($file);
$start = 'class="editable">';
$end = '<';
$oldText = g($html,$start,$end);
$i = 0;
foreach($oldText as $value){
echo '<textarea value="" style="width: 60px; height:20px;">'.$oldText[$i].'</textarea>'; // create a <textarea> that will update the editable area with changes
// something here
$i++;
};
// On submit, update all $oldText values in test.html with new values.
?>
<br><hr>
<input type="submit" name="save" value="Save"/>
</div>
<div style="float:left; width:300px;">
<?php include $file; // preview the file from dropdown. The editable areas should update when <textareas> are updated ?>
</div>
<div style="clear:both;"></div>
I know this answer is a little more involved, but I'd really appreciate any help.

Not sure if i correctly understand what you want to achieve. But it seems I would do that in jquery.
You can get all html elements that has the "editable" class like this :
$(".editable")
You can iterate on them with :
$(".editable").each(function(index){
alert($(this).text()); // or .html()
// etc... do your stuff
});
If you have all your data in a php array. You just need to pass it to the client using json. Use php print inside a javascript tag.
<?php
print "var phparray = " . json_encode($myphparray);
?>
I think it would be better to put the work on the client side (javascript). It will lower the server work load (PHP).
But as I said, I don't think I've grasped everything you wanted to achive.

How do I process XML string in order using PHP

I have an XML file which contains text with some very simple layout constructs:
<?xml version='1.0'?>
<page>
<section>
<header>Header</header>
<par>Some paragraph</par>
<par>Another paragraph with <emph>formatting</emph></par>
</section>
</page>
In PHP then I read this file using SimpleXML (Note that I intentionally strip other tags!):
$page = file_get_contents("page.xml");
if ($page) {
$stripped = strip_tags($page, "<?xml><page><section><header><par><emph>");
$xml = new SimpleXMLElement($stripped);
}
Now I would like to iterate over the XML elements and print them in order as HTML for my website. The final result should be the following snippet:
<h1>Header</h1>
<p>Some paragraph
<p>Another paragraph with <i>formatting</i>
I've noodled through SimpleXML and XPath and tried to figure out how I can iterate over the XML tree in order so that I can digest the original XML file into HTML output. I can produce a somewhat desired result but the <emph></emph> is just gone; how do I descent further into the tree? My code so far:
foreach ($xml->section as $s) {
echo "<h1>" . $s->header . "</h1>";
foreach ($s->par as $p) {
echo "<p>" . $p;
// Do some magic here to ensure <emph> tags are recognized and responded to properly.
}
}
Any hints and pointers are appreciated! Thanks :-)

Well, without an answer I just had to noodle myself :-) So here is what I did and it worked out just fine.
Turned out that the SimpleXML thing didn't cut it, so I used the XMLReader:
$xml = new XMLReader();
Then I manually parsed the XML string, jumped from element to element and acted upon each of them:
if ($xml->xml($stripped)) { // $stripped here is a string that's been validated (see below).
while (false !== $xml->read()) {
$t = $xml->nodeType;
if ($t === XMLReader::ELEMENT) {
$n = $xml->name;
switch ($n) {
case "page":
case "section":
// Nothing to echo here.
break;
case "header":
// Handle attributes here
echo "<h1>";
break;
case "par":
echo "<p> ";
break;
case "emph";
echo "<i>"; // This can also open a <span> for more flexibility later.
break;
default:
// Nothing should arrive here.
echo "Gah!"
}
}
else if ($t === XMLReader::END_ELEMENT) {
... // Close the opened tags here.
}
else if ($t === XMLReader::TEXT) {
$s = $xml->readString();
echo $s;
}
else {
// Everything else are comments or white spaces.
}
}
}
You get the drift. I basically had to bounce through the XML structure myself and, dependent on the element type, handle attributes and nodes of elements manually.
In fact, this is a two-step process. What you see here assumes a valid XML document. I also have a validator that runs before the above code, and which makes sure that the correct elements are nested properly and that the given XML is "well formed" as per my own definitions of nesting, attributes, whatnot. The validator operates after the exact same principle.
Hope this helps.

Creating Views in PHP - Best Practice [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I am working on a website with 2 other developers. I am only responsible to creating the views.
The data is available in an object, and I have getters to read the data then create XHTML pages.
What is the best practice to do this, without using any template engine?
Thanks a lot.

If you don't want to use a templating engine, you can make use of PHP's basic templating capabilities.
Actually, you should just write the HTML, and whenever you need to output a variable's value, open a PHP part with <?php and close it with ?>. I will assume for the examples that $data is your data object.
For example:
<div id="fos"><?php echo $data->getWhatever(); ?></div>
Please note that, all PHP control structures (like if, foreach, while, etc.) also have a syntax that can be used for templating. You can look these up in their PHP manual pages.
For example:
<div id="fos2">
<?php if ($data->getAnother() > 0) : ?>
<span>X</span>
<?php else : ?>
<span>Y</span>
<?php endif; ?>
</div>
If you know that short tag usage will be enabled on the server, for simplicity you can use them as well (not advised in XML and XHTML). With short tags, you can simply open your PHP part with <? and close it with ?>. Also, <?=$var?> is a shorthand for echoing something.
First example with short tags:
<div id="fos"><?=$data->getWhatever()?></div>
You should be aware of where you use line breaks and spaces though. The browser will receive the same text you write (except the PHP parts). What I mean by this:
Writing this code:
<?php
echo '<img src="x.jpg" alt="" />';
echo '<img src="y.jpg" alt="" />';
?>
is not equivalent to this one:
<img src="x.jpg" alt="" />
<img src="y.jpg" alt="" />
Because in the second one you have an actual \n between the img elements, which will be translated by the browser as a space character and displayed as an actual space between the images if they are inline.

Use a separate file to read the data:
<?php
if ($foo == False)
{
$bar = 1;
}
else
{
$bar = 0;
}
?>
Then reference the resulting state in the HTML file:
require 'logic.php';
<html>
<!--...-->
<input type="text" value="<?php echo $bar; ?>" > //Logic is separated from markup
<!--...-->
</html>

i dont know it i get realy your question. so if my answer not exaclty i will like to de deleted
this class will create simple view
class View
{
public function render($filename, $render_without_header_and_footer = false)
{
// page without header and footer, for whatever reason
if ($render_without_header_and_footer == true) {
require VIEWS_PATH . $filename . '.php';
} else {
require VIEWS_PATH . '_templates/header.php';
require VIEWS_PATH . $filename . '.php';
require VIEWS_PATH . '_templates/footer.php';
}
}
private function checkForActiveController($filename, $navigation_controller)
{
$split_filename = explode("/", $filename);
$active_controller = $split_filename[0];
if ($active_controller == $navigation_controller) {
return true;
}
// default return
return false;
}
private function checkForActiveAction($filename, $navigation_action)
{
$split_filename = explode("/", $filename);
$active_action = $split_filename[1];
if ($active_action == $navigation_action) {
return true;
}
// default return of not true
return false;
}
private function checkForActiveControllerAndAction($filename, $navigation_controller_and_action)
{
$split_filename = explode("/", $filename);
$active_controller = $split_filename[0];
$active_action = $split_filename[1];
$split_filename = explode("/", $navigation_controller_and_action);
$navigation_controller = $split_filename[0];
$navigation_action = $split_filename[1];
if ($active_controller == $navigation_controller AND $active_action == $navigation_action) {
return true;
}
// default return of not true
return false;
}
}
soo now you can create your templates and can call it from any where just like
$this->view->my_data = "data";
$this->view->render('index/index');
//
and on your index/index.php you can call the data $this->my_data;

An Ajax request is taking 6 seconds to complete, not sure why

I am working on a user interface, "dashboard" of sorts which has some div boxes on it, which contain information relevant to the current logged in user. Their calendar, a todo list, and some statistics dynamically pulled from a google spreadsheet.
I found here:
http://code.google.com/apis/spreadsheets/data/3.0/reference.html#CellFeed
that specific cells can be requested from the sheet with a url like this:
spreadsheets.google.com/feeds/cells/0AnhvV5acDaAvdDRvVmk1bi02WmJBeUtBak5xMmFTNEE/1/public/basic/R3C2
I briefly looked into Zend GData, but it seemed way more complex that what I was trying to do.
So instead I wrote two php functions: (in hours.php)
1.) does a file_get_contents() of the generated url, based on the parameters row, column, and sheet
2.) uses the first in a loop to find which column number is associated with the given name.
So basically I do an ajax request using jQuery that looks like this:
// begin js function
function ajaxStats(fullname)
{
$.ajax({
url: "lib/dashboard.stats.php?name="+fullname,
cache: false,
success: function(html){
document.getElementById("stats").innerHTML = html;
}
});
}
// end js function
// begin file hours.php
<?php
function getCol($name)
{
$r=1;
$c=2;
while(getCell($r,$c,1) != $name)
{ $c++; }
return $c;
}
function getCell($r, $c, $sheet)
{
$baseurl = "http://spreadsheets.google.com/feeds/cells/";
$spreadsheet = "0AnhvV5acDaAvdDRvVmk1bi02WmJBeUtBak5xMmFTNEE/";
$sheetID = $sheet . "/";
$vis = "public/";
$proj = "basic/";
$cell = "R".$r."C".$c;
$url = $baseurl . $spreadsheet . $sheetID . $vis . $proj . $cell . "";
$xml = file_get_contents($url);
//Sometimes the data is not xml formatted,
//so lets try to remove the url
$urlLen = strlen($url);
$xmlWOurl = substr($xml, $urlLen);
//then find the Z (in the datestamp, assuming its always there)
$posZ = strrpos($xmlWOurl, "Z");
//then substr from z2end
$data = substr($xmlWOurl, $posZ + 1);
//if the result has more than ten characters then something went wrong
//And most likely it is xml formatted
if(strlen($data) > 10)
{
//Asuming we have xml
$datapos = strrpos($xml,"<content type='text'>");
$datapos += 21;
$datawj = substr($xml, $datapos);
$endcont = strpos($datawj,"</content>");
return substr($datawj, 0,$endcont);
}
else
return $data;
}
?>
//End hours.php
//Begin dashboard.stats.php
<?php
session_start();
// This file is requested using ajax from the main dashboard because it takes so long to load,
// as to not slow down the usage of the rest of the page.
if (!empty($_GET['name']))
{
include "hours.php";
// GetCollumn of which C#R1 = users name
$col = getCol($_GET['name']);
// then get cell from each of the sheets for that user,
// assuming they are in the same column of each sheet
$s1 = getcell(3, $col, 1);
$s2 = getcell(3, $col, 2);
$s3 = getcell(3, $col, 3);
$s4 = getcell(3, $col, 4);
// Store my loot in the session varibles,
// so next time I want this, I don't need to fetch it
$_SESSION['fhrs'] = $s1;
$_SESSION['fdol'] = $s2;
$_SESSION['chrs'] = $s3;
$_SESSION['bhrs'] = $s4;
}
//print_r($_SESSION);
?>
<!-- and finally output the information formated for the widget-->
<strong>You have:</strong><br/>
<ul style="padding-left: 10px;">
<li> <strong><?php echo $_SESSION['fhrs']; ?></strong> fundraising hours<br/></li>
<li>earned $<strong><?php echo $_SESSION['fdol']; ?></strong> fundraising<br/></li>
<li> <strong><?php echo $_SESSION['chrs']; ?></strong> community service hours<br/></li>
<li> <strong><?php echo $_SESSION['bhrs']; ?></strong> build hours <br/></li>
</ul>
//end dashboard.stats.php
I think that where I am loosing my 4 secs is the while loop in getCol() [hours.php]
How can I improve this, and reduce my loading time?
Should I just scrap this, and go to Zend GData?
If it is that while loop, should i try to store each users column number from the spreadsheet in the user database that also authenticates login?

I didn't have the proper break in the while loop, it continued looping even after it found the right person.
Plus the request take time to go to the google spreadsheet. About .025 second per request.
I also spoke with a user of ZendGdata and they said that the request weren't much faster.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Simple HTML Parser issue - php

Related

Fail to using if/else to hover image by dynamic url

PHP replacing 'editable' areas in html files

How do I process XML string in order using PHP

Creating Views in PHP - Best Practice [closed]

An Ajax request is taking 6 seconds to complete, not sure why

Categories

Resources