Using Simple Html Dom to remove some elements

Using Simple Html Dom to remove some elements - php

This is the page I'm trying to parse using Simple Html Dom. I've gotten 90% of the functionality done, but since I'm new to the library I'm not quite sure to do this.
I want to scrape the text of each news item, but since the text is inside the <p>element, using something like ->innertext bring everything inside, including the link.
Here's what I've tried:
<h1>Scraper Noticias</h1>
<?php
include('simple_html_dom.php');
class News {
var $image;
var $fechanoticia;
var $title;
var $description;
var $sourceurl;
function get_image( ) {
return $this->image;
}
function set_image ($new_image) {
$this->image = $new_image;
}
function get_fechanoticia( ) {
return $this->fechanoticia;
}
function set_fechanoticia ($new_fechanoticia) {
$this->fechanoticia = $new_fechanoticia;
}
function get_title( ) {
return $this->title;
}
function set_title ($new_title) {
$this->title = $new_title;
}
function get_description( ) {
return $this->description;
}
function set_description ($new_description) {
$this->description = $new_description;
}
function get_sourceurl( ) {
return $this->sourceurl;
}
function set_sourceurl ($new_sourceurl) {
$this->sourceurl = $new_sourceurl;
}
}
// Create DOM from URL or file
$html = file_get_html('http://www.uvm.cl/noticias_mas.shtml');
$parsedNews = array();
// Find all news items.
foreach($html->find('#cont2 p') as $element) {
$newItem = new News;
// Parse the news item's thumbnail image.
foreach ($element->find('img') as $image) {
$newItem->set_image($image->src);
//echo $newItem->get_image() . "<br />";
}
// Parse the news item's post date.
foreach ($element->find('span.fechanoticia') as $fecha) {
$newItem->set_fechanoticia($fecha->innertext);
//echo $newItem->get_fechanoticia() . "<br />";
}
// Parse the news item's title.
foreach ($element->find('a') as $title) {
$newItem->set_title($title->innertext);
//echo $newItem->get_title() . "<br />";
}
// Parse the news item's source URL link.
foreach ($element->find('a') as $sourceurl) {
$newItem->set_sourceurl("http://www.uvm.cl/" . $sourceurl->href);
}
// Parse the news items' description text.
echo $link; //This is the entire <p> tag. How can I get just the text. Not the link?
}
?>

Here's a solution I found. Although if I can improve the code, it would be appreciated.
<h1>Scraper Noticias</h1>
<?php
include('simple_html_dom.php');
class News {
var $image;
var $fechanoticia;
var $title;
var $description;
var $sourceurl;
function get_image( ) {
return $this->image;
}
function set_image ($new_image) {
$this->image = $new_image;
}
function get_fechanoticia( ) {
return $this->fechanoticia;
}
function set_fechanoticia ($new_fechanoticia) {
$this->fechanoticia = $new_fechanoticia;
}
function get_title( ) {
return $this->title;
}
function set_title ($new_title) {
$this->title = $new_title;
}
function get_description( ) {
return $this->description;
}
function set_description ($new_description) {
$this->description = $new_description;
}
function get_sourceurl( ) {
return $this->sourceurl;
}
function set_sourceurl ($new_sourceurl) {
$this->sourceurl = $new_sourceurl;
}
}
// Create DOM from URL or file
$html = file_get_html('http://www.uvm.cl/noticias_mas.shtml');
$parsedNews = array();
// Find all news items.
foreach($html->find('#cont2 p') as $element) {
$newItem = new News;
// Parse the news item's thumbnail image.
foreach ($element->find('img') as $image) {
$newItem->set_image($image->src);
//echo $newItem->get_image() . "<br />";
}
// Parse the news item's post date.
foreach ($element->find('span.fechanoticia') as $fecha) {
$newItem->set_fechanoticia($fecha->innertext);
//echo $newItem->get_fechanoticia() . "<br />";
}
// Parse the news item's title.
foreach ($element->find('a') as $title) {
$newItem->set_title($title->innertext);
//echo $newItem->get_title() . "<br />";
}
// Parse the news item's source URL link.
foreach ($element->find('a') as $sourceurl) {
$newItem->set_sourceurl("http://www.uvm.cl/" . $sourceurl->href);
}
// Parse the news items' description text.
foreach ($element->find('a') as $link) {
$link->outertext = '';
}
foreach ($element->find('span') as $link) {
$link->outertext = '';
}
foreach ($element->find('img') as $link) {
$link->outertext = '';
}
echo $element->innertext;
}
?>

Use the innertext instead of outertext
foreach ($element->find('a') as $sourceurl) {
echo $sourceurl->innertext . "<br />";
}

Related

Template engine {var} doesnt work

I have made a template system but the {var} doesnt output the worth.
It just output {var}.
Here is my template class:
<?php
class Template {
public $assignedValues = array();
public $tpl;
function __construct($_path = '')
{
if(!empty($_path))
{
if(file_exists($_path))
{
$this->tpl = file_get_contents($_path);
}
else
{
echo 'Error: No template found. (code 25)';
}
}
}
function assign($_searchString, $_replaceString)
{
if(!empty($_searchString))
{
$this->assignedValues[strtoupper($_searchString)] = $_replaceString;
}
}
function show()
{
if(count($this->assignedValues) > 0)
{
foreach ($this->assignedValues as $key => $value)
{
$this->tpl = str_replace('{'.$key.'}', $value, $this->tpl);
}
}
echo $this->tpl;
}
}
?>
And here is what I execute on the index:
<?php
require_once('inc/classes/class.template.php');
define('PATH', 'tpl');
//new object
$template = new Template(PATH.'/test.tpl.html');
//assign values
$template->assign('title', 'Yupa');
$template->assign('about', 'Hello!');
//show the page
$template->show();
?>
I really need some help, if you can help I'd would be very grateful.

Instead of line:
$this->assignedValues[strtoupper($_searchString)] = $_replaceString;
You should have:
$this->assignedValues[$_searchString] = $_replaceString;
and it will work.
Of course I assume that inside your template file you have content:
{title} {about}

You should change
$this->assignedValues[strtoupper($_searchString)] = $_replaceString;
to this:
$this->assignedValues["{".$_searchString . "}"] = $_replaceString ;
this will only replace your keywords with values.

getattribute and tag value using dom

There are similar threads available. I look on them and tried to get. Here i can get the address value but phone number and image url is not comming.
Where I make mistake here?
function getData($url){
$containers1 = $html->find('div.mapbox div.mapbox-text strong.street-address address');
foreach($containers1 as $container)
{
$comments = $container->find('span');
$item = new stdClass();
foreach($comments as $comment)
{
$address.= $comment->plaintext; //append the content of each span
}
echo $address; // this gives correct result
}
$containers2 = $html->find('div.mapbox div.mapbox-text span.biz-phone');
$phone = $containers2->innertext;
echo "<br/>".$phone."<br/>"; // no result
$Imgcontainers = $html->find('div.js-photo photo photo-1 div.showcase-photo-box img');
echo $Imgcontainers->getAttribute('src'); // Fatal error: Call to a member function getAttribute() on a non-object
}
}
$url = 'http://www.yelp.com/biz/locanda-san-francisco?start=40';
$root = getData($url);
UPDATE
I added this:
$Imgcontainers = $html->find('div.photo-1 img');
foreach($Imgcontainers as $cont){
$img[] = $cont->getAttribute('src');
}
echo $img[0];
http://s3-media3.ak.yelpcdn.com/bphoto/Pf2RTD8pNdy-oUnp-57m4Q/ls.jpghttp://s3-media3.ak.yelpcdn.com/bphoto/Pf2RTD8pNdy-oUnp-57m4Q/ls.jpg
Same url twice? why twice though we echo only 0th array value?

Use this code
foreach($containers1 as $container)
{
$comments = $container->find('span');
$item = new stdClass();
foreach($comments as $comment)
{
$address.= $comment->plaintext; //append the content of each span
}
echo $address; // this gives correct result
}
$containers2 = $html->find('div.mapbox div.mapbox-text span.biz-phone');
foreach($containers2 as $contact){
$phone = $contact->plaintext;
}
echo "<br/>".$phone."<br/>"; // no result
$Imgcontainers = $html->find('div.photo-1 img');
foreach($Imgcontainers as $cont){
echo $cont->getAttribute('src');
}

RSS Parser to include Categories

I recently inherited a RSS/XML parser, and while it seems to work really good, I'm finding some things are missing.
For instance, pulling in a RSS feed from a blog. It's missing all the categories in the items. It shows as each item having only one category when in reality it should show as having a multitude of categories.
Link to Demo: http://dev.o7t.in/rss/
Link to Actual Feed: http://o7thblog.com/feed/
You can see how the first item in the feed itself has 8 total categories in the first item. (may need to view source)
However, in the Demo you can see that it only shows 1 category
Here is my entire code for the class:
<?php
class o7thRssFeedPuller{
public $FeedUrl = ''; // URL of the feed to pull in
public $ReturnJson = false; // Return the array as a JSON encoded string instead?
public $MaxItems = 0; // 0 = unlimited (except by feed), only applicable to GetItems
// Internal holders
private $document;
private $channel;
private $items;
// Get the full RSS feed
public function GetRSS($includeAttributes = false) {
// Pull in our feed
$this->loadParser(file_get_contents($this->FeedUrl, false, $this->randomContext()));
if($includeAttributes) {
// only if we are including attributes
return ($this->ReturnJson) ? json_encode($this->document) : $this->document;
}
// Return either an array or a json encoded string
return ($this->ReturnJson) ? json_encode($this->valueReturner()) : $this->valueReturner();
}
// Get the channel data
public function GetChannel($includeAttributes = false) {
// Pull in our feed
$this->loadParser(file_get_contents($this->FeedUrl, false, $this->randomContext()));
if($includeAttributes) {
// only if we are including attributes
return ($this->ReturnJson) ? json_encode($this->channel) : $this->channel;
}
// Return either an array or a json encoded string
return ($this->ReturnJson) ? json_encode($this->valueReturner($this->channel)) : $this->valueReturner($this->channel);
}
// Get the items
public function GetItems($includeAttributes=false) {
// Pull in our feed
$this->loadParser(file_get_contents($this->FeedUrl, false, $this->randomContext()));
if($includeAttributes) {
// only if we are including attributes
$arr = ($this->MaxItems == 0) ? $this->items : array_slice($this->items, 0, $this->MaxItems);
return ($this->ReturnJson) ? json_encode($arr) : $arr;
}
// Return either an array or a json encoded string
$arr = ($this->MaxItems == 0) ? $this->valueReturner($this->items) : array_slice($this->valueReturner($this->items), 0, $this->MaxItems);
return ($this->ReturnJson) ? json_encode($arr) : $arr;
}
// -------------------------------------------------------------------------------------------------
// Internal Methods
private function loadParser($rss=false) {
if($rss) {
$this->document = array();
$this->channel = array();
$this->items = array();
$DOMDocument = new DOMDocument;
$DOMDocument->strictErrorChecking = false;
$DOMDocument->loadXML($rss);
$this->document = $this->extractDOM($DOMDocument->childNodes);
}
}
private function valueReturner($valueBlock=false) {
if(!$valueBlock) {
$valueBlock = $this->document;
}
foreach($valueBlock as $valueName => $values) {
if(isset($values['value'])) {
$values = $values['value'];
}
if(is_array($values)) {
$valueBlock[$valueName] = $this->valueReturner($values);
} else {
$valueBlock[$valueName] = $values;
}
}
return $valueBlock;
}
private function extractDOM($nodeList,$parentNodeName=false) {
$itemCounter = 0;
foreach($nodeList as $values) {
if(substr($values->nodeName,0,1) != '#') {
if($values->nodeName == 'item') {
$nodeName = $values->nodeName.':'.$itemCounter;
$itemCounter++;
} else {
$nodeName = $values->nodeName;
}
$tempNode[$nodeName] = array();
if($values->attributes) {
for($i=0;$values->attributes->item($i);$i++) {
$tempNode[$nodeName]['properties'][$values->attributes->item($i)->nodeName] = $values->attributes->item($i)->nodeValue;
}
}
if(!$values->firstChild) {
$tempNode[$nodeName]['value'] = $values->textContent;
} else {
$tempNode[$nodeName]['value'] = $this->extractDOM($values->childNodes, $values->nodeName);
}
if(in_array($parentNodeName, array('channel','rdf:RDF'))) {
if($values->nodeName == 'item') {
$this->items[] = $tempNode[$nodeName]['value'];
} elseif(!in_array($values->nodeName, array('rss','channel'))) {
$this->channel[$values->nodeName] = $tempNode[$nodeName];
}
}
} elseif(substr($values->nodeName,1) == 'text') {
$tempValue = trim(preg_replace('/\s\s+/',' ',str_replace("\n",' ', $values->textContent)));
if($tempValue) {
$tempNode = $tempValue;
}
} elseif(substr($values->nodeName,1) == 'cdata-section'){
$tempNode = $values->textContent;
}
}
return (!isset($tempNode)) ? null : $tempNode;
}
// Load in a random header to pass
private function randomContext() {
$headerstrings = array();
$headerstrings['User-Agent'] = 'Mozilla/5.0 (Windows; U; Windows NT 5.'.rand(0,2).'; en-US; rv:1.'.rand(2,9).'.'.rand(0,4).'.'.rand(1,9).') Gecko/2007'.rand(10,12).rand(10,30).' Firefox/2.0.'.rand(0,1).'.'.rand(1,9);
$headerstrings['Accept-Charset'] = rand(0,1) ? 'en-gb,en;q=0.'.rand(3,8) : 'en-us,en;q=0.'.rand(3,8);
$headerstrings['Accept-Language'] = 'en-us,en;q=0.'.rand(4,6);
$setHeaders = 'Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5'."\r\n".
'Accept-Charset: '.$headerstrings['Accept-Charset']."\r\n".
'Accept-Language: '.$headerstrings['Accept-Language']."\r\n".
'User-Agent: '.$headerstrings['User-Agent']."\r\n";
$contextOptions = array(
'http'=>array(
'method'=>"GET",
'header'=>$setHeaders
)
);
return stream_context_create($contextOptions);
}
}
?>
And for the demo page:
<?php
require_once($_SERVER['DOCUMENT_ROOT'] . '/rss/o7th.rss.feed.puller.php');
$fp = new o7thRssFeedPuller();
$fp->FeedUrl = 'http://o7thblog.com/feed';
$fp->MaxItems = 2;
echo '<table width="100%" cellpadding="0" cellspacing="0">';
echo ' <tr>';
echo ' <td>';
echo ' <textarea cols="120" rows="30">';
print_r($fp->GetItems());
echo ' </textarea>';
echo ' </td>';
echo ' </tr>';
echo '</table>';
?>
So, I assume that the issue lies somewhere in either the valueReturner method or the extractDOM method, but I am just not sure where, nor what I can do to get all the categories in the returned array.
Can you help?

I would suggest using SimpleXML to parse the feed.
Here is how you can do it:
$feed_url = 'http://o7thblog.com/feed/';
$feed = simplexml_load_file($feed_url, null, LIBXML_NOCDATA);
$channel = $feed->channel;
echo "<h1>{$channel->title}</h1>\n";
echo "{$channel->description}\n";
echo "<dl>\n";
foreach ($channel->item as $item) {
echo "<dt>{$item->title}</dt>\n"
. "<dd style=\"margin-bottom: 30px;\"><div style=\"font-size: small;\">{$item->pubDate}</div>\n"
. "<div>{$item->description}</div>\n"
. "Categories: <strong>".implode('</strong>, <strong>', (array) $item->category) . "</strong>\n</dd>";
}
echo "</dl>\n";
Above shows you all categories.

You have written a custom parser for what you can do simply with one line of code!
$feed = (array) simplexml_load_file('http://o7thblog.com/feed/', null, LIBXML_NOCDATA);

How to convert PHP to XML output

I have a php code. this code outputs an HTML. I need to modify this code to output an XML.
ANy ideas as to how shall I go about doing this. Is there any XML library available that directly does the job or do i have to manually create each node.?
My php code is:
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<style>
a {text-decoration:none; color:black;}
</style>
</head>
<body>
<?php
$a=$_POST["title"];
$b=$_POST["name"];
$c="http://www.imdb.com/search/title?title=".urlencode($a)."&title_type=".urlencode($b);
$d=file_get_contents($c);
preg_match_all('/<div id="main">\n(No results.)/', $d,$nore);
preg_match_all('#<img src="(.*)"#Us', $d, $img);//image
preg_match_all('/<a\s*href="\/title\/tt[0-9]*\/">((?:[a-z]*(?:&*[.]*)?\s*-*[a-z]*[0-9]*[^<])+)/i',$d,$tit); //title
preg_match_all('/<span\sclass="year_type">\s*\(([\d]*)/',$d,$ye); //movie year working fine
preg_match_all('#<span class="credit">\n Dir: (.*)\n(?: With:)?#Us',$d,$dir); //director
preg_match_all('/<span class="rating-rating"><span class="value">([\w]*.[\w]*)/i',$d,$rat); //rating
preg_match_all('/<a\shref="(\/title\/tt[0-9]*\/)"\s*[title]+/i',$d,$lin); //link
for($i=0;$i<5;$i++)
{
if (#$rat[1][$i]=="-")
$rat[1][$i]="N/A";
}
for($i=0;$i<5;$i++)
{
if(#$dir[1][$i]=="")
$dir[1][$i]="N/A";
}
if(count($tit[1])>5)
$cnt=5;
else
$cnt=count($tit[1]);
echo"<center><b>Search Result</b></center>";
echo "<br/>";
echo "<center><b>\"$a\"of type\"$b\":</b></center>";
echo"<br/>";
if(#$nore[1][0]=="No results.")
echo "<center><b>No movies found!</b></center>";
else
{
echo "<center><table border=1><tr><td><center>Image</center></td><td><center>Title</center></td><td><center>Year</center></td><td><center>Director</center></td><td><center>Rating(10)</center></td><td><center>Link to Movie</center></td></tr>";
for($j=0;$j<$cnt;$j++)
{
echo "<tr>";
echo "<td>".#$img[0][$j+2]."</td>";
echo "<td><center>".#$tit[1][$j]."</center></td>";
echo "<td><center>".#$ye[1][$j]."</center></td>";
echo "<td><center>".#$dir[1][$j]."</center></td>";
echo "<td><center>".#$rat[1][$j]."</center></td>";
echo '<td><center><a style="text-decoration:underline; color:blue;" href="http://www.imdb.com'.#$lin[1][$j].'">Details</a></center></td>';
echo "</tr>";
}
echo "</table></center>";
}
?>
</body>
</html>
Expected XML output:
<result cover="http://ia.mediaimdb.com/images
/M/MV5BMjMyOTM4MDMxNV5BMl5BanBnXkFtZTcwNjIyNzExOA##._V1._SX54_
CR0,0,54,74_.jpg" title="The Amazing Spider-Man(2012)"year="2012"
director="Marc Webb" rating="7.5"
details="http://www.imdb.com/title/tt0948470"/>
<result cover="http://ia.mediaimdb.
com/images/M/MV5BMzk3MTE5MDU5NV5BMl5BanBnXkFtZTYwMjY3NTY3._V1._SX54_CR0,
0,54,74_.jpg" title="Spider-Man(2002)" year="2002"director="Sam Raimi"
rating="7.3" details="http://www.imdb.com/title/tt0145487"/>
<result cover="http://ia.mediaimdb.
com/images/M/MV5BODUwMDc5Mzc5M15BMl5BanBnXkFtZTcwNDgzOTY0MQ##._V1._SX54_
CR0,0,54,74_.jpg" title="Spider-Man 3 (2007)" year="2007" director="Sam
Raimi" rating="6.3" details="http://www.imdb.com/title/tt0413300"/>
<result cover="http://i.mediaimdb.
com/images/SF1f0a42ee1aa08d477a576fbbf7562eed/realm/feature.gif" title="
The Amazing Spider-Man 2 (2014)" year="2014" director="Sam Raimi"
rating="6.3" details="http://www.imdb.com/title/tt1872181"/>
<result cover="http://ia.mediaimdb.
com/images/M/MV5BMjE1ODcyODYxMl5BMl5BanBnXkFtZTcwNjA1NDE3MQ##._V1._SX54_
CR0,0,54,74_.jpg" title="Spider-Man 2 (2004)" year="2004" director="Sam
Raimi" rating="7.5" details="http://www.imdb.com/title/tt0316654"/>
</results>

First thing, you're parsing your html result with regex which is inefficient, unnecessary, and... well, you're answering to the cthulhu call!
Second, parsing IMDB HTML to retrieve results, although valid, might be unnecessary. There are some neat 3rd party APIs that do the job for you, like http://imdbapi.org
If you don't want to use any 3rd party API though, IMHO, you should, instead, parse the HTML using a DOM parser/manipulator, like DOMDocument, for instance, which is safer, better and, at the same time, can solve your HTML to XML problem.
Here's the bit you asked (build XML and HTML from results):
function resultsToHTML($results)
{
$doc = new DOMDocumet();
$table = $doc->createElement('table');
foreach ($results as $r) {
$row = $doc->createElement('tr');
$doc->appendChild($row);
$title = $doc->createElement('td', $r['title']);
$row->appendChild($title);
$year = $doc->createElement('td', $r['year']);
$row->appendChild($year);
$rating = $doc->createElement('td', $r['rating']);
$row->appendChild($rating);
$imgTD = $doc->createElement('td');
//Creating a img tag (use only on)
$img = $doc->createElement('img');
$img->setAttribute('src', $r['img_src']);
$imgTD->appendChild($img);
$row->appendChild($imgTD);
$imgTD = $doc->createElement('td');
//Importing directly from the old document
$fauxDoc = new DOMDocument();
$fauxDoc->loadXML($r['img']);
$img = $fauxDoc->getElementsByTagName('img')->index(0);
$importedImg = $doc->importNode('$img', true);
$imgTD->appendChild($importedImg);
$row->appendChild($imgTD);
}
return $doc;
}
function resultsToXML($results)
{
$doc = new DOMDocumet();
$root = $doc->createElement('results');
foreach ($results as $r) {
$element = $root->createElement('result');
$element->setAttribute('cover', $r['img_src']);
$element->setAttribute('title', $r['title']);
$element->setAttribute('year', $r['year']);
$element->setAttribute('rating', $r['rating']);
$root->appendChild($element);
}
$doc->appendChild($root);
return $doc;
}
to print them you just need to
$xml = resultsToXML($results);
print $xml->saveXML();
Same thing with html
Here's a refactor of your code with DOMDocument, based on your post:
<?php
//Mock IMDB Link
$a = 'The Amazing Spider-Man';
$b = 'title';
$c = "http://www.imdb.com/search/title?title=".urlencode($a)."&title_type=".urlencode($b);
// HTML might be malformed so we want DOMDocument to be quiet
libxml_use_internal_errors(true);
//Initialize DOMDocument parser
$doc = new DOMDocument();
//Load previously downloaded document
$doc->loadHTMLFile($c);
//initialize array to store results
$results = array();
// get table of results and extract a list of rows
$listOfTables = $doc->getElementsByTagName('table');
$rows = getResultRows($listOfTables);
$i = 0;
//loop through all rows to retrieve information
foreach ($rows as $row) {
if ($title = getTitle($row)) {
$results[$i]['title'] = $title;
}
if (!is_null($year = getYear($row)) && $year) {
$results[$i]['year'] = $year;
}
if (!is_null($rating = getRating($row)) && $rating) {
$results[$i]['rating'] = $rating;
}
if ($img = getImage($row)) {
$results[$i]['img'] = $img;
}
if ($src = getImageSrc($row)) {
$results[$i]['img_src'] = $src;
}
++$i;
}
//the first result can be a false positive due to the
// results' table header, so we remove it
if (isset($results[0])) {
array_shift($results);
}
FUNCTIONS
function getResultRows($listOfTables)
{
foreach ($listOfTables as $table) {
if ($table->getAttribute('class') === 'results') {
return $table->getElementsByTagName('tr');
}
}
}
function getImageSrc($row)
{
$img = $row->getElementsByTagName('img')->item(0);
if (!is_null($img)) {
return $img->getAttribute('src');
} else {
return false;
}
}
function getImage($row, $doc)
{
$img = $row->getElementsByTagName('img')->item(0);
if (!is_null($img)) {
return $doc->saveHTML($img);
} else {
return false;
}
}
function getTitle($row)
{
$tdInfo = getTDInfo($row->getElementsByTagName('td'));
if (!is_null($tdInfo) && !is_null($as = $tdInfo->getElementsByTagName('a'))) {
return $as->item(0)->nodeValue;
} else {
return false;
}
}
function getYear($row)
{
$tdInfo = getTDInfo($row->getElementsByTagName('td'));
if (!is_null($tdInfo) && !is_null($spans = $tdInfo->getElementsByTagName('span'))) {
foreach ($spans as $span) {
if ($span->getAttribute('class') === 'year_type') {
return str_replace(')', '', str_replace('(', '', $span->nodeValue));
}
}
}
}
function getRating($row)
{
$tdInfo = getTDInfo($row->getElementsByTagName('td'));
if (!is_null($tdInfo) && !is_null($spans = $tdInfo->getElementsByTagName('span'))) {
foreach ($spans as $span) {
if ($span->getAttribute('class') === 'rating-rating') {
return $span->nodeValue;
}
}
}
}
function getTDInfo($tds)
{
foreach ($tds as $td) {
if ($td->getAttribute('class') == 'title') {
return $td;
}
}
}

Pass MySQL Variable to Joomla Module instead of using default field for Twitter Search

It seems I've hit a wall here and could use some help
Would like to pass a MySql variable to a Joomla Module
Im using Yootheme's Widget-kit to display tweets from a search term. Works great and all but you need to enter the twitter search term in the Module back end.
Instead I would like to use a variable ( already used on the page) and pass that variable to the Twitter module so it can display the tweets I want
Here are some lines of PHP with the variable I'd like to use
$document->setTitle(JText::sprintf('COVERAGE_DATA_PLACE', $this->country->country_name, $this->city->city_name));
$text = JString::str_ireplace('%city_name%',$this->city->city_name,$text);
$this->setBreadcrumbs(array('country','city'));
Is there any way to take the "City" variable and send it to the 'word" field found in the twitter module?
Here is the Code for the twitter module
<?php
Class: TwitterWidgetkitHelper
Twitter helper class
*/
class TwitterWidgetkitHelper extends WidgetkitHelper {
/* type */
public $type;
/* options */
public $options;
/*
Function: Constructor
Class Constructor.
*/
public function __construct($widgetkit) {
parent::__construct($widgetkit);
// init vars
$this->type = strtolower(str_replace('WidgetkitHelper', '', get_class($this)));
$this->options = $this['system']->options;
// create cache
$cache = $this['path']->path('cache:');
if ($cache && !file_exists($cache.'/twitter')) {
mkdir($cache.'/twitter', 0777, true);
}
// register path
$this['path']->register(dirname(__FILE__), $this->type);
}
/*
Function: site
Site init actions
Returns:
Void
*/
public function site() {
// add translations
foreach (array('LESS_THAN_A_MINUTE_AGO', 'ABOUT_A_MINUTE_AGO', 'X_MINUTES_AGO', 'ABOUT_AN_HOUR_AGO', 'X_HOURS_AGO', 'ONE_DAY_AGO', 'X_DAYS_AGO') as $key) {
$translations[$key] = $this['system']->__($key);
}
// add stylesheets/javascripts
$this['asset']->addFile('css', 'twitter:styles/style.css');
$this['asset']->addFile('js', 'twitter:twitter.js');
$this['asset']->addString('js', sprintf('jQuery.trans.addDic(%s);', json_encode($translations)));
// rtl
if ($this['system']->options->get('direction') == 'rtl') {
$this['asset']->addFile('css', 'twitter:styles/rtl.css');
}
}
/*
Function: render
Render widget on site
Returns:
String
*/
public function render($options) {
if ($tweets = $this->_getTweets($options)) {
// get options
extract($options);
return $this['template']->render("twitter:styles/$style/template", compact('tweets', 'show_image', 'show_author', 'show_date', 'image_size'));
}
return 'No tweets found.';
}
/*
Function: _getURL
Create Twitter Query URL
Returns:
String
*/
protected function _getURL($options) {
// get options
extract($options);
// clean options
foreach (array('from_user', 'to_user', 'ref_user', 'word', 'nots', 'hashtag') as $var) {
$$var = preg_replace('/[##]/', '', preg_replace('/\s+/', ' ', trim($$var)));
}
// build query
$query = array();
if ($from_user) {
$query[] = 'from:'.str_replace(' ', ' OR from:', $from_user);
}
if ($to_user) {
$query[] = 'to:'.str_replace(' ', ' OR to:', $to_user);
}
if ($ref_user) {
$query[] = '#'.str_replace(' ', ' #', $ref_user);
}
if ($word) {
$query[] = $word;
}
if ($nots) {
$query[] = '-'.str_replace(' ', ' -', $nots);
}
if ($hashtag) {
$query[] = '#'.str_replace(' ', ' #', $hashtag);
}
$limit = min($limit ? intval($limit) : 5, 100);
// build timeline url
if ($from_user && !strpos($from_user, ' ') && count($query) == 1) {
$url = 'http://twitter.com/statuses/user_timeline/'.strtolower($from_user).'.json';
if ($limit > 15) {
$url .= '?count='.$limit;
}
return $url;
}
// build search url
if (count($query)) {
$url = 'http://search.twitter.com/search.json?q='.urlencode(implode(' ', $query));
if ($limit > 15) {
$url .= '&rpp='.$limit;
}
return $url;
}
return null;
}
/*
Function: _getTweets
Get Tweet Object Array
Returns:
Array
*/
protected function _getTweets($options) {
// init vars
$tweets = array();
// query twitter
if ($url = $this->_getURL($options)) {
if ($path = $this['path']->path('cache:twitter')) {
$file = rtrim($path, '/').sprintf('/twitter-%s.php', md5($url));
// is cached ?
if (file_exists($file)) {
$response = file_get_contents($file);
}
// refresh cache ?
if (!file_exists($file) || (time() - filemtime($file)) > 300) {
// send query
$request = $this['http']->get($url);
if (isset($request['status']['code']) && $request['status']['code'] == 200) {
$response = $request['body'];
file_put_contents($file, $response);
}
}
}
}
// create tweets
if (isset($response)) {
$response = json_decode($response, true);
if (is_array($response)) {
if (isset($response['results'])) {
foreach ($response['results'] as $res) {
$tweet = new WidgetkitTweet();
$tweet->user = $res['from_user'];
$tweet->name = $res['from_user'];
$tweet->image = $res['profile_image_url'];
$tweet->text = $res['text'];
$tweet->created_at = $res['created_at'];
$tweets[] = $tweet;
}
} else {
foreach ($response as $res) {
$tweet = new WidgetkitTweet();
$tweet->user = $res['user']['screen_name'];
$tweet->name = $res['user']['name'];
$tweet->image = $res['user']['profile_image_url'];
$tweet->text = $res['text'];
$tweet->created_at = $res['created_at'];
$tweets[] = $tweet;
}
}
}
}
return array_slice($tweets, 0, $options['limit'] ? intval($options['limit']) : 5);
}
}
class WidgetkitTweet {
public $user;
public $name;
public $image;
public $text;
public $created_at;
public function getLink() {
return 'http://twitter.com/'.$this->user;
}
public function getText() {
// format text
$text = preg_replace('#(https?://([-\w\.]+)+(/([\w/_\.]*(\?\S+)?(#\S+)?)?)?)#', '$1', $this->text);
$text = preg_replace('/#(\w+)/', '#$1', $text);
$text = preg_replace('/\s+#(\w+)/', ' #$1', $text);
return $text;
}
}
// bind events
$widgetkit = Widgetkit::getInstance();
$widgetkit['event']->bind('site', array($widgetkit['twitter'], 'site'));

I believe you can do what you want with this:
http://www.j-plant.com/joomla-extensions/free-extensions/26-module-plant.html
This plugin supports module parameters overriding. Use the next code
for overriding parameters:
[moduleplant id="77" <param_name_1>="<param_value_1>" <param_name_N>="<param_value_N>"]
replace and with the necessary parameter name and value.
You can override as many parameters as you want.
Available module parameters can be found in module XML manifest
file. A manifest file is usually located in the next path:
modules/<module_type>/<module_type>.xml

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Using Simple Html Dom to remove some elements - php

Use the innertext instead of outertext foreach ($element->find('a') as $sourceurl) { echo $sourceurl->innertext . "<br />"; }

Related

Template engine {var} doesnt work

getattribute and tag value using dom

RSS Parser to include Categories

How to convert PHP to XML output

Pass MySQL Variable to Joomla Module instead of using default field for Twitter Search

Categories

Resources