I am trying to generate an RSS feed on my site using the code below. The rss is appearing but I am having two issues:
When the feed shows on my page the images do not show up, instead you see the img link appear directly on the page like this... <img src="http://graphics8.nytimes.com/images/2011/11/18/movies/18RDP_GARBO/18RDP_GARBO-thumbStandard.jpg" border="0" height="75" width="75" hspace="4" align="left">
How do I limit the amount of articles that appear on my site?
Here is the link to the RSS: Spy RSS FEED
Here is the code I am using:
<?php
$insideitem = false;
$tag = "";
$title = "";
$description = "";
$link = "";
$locations = array('http://topics.nytimes.com/topics/reference/timestopics/subjects/e/espionage/index.html?rss=1');
srand((float) microtime() * 10000000); // seed the random gen
$random_key = array_rand($locations);
function startElement($parser, $name, $attrs) {
global $insideitem, $tag, $title, $description, $link;
if ($insideitem) {
$tag = $name;
} elseif ($name == "ITEM") {
$insideitem = true;
}
}
function endElement($parser, $name) {
global $insideitem, $tag, $title, $description, $link;
if ($name == "ITEM") {
printf("<dt><b><a href='%s' target=new>%s</a></b></dt>",
trim($link),htmlspecialchars(trim($title)));
printf("<dt>%s</dt><br><br>",htmlspecialchars(trim($description)));
$title = "";
$description = "";
$link = "";
$insideitem = false;
}
}
function characterData($parser, $data) {
global $insideitem, $tag, $title, $description, $link;
if ($insideitem) {
switch ($tag) {
case "TITLE":
$title .= $data;
break;
case "DESCRIPTION":
$description .= $data;
break;
case "LINK":
$link .= $data;
break;
}
}
}
$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData");
$fp = fopen($locations[$random_key], 'r')
or die("Error reading RSS data.");
while ($data = fread($fp, 4096))
xml_parse($xml_parser, $data, feof($fp))
or die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
fclose($fp);
xml_parser_free($xml_parser);
?>
In endElement(), when outputting the feed content, it calls printf("<dt>%s</dt><br><br>",htmlspecialchars(trim($description)));
If you remove the htmlspecialchars function, then it should display images and other html properly instead of converting < to < etc.
Given that code, there is no built in way to limit the number of feeds. Nytimes may have an option you can pass as part of the query string that restricts the number of results, but I am not sure about that.
A quick fix would be to add a global variable called $numShown or something like that, and at the beginning of endElement(), you can increment it, and the check to see if it is above some value and if so just return prior to all the printf calls to output the feed item.
<?php
function endElement($parser, $name) {
global $insideitem, $tag, $title, $description, $link, $numShown;
if ($name == "ITEM") {
$numShown++;
if ($numShown >= 5) {
return ;
}
printf ( "<dt><b><a href='%s' target=new>%s</a></b></dt>", trim ( $link ), htmlspecialchars ( trim ( $title ) ) );
printf ( "<dt>%s</dt><br><br>", trim ( $description ) );
$title = "";
$description = "";
$link = "";
$insideitem = false;
}
}
Related
I am trying to generate a sitemap but somehow an extra DIV tag at the initial line of xml. I need to remove this wrong tag DIV from the xml output.
I've tried to gather the logic at first and segregate the generation of the xml side at the bottom.
set header 'text/xml'.
I tried to strip_tags the whole xml string before output, but then, it shows document empty
private function removeImageAndEmbeds ( $content )
{
// remove img tags
$re1='(<img).*?\\/.*?\\/.*?\\/.*?\\/.*?\\/.*?\\/.*?(\\/>)';
if ( $c=preg_replace("/".$re1."/is", "", $content) ) $content = $c;
// remove embedded tags
$re2='(<div).*?(data-oembed-url=)(".*?").*?<\\/div>.*?(<\\/div>)';
if ( $c=preg_replace("/".$re2."/is", "", $content) ) $content = $c;
return $content;
}
public function sitemaps ($tenantName="") {
if ( !empty($tenantName) ) {
$this->db->like( 't.name', str_replace('-', ' ', rawurldecode($tenantName)), 'none' );
$results = $this->db->get($this->TBL . ' t')->result_array();
foreach ( $results as $result ) {
$tenantId = $result['id'];
$tenantNameinURL = formatTenantNameinURL( $result['name'] );
$AllItems = $this->db->get_where($this->DIVIEW . ' di', 'di.account_id = '. $tenantId)->result_array();
$topics = [];
$itemIds = [];
$ddIds = [];
$urls = [];
foreach ( $AllItems as $k => $item ) {
$pieces = explode('_', $item['id']);
if ( $pieces[1] === $this->ITEMTBL ) {
if( !in_array($item['record_id'], $itemIds) ){
$itemIds[] = $item['record_id'];
$content = $this->removeImageAndEmbeds( $item['content'] );
$AllItems[$k]['content'] = $content;
$topics[$k][] = $AllItems[$k];
$urls[$k]['url'] = formatFrontEndURL( $this->current_class_name, $tenantName, 'show', $pieces[0] );
}
} else if ( $pieces[1] === 'dataDefinitions' ) {
if( !in_array($item['record_id'], $ddIds) ){
$ddIds[] = $item['record_id'];
$content = $this->removeImageAndEmbeds( $item['content'] );
$AllItems[$k]['content'] = $content;
$topics[$k][] = $AllItems[$k];
$urls[$k]['url'] = formatFrontEndURL( $this->current_class_name, $tenantName, 'data_definition', $pieces[0] );
}
}
}
$urlset = new SimpleXMLElement('<?xml version="1.0" encoding="UTF-8"?><urlset />');
$urlset->addAttribute('xmlns', 'http://www.sitemaps.org/schemas/sitemap/0.9');
foreach ($topics as $i => $itemsInTopic) {
$url = $urlset->addChild('url');
$url->loc = $urls[$i]['url'];
$pageMap = $url->addChild('PageMap');
$pageMap->addAttribute('xmlns', 'http://www.google.com/schemas/sitemap-pagemap/1.0');
foreach ( $itemsInTopic as $item ) {
$content = $item['content'];
$content = trim( str_replace([" ","\r","\n","\t", "
", "
"], ' ', strip_tags( utf8_decode( $content ) )) );
$dataObject = $pageMap->addChild('DataObject');
$dataObject->addAttribute('type', 'document');
$dataObject->addAttribute('id', $item['record_id']);
$dataObject->Attribute[0]['name'] = 'title';
$dataObject->Attribute[0] = $item['title'];
$dataObject->Attribute[1]['name'] = 'content';
$dataObject->Attribute[1] = $content;
}
}
$xmlContent = $urlset->asXML();
$this->output->set_content_type('text/xml')->set_output( $xmlContent );
}
}
}
here are two errors generated from seochat validator
https://drive.google.com/file/d/1vacmuJL6hnMErzqZ5zZWkkObT74rKOmT/view?usp=sharing
https://drive.google.com/file/d/1y3z85D1WtJIT9GvOC-DeYwS-DtQCAxK5/view?usp=sharing
here is google console error
https://drive.google.com/file/d/1qMvifyjGILqAjJzdWdc90jyymvdUFV5A/view?usp=sharing
i want to create an array of variable $link to get all the links in array so that i can process them simultaneously outside curly braces
include("simple_html_dom.php");
$html = file_get_html($url);
$i=0;
$linkObjs = $html->find('h3.r a');
foreach ($linkObjs as $linkObj)
{
$title = trim($linkObj->plaintext);
$link = trim($linkObj->href);
//if it is not a direct link but url reference found inside it, then extract
if (!preg_match('/^https?/', $link) && preg_match('/q=(.+)&sa=/U', $link, $matches) && preg_match('/^https?/', $matches[1]))
{
$link = $matches[1];
} else if (!preg_match('/^https?/', $link)) { // skip if it is not a valid link
continue;
}
$descr = $html->find('span.st',$i); // description is not a child element of H3 thereforce we use a counter and recheck.
$i++;
}
Create an array and push matches.
include("simple_html_dom.php");
$html = file_get_html($url);
$links = array();
$i=0;
$linkObjs = $html->find('h3.r a');
foreach ($linkObjs as $linkObj)
{
$title = trim($linkObj->plaintext);
$link = trim($linkObj->href);
// if it is not a direct link but url reference found inside it, then extract
if (!preg_match('/^https?/', $link) && preg_match('/q=(.+)&sa=/U', $link, $matches) && preg_match('/^https?/', $matches[1]))
{
array_push($links, $link);
} else if (!preg_match('/^https?/', $link)) { // skip if it is not a valid link
continue;
}
$descr = $html->find('span.st',$i); // description is not a child element of H3 thereforce we use a counter and recheck.
$i++;
}
Just declare a array variable, and add it to use it later.
Before Loop,
$myLinks = [];
And, Just after this line,
$link = $matches[1];
$myLinks[] = $link;
Now, you can use the array $myLinks, Hope this was what you needed.
The function below is designed to apply rel="nofollow" attributes to all external links and no internal links unless the path matches a predefined root URL defined as $my_folder below.
So given the variables...
$my_folder = 'http://localhost/mytest/go/';
$blog_url = 'http://localhost/mytest';
And the content...
internal
internal cloaked link
external
The end result, after replacement should be...
internal
internal cloaked link
external
Notice that the first link is not altered, since its an internal link.
The link on the second line is also an internal link, but since it matches our $my_folder string, it gets the nofollow too.
The third link is the easiest, since it does not match the blog_url, its obviously an external link.
However, in the script below, ALL of my links are getting nofollow. How can I fix the script to do what I want?
function save_rseo_nofollow($content) {
$my_folder = $rseo['nofollow_folder'];
$blog_url = get_bloginfo('url');
preg_match_all('~<a.*>~isU',$content["post_content"],$matches);
for ( $i = 0; $i <= sizeof($matches[0]); $i++){
if ( !preg_match( '~nofollow~is',$matches[0][$i])
&& (preg_match('~' . $my_folder . '~', $matches[0][$i])
|| !preg_match( '~'.$blog_url.'~',$matches[0][$i]))){
$result = trim($matches[0][$i],">");
$result .= ' rel="nofollow">';
$content["post_content"] = str_replace($matches[0][$i], $result, $content["post_content"]);
}
}
return $content;
}
Here is the DOMDocument solution...
$str = 'internal
internal cloaked link
external
external
external
external
';
$dom = new DOMDocument();
$dom->preserveWhitespace = FALSE;
$dom->loadHTML($str);
$a = $dom->getElementsByTagName('a');
$host = strtok($_SERVER['HTTP_HOST'], ':');
foreach($a as $anchor) {
$href = $anchor->attributes->getNamedItem('href')->nodeValue;
if (preg_match('/^https?:\/\/' . preg_quote($host, '/') . '/', $href)) {
continue;
}
$noFollowRel = 'nofollow';
$oldRelAtt = $anchor->attributes->getNamedItem('rel');
if ($oldRelAtt == NULL) {
$newRel = $noFollowRel;
} else {
$oldRel = $oldRelAtt->nodeValue;
$oldRel = explode(' ', $oldRel);
if (in_array($noFollowRel, $oldRel)) {
continue;
}
$oldRel[] = $noFollowRel;
$newRel = implode($oldRel, ' ');
}
$newRelAtt = $dom->createAttribute('rel');
$noFollowNode = $dom->createTextNode($newRel);
$newRelAtt->appendChild($noFollowNode);
$anchor->appendChild($newRelAtt);
}
var_dump($dom->saveHTML());
Output
string(509) "<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
internal
internal cloaked link
external
external
external
external
</body></html>
"
Try to make it more readable first, and only afterwards make your if rules more complex:
function save_rseo_nofollow($content) {
$content["post_content"] =
preg_replace_callback('~<(a\s[^>]+)>~isU', "cb2", $content["post_content"]);
return $content;
}
function cb2($match) {
list($original, $tag) = $match; // regex match groups
$my_folder = "/hostgator"; // re-add quirky config here
$blog_url = "http://localhost/";
if (strpos($tag, "nofollow")) {
return $original;
}
elseif (strpos($tag, $blog_url) && (!$my_folder || !strpos($tag, $my_folder))) {
return $original;
}
else {
return "<$tag rel='nofollow'>";
}
}
Gives following output:
[post_content] =>
internal
<a href="http://localhost/mytest/go/hostgator" rel=nofollow>internal cloaked link</a>
<a href="http://cnn.com" rel=nofollow>external</a>
The problem in your original code might have been $rseo which wasn't declared anywhere.
Try this one (PHP 5.3+):
skip selected address
allow manually set rel parameter
and code:
function nofollow($html, $skip = null) {
return preg_replace_callback(
"#(<a[^>]+?)>#is", function ($mach) use ($skip) {
return (
!($skip && strpos($mach[1], $skip) !== false) &&
strpos($mach[1], 'rel=') === false
) ? $mach[1] . ' rel="nofollow">' : $mach[0];
},
$html
);
}
Examples:
echo nofollow('something');
// will be same because it's already contains rel parameter
echo nofollow('something'); // ad
// add rel="nofollow" parameter to anchor
echo nofollow('something', 'localhost');
// skip this link as internall link
Using regular expressions to do this job properly would be quite complicated. It would be easier to use an actual parser, such as the one from the DOM extension. DOM isn't very beginner-friendly, so what you can do is load the HTML with DOM then run the modifications with SimpleXML. They're backed by the same library, so it's easy to use one with the other.
Here's how it can look like:
$my_folder = 'http://localhost/mytest/go/';
$blog_url = 'http://localhost/mytest';
$html = '<html><body>
internal
internal cloaked link
external
</body></html>';
$dom = new DOMDocument;
$dom->loadHTML($html);
$sxe = simplexml_import_dom($dom);
// grab all <a> nodes with an href attribute
foreach ($sxe->xpath('//a[#href]') as $a)
{
if (substr($a['href'], 0, strlen($blog_url)) === $blog_url
&& substr($a['href'], 0, strlen($my_folder)) !== $my_folder)
{
// skip all links that start with the URL in $blog_url, as long as they
// don't start with the URL from $my_folder;
continue;
}
if (empty($a['rel']))
{
$a['rel'] = 'nofollow';
}
else
{
$a['rel'] .= ' nofollow';
}
}
$new_html = $dom->saveHTML();
echo $new_html;
As you can see, it's really short and simple. Depending on your needs, you may want to use preg_match() in place of the strpos() stuff, for example:
// change the regexp to your own rules, here we match everything under
// "http://localhost/mytest/" as long as it's not followed by "go"
if (preg_match('#^http://localhost/mytest/(?!go)#', $a['href']))
{
continue;
}
Note
I missed the last code block in the OP when I first read the question. The code I posted (and basically any solution based on DOM) is better suited at processing a whole page rather than a HTML block. Otherwise, DOM will attempt to "fix" your HTML and may add a <body> tag, a DOCTYPE, etc...
Thanks #alex for your nice solution. But, I was having a problem with Japanese text. I have fixed it as following way. Also, this code can skip multiple domains with the $whiteList array.
public function addRelNoFollow($html, $whiteList = [])
{
$dom = new \DOMDocument();
$dom->preserveWhiteSpace = false;
$dom->loadHTML(mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8'));
$a = $dom->getElementsByTagName('a');
/** #var \DOMElement $anchor */
foreach ($a as $anchor) {
$href = $anchor->attributes->getNamedItem('href')->nodeValue;
$domain = parse_url($href, PHP_URL_HOST);
// Skip whiteList domains
if (in_array($domain, $whiteList, true)) {
continue;
}
// Check & get existing rel attribute values
$noFollow = 'nofollow';
$rel = $anchor->attributes->getNamedItem('rel');
if ($rel) {
$values = explode(' ', $rel->nodeValue);
if (in_array($noFollow, $values, true)) {
continue;
}
$values[] = $noFollow;
$newValue = implode($values, ' ');
} else {
$newValue = $noFollow;
}
// Create new rel attribute
$rel = $dom->createAttribute('rel');
$node = $dom->createTextNode($newValue);
$rel->appendChild($node);
$anchor->appendChild($rel);
}
// There is a problem with saveHTML() and saveXML(), both of them do not work correctly in Unix.
// They do not save UTF-8 characters correctly when used in Unix, but they work in Windows.
// So we need to do as follows. #see https://stackoverflow.com/a/20675396/1710782
return $dom->saveHTML($dom->documentElement);
}
<?
$str='internal
internal cloaked link
external';
function test($x){
if (preg_match('#localhost/mytest/(?!go/)#i',$x[0])>0) return $x[0];
return 'rel="nofollow" '.$x[0];
}
echo preg_replace_callback('/href=[\'"][^\'"]+/i', 'test', $str);
?>
Here is the another solution which has whitelist option and add tagret Blank attribute.
And also it check if there already a rel attribute before add a new one.
function Add_Nofollow_Attr($Content, $Whitelist = [], $Add_Target_Blank = true)
{
$Whitelist[] = $_SERVER['HTTP_HOST'];
foreach ($Whitelist as $Key => $Link)
{
$Host = preg_replace('#^https?://#', '', $Link);
$Host = "https?://". preg_quote($Host, '/');
$Whitelist[$Key] = $Host;
}
if(preg_match_all("/<a .*?>/", $Content, $matches, PREG_SET_ORDER))
{
foreach ($matches as $Anchor_Tag)
{
$IS_Rel_Exist = $IS_Follow_Exist = $IS_Target_Blank_Exist = $Is_Valid_Tag = false;
if(preg_match_all("/(\w+)\s*=\s*['|\"](.*?)['|\"]/",$Anchor_Tag[0],$All_matches2))
{
foreach ($All_matches2[1] as $Key => $Attr_Name)
{
if($Attr_Name == 'href')
{
$Is_Valid_Tag = true;
$Url = $All_matches2[2][$Key];
// bypass #.. or internal links like "/"
if(preg_match('/^\s*[#|\/].*/', $Url))
{
continue 2;
}
foreach ($Whitelist as $Link)
{
if (preg_match("#$Link#", $Url)) {
continue 3;
}
}
}
else if($Attr_Name == 'rel')
{
$IS_Rel_Exist = true;
$Rel = $All_matches2[2][$Key];
preg_match("/[n|d]ofollow/", $Rel, $match, PREG_OFFSET_CAPTURE);
if( count($match) > 0 )
{
$IS_Follow_Exist = true;
}
else
{
$New_Rel = 'rel="'. $Rel . ' nofollow"';
}
}
else if($Attr_Name == 'target')
{
$IS_Target_Blank_Exist = true;
}
}
}
$New_Anchor_Tag = $Anchor_Tag;
if(!$IS_Rel_Exist)
{
$New_Anchor_Tag = str_replace(">",' rel="nofollow">',$Anchor_Tag);
}
else if(!$IS_Follow_Exist)
{
$New_Anchor_Tag = preg_replace("/rel=[\"|'].*?[\"|']/",$New_Rel,$Anchor_Tag);
}
if($Add_Target_Blank && !$IS_Target_Blank_Exist)
{
$New_Anchor_Tag = str_replace(">",' target="_blank">',$New_Anchor_Tag);
}
$Content = str_replace($Anchor_Tag,$New_Anchor_Tag,$Content);
}
}
return $Content;
}
To use it:
$Page_Content = 'internal
internal
google
example
stackoverflow';
$Whitelist = ["http://yoursite.com","http://localhost"];
echo Add_Nofollow_Attr($Page_Content,$Whitelist,true);
WordPress decision:
function replace__method($match) {
list($original, $tag) = $match; // regex match groups
$my_folder = "/articles"; // re-add quirky config here
$blog_url = 'https://'.$_SERVER['SERVER_NAME'];
if (strpos($tag, "nofollow")) {
return $original;
}
elseif (strpos($tag, $blog_url) && (!$my_folder || !strpos($tag, $my_folder))) {
return $original;
}
else {
return "<$tag rel='nofollow'>";
}
}
add_filter( 'the_content', 'add_nofollow_to_external_links', 1 );
function add_nofollow_to_external_links( $content ) {
$content = preg_replace_callback('~<(a\s[^>]+)>~isU', "replace__method", $content);
return $content;
}
a good script which allows to add nofollow automatically and to keep the other attributes
function nofollow(string $html, string $baseUrl = null) {
return preg_replace_callback(
'#<a([^>]*)>(.+)</a>#isU', function ($mach) use ($baseUrl) {
list ($a, $attr, $text) = $mach;
if (preg_match('#href=["\']([^"\']*)["\']#', $attr, $url)) {
$url = $url[1];
if (is_null($baseUrl) || !str_starts_with($url, $baseUrl)) {
if (preg_match('#rel=["\']([^"\']*)["\']#', $attr, $rel)) {
$relAttr = $rel[0];
$rel = $rel[1];
}
$rel = 'rel="' . ($rel ? (strpos($rel, 'nofollow') ? $rel : $rel . ' nofollow') : 'nofollow') . '"';
$attr = isset($relAttr) ? str_replace($relAttr, $rel, $attr) : $attr . ' ' . $rel;
$a = '<a ' . $attr . '>' . $text . '</a>';
}
}
return $a;
},
$html
);
}
for some reason I get this error below when trying to use multiple require() functions in my PHP. Basically, I'm use a couple require() functions to access a couple xml parser pages.
Does anyone know how to fix this?If this isn't very descriptive please say below and I will try to fix it. Thank you. I appreciate any positive feedback. Also, I'm just learning PHP so please don't be too harsh on me. I'm going to provide the following code below.
Here is the error:
Fatal error: Cannot redeclare startElement() (previously declared in /Applications/XAMPP/xamppfiles/htdocs/yournewsflow/news/sports.php:27) in /Applications/XAMPP/xamppfiles/htdocs/yournewsflow/news/political.php on line 34
Here are the require functions:
<?php
require("news/sports.php");
require("news/political.php");
?>
Here is the xml parser used for a couple pages:
<?php
$tag = "";
$title = "";
$description = "";
$link = "";
$pubDate = "";
$show= 50;
$feedzero = "http://feeds.finance.yahoo.com/rss/2.0/category-stocks?region=US&lang=en-US"; $feedone = "http://feeds.finance.yahoo.com/rss/2.0/category-ideas-and-strategies?region=US&lang=en-US";
$feedtwo = "http://feeds.finance.yahoo.com/rss/2.0/category-earnings?region=US&lang=en-US"; $feedthree = "http://feeds.finance.yahoo.com/rss/2.0/category-bonds?region=US&lang=en-US";
$feedfour = "http://feeds.finance.yahoo.com/rss/2.0/category-economy-govt-and-policy?region=US&lang=en-US";
$insideitem = false;
$counter = 0;
$outerData;
function startElement($parser, $name, $attrs) {
global $insideitem, $tag, $title, $description, $link, $pubDate;
if ($insideitem) {
$tag = $name;
} elseif ($name == "ITEM") {
$insideitem = true;
} }
function endElement($parser, $name) {
global $insideitem, $tag, $counter, $show, $showHTML, $outerData;
global $title, $description, $link, $pubDate;
if ($name == "ITEM" && $counter < $show) {
echo "<table>
<tr>
<td>
".htmlspecialchars($description)."
</td>
</tr>";
// if you chose to show the HTML
if ($showHTML) {
$title = htmlspecialchars($title);
$description = htmlspecialchars($description);
$link = htmlspecialchars($link);
$pubDate = htmlspecialchars($pubDate);
// if you chose not to show the HTML
} else {
$title = strip_tags($title);
$description = strip_tags($description);
$link = strip_tags($link);
$pubDate = strip_tags($pubDate);
}
// fill the innerData array
$innerData["title"] = $title;
$innerData["description"] = $description;
$innerData["link"] = $link;
$innerData["pubDate"] = $pubDate;
// fill one index of the outerData array
$outerData["data".$counter] = $innerData;
// make all the variables blank for the next iteration of the loop
$title = "";
$description = "";
$link = "";
$pubDate = "";
$insideitem = false;
// add one to the counter
$counter++;
}
}
function characterData($parser, $data) {
global $insideitem, $tag, $title, $description, $link, $pubDate;
if ($insideitem) {
switch ($tag) {
case "TITLE":
$title .= $data;
break;
case "DESCRIPTION":
$description .= $data;
break;
case "LINK":
$link .= $data;
break;
case "PUBDATE":
$pubDate .= $data;
break;
}
}
}
// Create an XML parser
$xml_parser = xml_parser_create();
// Set the functions to handle opening and closing tags
xml_set_element_handler($xml_parser, "startElement", "endElement");
// Set the function to handle blocks of character data
xml_set_character_data_handler($xml_parser, "characterData");
// if you started with feed:// fix it to html://
// Open the XML file for reading
$feedzeroFp = fopen($feedzero, 'r') or die("Error reading RSS data.");
$feedoneFp = fopen($feedone, 'r') or die("Error reading RSS data.");
$feedtwoFp = fopen($feedtwo, 'r') or die("Error reading RSS data.");
$feedthreeFp = fopen($feedthree, 'r') or die("Error reading RSS data.");
$feedfourFp = fopen($feedfour, 'r') or die("Error reading RSS data.");
// Read the XML file 4KB at a time
while ($data = fread($feedoneFp, 4096))
//Parse each 4KB chunk with the XML parser created above
xml_parse($xml_parser,$data,feof($feedoneFp))
//Handle errors in parsing
or die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
// Close the XML file
fclose($feedoneFp);
while ($data = fread($feedtwoFp, 4096))
//Parse each 4KB chunk with the XML parser created above
xml_parse($xml_parser,$data,feof($feedtwoFp))
//Handle errors in parsing
or die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
// Close the XML file
fclose($feedtwoFp);
while ($data = fread($feedthreeFp, 4096))
//Parse each 4KB chunk with the XML parser created above
xml_parse($xml_parser,$data,feof($feedthreeFp))
//Handle errors in parsing
or die(sprintfs("XML error: %s at line %d",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
// Close the XML file
fclose($feedthreeFp);
while ($data = fread($feedfourFp, 4096))
//Parse each 4KB chunk with the XML parser created above
xml_parse($xml_parser,$data,feof($feedfourFp))
//Handle errors in parsing
or die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
// Close the XML file
fclose($feedfourFp);
// Free up memory used by the XML parser
xml_parser_free($xml_parser);
?>
You cant require the same "parser" more than once because youve already defined the functions in that file. You need to restructure your code:
In parser.functions.php:
function startElement($parser, $name, $attrs) {
global $insideitem, $tag, $title, $description, $link, $pubDate;
if ($insideitem) {
$tag = $name;
} elseif ($name == "ITEM") {
$insideitem = true;
} }
function endElement($parser, $name) {
global $insideitem, $tag, $counter, $show, $showHTML, $outerData;
global $title, $description, $link, $pubDate;
if ($name == "ITEM" && $counter < $show) {
echo "<table>
<tr>
<td>
".htmlspecialchars($description)."
</td>
</tr>";
// if you chose to show the HTML
if ($showHTML) {
$title = htmlspecialchars($title);
$description = htmlspecialchars($description);
$link = htmlspecialchars($link);
$pubDate = htmlspecialchars($pubDate);
// if you chose not to show the HTML
} else {
$title = strip_tags($title);
$description = strip_tags($description);
$link = strip_tags($link);
$pubDate = strip_tags($pubDate);
}
// fill the innerData array
$innerData["title"] = $title;
$innerData["description"] = $description;
$innerData["link"] = $link;
$innerData["pubDate"] = $pubDate;
// fill one index of the outerData array
$outerData["data".$counter] = $innerData;
// make all the variables blank for the next iteration of the loop
$title = "";
$description = "";
$link = "";
$pubDate = "";
$insideitem = false;
// add one to the counter
$counter++;
}
}
function characterData($parser, $data) {
global $insideitem, $tag, $title, $description, $link, $pubDate;
if ($insideitem) {
switch ($tag) {
case "TITLE":
$title .= $data;
break;
case "DESCRIPTION":
$description .= $data;
break;
case "LINK":
$link .= $data;
break;
case "PUBDATE":
$pubDate .= $data;
break;
}
}
}
In your actual page php files:
$tag = "";
$title = "";
$description = "";
$link = "";
$pubDate = "";
$show= 50;
$feedzero = "http://feeds.finance.yahoo.com/rss/2.0/category-stocks?region=US&lang=en-US"; $feedone = "http://feeds.finance.yahoo.com/rss/2.0/category-ideas-and-strategies?region=US&lang=en-US";
$feedtwo = "http://feeds.finance.yahoo.com/rss/2.0/category-earnings?region=US&lang=en-US"; $feedthree = "http://feeds.finance.yahoo.com/rss/2.0/category-bonds?region=US&lang=en-US";
$feedfour = "http://feeds.finance.yahoo.com/rss/2.0/category-economy-govt-and-policy?region=US&lang=en-US";
$insideitem = false;
$counter = 0;
$outerData;
require_once('path/to/parser.functions.php');
// Create an XML parser
$xml_parser = xml_parser_create();
// Set the functions to handle opening and closing tags
xml_set_element_handler($xml_parser, "startElement", "endElement");
// Set the function to handle blocks of character data
xml_set_character_data_handler($xml_parser, "characterData");
// if you started with feed:// fix it to html://
// Open the XML file for reading
$feedzeroFp = fopen($feedzero, 'r') or die("Error reading RSS data.");
$feedoneFp = fopen($feedone, 'r') or die("Error reading RSS data.");
$feedtwoFp = fopen($feedtwo, 'r') or die("Error reading RSS data.");
$feedthreeFp = fopen($feedthree, 'r') or die("Error reading RSS data.");
$feedfourFp = fopen($feedfour, 'r') or die("Error reading RSS data.");
// Read the XML file 4KB at a time
while ($data = fread($feedoneFp, 4096))
//Parse each 4KB chunk with the XML parser created above
xml_parse($xml_parser,$data,feof($feedoneFp))
//Handle errors in parsing
or die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
// Close the XML file
fclose($feedoneFp);
while ($data = fread($feedtwoFp, 4096))
//Parse each 4KB chunk with the XML parser created above
xml_parse($xml_parser,$data,feof($feedtwoFp))
//Handle errors in parsing
or die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
// Close the XML file
fclose($feedtwoFp);
while ($data = fread($feedthreeFp, 4096))
//Parse each 4KB chunk with the XML parser created above
xml_parse($xml_parser,$data,feof($feedthreeFp))
//Handle errors in parsing
or die(sprintfs("XML error: %s at line %d",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
// Close the XML file
fclose($feedthreeFp);
while ($data = fread($feedfourFp, 4096))
//Parse each 4KB chunk with the XML parser created above
xml_parse($xml_parser,$data,feof($feedfourFp))
//Handle errors in parsing
or die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
// Close the XML file
fclose($feedfourFp);
// Free up memory used by the XML parser
xml_parser_free($xml_parser);
This means the function startElement was already defined. You cannot have more than one function with the same name.
from a PHP script I'm downloading a RSS feed like:
$fp = fopen('http://news.google.es/news?cf=all&ned=es_ve&hl=es&output=rss','r')
or die('Error reading RSS data.');
The feed is an spanish news feed, after I downloaded the file I parsed all the info into one var that have only the content of the tag <description> of every <item>. Well, the issue is that when I echo the var all the information have an html enconding like:
echo($result); // this print: el ministerio pãºblico investigarã¡ la publicaciã³n en la primera pã¡gina
Well I can create a HUGE case instance that searchs for every char can change it for the correspongind one, like: ã¡ for Á and so and so, but there is no way to do this with a single function??? or even better, there is no way to download the content to $fp without the html encoding? Thanks!
Actual code:
<?php
$acumula="";
$insideitem = false;
$tag = '';
$title = '';
$description = '';
$link = '';
function startElement($parser, $name, $attrs) {
global $insideitem, $tag, $title, $description, $link;
if ($insideitem) {
$tag = $name;
} elseif ($name == 'ITEM') {
$insideitem = true;
}
}
function endElement($parser, $name) {
global $insideitem, $tag, $title, $description, $link, $acumula;
if ($name == 'ITEM') {
$acumula = $acumula . (trim($title)) . "<br>" . (trim($description));
$title = '';
$description = '';
$link = '';
$insideitem = false;
}
}
function characterData($parser, $data) {
global $insideitem, $tag, $title, $description, $link;
if ($insideitem) {
switch ($tag) {
case 'TITLE':
$title .= $data;
break;
case 'DESCRIPTION':
$description .= $data;
break;
case 'LINK':
$link .= $data;
break;
}
}
}
$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, 'startElement', 'endElement');
xml_set_character_data_handler($xml_parser, "characterData");
$fp = fopen('http://news.google.es/news?cf=all&ned=es_ve&hl=es&output=rss','r')
or die('Error reading RSS data.');
while ($data = fread($fp, 4096)) {
xml_parse($xml_parser, $data, feof($fp))
or die(sprintf('XML error: %s at line %d',
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
}
//echo $acumula;
fclose($fp);
xml_parser_free($xml_parser);
echo($acumula); // THIS IS $RESULT!
?>
EDIT
Since you're already using the XML parser, you're guaranteed the encoding is UTF-8.
If your page is encoded in ISO-8859-1, or even ASCII, you can do this to convert:
$result = mb_convert_encoding($result, "HTML-ENTITIES", "UTF-8");
Use a library that handles this for you, e.g. the DOM extension or SimpleXML. Example:
$d = new DOMDocument();
$d->load('http://news.google.es/news?cf=all&ned=es_ve&hl=es&output=rss');
//now all the data you get will be encoded in UTF-8
Example with SimpleXML:
$url = 'http://news.google.es/news?cf=all&ned=es_ve&hl=es&output=rss';
if ($sxml = simplexml_load_file($url)) {
echo htmlspecialchars($sxml->channel->title); //UTF-8
}
You can use DOMDocument from PHP to strip HTML encoding tags.
And use encoding conversion functions also from PHP to change encoding of this sting.