for some reason I get this error below when trying to use multiple require() functions in my PHP. Basically, I'm use a couple require() functions to access a couple xml parser pages.
Does anyone know how to fix this?If this isn't very descriptive please say below and I will try to fix it. Thank you. I appreciate any positive feedback. Also, I'm just learning PHP so please don't be too harsh on me. I'm going to provide the following code below.
Here is the error:
Fatal error: Cannot redeclare startElement() (previously declared in /Applications/XAMPP/xamppfiles/htdocs/yournewsflow/news/sports.php:27) in /Applications/XAMPP/xamppfiles/htdocs/yournewsflow/news/political.php on line 34
Here are the require functions:
<?php
require("news/sports.php");
require("news/political.php");
?>
Here is the xml parser used for a couple pages:
<?php
$tag = "";
$title = "";
$description = "";
$link = "";
$pubDate = "";
$show= 50;
$feedzero = "http://feeds.finance.yahoo.com/rss/2.0/category-stocks?region=US&lang=en-US"; $feedone = "http://feeds.finance.yahoo.com/rss/2.0/category-ideas-and-strategies?region=US&lang=en-US";
$feedtwo = "http://feeds.finance.yahoo.com/rss/2.0/category-earnings?region=US&lang=en-US"; $feedthree = "http://feeds.finance.yahoo.com/rss/2.0/category-bonds?region=US&lang=en-US";
$feedfour = "http://feeds.finance.yahoo.com/rss/2.0/category-economy-govt-and-policy?region=US&lang=en-US";
$insideitem = false;
$counter = 0;
$outerData;
function startElement($parser, $name, $attrs) {
global $insideitem, $tag, $title, $description, $link, $pubDate;
if ($insideitem) {
$tag = $name;
} elseif ($name == "ITEM") {
$insideitem = true;
} }
function endElement($parser, $name) {
global $insideitem, $tag, $counter, $show, $showHTML, $outerData;
global $title, $description, $link, $pubDate;
if ($name == "ITEM" && $counter < $show) {
echo "<table>
<tr>
<td>
".htmlspecialchars($description)."
</td>
</tr>";
// if you chose to show the HTML
if ($showHTML) {
$title = htmlspecialchars($title);
$description = htmlspecialchars($description);
$link = htmlspecialchars($link);
$pubDate = htmlspecialchars($pubDate);
// if you chose not to show the HTML
} else {
$title = strip_tags($title);
$description = strip_tags($description);
$link = strip_tags($link);
$pubDate = strip_tags($pubDate);
}
// fill the innerData array
$innerData["title"] = $title;
$innerData["description"] = $description;
$innerData["link"] = $link;
$innerData["pubDate"] = $pubDate;
// fill one index of the outerData array
$outerData["data".$counter] = $innerData;
// make all the variables blank for the next iteration of the loop
$title = "";
$description = "";
$link = "";
$pubDate = "";
$insideitem = false;
// add one to the counter
$counter++;
}
}
function characterData($parser, $data) {
global $insideitem, $tag, $title, $description, $link, $pubDate;
if ($insideitem) {
switch ($tag) {
case "TITLE":
$title .= $data;
break;
case "DESCRIPTION":
$description .= $data;
break;
case "LINK":
$link .= $data;
break;
case "PUBDATE":
$pubDate .= $data;
break;
}
}
}
// Create an XML parser
$xml_parser = xml_parser_create();
// Set the functions to handle opening and closing tags
xml_set_element_handler($xml_parser, "startElement", "endElement");
// Set the function to handle blocks of character data
xml_set_character_data_handler($xml_parser, "characterData");
// if you started with feed:// fix it to html://
// Open the XML file for reading
$feedzeroFp = fopen($feedzero, 'r') or die("Error reading RSS data.");
$feedoneFp = fopen($feedone, 'r') or die("Error reading RSS data.");
$feedtwoFp = fopen($feedtwo, 'r') or die("Error reading RSS data.");
$feedthreeFp = fopen($feedthree, 'r') or die("Error reading RSS data.");
$feedfourFp = fopen($feedfour, 'r') or die("Error reading RSS data.");
// Read the XML file 4KB at a time
while ($data = fread($feedoneFp, 4096))
//Parse each 4KB chunk with the XML parser created above
xml_parse($xml_parser,$data,feof($feedoneFp))
//Handle errors in parsing
or die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
// Close the XML file
fclose($feedoneFp);
while ($data = fread($feedtwoFp, 4096))
//Parse each 4KB chunk with the XML parser created above
xml_parse($xml_parser,$data,feof($feedtwoFp))
//Handle errors in parsing
or die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
// Close the XML file
fclose($feedtwoFp);
while ($data = fread($feedthreeFp, 4096))
//Parse each 4KB chunk with the XML parser created above
xml_parse($xml_parser,$data,feof($feedthreeFp))
//Handle errors in parsing
or die(sprintfs("XML error: %s at line %d",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
// Close the XML file
fclose($feedthreeFp);
while ($data = fread($feedfourFp, 4096))
//Parse each 4KB chunk with the XML parser created above
xml_parse($xml_parser,$data,feof($feedfourFp))
//Handle errors in parsing
or die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
// Close the XML file
fclose($feedfourFp);
// Free up memory used by the XML parser
xml_parser_free($xml_parser);
?>
You cant require the same "parser" more than once because youve already defined the functions in that file. You need to restructure your code:
In parser.functions.php:
function startElement($parser, $name, $attrs) {
global $insideitem, $tag, $title, $description, $link, $pubDate;
if ($insideitem) {
$tag = $name;
} elseif ($name == "ITEM") {
$insideitem = true;
} }
function endElement($parser, $name) {
global $insideitem, $tag, $counter, $show, $showHTML, $outerData;
global $title, $description, $link, $pubDate;
if ($name == "ITEM" && $counter < $show) {
echo "<table>
<tr>
<td>
".htmlspecialchars($description)."
</td>
</tr>";
// if you chose to show the HTML
if ($showHTML) {
$title = htmlspecialchars($title);
$description = htmlspecialchars($description);
$link = htmlspecialchars($link);
$pubDate = htmlspecialchars($pubDate);
// if you chose not to show the HTML
} else {
$title = strip_tags($title);
$description = strip_tags($description);
$link = strip_tags($link);
$pubDate = strip_tags($pubDate);
}
// fill the innerData array
$innerData["title"] = $title;
$innerData["description"] = $description;
$innerData["link"] = $link;
$innerData["pubDate"] = $pubDate;
// fill one index of the outerData array
$outerData["data".$counter] = $innerData;
// make all the variables blank for the next iteration of the loop
$title = "";
$description = "";
$link = "";
$pubDate = "";
$insideitem = false;
// add one to the counter
$counter++;
}
}
function characterData($parser, $data) {
global $insideitem, $tag, $title, $description, $link, $pubDate;
if ($insideitem) {
switch ($tag) {
case "TITLE":
$title .= $data;
break;
case "DESCRIPTION":
$description .= $data;
break;
case "LINK":
$link .= $data;
break;
case "PUBDATE":
$pubDate .= $data;
break;
}
}
}
In your actual page php files:
$tag = "";
$title = "";
$description = "";
$link = "";
$pubDate = "";
$show= 50;
$feedzero = "http://feeds.finance.yahoo.com/rss/2.0/category-stocks?region=US&lang=en-US"; $feedone = "http://feeds.finance.yahoo.com/rss/2.0/category-ideas-and-strategies?region=US&lang=en-US";
$feedtwo = "http://feeds.finance.yahoo.com/rss/2.0/category-earnings?region=US&lang=en-US"; $feedthree = "http://feeds.finance.yahoo.com/rss/2.0/category-bonds?region=US&lang=en-US";
$feedfour = "http://feeds.finance.yahoo.com/rss/2.0/category-economy-govt-and-policy?region=US&lang=en-US";
$insideitem = false;
$counter = 0;
$outerData;
require_once('path/to/parser.functions.php');
// Create an XML parser
$xml_parser = xml_parser_create();
// Set the functions to handle opening and closing tags
xml_set_element_handler($xml_parser, "startElement", "endElement");
// Set the function to handle blocks of character data
xml_set_character_data_handler($xml_parser, "characterData");
// if you started with feed:// fix it to html://
// Open the XML file for reading
$feedzeroFp = fopen($feedzero, 'r') or die("Error reading RSS data.");
$feedoneFp = fopen($feedone, 'r') or die("Error reading RSS data.");
$feedtwoFp = fopen($feedtwo, 'r') or die("Error reading RSS data.");
$feedthreeFp = fopen($feedthree, 'r') or die("Error reading RSS data.");
$feedfourFp = fopen($feedfour, 'r') or die("Error reading RSS data.");
// Read the XML file 4KB at a time
while ($data = fread($feedoneFp, 4096))
//Parse each 4KB chunk with the XML parser created above
xml_parse($xml_parser,$data,feof($feedoneFp))
//Handle errors in parsing
or die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
// Close the XML file
fclose($feedoneFp);
while ($data = fread($feedtwoFp, 4096))
//Parse each 4KB chunk with the XML parser created above
xml_parse($xml_parser,$data,feof($feedtwoFp))
//Handle errors in parsing
or die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
// Close the XML file
fclose($feedtwoFp);
while ($data = fread($feedthreeFp, 4096))
//Parse each 4KB chunk with the XML parser created above
xml_parse($xml_parser,$data,feof($feedthreeFp))
//Handle errors in parsing
or die(sprintfs("XML error: %s at line %d",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
// Close the XML file
fclose($feedthreeFp);
while ($data = fread($feedfourFp, 4096))
//Parse each 4KB chunk with the XML parser created above
xml_parse($xml_parser,$data,feof($feedfourFp))
//Handle errors in parsing
or die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
// Close the XML file
fclose($feedfourFp);
// Free up memory used by the XML parser
xml_parser_free($xml_parser);
This means the function startElement was already defined. You cannot have more than one function with the same name.
Related
I try to pre-sort and slice a big XML file for later processing via xml_parser
function CreateXMLParser($CHARSET, $bareXML = false) {
$CURRXML = xml_parser_create($CHARSET);
xml_parser_set_option( $CURRXML, XML_OPTION_CASE_FOLDING, false);
xml_parser_set_option( $CURRXML, XML_OPTION_TARGET_ENCODING, $CHARSET);
xml_set_element_handler($CURRXML, 'startElement', 'endElement');
xml_set_character_data_handler($CURRXML, 'dataHandler');
xml_set_default_handler($CURRXML, 'defaultHandler');
if ($bareXML) {
xml_parse($CURRXML, '<?xml version="1.0"?>', 0);
}
return $CURRXML;
}
function ChunkXMLBigFile($file, $tag = 'item', $howmany = 1000) {
global $CHUNKON, $CHUNKS, $ITEMLIMIT;
$CHUNKON = $tag;
$ITEMLIMIT = $howmany;
$xml = CreateXMLParser('UTF-8', false);
$fp = fopen($file, "r");
$CHUNKS = 0;
while(!feof($fp)) {
$chunk = fgets($fp, 10240);
xml_parse($xml, $chunk, feof($fp));
}
xml_parser_free($xml);
processChunk();
}
function processChunk() {
global $CHUNKS, $PAYLOAD, $ITEMCOUNT;
if ('' == $PAYLOAD) {
return;
}
$xp = fopen($file = "xmlTemp/slices/slice_".$CHUNKS.".xml", "w");
fwrite($xp, '<?xml version="1.0" ?>'."\n");
fwrite($xp, "<producten>");
fwrite($xp, $PAYLOAD);
fwrite($xp, "</producten>");
fclose($xp);
print "Written ".$file."<br>";
$CHUNKS++;
$PAYLOAD = '';
$ITEMCOUNT = 0;
}
function startElement($xml, $tag, $attrs = array()) {
global $PAYLOAD, $CHUNKS, $ITEMCOUNT, $CHUNKON;
if (!($CHUNKS||$ITEMCOUNT)) {
if ($CHUNKON == strtolower($tag)) {
$PAYLOAD = '';
}
} else {
$PAYLOAD .= "<".$tag;
}
foreach($attrs as $k => $v) {
$PAYLOAD .= " $k=".'"'.addslashes($v).'"';
}
$PAYLOAD .= '>';
}
function endElement($xml, $tag) {
global $CHUNKON, $ITEMCOUNT, $ITEMLIMIT;
dataHandler(null, "<$tag>");
if ($CHUNKON == strtolower($tag)) {
if (++$ITEMCOUNT >= $ITEMLIMIT) {
processChunk();
}
}
}
function dataHandler($xml, $data) {
global $PAYLOAD;
$PAYLOAD .= $data;
}
but how can I access the node-name??
.. I have to sort some items (with n nodes) out, before the slice-file is saved. the the XML is parsed line after line, right? so I have to save the nodes from a whole item temporarely and decide if the item is gonna be written to the file.. is there a way to do this?
Your code is effectively reading the entire source file every time you call the ChunkXMLBigFile function.
After your while loop you have all the elements, which you can then manipulate any way you like.
See the following questions about how to approach this:
How to sort a xml file using DOM
Sort XML nodes with PHP
If you parse the chunks after that in batches of $howmany you are where you want to be.
Tip: there are many examples online where this functionality is presented in an Object Orient Programming (OOP) approach where all the functions are inside a class. This would also eliminate the need of global variables which can cause some (read: a lot) of frustrations and confusion.
I've made this program that updates an xml file based on entries in an array.
I've used FILE_APPEND because the entries are more than one and otherwise file gets overwritten. But the problem is the xml version tag prints out as many times as many entries are there.
So i want to remove this tag.
Here's my program:-
<?php
include 'array.php';
$xmlW = new XMLWriter();
$file = 'entry-'. date('M-D-Y') .'.xml';
/*$setting = new XMLWriterSettings();
$setting->OmitXmlDeclaration = true;*/
foreach($data as $d) {
if(in_array ($d['Mode'], array('ccAV','MB','Paypal','E2P'))) {
$recordType = 'receipt';
$xml_object = simplexml_load_file ('receipt.xml');
} else {
$xml_object = simplexml_load_file ('journal.xml');
$recordType = 'journal';
}
$xml_object->xpath("/ENVELOPE/BODY/IMPORTDATA/REQUESTDATA/TALLYMESSAGE/VOUCHER")[0]->DATE = $d['InvoiceDate'];
$xml_object->xpath("/ENVELOPE/BODY/IMPORTDATA/REQUESTDATA/TALLYMESSAGE/VOUCHER")[0]->NARRATION = 'Rahul';
$xml_object->xpath("/ENVELOPE/BODY/IMPORTDATA/REQUESTDATA/TALLYMESSAGE/VOUCHER")[0]->EFFECTIVEDATE = $d['InvoiceDate'];
$xml_object->xpath("/ENVELOPE/BODY/IMPORTDATA/REQUESTDATA/TALLYMESSAGE/VOUCHER/ALLLEDGERENTRIES.LIST")[0]->LEDGERNAME = $d['Mode'];
$xml_object->xpath("/ENVELOPE/BODY/IMPORTDATA/REQUESTDATA/TALLYMESSAGE/VOUCHER/ALLLEDGERENTRIES.LIST")[0]->AMOUNT = 'Rahul';
$xml_object->xpath("/ENVELOPE/BODY/IMPORTDATA/REQUESTDATA/TALLYMESSAGE/VOUCHER/ALLLEDGERENTRIES.LIST")[1]->AMOUNT = 'Rahul';
$xml_object->xpath("/ENVELOPE/BODY/IMPORTDATA/REQUESTDATA/TALLYMESSAGE/VOUCHER/ALLLEDGERENTRIES.LIST")[2]->AMOUNT = 'Rahul';
$xml_object->xpath("/ENVELOPE/BODY/IMPORTDATA/REQUESTDATA/TALLYMESSAGE/VOUCHER/ALLLEDGERENTRIES.LIST/BANKALLOCATIONS.LIST")[0]->DATE = 'Rahul';
$xml_object->xpath("/ENVELOPE/BODY/IMPORTDATA/REQUESTDATA/TALLYMESSAGE/VOUCHER/ALLLEDGERENTRIES.LIST/BANKALLOCATIONS.LIST")[0]->INSTRUMENTDATE = 'Rahul';
$xml_object->xpath("/ENVELOPE/BODY/IMPORTDATA/REQUESTDATA/TALLYMESSAGE/VOUCHER/ALLLEDGERENTRIES.LIST/BANKALLOCATIONS.LIST")[0]->AMOUNT = 'Rahul';
$xml = $xml_object->asXML();
file_put_contents($file, $xml, FILE_APPEND);
}
?>
Thanks for the help.
The following codes scrapes a list of links from a given webpage and then place them into another script that scrapes the text from the given links and places the data into a csv document. The code runs perfectly on localhost (wampserver 5.5 php) but fails horribly when placed on domain.
You can check out the functionality of the script at http://miskai.tk/ANOFM/csv.php .
Also, file get html and curl are both enabled onto the server.
<?php
header('Content-Type: application/excel');
header('Content-Disposition: attachment; filename="Mehedinti.csv"');
include_once 'simple_html_dom.php';
include_once 'csv.php';
$urls = scrape_main_page();
function scraping($url) {
// create HTML DOM
$html = file_get_html($url);
// get article block
if ($html && is_object($html) && isset($html->nodes)) {
foreach ($html->find('/html/body/table') as $article) {
// get title
$item['titlu'] = trim($article->find('/tbody/tr[1]/td/div', 0)->plaintext);
// get body
$item['tr2'] = trim($article->find('/tbody/tr[2]/td[2]', 0)->plaintext);
$item['tr3'] = trim($article->find('/tbody/tr[3]/td[2]', 0)->plaintext);
$item['tr4'] = trim($article->find('/tbody/tr[4]/td[2]', 0)->plaintext);
$item['tr5'] = trim($article->find('/tbody/tr[5]/td[2]', 0)->plaintext);
$item['tr6'] = trim($article->find('/tbody/tr[6]/td[2]', 0)->plaintext);
$item['tr7'] = trim($article->find('/tbody/tr[7]/td[2]', 0)->plaintext);
$item['tr8'] = trim($article->find('/tbody/tr[8]/td[2]', 0)->plaintext);
$item['tr9'] = trim($article->find('/tbody/tr[9]/td[2]', 0)->plaintext);
$item['tr10'] = trim($article->find('/tbody/tr[10]/td[2]', 0)->plaintext);
$item['tr11'] = trim($article->find('/tbody/tr[11]/td[2]', 0)->plaintext);
$item['tr12'] = trim($article->find('/tbody/tr[12]/td/div/]', 0)->plaintext);
$ret[] = $item;
}
// clean up memory
$html->clear();
unset($html);
return $ret;}
}
$output = fopen("php://output", "w");
foreach ($urls as $url) {
$ret = scraping($url);
foreach($ret as $v){
fputcsv($output, $v);}
}
fclose($output);
exit();
second file
<?php
function get_contents($url) {
// We could just use file_get_contents but using curl makes it more future-proof (setting a timeout for example)
$ch = curl_init($url);
curl_setopt_array($ch, array(CURLOPT_RETURNTRANSFER => true,));
$content = curl_exec($ch);
curl_close($ch);
return $content;
}
function scrape_main_page() {
set_time_limit(300);
libxml_use_internal_errors(true); // Prevent DOMDocument from spraying errors onto the page and hide those errors internally ;)
$html = get_contents("http://lmvz.anofm.ro:8080/lmv/index2.jsp?judet=26");
$dom = new DOMDocument();
$dom->loadHTML($html);
die(var_dump($html));
$xpath = new DOMXPath($dom);
$results = $xpath->query("//table[#width=\"645\"]/tr");
$all = array();
//var_dump($results);
for($i = 1; $i < $results->length; $i++) {
$tr = $results->item($i);
$id = $tr->childNodes->item(0)->textContent;
$requesturl = "http://lmvz.anofm.ro:8080/lmv/detalii.jsp?UNIQUEJVID=" . urlencode($id) .
"&judet=26";
$details = scrape_detail_page($requesturl);
$newObj = new stdClass();
$newObj = $id;
$all[] = $newObj;
}
foreach($all as $xtr) {
$urls[] = "http://lmvz.anofm.ro:8080/lmv/detalii.jsp?UNIQUEJVID=" . $xtr .
"&judet=26";
}
return $urls;
}
scrape_main_page();
Yeah, the problem here is your php.ini configuration. Make sure the server supports curl and fopen. If not start your own linux server.
I am trying to generate an RSS feed on my site using the code below. The rss is appearing but I am having two issues:
When the feed shows on my page the images do not show up, instead you see the img link appear directly on the page like this... <img src="http://graphics8.nytimes.com/images/2011/11/18/movies/18RDP_GARBO/18RDP_GARBO-thumbStandard.jpg" border="0" height="75" width="75" hspace="4" align="left">
How do I limit the amount of articles that appear on my site?
Here is the link to the RSS: Spy RSS FEED
Here is the code I am using:
<?php
$insideitem = false;
$tag = "";
$title = "";
$description = "";
$link = "";
$locations = array('http://topics.nytimes.com/topics/reference/timestopics/subjects/e/espionage/index.html?rss=1');
srand((float) microtime() * 10000000); // seed the random gen
$random_key = array_rand($locations);
function startElement($parser, $name, $attrs) {
global $insideitem, $tag, $title, $description, $link;
if ($insideitem) {
$tag = $name;
} elseif ($name == "ITEM") {
$insideitem = true;
}
}
function endElement($parser, $name) {
global $insideitem, $tag, $title, $description, $link;
if ($name == "ITEM") {
printf("<dt><b><a href='%s' target=new>%s</a></b></dt>",
trim($link),htmlspecialchars(trim($title)));
printf("<dt>%s</dt><br><br>",htmlspecialchars(trim($description)));
$title = "";
$description = "";
$link = "";
$insideitem = false;
}
}
function characterData($parser, $data) {
global $insideitem, $tag, $title, $description, $link;
if ($insideitem) {
switch ($tag) {
case "TITLE":
$title .= $data;
break;
case "DESCRIPTION":
$description .= $data;
break;
case "LINK":
$link .= $data;
break;
}
}
}
$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData");
$fp = fopen($locations[$random_key], 'r')
or die("Error reading RSS data.");
while ($data = fread($fp, 4096))
xml_parse($xml_parser, $data, feof($fp))
or die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
fclose($fp);
xml_parser_free($xml_parser);
?>
In endElement(), when outputting the feed content, it calls printf("<dt>%s</dt><br><br>",htmlspecialchars(trim($description)));
If you remove the htmlspecialchars function, then it should display images and other html properly instead of converting < to < etc.
Given that code, there is no built in way to limit the number of feeds. Nytimes may have an option you can pass as part of the query string that restricts the number of results, but I am not sure about that.
A quick fix would be to add a global variable called $numShown or something like that, and at the beginning of endElement(), you can increment it, and the check to see if it is above some value and if so just return prior to all the printf calls to output the feed item.
<?php
function endElement($parser, $name) {
global $insideitem, $tag, $title, $description, $link, $numShown;
if ($name == "ITEM") {
$numShown++;
if ($numShown >= 5) {
return ;
}
printf ( "<dt><b><a href='%s' target=new>%s</a></b></dt>", trim ( $link ), htmlspecialchars ( trim ( $title ) ) );
printf ( "<dt>%s</dt><br><br>", trim ( $description ) );
$title = "";
$description = "";
$link = "";
$insideitem = false;
}
}
from a PHP script I'm downloading a RSS feed like:
$fp = fopen('http://news.google.es/news?cf=all&ned=es_ve&hl=es&output=rss','r')
or die('Error reading RSS data.');
The feed is an spanish news feed, after I downloaded the file I parsed all the info into one var that have only the content of the tag <description> of every <item>. Well, the issue is that when I echo the var all the information have an html enconding like:
echo($result); // this print: el ministerio pãºblico investigarã¡ la publicaciã³n en la primera pã¡gina
Well I can create a HUGE case instance that searchs for every char can change it for the correspongind one, like: ã¡ for Á and so and so, but there is no way to do this with a single function??? or even better, there is no way to download the content to $fp without the html encoding? Thanks!
Actual code:
<?php
$acumula="";
$insideitem = false;
$tag = '';
$title = '';
$description = '';
$link = '';
function startElement($parser, $name, $attrs) {
global $insideitem, $tag, $title, $description, $link;
if ($insideitem) {
$tag = $name;
} elseif ($name == 'ITEM') {
$insideitem = true;
}
}
function endElement($parser, $name) {
global $insideitem, $tag, $title, $description, $link, $acumula;
if ($name == 'ITEM') {
$acumula = $acumula . (trim($title)) . "<br>" . (trim($description));
$title = '';
$description = '';
$link = '';
$insideitem = false;
}
}
function characterData($parser, $data) {
global $insideitem, $tag, $title, $description, $link;
if ($insideitem) {
switch ($tag) {
case 'TITLE':
$title .= $data;
break;
case 'DESCRIPTION':
$description .= $data;
break;
case 'LINK':
$link .= $data;
break;
}
}
}
$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, 'startElement', 'endElement');
xml_set_character_data_handler($xml_parser, "characterData");
$fp = fopen('http://news.google.es/news?cf=all&ned=es_ve&hl=es&output=rss','r')
or die('Error reading RSS data.');
while ($data = fread($fp, 4096)) {
xml_parse($xml_parser, $data, feof($fp))
or die(sprintf('XML error: %s at line %d',
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
}
//echo $acumula;
fclose($fp);
xml_parser_free($xml_parser);
echo($acumula); // THIS IS $RESULT!
?>
EDIT
Since you're already using the XML parser, you're guaranteed the encoding is UTF-8.
If your page is encoded in ISO-8859-1, or even ASCII, you can do this to convert:
$result = mb_convert_encoding($result, "HTML-ENTITIES", "UTF-8");
Use a library that handles this for you, e.g. the DOM extension or SimpleXML. Example:
$d = new DOMDocument();
$d->load('http://news.google.es/news?cf=all&ned=es_ve&hl=es&output=rss');
//now all the data you get will be encoded in UTF-8
Example with SimpleXML:
$url = 'http://news.google.es/news?cf=all&ned=es_ve&hl=es&output=rss';
if ($sxml = simplexml_load_file($url)) {
echo htmlspecialchars($sxml->channel->title); //UTF-8
}
You can use DOMDocument from PHP to strip HTML encoding tags.
And use encoding conversion functions also from PHP to change encoding of this sting.