I succeed at converting a XML file into a HTML one, but the output file is horrible. I would like a HTML file with the structure of the XML file (including clickable links).
My code :
public function SitemapHTML() {
$dom = new DomDocument();
$dom->formatOutput = TRUE;
$dom->load('sitemap.xml');
$data = $dom->getElementsByTagName('loc');
echo '<!DOCTYPE html>';
echo '<HTML>';
echo '<HEAD>';
echo '<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />';
echo '</HEAD>';
echo '<BODY>';
echo ("<table>");
foreach ($data as $node)
{
echo '<tr>';
$thisurl = $node->textContent;
echo ("<td>" .
'' . $thisurl . '' .
"</td>");
echo '</tr>';
}
echo ("<table>");
echo '</BODY>';
echo '</HTML>';
$dom->saveHTMLFile('ok.html');
}
HTML output file :
I have looked all over google and stackoverflow. Tried many things, but nothing.
I would like a HTML file that has the structure of a common XML sitemap file.
Related
I've been using simplexml_load_file to parse a XML URL, however, the file size is above 100mb and instead of loading only the nodes, what's happening is that the script is loading the whole XML file before the nodes are extracted and parsed, what is resulting in a page TimeOut.
I'm using the following code:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>Boutique</title>
</head>
<body>
<?php header ('Content-type: text/html; charset=UTF-8'); ?>
<!-- --><link rel="stylesheet" href="/va/artigos-complexos/afilio/afilio-vitrine.css" type="text/css" />
<div><p class="ofertasdotopovitrine">ConheƧa nossas super ofertas!</p>
</div>
<div class="mainproductebayfloatright-bottom">
<?php
function parse($url, $offset = 1, $limit = -1)
{
$xml = simplexml_load_file($url);
$limitCriteria = '';
if ($limit > 0) {
$limitCriteria = 'and position() <= ' . ((int)$offset + (int)$limit + 1);
}
$products = array();
$path = sprintf('//produto[position() >= %s %s]', (int)$offset, $limitCriteria);
foreach($xml->xpath($path) as $product) {
$products[] = array(
'nome' => $product->nome,
'preco_promocao' => $product->preco_promocao,
'description' => $product->descricao,
'link_imagem' => $product->link_imagem,
'link_produto' => $product->link_produto,
'preco_normal' => $product->preco_normal,
'parcelas' => $product->parcelas,
'vl_parcelas' => $product->vl_parcelas
);
}
return $products;
}
//XML GOES HERE
$products = parse('http://v2.afilio.com.br/aff/aff_get_boutique.php?boutiqueid=37930-895987&token=53e355b0a09ea0.74300807&progid=1010&format=XML', 5, 5);
?>
<?php
foreach ($products as $product) {
print '<div class="aroundebay"><div id="aroundebay2">';
/* */
print '<div class="titleebay"><a target="_blank" rel="nofollow" href="'. $product['link_produto'] . '">' . $product['nome'] . '"</a></div>';
print '<div class="mainproduct"><a target="_blank" rel="nofollow" href="' . $product['link_produto'] . '"><img style="height:120px" src="' . $product['link_imagem'] . '"/><br/>';
//print "De:; R$". $preco_normal . "<br/>";
print '<span>Apenas R$' . $product['preco_promocao'] . '<br/></a></span></div>';
//print "Em " . $parcelas . "x de : R$" . $vl_parcelas . "</a></span></div>";
print '</div></div>';
}
?>
</div>
</body>
</html>
The CSS is irrelevant.
The script works just fine when you use a smaller XML, such as this one:
http://v2.afilio.com.br/aff/aff_get_boutique.php?boutiqueid=37930-895835&token=53e355b0a09ea0.74300807&progid=1681&format=XML
Would it be possible to load only the, for exemplo, 10 first nodes of the xml without having to load the whole file first?
I'm also accepting suggestions in other languages, such as jQuery.
Thanks in advance. You can also change the file format to JSON and RSS, just change format=XML to format=JSON or format=RSS.
I am trying to parse a website homepage to convert it into xml file to be used as an api in my app.
So far I have successfully done so. However, the parsed text contains the & (ampersand) character which causes the XML parser to fail.
I am looking for a solution that doesn't use the CDATA or doesn't output CDATA in the XML file.
I want to replace & with and at every occurrence. What phpQuery method should I use?
This causes error in browser because the text() method returns a text with
& character in it.
require('phpQuery/phpQuery.php');
$all=phpQuery::newDocumentFileHTML('BPUT.htm', $charset = 'utf-8');
$links = $all['a.myblue'];
echo '<notice>';
foreach ($links as $link) {
echo '<text>';
echo pq($link)->text();
echo '</text>';
echo '<url>';
echo pq($link)->attr('href');
echo '</url>';
}
echo '</notice>';
?>
I do not want to use CDATA, as the CDATA tag is visible in the generated XML :
<?php
header('Content-type: text/xml');
require('phpQuery/phpQuery.php');
$all=phpQuery::newDocumentFileHTML('BPUT.htm', $charset = 'utf-8');
$links = $all['a.myblue'];
echo '<notice>';
foreach ($links as $link) {
echo '<text>';
echo "<![CDATA[";
echo pq($link)->text();
echo "]]>";
echo '</text>';
echo '<url>';
echo pq($link)->attr('href');
echo '</url>';
}
echo '</notice>';
?>
bumping for answers.
I am working through using simple_html_dom.php to scrape and edit/manipulate the following:
<?php
include('simple_html_dom.php');
$_GET["name"];
$html_code="https://hwb.wales.gov.uk/Home/Pages/Home.aspx";
$html_code= $html_code.$name."/?lang=en";
echo $html_code;
$html = file_get_html($html_code);
echo "<html>";
echo "<head>";
echo "<meta charset='UTF-8'>";
echo "<title>PHP Test</title>";
echo " </head>";
echo " <body>";
foreach($html->find('#LatestNewsArts') as $e)
// Code here to append hwb.wale.gov.uk to <img src="/ >
echo $e->innertext . '<br>';
echo " </body>";
echo "</html>";
?>
I can extract the <div> that I'm looking for - and echo it -- that works fine.
Where I hit a wall (my .php-fu is letting me down) is how to I intercept and edit the html inside the e$ that I have scraped?
What I am looking to do, is replace the <img src="/...."> tag with <img src="hwb.wales.gov.uk/....">
Setting a new value to an attribute can easily be done like this: $elmt->attribute = NewValue
Here's a working code answering your question:
// includes Simple HTML DOM Parser
include "simple_html_dom.php";
$html_code="https://hwb.wales.gov.uk/Home/Pages/Home.aspx";
// => I dont know what $name stands fore... It's up to you to change this code to suit your needs
//$html_code= $html_code.$name."/?lang=en";
echo $html_code;
$html = file_get_html($html_code);
echo "<html>";
echo "<head>";
echo "<meta charset='UTF-8'>";
echo "<title>PHP Test</title>";
echo " </head>";
echo " <body>";
// Loop through all divs with id="Article"
foreach($html->find('#LatestNewsArts #Article') as $e){
$url = "https://hwb.wales.gov.uk" . $e->find("img",0)->src;
// Set src to the new $url
$e->find("img",0)->src = $url;
// Print the outertext
echo $e->outertext . '<br>';
}
echo " </body>";
echo "</html>";
// Clear dom object
$html->clear();
unset($html);
=> Working Demo <=
I have the following code that opens a non-txt file and runs through it so it can read the file line by line, i want to create a textbox (using html probably) so i can put my readed text into that but i have no idea how to do it
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title></title>
</head>
<body>
<h2>testing</h2>
<?php
$currentFile = "pathtest.RET";
$fp = fopen($currentFile , 'r');
if (!$fp)
{
echo '<p> FILE NOT FOUND </p>';
exit;
}
else
{
echo '<p><strong> Arquivo:</strong> ['. $currentFile. '] </p>';
}
$numLinha = 0;
while (!feof($fp))
{
$linha = fgets($fp,300);
$numLinha = $numLinha + 1;
echo $linha;
}
fclose($fp);
$numLinha = $numLinha -1;
echo '<hr>linhas processadas: ' . $numLinha;
?>
</body>
</html>
i need the textbox area to be in a form so i can define the cols and rows, or there is an way to do it in php ? is there any way to send the readed content to another .php so i can edit the php to an html interface style freely ?
Try echoing the lines between a textarea:
echo "<textarea>";
while (!feof($fp))
{
$linha = fgets($fp,300);
$numLinha = $numLinha + 1;
echo $linha;
};
echo "</textarea>";
You may use \n in order to break lines on the textarea:
echo $linha . "\n";
I'm using the following code to read a RSS feed. The problem is that I get wrong encoding. The code is in a file with UTF-8 encoding. Is there anything else I have to do to get it right?
$feed_url = "http://lujanenlinea.com.ar/noticias/feed";
$content = file_get_contents($feed_url);
$x = new SimpleXmlElement($content);
echo "<div class='rss-container'>";
echo "<ul class='rss-content'>";
foreach($x->channel->item as $entry) {
echo "<li><a href='$entry->link' title='$entry->title'>" . $entry->title . "</a></li>";
}
echo "</ul>";
Maybe use utf8_encode() or utf8_decode()