PHP parses XML content without looping the hyperlink - php

I have an xml file structured like this:
<channel>
<title>abc</title>
<link>domain.com</link>
<description>Bla bla.</description>
<item>
<title>xyz </title>
<link>domain.com/</link>
<description>
<table border="1" width="100%"><tr><th colspan="2"></th><th>P</th><th>W</th><th>D</th><th>L</th><th>GF</th><th>GA</th><th>Dif</th><th>Pts</th></tr><tr><td width="7%">1</td><td width="27%"><a target="_blank" href="domain[dot]com/new-york/"/>New York</td><td width="7%"><center>12</center></td><td width="7%"><center>8</center></td><td width="7%"><center>2</center></td><td width="7%"><center>2</center></td><td width="7%"><center>17</center></td><td width="7%"><center>10</center></td><td width="7%"<center>+7</center></td><td width="7%"><center>26</center></td></tr><tr><td width="7%">2</td><td width="27%"><a target="_blank" href="domain[dot]com/lon-don/"/>London</td><td width="7%"><center>12</center></td><td width="7%"><center>6</center></td><td width="7%"><center>4</center></td><td width="7%"><center>2</center></td><td width="7%"><center>22</center></td><td width="7%"><center>12</center></td><td width="7%"><center>+10</center></td><td width="7%"><center>22</center></td></tr></table><br/>
</description>
I used this piece of code to parse the table data in PHP and i was successful:
$url = "link to the above xml file";
$xml = simplexml_load_file($url);
foreach($xml->channel->item as $item){
$desc = html_entity_decode((string)$item->description);
$descXML = simplexml_load_string('<desc>'.$desc.'</desc>');
$html = $descXML->table->asXML();
$html .= "<hr />";
echo $html;
}
However, it also includes the hyperlink in the table data/ array values, which are domain[dot]com/newyork/ and domain[dot]com/london/ while outputting.
What I am expecting is that I would like to exclude the hyperlinks in the output, which means that I just need the plain text such as Lon Don or New York and so on.
No hyperlink in the output, please.
Thanks,

As you are just displaying the entire table XML in
$html = $descXML->table->asXML();
This contains all of the markup of the table, what you need to do if you just want some of the table data is to further process it to extract that data...
$xml = simplexml_load_file($url);
foreach($xml->item as $item){
$desc = html_entity_decode((string)$item->description);
$descXML = simplexml_load_string('<desc>'.$desc.'</desc>');
// Loop over each row of the table
foreach ( $descXML->table->tr as $row ) {
// If there are td elements
if ( isset($row->td) ) {
// Extract the value from the second td element, convert to a string and trim the result
$html = trim((string)($row->td[1]));
$html .= "<hr />";
echo $html;
}
}
}
If you want all of the <tr> XML except the <a> tag, you can just unset it (assuming it will always be there)...
foreach ( $descXML->table->tr as $row ) {
// If there are td elements
if ( isset($row->td) ) {
unset($row->td[1]->a);
$html = $row->asXML(). "<hr />";
echo $html;
}
}

Related

Fetch content of all div with same class using PHP Simple HTML DOM Parser

I am new to HTML DOM parsing with PHP, there is one page which is having different content in its but having same 'class', when I am trying to fetch content I am able to get content of last div, Is it possible that somehow I could get all the content of divs having same class request you to please have a look over my code:
<?php
include(__DIR__."/simple_html_dom.php");
$html = file_get_html('http://campaignstudio.in/');
echo $x = $html->find('h2[class="section-heading"]',1)->outertext;
?>
In your example code, you have
echo $x = $html->find('h2[class="section-heading"]',1)->outertext;
as you are calling find() with a second parameter of 1, this will only return the 1 element. If instead you find all of them - you can do whatever you need with them...
$list = $html->find('h2[class="section-heading"]');
foreach ( $list as $item ) {
echo $item->outertext . PHP_EOL;
}
The full code I've just tested is...
include(__DIR__."/simple_html_dom.php");
$html = file_get_html('http://campaignstudio.in/');
$list = $html->find('h2[class="section-heading"]');
foreach ( $list as $item ) {
echo $item->outertext . PHP_EOL;
}
which gives the output...
<h2 class="section-heading text-white">We've got what you need!</h2>
<h2 class="section-heading">At Your Service</h2>
<h2 class="section-heading">Let's Get In Touch!</h2>

Parsing HTML Table Data from XML with PHP

I am somewhat new with PHP, but can't really wrap my head around what I am doing wrong here given my situation.
Problem: I am trying to get the href of a certain HTML element within a string of characters inside an XML object/element via Reddit (if you visit this page, it would be the actual link of the video - not the reddit link but the external youtube link or whatever - nothing else).
Here is my code so far (code updated):
Update: Loop-mania! Got all of the hrefs, but am now trying to store them inside a global array to access a random one outside of this function.
function getXMLFeed() {
echo "<h2>Reddit Items</h2><hr><br><br>";
//$feedURL = file_get_contents('https://www.reddit.com/r/videos/.xml?limit=200');
$feedURL = 'https://www.reddit.com/r/videos/.xml?limit=200';
$xml = simplexml_load_file($feedURL);
//define each xml entry from reddit as an item
foreach ($xml -> entry as $item ) {
foreach ($item -> content as $content) {
$newContent = (string)$content;
$html = str_get_html($newContent);
foreach($html->find('table') as $table) {
$links = $table->find('span', '0');
//echo $links;
foreach($links->find('a') as $link) {
echo $link->href;
}
}
}
}
}
XML Code:
http://pasted.co/0bcf49e8
I've also included JSON if it can be done this way; I just preferred XML:
http://pasted.co/f02180db
That is pretty much all of the code. Though, here is another piece I tried to use with DOMDocument (scrapped it).
foreach ($item -> content as $content) {
$dom = new DOMDocument();
$dom -> loadHTML($content);
$xpath = new DOMXPath($dom);
$classname = "/html/body/table[1]/tbody/tr/td[2]/span[1]/a";
foreach ($dom->getElementsByTagName('table') as $node) {
echo $dom->saveHtml($node), PHP_EOL;
//$originalURL = $node->getAttribute('href');
}
//$html = $dom->saveHTML();
}
I can parse the table fine, but when it comes to getting certain element's values (nothing has an ID or class), I can only seem to get ALL anchor tags or ALL table rows, etc.
Can anyone point me in the right direction? Let me know if there is anything else I can add here. Thanks!
Added HTML:
I am specifically trying to extract <span>[link]</span> from each table/item.
http://pastebin.com/QXa2i6qz
The following code can extract you all the youtube links from each content.
function extract_youtube_link($xml) {
$entries = $xml['entry'];
$videos = [];
foreach($entries as $entry) {
$content = html_entity_decode($entry['content']);
preg_match_all('/<span><a href="(.*)">\[link\]/', $content, $matches);
if(!empty($matches[1][0])) {
$videos[] = array(
'entry_title' => $entry['title'],
'author' => preg_replace('/\/(.*)\//', '', $entry['author']['name']),
'author_reddit_url' => $entry['author']['uri'],
'video_url' => $matches[1][0]
);
}
}
return $videos;
}
$xml = simplexml_load_file('reddit.xml');
$xml = json_decode(json_encode($xml), true);
$videos = extract_youtube_link($xml);
foreach($videos as $video) {
echo "<p>Entry Title: {$video['entry_title']}</p>";
echo "<p>Author: {$video['author']}</p>";
echo "<p>Author URL: {$video['author_reddit_url']}</p>";
echo "<p>Video URL: {$video['video_url']}</p>";
echo "<br><br>";
}
The code outputs in the multidimensional format of array with the elements inside are entry_title, author, author_reddit_url and video_url. Hope it helps you!
If you're looking for a specific element you don't need to parse the whole thing. One way of doing it could be to use the DOMXPath class and query directly the xml. The documentation should guide you through.
http://php.net/manual/es/class.domxpath.php .

How can export mysql data in to xml using php

Below code is for export data from mysql table as xml file. I have tried several code but not getting the result. Please check and help me.
Currently getting result is
8sarathsarathernakulam423432washington9rahulrahulernakulam21212121newyork10aaaa3london11bbbb1newyork12cccc2washington13dddd3london
Code
<?php
require_once "classes/dbconnection-class.php";
if(isset($_POST['export'])){
header('Content-type: text/xml');
$xml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>";
$root_element = "addressbook"; //fruits
$xml .= "<$root_element>";
$query = "SELECT AB.id, AB.name, AB.firstname, AB.street, AB.zipcode, AB.city_id, CI.city FROM address_book AS AB INNER JOIN city AS CI ON AB.city_id = CI.id";
$result = $mysqli->query($query);
if (!$result) {
die('Invalid query: ' . $mysqli->error());
}
while($result_array = $result->fetch_assoc()){
$xml .= "<address>";
foreach($result_array as $key => $value)
{
//$key holds the table column name
$xml .= "<$key>";
//embed the SQL data in a CDATA element to avoid XML entity issues
$xml .= "<![CDATA[$value]]>";
//and close the element
$xml .= "</$key>";
}
$xml.="</address>";
}
$xml .= "</$root_element>";
header ("Content-Type:text/xml");
//header('Content-Disposition: attachment; filename="downloaded.xml"');
echo $xml;
}
?>
Browser shows
<?xml version="1.0" encoding="UTF-8"?><addressbook><address><id><![CDATA[8]]></id><name><![CDATA[sarath]]></name><firstname><![CDATA[sarath]]></firstname><street><![CDATA[ernakulam]]></street><zipcode><![CDATA[42343]]></zipcode><city_id><![CDATA[2]]></city_id><city><![CDATA[washington]]></city></address><address><id><![CDATA[9]]></id><name><![CDATA[rahul]]></name><firstname><![CDATA[rahul]]></firstname><street><![CDATA[ernakulam]]></street><zipcode><![CDATA[2121212]]></zipcode><city_id><![CDATA[1]]></city_id><city><![CDATA[newyork]]></city></address><address><id><![CDATA[10]]></id><name><![CDATA[a]]></name><firstname><![CDATA[a]]></firstname><street><![CDATA[a]]></street><zipcode><![CDATA[a]]></zipcode><city_id><![CDATA[3]]></city_id><city><![CDATA[london]]></city></address><address><id><![CDATA[11]]></id><name><![CDATA[b]]></name><firstname><![CDATA[b]]></firstname><street><![CDATA[b]]></street><zipcode><![CDATA[b]]></zipcode><city_id><![CDATA[1]]></city_id><city><![CDATA[newyork]]></city></address><address><id><![CDATA[12]]></id><name><![CDATA[c]]></name><firstname><![CDATA[c]]></firstname><street><![CDATA[c]]></street><zipcode><![CDATA[c]]></zipcode><city_id><![CDATA[2]]></city_id><city><![CDATA[washington]]></city></address><address><id><![CDATA[13]]></id><name><![CDATA[d]]></name><firstname><![CDATA[d]]></firstname><street><![CDATA[d]]></street><zipcode><![CDATA[d]]></zipcode><city_id><![CDATA[3]]></city_id><city><![CDATA[london]]></city></address></addressbook>
When we are dealing with XML and HTML, the best way to act is ever through a parser.
In this particular situation, operating with a parser guarantees a valid XML and a clean, short code.
After defining mySQL query, we init a new DOMDocument with version and encoding, then we set his ->formatOutput to True to print out XML in indented format:
$query = "SELECT AB.id, AB.name, AB.firstname, AB.street, AB.zipcode, AB.city_id, CI.city FROM address_book AS AB INNER JOIN city AS CI ON AB.city_id = CI.id";
$dom = new DOMDocument( '1.0', 'utf-8' );
$dom ->formatOutput = True;
Then, we create the root node and we append it to DOMDocument:
$root = $dom->createElement( 'addressbook' );
$dom ->appendChild( $root );
At this point, after executing mySQL query, we perform a while loop through each resulting row; for each row, we create an empty node <address>, then we perform a foreach loop through each row's field. For each field, we create an empty childnode with tag as field key, then we append to childnode the field value as CDATA and the same childnode to <address> node; at the end of each while loop, each <address> node is appended to root node:
$result = $mysqli->query( $query );
while( $row = $result->fetch_assoc() )
{
$node = $dom->createElement( 'address' );
foreach( $row as $key => $val )
{
$child = $dom->createElement( $key );
$child ->appendChild( $dom->createCDATASection( $val) );
$node ->appendChild( $child );
}
$root->appendChild( $node );
}
Now, your XML is ready.
If you want save it to a file, you can do it by:
$dom->save( '/Your/File/Path.xml' );
Otherwise, if you prefer send it as XML you have to use this code:
header( 'Content-type: text/xml' );
echo $dom->saveXML();
exit;
If you want instead output it in HTML page, you can write this code:
echo '<pre>';
echo htmlentities( $dom->saveXML() );
echo '</pre>';
See more about DOMDocument
Go to your phpmyadmin database export and select xml in file format.
Replace
$xml .= "<![CDATA[$value]]>";
with
$xml .= $value;
IF you want to have it format it "nicely" in the browser add an:
echo "<pre>";
before the:
echo $xml;
Please note this WILL BREAK the XML file, but it will look good in the browser.... if that is what you are after...
I would suggest to use libraries like SimpleXMLElement etc. to create XML documents.
$xml = new SimpleXMLElement("<?xml version=\"1.0\" encoding=\"UTF-8\" ?><{$root_element}></{$root_element}>");
while($result_array = $result->fetch_assoc()){
foreach($result_array as $key => $value)
{
$address = $xml->addChild("address");
//embed the SQL data in a CDATA element to avoid XML entity issues
$addressFields = $address->addChild('"' . $key . '"', "<![CDATA[$value]]>");
//No need to close the element
}
}
Header('Content-type: text/xml');
print($xml->asXML());

Echo content between tags?

Is it possible to get and then echo the content in between tags using only PHP?
For instance. If this is the following HTML:
<td class="header subject">Text</td>
How can you get Text from inside the tags and then echo it?
I thought this would work:
<?
preg_match("'<td class=\"header subject\">(.*?)</td>'si", $source, $match);
if($match) echo "result=".$match[1];
?>
But the $source variable has to be the entire page.
Note: There is only one instance of the header subject class, so there shouldn't be a problem with multiple tags.
You should parse the text using the DOMDocument class, and grab the textContent of the element.
$html = '<td class="header subject">Text</td>';
$dom = new DOMDocument();
$dom->loadHTML( $html );
// Text
echo $dom->getElementsByTagName("td")->item(0)->textContent;
Or if you need to cycle through many td elements and only show the text of those that have the class value "header subject", you could do the following:
$tds = $dom->getElementsByTagName("td");
for ( $i = 0; $i < $tds->length; $i++ ) {
$currentTD = $tds->item($i);
$classAttr = $currentTD->attributes->getNamedItem("class");
if ( $classAttr && $classAttr->nodeValue === "header subject" ) {
echo $currentTD->textContent;
}
}
Demo: http://codepad.org/o1xqrnRS
Assuming your problem is because you don't know how to interpret the page, you might want to try this:
<?php
$lines = file("/path/to/file.html");
foreach($lines as $i => $line)
{
if (preg_match("'<td class=\"header subject\">(.*?)</td>'si", $line, $match))
{
echo "result=". $match[$i];
}
}
?>

Removing wrapping HTML elements inside a RSS XML node

I have a fetch function that injects rss content into a page for me. This returns an xml which contains the usual RSS elements like title, link, description but the problem is the returned description is a table with two tds which one contains an image the other the text. I am not sure how I can remove the table, img and the tds and be left only with the text using php and not javascript.
Any help is much appreciated.
<?php
require_once('rss_fetch.inc');
$url = 'http://www.domain.com/rss.aspx?typeid=0&imagesize=120&topcount=20';
if ( $url ) {
$rss = fetch_rss( $url );
//echo "Channel: " . $rss->channel['title'] . "<p>";
echo "<ul>";
foreach ($rss->items as $item) {
$href = $item['link'];
$title = $item['title'];
$description = $item['description'];
$pubdate = date('F dS, Y', strtotime($item['pubdate']));
echo "<li><h3>$title<em>$pubdate</em></h3>$description <p><a href='$href' target='_blank'>ادامه مطلب</a></p><br/></li>";
}
echo "</ul>";
}
?>
strip_tags() will do the job..

Categories