I am trying to parse this feed with PHP. This is the structure of feed:
<item>
<title> ... TITLE ... </title>
<link> ... LINK .... </link>
<comments> .. COMMENTS .. </comments>
.... More tags here ....
<description><![CDATA[.. HTML ...]]></description>
</item>
This is my PHP code:
$rss = new DOMDocument();
$rss->loadHTML($feed_url);
foreach ($rss->getElementsByTagName('item') as $node) {
$description = $node->getElementsByTagName('description')->item(0)->nodeValue;
echo $description;
}
but it echoes nothing. I have tried using cURL but even then I can't echo the description tag.
What do I need to change in this code for it to work? Please let me know If I need to post the code of alternate cURL method.
loadHTML is used to load html content, to read rss use below solution
Method 1
$feed_url = 'http://thechive.com/feed/';
$rss = new DOMDocument();
$rss->load($feed_url);
foreach ($rss->getElementsByTagName('item') as $node) {
$description = $node->getElementsByTagName('description')->item(0)->nodeValue;
echo $description;
}
Method 2
$feed_url = 'http://thechive.com/feed/';
$content = file_get_contents($feed_url);
$x = new SimpleXmlElement($content);
foreach($x->channel->item as $entry) {
echo $entry->description;
}
Hope it will help you...
Related
I am trying to extract content from this feed. This is the code I am using:
$rss = new DOMDocument();
$rss->load($feed_url);
foreach ($rss->getElementsByTagName('entry') as $node) {
$description = $node->getElementsByTagName('content')->item(0)->nodeValue;
echo $description;
}
This however, instead of echoing HTML echoes plain text. Here is the strucutre of feed.
<entry>
<title>.....</title>
<link rel=".." type="..." href="...." />
...... More tags ......
<content type="xhtml" xml:lang="en-US" xml:base="http://www.abeautifulmess.com/">
<div xmlns="http://www.w3.org/1999/xhtml"> HTML is all here.
</div>
</content>
It has not happened with any other feed. Is it because of type of content or something else?
Using DOMDocument::saveHTML will preserve the html formatting of the node. This will give you what you want:
$feed_url = 'http://feeds.feedburner.com/a_beautiful_mess?format=xml';
$rss = new DOMDocument();
$rss->load($feed_url);
foreach ($rss->getElementsByTagName('entry') as $node) {
$description = $node->getElementsByTagName('content')->item(0);
echo $rss->saveHTML($description);
}
I have XML in the following form that I want to parse with PHP (I can't change the format of the XML). Neither SimpleXML nor DOM seem to handle the different namespaces - can anyone give me sample code? The code below gives no results.
<atom:feed>
<atom:entry>
<atom:id />
<otherns:othervalue />
</atom:entry>
<atom:entry>
<atom:id />
<otherns:othervalue />
</atom:entry>
</atom:feed>
$doc = new DOMDocument();
$doc->load($url);
$entries = $doc->getElementsByTagName("atom:entry");
foreach($entries as $entry) {
$id = $entry->getElementsByTagName("atom:id");
echo $id;
$othervalue = $entry->getElementsByTagName("otherns:othervalue");
echo $othervalue;
}
I just want to post with an answer to this awful question. Sorry.
Namespaces are irrelavent with DOM - I just wasn't getting the nodeValue from the Element.
$doc = new DOMDocument();
$doc->load($url);
$feed = $doc->getElementsByTagName("entry");
foreach($feed as $entry) {
$id = $entry->getElementsByTagName("id")->item(0)->nodeValue;
echo $id;
$id = $entry->getElementsByTagName("othervalue")->item(0)->nodeValue;
echo $othervalue;
}
You need to register your name spaces. Otherwise simplexml will ignore them.
This bit of code I got from the PHP manual and I used in my own project
$xmlsimple = simplexml_load_string('YOUR XML');
$namespaces = $xmlsimple->getNamespaces(true);
$extensions = array_keys($namespaces);
foreach ($extensions as $extension )
{
$xmlsimple->registerXPathNamespace($extension,$namespaces[$extension]);
}
After that you use xpath on $xmlsimple
I have a little question ... this is my xml:
<?xml version="1.0" encoding="UTF-8"?>
<links>
<link>
<id>432423</id>
<href>http://www.google.ro</href>
</link>
<link>
<id>5432345</id>
<href>http://www.youtube.com</href>
</link>
<link>
<id>5443</id>
<href>http://www.yoursite.com</href>
</link>
</links>
How can i ad another
<link>
<id>5443</id>
<href>http://www.yoursite.com</href>
</link>
??
I managed only to add a record to ROOT/LINKS -> LINK using xpath, and here is the code
<?php
$doc = new DOMDocument();
$doc->load( 'links.xml' );
$links= $doc->getElementsByTagName("links");
$xpath = new DOMXPath($doc);
$hrefs = $xpath->evaluate("/links");
$href = $hrefs->item(0);
$item = $doc->createElement("item");
/*HERE IS THE ISSUE...*/
$link = $doc->createElement("id","298312800");
$href->appendChild($link);
$link = $doc->createElement("link","www.anysite.com");
$href->appendChild($link);
$href->appendChild($item);
print $doc->save('links.xml');
echo "the link has been added!";
?>
Any help would be appreciated :D
$doc = new DOMDocument();
// Setting formatOutput to true will turn on xml formating so it looks nicely
// however if you load an already made xml you need to strip blank nodes if you want this to work
$doc->load('links.xml', LIBXML_NOBLANKS);
$doc->formatOutput = true;
// Get the root element "links"
$root = $doc->documentElement;
// Create new link element
$link = $doc->createElement("link");
// Create and add id to new link element
$id = $doc->createElement("id","298312800");
$link->appendChild($id);
// Create and add href to new link element
$href = $doc->createElement("href","www.anysite.com");
$link->appendChild($href);
// Append new link to root element
$root->appendChild($link);
print $doc->save('links.xml');
echo "the link has been added!";
XPath is used to locate nodes in an XML document, not to manipulate the tree. Try $dom->appendChild($new_link).
Anyone has a PHP function that can grab all links inside a specific DIV on a remote site? So usage might be:
$links = grab_links($url,$divname);
And return an array I can use. Grabbing links I can figure out but not sure how to make it only do it within a specific div.
Thanks!
Scott
Check out PHP XPath. It will let you query a document for the contents of specific tags and so on. The example on the php site is pretty straightforward:
http://php.net/manual/en/simplexmlelement.xpath.php
This following example will actually grab all of the URLs in any DIVs in a doc:
$xml = new SimpleXMLElement($docAsString);
$result = $xml->xpath('//div//a');
You can use this on well-formed HTML files, not just XML.
Good XPath reference: http://msdn.microsoft.com/en-us/library/ms256086.aspx
In the past I have use the PHP Simple DOM library with success:
http://simplehtmldom.sourceforge.net/
Samples:
// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');
// Find all images
foreach($html->find('img') as $element)
echo $element->src . '<br>';
// Find all links
foreach($html->find('a') as $element)
echo $element->href . '<br>';
I found something that seems to do what I wanted.
http://www.earthinfo.org/xpaths-with-php-by-example/
<?php
$html = new DOMDocument();
#$html->loadHtmlFile('http://www.bbc.com');
$xpath = new DOMXPath( $html );
$nodelist = $xpath->query( "//div[#id='news_moreTopStories']//a/#href" );
foreach ($nodelist as $n){
echo $n->nodeValue."\n";
}
// for images
echo "<br><br>";
$html = new DOMDocument();
#$html->loadHtmlFile('http://www.bbc.com');
$xpath = new DOMXPath( $html );
$nodelist = $xpath->query( "//div[#id='promo_area']//img/#src" );
foreach ($nodelist as $n){
echo $n->nodeValue."\n";
}
?>
I also tried PHP DOM method and it seems faster...
http://w-shadow.com/blog/2009/10/20/how-to-extract-html-tags-and-their-attributes-with-php/
$html = file_get_contents('http://www.bbc.com');
//Create a new DOM document
$dom = new DOMDocument;
//Parse the HTML. The # is used to suppress any parsing errors
//that will be thrown if the $html string isn't valid XHTML.
#$dom->loadHTML($html);
//Get all links. You could also use any other tag name here,
//like 'img' or 'table', to extract other tags.
$links = $dom->getElementById('news_moreTopStories')->getElementsByTagName('a');
//Iterate over the extracted links and display their URLs
foreach ($links as $link){
//Extract and show the "href" attribute.
echo $link->getAttribute('href'), '<br>';
}
My question is best phrase as:
Remove a child with a specific attribute, in SimpleXML for PHP
except I'm not using simpleXML.
I'm new to XML for PHP so I may not be doing the best way
I have a xml created using the $dom->save($xml) for each individual user. (not placing all in one xml due to undisclosed reasons)
It gives me that xml declaration <?xml version="1.0"?> (no idea how to make it to others, but that's not the point, hopefully)
<?xml version="1.0"?>
<details>
<person>name</person>
<data1>some data</data1>
<data2>some data</data2>
<data3>some data</data3>
<category id="0">
<categoryName>Cat 1</categoryName>
<categorydata1>some data</categorydata1>
</category>
<category id="1">
<categoryName>Cat 2</categoryName>
<categorydata1>some data</categorydata1>
<categorydata2>some data</categorydata2>
<categorydata3>some data</categorydata3>
<categorydata4>some data</categorydata4>
</category>
</details>
And I want to remove a category that has a specific attribute named id with the DOM class in php when i run a function activated from using a remove button.
the following is the debug of the function im trying to get to work. Can i know what I'm doing wrong?
function CatRemove($myXML){
$xmlDoc = new DOMDocument();
$xmlDoc->load( $myXML );
$categoryArray = array();
$main = $xmlDoc->getElementsByTagName( "details" )->item(0);
$mainElement = $xmlDoc->getElementsByTagName( "details" );
foreach($mainElement as $details){
$currentCategory = $details->getElementsByTagName( "category" );
foreach($currentCategory as $category){
$categoryID = $category->getAttribute('id');
array_push($categoryArray, $categoryID);
if($categoryID == $_POST['categorytoremoveValue']) {
return $categoryArray;
}
}
}
$xmlDoc->save( $myXML );
}
Well the above prints me an array of [0]->0 all the time when i slot the return outside the if.
is there a better way? I've tried using getElementbyId as well but I've no idea how to work that.
I would prefer not to use an attribute though if that would make things easier.
Ok, let’s try this complete example of use:
function CatRemove($myXML, $id) {
$xmlDoc = new DOMDocument();
$xmlDoc->load($myXML);
$xpath = new DOMXpath($xmlDoc);
$nodeList = $xpath->query('//category[#id="'.(int)$id.'"]');
if ($nodeList->length) {
$node = $nodeList->item(0);
$node->parentNode->removeChild($node);
}
$xmlDoc->save($myXML);
}
// test data
$xml = <<<XML
<?xml version="1.0"?>
<details>
<person>name</person>
<data1>some data</data1>
<data2>some data</data2>
<data3>some data</data3>
<category id="0">
<categoryName>Cat 1</categoryName>
<categorydata1>some data</categorydata1>
</category>
<category id="1">
<categoryName>Cat 2</categoryName>
<categorydata1>some data</categorydata1>
<categorydata2>some data</categorydata2>
<categorydata3>some data</categorydata3>
<categorydata4>some data</categorydata4>
</category>
</details>
XML;
// write test data into file
file_put_contents('untitled.xml', $xml);
// remove category node with the id=1
CatRemove('untitled.xml', 1);
// dump file content
echo '<pre>', htmlspecialchars(file_get_contents('untitled.xml')), '</pre>';
So you want to remove the category node with a specific id?
$node = $xmlDoc->getElementById("12345");
if ($node) {
$node->parentNode->removeChild($node);
}
You could also use XPath to get the node, for example:
$xpath = new DOMXpath($xmlDoc);
$nodeList = $xpath->query('//category[#id="12345"]');
if ($nodeList->length) {
$node = $nodeList->item(0);
$node->parentNode->removeChild($node);
}
I haven’t tested it but it should work.
Can you try with this modified version:
function CatRemove($myXML, $id){
$doc = new DOMDocument();
$doc->loadXML($myXML);
$xpath = new DOMXpath($doc);
$nodeList = $xpath->query("//category[#id='$id']");
foreach ($nodeList as $element) {
$element->parentNode->removeChild($element);
}
echo htmlentities($doc->saveXML());
}
It's working for me. Just adapt it to your needs. It's not intended to use as-is, but just a proof of concept.
You also have to remove the xml declaration from the string.
the above funciton modified to remove an email from a mailing list
function CatRemove($myXML, $id) {
$xmlDoc = new DOMDocument();
$xmlDoc->load($myXML);
$xpath = new DOMXpath($xmlDoc);
$nodeList = $xpath->query('//subscriber[#email="'.$id.'"]');
if ($nodeList->length) {
$node = $nodeList->item(0);
$node->parentNode->removeChild($node);
}
$xmlDoc->save($myXML);
}
$xml = 'list.xml';
$to = $_POST['email'];//user already submitted they email using a form
CatRemove($xml,$to);