parse and process HTML/XML/plain text page [duplicate] - php

This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 3 years ago.
I am creating a small php app that pulls data from a remote website its working great but i would like to make it more user friendly now.
I need to get a few specific items from the page and as far as I can tell the page looks like an xml file wen you look at sorce code but it has no style to it and appears as plain text so I don't really know what to do.
The page I am trying to get looks like this
<channel>
<name>data</name>
<id>data</id>
<img>data</img>
<auther>data</auther>
<mp3>data</mp3>
<bio>data</bio>
</channel>
<channel>
<name>data</name>
<id>data</id>
<img>data</img>
<auther>data</auther>
<mp3>data</mp3>
<bio>data</bio>
</channel>
<channel>
<name>data</name>
<id>data</id>
<img>data</img>
<auther>data</auther>
<mp3>data</mp3>
<bio>data</bio>
</channel>
<channel>
<name>data</name>
<id>data</id>
<img>data</img>
<auther>data</auther>
<mp3>data</mp3>
<bio>data</bio>
</channel>
I need to get all the data from each tag under the channel tag and keep it in the same order to echo it back out onto my own page in the same way.
How could i do this ? i tried using regex with the following patter
$pattern = '<channel>
<name>(.*)</name>
<id>(.*)</id>
<img>(.*)</img>
<auther>(.*)</auther>
<mp3>(.*)</mp3>
<bio>(.*)</bio>
</channel>';
but that doesn't work I really need the best and simplest way to do this.

$SimpleXMLElement = new SimpleXMLElement($str);
foreach ($SimpleXMLElement->children() as $Channel) {
foreach ($Channel->children() as $Child) {
echo $Child->getName() . ' = ' . (string) $Child;
}
}
this way you can use SimpleXMLElement, it's very easy

I would "sanitize" the incoming data and make an xml document out of it. This can be done by simply wrapping it into a surrounding tag. (I name it channels). Having this, you can parse the data using DOM:
// Sanitize input data. Make an xml out of it
$xml = '<channels>';
$xml .= file_get_contents($url);
$xml .= '</channels>';
// Create a document
$doc = new DOMDocument();
$doc->loadXML($xml);
// Iterate through channel elements
foreach($doc->getElementsByTagName('channel') as $channel) {
echo $channel->getElementsByTagName('name')->item(0)->nodeValue . PHP_EOL;
echo $channel->getElementsByTagName('id')->item(0)->nodeValue . PHP_EOL;
// And so on ...
}

Related

Fetching the tags and contents of an xml document with DOM parser in PHP

Consider an XML file like this :
<title>sometitle</title>
<a>
<abc>content1</abc>
<xyz>content2</sxyz>
<metadata>
<b>
<c>content3</c>
<d><attribute></d>
</b>
</metadata>
</a>
I use this code to parse my file and i get the output such as :
title : abc
a:content1 content2 content 3
i.e it only parses the first level tags and fails to parse subtags and get the value ,any help is much appreciated since I'am a complete newbie in this.So far this is what I have tried:
$xmlDoc = new DOMDocument();
$xmlDoc->load("somedoc.xml");
$x = $xmlDoc->documentElement;
foreach($x->childNodes AS $item)
{
print $item->nodeName . " = " . $item->nodeValue . "<br>";
}
Check the below link for php documentation on haschildnodes() functions.
You can see samples in the below page for usage.
http://php.net/manual/en/domnode.haschildnodes.php

Parsing xml feed with cdata PHP SimpleXML [duplicate]

This question already has answers here:
How to parse CDATA HTML-content of XML using SimpleXML?
(2 answers)
Closed 8 years ago.
I am parsing a rss feed to json using php.
using below code
my json output contains data out of description from item element but title and link data not extracting
problem is some where with incorrent CDATA or my code is not parsing it correctly.
xml is here
$blog_url = 'http://www.blogdogarotinho.com/rssfeedgenerator.ashx';
$rawFeed = file_get_contents($blog_url);
$xml=simplexml_load_string($rawFeed,'SimpleXMLElement', LIBXML_NOCDATA);
// step 2: extract the channel metadata
$articles = array();
// step 3: extract the articles
foreach ($xml->channel->item as $item) {
$article = array();
$article['title'] = (string)trim($item->title);
$article['link'] = $item->link;
$article['pubDate'] = $item->pubDate;
$article['timestamp'] = strtotime($item->pubDate);
$article['description'] = (string)trim($item->description);
$article['isPermaLink'] = $item->guid['isPermaLink'];
$articles[$article['timestamp']] = $article;
}
echo json_encode($articles);
I think you are just the victim of the browser hiding the tags. Let me explain:
Your input feed doesn't really has <![CDATA[ ]]> tags in them, the < and >s are actually entity encoded in the raw source of the rss stream, hit ctrl+u on the rss link in your browser and you will see:
<?xml version="1.0" encoding="utf-16"?>
<rss xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" version="2.0">
<channel>
<description>Blog do Garotinho</description>
<item>
<description><![CDATA[<br>
Fico impressionado com a hipocrisia e a falsidade de certos polĂ­ticos....]]>
</description>
<link><![CDATA[http://www.blogdogarotinho.com.br/lartigo.aspx?id=16796]]></link>
...
<title><![CDATA[A bancada dos caras de pau]]></title>
</item>
As you can see the <title> for example starts with a < which when will turn to a < when simplexml returns it for your json data.
Now if you are looking the printed json data in a browser your browser will see the following:
"title":"<![CDATA[A bancada dos caras de pau]]>"
Which will will not be rendered because it's inside a tag. The description seem to show up because it has a <br> tag in it at some point which ends the first "tag" and thus you can see the rest of the output.
If you hit ctrl+u you should see the output printed as expected (i myself used a command line php file and did not notice this first).
Try this demo:
There seem to be empty an empty "" after the "title":
http://codepad.viper-7.com/ZYpaS1
However if i put a htmlspecialchars() around the json_encode():
http://codepad.viper-7.com/1nHqym they became "visible".
You could try to get rid of these by simply replacing them out after the parse with a simple preg_replace():
function clean_cdata($str) {
return preg_replace('#(^\s*<!\[CDATA\[|\]\]>\s*$)#sim', '', (string)$str);
}
This should take care of the CDATA blocks if they are at the start or the end of the individual tags. You can throw call this inside the foreach() loop like this:
// ....
$article['title'] = clean_cdata($item->title);
// ....

Simple xml returns no values when trying to access nodes

Hi guys very new to the php world.
I am listening for a PHP post that contains xml, when the xml is retrieved i need to access individual nodes. I am able to echo the full xml file but not individual attributes.
Currently I am just sending the data using a chrome extension Postman. There is no front end code. Here is my XML:
<?xml version="1.0" encoding="UTF-8"?>
<job>
<job_ref>abc123</job_ref>
<job_title>Test Engineer</job_title>
</job>
And here is my PHP:
if($_SERVER['REQUEST_METHOD'] === 'POST') {
$xml = file_get_contents('php://input');
echo $xml;
$xml=simplexml_load_file($xml);
echo $xml->job_ref . "<br>";
echo $xml->job_title . "<br>";
}else{
die();
}
Any hep wopuld be amazing am I am very stuck.
Many thanks
simplexml_load_file expect PATH to the XML file, not its content. You have to use simplexml_load_string instead:
$xml = simplexml_load_string($xml);

Linked in xml response to php variables [duplicate]

This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 9 years ago.
i am getting this result from my linked in connect script,
<person>
<email-address>xzenia1#gmail.com</email-address>
<picture-url>http://m3.licdn.com/mpr/mprx/0_UiHHf6SiF4yuBerHUkfUfkshFpomUIrHMbpBf5Iy4sOYk7FecL4XTLxtdAEl42AXsho9hGzDtRBl</picture-url>
</person>
this is the php call
$xml_response = $linkedin->getProfile("~:(email-address,picture-url)");
how to make them assign to separate PHP variable.
You can load your xml as string with simplexml_load_string and then loop in it to get all data
$xml = simplexml_load_string($xml_response);
foreach($xml as $key => $val)
{
echo "$key=>$val<br>" . "\n";
}
This will output
email-address=>xzenia1#gmail.com
picture-url=>http://m3.licdn.com/mpr/mprx/0_UiHHf6SiF4yuBerHUkfUfkshFpomUIrHMbpBf5Iy4sOYk7FecL4XTLxtdAEl42AXsho9hGzDtRBl
Live sample
Try,
$xml = (array)simplexml_load_string($xml_response);
echo $email=$xml['email-address'];
echo $picture=$xml['picture-url'];
$xml = simplexml_load_string($linkedin->getProfile("~:(email-address,picture-url)"));
echo $xml->{'email-address'}[0] . "<br />";
echo $xml->{'picture-url'}[0];
simplexmldoesn't like - in node names, therefore use $xml->{'email-address'} instead of $xml->email-address.
use index [0] on both nodes, just in case, if one day your simplexml object would contain more than one <person> node...
see it working: http://codepad.viper-7.com/dQQ6sa

Parse RSS Feed With Unique Elements

I have used PHP with simplexml to parse RSS using standard elements before like <title> <pubDate> etc. But how would I parse something custom to the feed like <xCal:location> or <xCal:dtstart> that uses an xCal data element?
Something like $item->xCal:dtstart will error out. How would I collect this data element?
A sample of a feed like this: http://www.trumba.com/calendars/vd.rss?mixin=236393%2c236288
Try like this:
$feedUrl = 'http://www.trumba.com/calendars/vd.rss?mixin=236393%2c236288';
$rawFeed = file_get_contents($feedUrl);
$xml = new SimpleXmlElement($rawFeed);
$ns = $xml->getNamespaces(true);
//print_r($ns);
$xCal = $xml->channel->children($ns['xCal']);
echo ($xCal->version)."<br />";
foreach($xml->channel->item as $item)
{
//print_r($item);
$itemxTrumba=$item->children($ns['x-trumba']);
echo $itemxTrumba->masterid."<br />";
}
//print_r($xCal);
The "something custom" is an XML namespace. Search for existing answers regarding SimpleXML and namespaces.
Basically, what you need is the ->children() method: $item->children('xCal', true)->dtStart

Categories