I'm trying to parse an RSS feed in PHP for the first time. It seems to go fine until I actually try to display anything! This example is me trying to pull out four random organization names from the feed (I actually want to display more, but am keeping it simple here...)
$xml = file_get_contents('https://rss.myinterfase.com/rss/oxford_RSS_Jobs_xml.xml');
foreach($xml->Row as $job) {
$item[] = array(
'OrganizationName' => (string)$job->OrganizationName,
'job_JobTitle' => (string)$job->job_JobTitle,
'job_expiredate' => strtotime($job->job_expiredate),
'ExternalLink' => $job->ExternalLink
);
}
$rand_job = array_rand($item, 4);
$i=0;
echo '<ul>';
while($i<=3) {
echo '<li>';
echo $item[$i]['OrganizationName'];
echo '</li>';
$i++;
}
echo '</ul>'
What do I need to do differently? Thanks!
You have to use simplexml_load_file($url); or similar.
$url = 'https://rss.myinterfase.com/rss/oxford_RSS_Jobs_xml.xml';
$xml = simplexml_load_file($url);
foreach($xml->row as $job) { // be sure about $xml->row. If it's full path to this elements
//..... your code
}
Related
I am somewhat new with PHP, but can't really wrap my head around what I am doing wrong here given my situation.
Problem: I am trying to get the href of a certain HTML element within a string of characters inside an XML object/element via Reddit (if you visit this page, it would be the actual link of the video - not the reddit link but the external youtube link or whatever - nothing else).
Here is my code so far (code updated):
Update: Loop-mania! Got all of the hrefs, but am now trying to store them inside a global array to access a random one outside of this function.
function getXMLFeed() {
echo "<h2>Reddit Items</h2><hr><br><br>";
//$feedURL = file_get_contents('https://www.reddit.com/r/videos/.xml?limit=200');
$feedURL = 'https://www.reddit.com/r/videos/.xml?limit=200';
$xml = simplexml_load_file($feedURL);
//define each xml entry from reddit as an item
foreach ($xml -> entry as $item ) {
foreach ($item -> content as $content) {
$newContent = (string)$content;
$html = str_get_html($newContent);
foreach($html->find('table') as $table) {
$links = $table->find('span', '0');
//echo $links;
foreach($links->find('a') as $link) {
echo $link->href;
}
}
}
}
}
XML Code:
http://pasted.co/0bcf49e8
I've also included JSON if it can be done this way; I just preferred XML:
http://pasted.co/f02180db
That is pretty much all of the code. Though, here is another piece I tried to use with DOMDocument (scrapped it).
foreach ($item -> content as $content) {
$dom = new DOMDocument();
$dom -> loadHTML($content);
$xpath = new DOMXPath($dom);
$classname = "/html/body/table[1]/tbody/tr/td[2]/span[1]/a";
foreach ($dom->getElementsByTagName('table') as $node) {
echo $dom->saveHtml($node), PHP_EOL;
//$originalURL = $node->getAttribute('href');
}
//$html = $dom->saveHTML();
}
I can parse the table fine, but when it comes to getting certain element's values (nothing has an ID or class), I can only seem to get ALL anchor tags or ALL table rows, etc.
Can anyone point me in the right direction? Let me know if there is anything else I can add here. Thanks!
Added HTML:
I am specifically trying to extract <span>[link]</span> from each table/item.
http://pastebin.com/QXa2i6qz
The following code can extract you all the youtube links from each content.
function extract_youtube_link($xml) {
$entries = $xml['entry'];
$videos = [];
foreach($entries as $entry) {
$content = html_entity_decode($entry['content']);
preg_match_all('/<span><a href="(.*)">\[link\]/', $content, $matches);
if(!empty($matches[1][0])) {
$videos[] = array(
'entry_title' => $entry['title'],
'author' => preg_replace('/\/(.*)\//', '', $entry['author']['name']),
'author_reddit_url' => $entry['author']['uri'],
'video_url' => $matches[1][0]
);
}
}
return $videos;
}
$xml = simplexml_load_file('reddit.xml');
$xml = json_decode(json_encode($xml), true);
$videos = extract_youtube_link($xml);
foreach($videos as $video) {
echo "<p>Entry Title: {$video['entry_title']}</p>";
echo "<p>Author: {$video['author']}</p>";
echo "<p>Author URL: {$video['author_reddit_url']}</p>";
echo "<p>Video URL: {$video['video_url']}</p>";
echo "<br><br>";
}
The code outputs in the multidimensional format of array with the elements inside are entry_title, author, author_reddit_url and video_url. Hope it helps you!
If you're looking for a specific element you don't need to parse the whole thing. One way of doing it could be to use the DOMXPath class and query directly the xml. The documentation should guide you through.
http://php.net/manual/es/class.domxpath.php .
I have never used XML before and am trying to loop through the XML and display all the display names in the 'A Team'. The code I am using is outputting 10 zeros and not the names.
The code I am using is attached below along with the feed.
Any assistance is much appreciated!
feed: https://apn.apcentiaservices.co.uk/ContactXML/agentfeed?organisation=se724de89ca150f
<?php
$url = 'https://apn.apcentiaservices.co.uk/ContactXML/agentfeed?organisation=se724de89ca150f';
$html = "";
$xml = simplexml_load_file($url);
for($i = 0; $i < 10; $i++){
$title = $xml->organisation['APN']->brand['AllStar Psychics']->pool['A Team']->agent[$i]->display-name;
echo $title;
}
echo $html;
?>
This might getthe basics you asked for. Not sure if it's what you want. I'm not that good at xpath.
$mydata = $xml->xpath('/organisation/brand/pool[#name="A Team"]//display-name');
foreach($mydata as $key=>$value){
echo('Name:' . $value .'<br>');
}
I'm not really that good at php/XML I got this far using google etc. but cant get any further.
I use
<?php
$counter=0;
foreach($proxml->test->children() as $test1) {
$counter++;
$name=$test1->Name ;
$test[$counter]="http://url".$name."/test?xml=1" ;
}
$xml=simplexml_load_file("$test[1]");
$xml1=simplexml_load_file("$test[2]");
$xml2=simplexml_load_file("$test[3]");
$xml3=simplexml_load_file("$test[4]");
$xml4=simplexml_load_file("$test[5]");
$xmls = array( $xml, $xml1, $xml2, $xml3, $xml4 );
echo $xmls ;
?>
Each test is a different URL that has XML information like a UNIX time-stamp.
I want to put all XML files into an array and then extract all the < Time-stamps > and sort it. Then output it with the other information that belongs to the time-stamp like the < name > etc.
It works for 1 URL but cant get it to work for multiply URLs together.
Try this:
<?php
$counter=0;
foreach($proxml->test->children() as $test1) {
$counter++;
$name=$test1->Name ;
$test[ $counter ] = "http://url" . $name . "/test?xml=1";
}
foreach( $test as $url ) {
$xml = simplexml_load_file( $url );
// will display the xml, you can read it and get required values
echo $xml->asXML();
}
?>
Is it possible to use a foreach loop to scrape multiple URL's from an array? I've been trying but for some reason it will only pull from the first URL in the array and the show the results.
include_once('../../simple_html_dom.php');
$link = array (
'http://www.amazon.com/dp/B0038JDEOO/',
'http://www.amazon.com/dp/B0038JDEM6/',
'http://www.amazon.com/dp/B004CYX17O/'
);
foreach ($link as $links) {
function scraping_IMDB($links) {
// create HTML DOM
$html = file_get_html($links);
$values = array();
foreach($html->find('input') as $element) {
$values[$element->id=='ASIN'] = $element->value; }
// get title
$ret['ASIN'] = end($values);
// get rating
$ret['Name'] = $html->find('h1[class="parseasinTitle"]', 0)->innertext;
$ret['Retail'] =$html->find('b[class="priceLarge"]', 0)->innertext;
// clean up memory
//$html->clear();
// unset($html);
return $ret;
}
// -----------------------------------------------------------------------------
// test it!
$ret = scraping_IMDB($links);
foreach($ret as $k=>$v)
echo '<strong>'.$k.'</strong>'.$v.'<br />';
}
Here is the code since the comment part didn't work. :) It's very dirty because I just edited one of the examples to play with it to see if I could get it to do what I wanted.
include_once('../../simple_html_dom.php');
function scraping_IMDB($links) {
// create HTML DOM
$html = file_get_html($links);
// What is this spaghetti code good for?
/*
$values = array();
foreach($html->find('input') as $element) {
$values[$element->id=='ASIN'] = $element->value;
}
// get title
$ret['ASIN'] = end($values);
*/
foreach($html->find('input') as $element) {
if($element->id == 'ASIN') {
$ret['ASIN'] = $element->value;
}
}
// Our you could use the following instead of the whole foreach loop above
//
// $ret['ASIN'] = $html->find('input[id="ASIN"]', 0)->value;
//
// if the 0 means, return first found or something similar,
// I just had a look at Amazons source code, and it contains
// 2 HTML tags with id='ASIN'. If they were following html-regulations
// then there should only be ONE element with a specific id.
// get rating
$ret['Name'] = $html->find('h1[class="parseasinTitle"]', 0)->innertext;
$ret['Retail'] = $html->find('b[class="priceLarge"]', 0)->innertext;
// clean up memory
//$html->clear();
// unset($html);
return $ret;
}
// -----------------------------------------------------------------------------
// test it!
$links = array (
'http://www.amazon.com/dp/B0038JDEOO/',
'http://www.amazon.com/dp/B0038JDEM6/',
'http://www.amazon.com/dp/B004CYX17O/'
);
foreach ($links as $link) {
$ret = scraping_IMDB($link);
foreach($ret as $k=>$v) {
echo '<strong>'.$k.'</strong>'.$v.'<br />';
}
}
This should do the trick
I have renamed the array to 'links' instead of 'link'. It's an array of links, containing link(s), therefore, foreach($link as $links) seemed wrong, and I changed it to foreach($links as $link)
I really need to ask this question as it will answer way more questions after the world reads this thread. What if ... you used articles like the simple html dom site.
$ret['Name'] = $html->find('h1[class="parseasinTitle"]', 0)->innertext;
$ret['Retail'] = $html->find('b[class="priceLarge"]', 0)->innertext;
return $ret;
}
$links = array (
'http://www.amazon.com/dp/B0038JDEOO/',
'http://www.amazon.com/dp/B0038JDEM6/',
'http://www.amazon.com/dp/B004CYX17O/'
);
foreach ($links as $link) {
$ret = scraping_IMDB($link);
foreach($ret as $k=>$v) {
echo '<strong>'.$k.'</strong>'.$v.'<br />';
}
}
what if its $articles?
$articles[] = $item;
}
//print_r($articles);
$links = array (
'http://link1.com',
'http://link2.com',
'http://link3.com'
);
what would this area look like?
foreach ($links as $link) {
$ret = scraping_IMDB($link);
foreach($ret as $k=>$v) {
echo '<strong>'.$k.'</strong>'.$v.'<br />';
}
}
Ive seen this multiple links all over stackoverflow for past 2 years, and I still cannot figure it out. Would be great to get the basic handle on it to how the simple html dom examples are.
thx.
First time postin im sure I broke a bunch of rules and didnt do the code section right. I just had to ask this question badly.
I'm trying to parse the Last.fm feed of my last 10 tracks played onto my website.
This is what I have so far,
<?php
$doc = new DOMDocument();
$doc->load('http://ws.audioscrobbler.com/1.0/user/nathanjmassey/recenttracks.xml');
$arrFeeds = array();
foreach ($doc->getElementsByTagName('track') as $node) {
$itemRSS = array (
'artist' => $node->getElementsByTagName('artist')->item(0)->nodeValue,
'name' => $node->getElementsByTagName('name')->item(0)->nodeValue,
'url' => $node->getElementsByTagName('url')->item(0)->nodeValue,
);
array_push($arrFeeds, $itemRSS);
}
?>
<?php
foreach ($arrFeeds as $i => $values) {
foreach ($values as $key => $value) {
print "<p>$value\n</p>";
}
}
?>
This basically gives me all 10 tracks in the feed in the format,
Linkin Park
In Between
http://www.last.fm/music/Linkin+Park/_/In+Between
But I need to format the results in list of links such as,
$artist - $track
How would I extend my script to achieve this?
For your output, use this:
<?
foreach ($arrFeeds as $i => $values)
{
print "<a href='" . $values['url'] . "'>" . $values['artist'] . " - " . $values['name'] . "</a>";
}
?>
UPDATE: How to limit # of parsed items
(Responding to the comment via edit so I can use the code display tags.)
I'm at work at the moment, but I'd try changing your initial parsing code something like so:
array_push($arrFeeds, $itemRSS); // existing line
if (count($arrFeeds) >= 5) { break; } // add this line