Parsing XML data with php - php

I am trying to pull text from a single element in an xml file, i was able to pull the entire file but i really only need a couple of lines from the file...
On this page i was able to pull the entire xml file ( http://smyrnainlet.com/testing.php )
But when i try to just single out one line from that file.
http://smyrnainlet.com/current_data.php
This is the error i am receiving:
Fatal error: Call to a member function getbyElementID() on a non-object in /home/content/74/8620474/html/current_data.php on line 9
If anyone could help me that would be amazing, i have been struggling with this for hours.
This is my code:
<?php
function startElemHandler($parser, $name, $attribs)
{
if (strcasecmp($name, "current_observation") ==0 ) {
echo "<div id='waves'>\n";
}
if (strcasecmp($name, "wave_height_ft") ==0) {
$waveHeight->getbyElementID("wave_height_ft");
echo $waveHeight->asXML();
}
}
function endElemHandler($parser, $name)
{
if (strcasecmp($name, "current_observation") ==0 ) {
echo "</div>";
}
}
$parser = xml_parser_create();
xml_set_element_handler($parser, startElemHandler, endElemHandler);
xml_parser_set_option($parser, XML_OPTION_CASE_FOLDING, 0);
$strXML = implode("",file("http://www.weather.gov/xml/current_obs/41009.xml"));
xml_parse($parser, $strXML);
xml_parser_free($parser);
?>

You have a few fundamental flaws with your code but you are essentially asking how to parse an XML file. I suggest using PHP's DOMDocument with DOMXPath to extract the data you need. Here is some example code:
$xml = file_get_contents('weather.xml');
$dom = new DOMDocument();
#$dom->loadHTML($xml);
$domx = new DOMXPath($dom);
$entries = $domx->evaluate("//wave_height_ft");
$arr = array();
foreach ($entries as $entry) {
$arr[] = '<' . $entry->tagName . '>' . $entry->nodeValue . '</' . $entry->tagName . '>';
}
print_r($arr);

$waveHeight seem not defined (not in scope) wher you try to use it!
you can
pass it as argument
use the global statement
create a class

Related

Parsing HTML Table Data from XML with PHP

I am somewhat new with PHP, but can't really wrap my head around what I am doing wrong here given my situation.
Problem: I am trying to get the href of a certain HTML element within a string of characters inside an XML object/element via Reddit (if you visit this page, it would be the actual link of the video - not the reddit link but the external youtube link or whatever - nothing else).
Here is my code so far (code updated):
Update: Loop-mania! Got all of the hrefs, but am now trying to store them inside a global array to access a random one outside of this function.
function getXMLFeed() {
echo "<h2>Reddit Items</h2><hr><br><br>";
//$feedURL = file_get_contents('https://www.reddit.com/r/videos/.xml?limit=200');
$feedURL = 'https://www.reddit.com/r/videos/.xml?limit=200';
$xml = simplexml_load_file($feedURL);
//define each xml entry from reddit as an item
foreach ($xml -> entry as $item ) {
foreach ($item -> content as $content) {
$newContent = (string)$content;
$html = str_get_html($newContent);
foreach($html->find('table') as $table) {
$links = $table->find('span', '0');
//echo $links;
foreach($links->find('a') as $link) {
echo $link->href;
}
}
}
}
}
XML Code:
http://pasted.co/0bcf49e8
I've also included JSON if it can be done this way; I just preferred XML:
http://pasted.co/f02180db
That is pretty much all of the code. Though, here is another piece I tried to use with DOMDocument (scrapped it).
foreach ($item -> content as $content) {
$dom = new DOMDocument();
$dom -> loadHTML($content);
$xpath = new DOMXPath($dom);
$classname = "/html/body/table[1]/tbody/tr/td[2]/span[1]/a";
foreach ($dom->getElementsByTagName('table') as $node) {
echo $dom->saveHtml($node), PHP_EOL;
//$originalURL = $node->getAttribute('href');
}
//$html = $dom->saveHTML();
}
I can parse the table fine, but when it comes to getting certain element's values (nothing has an ID or class), I can only seem to get ALL anchor tags or ALL table rows, etc.
Can anyone point me in the right direction? Let me know if there is anything else I can add here. Thanks!
Added HTML:
I am specifically trying to extract <span>[link]</span> from each table/item.
http://pastebin.com/QXa2i6qz
The following code can extract you all the youtube links from each content.
function extract_youtube_link($xml) {
$entries = $xml['entry'];
$videos = [];
foreach($entries as $entry) {
$content = html_entity_decode($entry['content']);
preg_match_all('/<span><a href="(.*)">\[link\]/', $content, $matches);
if(!empty($matches[1][0])) {
$videos[] = array(
'entry_title' => $entry['title'],
'author' => preg_replace('/\/(.*)\//', '', $entry['author']['name']),
'author_reddit_url' => $entry['author']['uri'],
'video_url' => $matches[1][0]
);
}
}
return $videos;
}
$xml = simplexml_load_file('reddit.xml');
$xml = json_decode(json_encode($xml), true);
$videos = extract_youtube_link($xml);
foreach($videos as $video) {
echo "<p>Entry Title: {$video['entry_title']}</p>";
echo "<p>Author: {$video['author']}</p>";
echo "<p>Author URL: {$video['author_reddit_url']}</p>";
echo "<p>Video URL: {$video['video_url']}</p>";
echo "<br><br>";
}
The code outputs in the multidimensional format of array with the elements inside are entry_title, author, author_reddit_url and video_url. Hope it helps you!
If you're looking for a specific element you don't need to parse the whole thing. One way of doing it could be to use the DOMXPath class and query directly the xml. The documentation should guide you through.
http://php.net/manual/es/class.domxpath.php .

nested selector failed in using simple html dom parser

I want to get the link and scrape its content but I can';t event reach there. What's wrong with my nested selector?
my php
$dom = file_get_html('http://mojim.com/%E5%BF%83%E8%B7%B3.html?t3');
$tables = $dom->find('.iB');
$firstRow = $tables->find('tr',1)->find('td',4);
foreach ($firstRow as $value) {
echo $value;
}
?>
here is how the DOM look like
You just have a problem on pointing/traversing the correct element.
Example:
$dom = file_get_html('http://mojim.com/%E5%BF%83%E8%B7%B3.html?t3');
$firstRow = $dom->find('table.iB', 0)->find('tr', 1)->find('td', 3);
$link = $firstRow->find('a', 0);
echo $link->href . '<br/>' . $link->title;
Should output:
/twy100015x34x8.htm
心跳 歌詞 王力宏

Call to a member function find() on a non-object simpleHTMLDOM

I am trying to read a link from one page, print the URL, go to that page, and read the link on the next page in the same location, print the url, go to that page (and so on...).
All I'm doing is reading the URL and passing it as an argument to the get_links() function until there are no more links.
This is my code but it throws:
Fatal error: Call to a member function find() on a non-object.
Anyone know how to fix this?
<?php
$mainPage = 'https://www.bu.edu/link/bin/uiscgi_studentlink.pl/1346752597?ModuleName=univschr.pl&SearchOptionDesc=Class+Subject&SearchOptionCd=C&KeySem=20133&ViewSem=Fall+2012&Subject=&MtgDay=&MtgTime=';
get_links($mainPage);
function get_links($url) {
$data = new simple_html_dom();
$data = file_get_html($url);
$nodes = $data->find("input[type=hidden]");
$fURL = $data->find("/html/body/form");
$firstPart = $fURL[0]->action . '<br>';
foreach ($nodes as $node) {
$val = $node->value;
$name = $node->name;
$name . '<br />';
$val . "<br />";
$str1 = $str1 . "&" . $name . "=" . $val;
}
$fixStr1 = str_replace('&College', '?College', $str1);
$fixStr2 = str_replace('Fall 2012', 'Fall+2012', $fixStr1);
$fixStr3 = str_replace('Class Subject', 'Class+Subject', $fixStr2);
$fixStr4 = $firstPart . $fixStr3;
echo $nextPageURL = chop($fixStr4);
get_links($nextPageURL);
}
?>
Alright so I was using the load->file() function somewhere in my code and did not see it until I really scraped through it. Finally have a running script :) The key is to use file_get_html instead of loading the webpage as an object using the load->file() function.

PHP DomParser getting option name and value from xml tag

I'm trying to parse an xml file with PHP.
I'm using this code and it works well for getting the tagname and the value:
function getChildNodes($dom,$SearchKey){
foreach ($dom->getElementsByTagName($SearchKey) as $telefon) {
foreach($telefon->childNodes as $node) {
print_r(
$SearchKey . " : " . $node->nodeValue . "\n"
);
}
}
}
example xml of working piece of code:
<inputs>
<input>C:\Program Files\VideoLAN\VLC\airbag.mp3</input>
<input>C:\Program Files\VideoLAN\VLC\sunpark.mp3</input>
<input>C:\Program Files\VideoLAN\VLC\rapidarc.mp3</input>
</inputs>
example xml of not working piece of code:
<instances>
<instance name="default" state="playing" position="0.050015" time="9290569" length="186489519" rate="1.000000" title="0" chapter="0" can-seek="1" playlistindex="3"/>
</instances>
Can someone help me figure out wich options I need to use for getting out the optioname and optionvalue?
all responses are appreciated
Here is a code sample for printing out the XML attributes:
<?php
$source = '<instances><instance name="default" state="playing" position="0.050015" time="9290569" length="186489519" rate="1.000000" title="0" chapter="0" can-seek="1" playlistindex="3"/></instances>';
$doc = new DOMDocument();
$doc->loadXML($source);
$el = $doc->firstChild->firstChild;
for ($i = 0; $i < $el->attributes->length; $i++) {
$attr = $el->attributes->item($i);
echo 'Name: '.$attr->name, "\n";
echo 'Value: '.$attr->value, "\n";
}
Hope this helps.

how to display data id, name?

My file xml:
<pasaz:Envelope>
<pasaz:Body>
<loadOffe>
<offe>
<off>
<id>120023</id>
<name>my name John</name>
<name>Test</name>
</off>
</offe>
</loadOffe>
</pasaz:Body>
</pasaz:Envelope>
How to view a php (id and name).
If you're just looking for a simple way to extract the contents of a tag, but don't want to go to all the trouble of parsing the XML properly, you could do something like this:
$xml = ""; // your xml data as a string
function get_tag_contents($xml, $tagName) {
$startPosition = strpos($xml, "<" . $tagName . ">");
$endPosition = strpos($xml, "</" . $tagName . ">");
$length = $endPosition - ($startPosition + 1);
return substr($xml, $startPosition, $length);
}
$id = get_tag_contents($xml, "id");
$name = get_tag_contents($xml, "name");
This assumes you haven't assigned any attributes to your tags, and that each tag is unique (in the example you gave us I noted two "name" tags, and if you want both you'll need to make this solution a bit more robust or do proper XML parsing).
How to get all items?
Example (does not work ..)
$pliks = simplexml_load_file("file.xml");
foreach ($pliks->children('pasaz', true) as $body)
{
foreach ($body->children() as $loadOffe)
{
if ($loadOffe->offe->off) {
echo "<p>id: $loadOffe->id</p>";
echo "$id->id";
echo "<p>name: <b>$name->name</b></p>";
}
}
// echo $loadOffe->offe->off->id;
}
As Marc B suggested in his comment you should use DOM, either use getElementsByTagName() or DOMXPath, example for getElementaByTagName():
$dom = new DOMDocument;
$dom->loadXML($xml);
$ids = $dom->getElementsByTagName('id');
if( $ids || !$ids->length){
throw new Exception( 'Id not found');
}
return $ids->item(0);

Categories