Extracting XML Data with Foreach Loops, Results Inconsistent - php

I am extracting XML data using DOMDocument and foreach loops. I am pulling certain attributes and node values from the XML document and creating variables with that data. I am then echoing the variables.
I have successfully completed this for the first portion of the XML data that lives between the <VehicleDescription tags. However, using the same logic with data within the <style> tags, I have been having issues. Specially, the created variables won't echo unless they are in the foreach loop. See the code below for clarification.
My php:
<?php
$vehiclexml = $_POST['vehiclexml'];
$xml = file_get_contents($vehiclexml);
$dom = new DOMDocument();
$dom->loadXML($xml);
//This foreach loop works perfectly, the variables echo below:
foreach ($dom->getElementsByTagName('VehicleDescription') as $vehicleDescription){
$year = $vehicleDescription->getAttribute('modelYear');
$make = $vehicleDescription->getAttribute('MakeName');
$model = $vehicleDescription->getAttribute('ModelName');
$trim = $vehicleDescription->getAttribute('StyleName');
$id = $vehicleDescription->getAttribute('id');
$BodyType = $vehicleDescription->getAttribute('altBodyType');
$drivetrain = $vehicleDescription->getAttribute('drivetrain');
}
//This foreach loop works; however, the variables don't echo below, the will only echo within the loop tags. How can I resolve this?
foreach ($dom->getElementsByTagName('style') as $style){
$displacement = $style->getElementsByTagName('displacement')->item(0)->nodeValue;
}
echo "<b>Year:</b> ".$year;
echo "<br>";
echo "<b>Make:</b> ".$make;
echo "<br>";
echo "<b>Model:</b> ".$model;
echo "<br>";
echo "<b>Trim:</b> ".$trim;
echo "<br>";
echo "<b>Drivetrain:</b> ".$drivetrain;
echo "<br>";
//Displacement will not echo
echo "<b>Displacement:</b> ".$displacement;
?>
Here is the XML file it is pulling from:
<VehicleDescription country="US" language="en" modelYear="2019" MakeName="Toyota" ModelName="RAV4" StyleName="LE" id="1111" altBodyType="SUV" drivetrain="AWD">
<style modelYear="2019" name="Toyota RAV4 LE" passDoors="4">
<make>Toyota</make>
<model>RAV4</model>
<style>LE</style>
<drivetrain>AWD</drivetrain>
<displacement>2.5 liter</displacement>
<cylinders>4-cylinder</cylinders>
<gears>8-speed</gears>
<transtype>automatic</transtype>
<horsepower>203</horsepower>
<torque>184</torque>
</style>
</VehicleDescription>
Any help or insight as to why variables from the first foreach loop echo but variables from the second don't would be greatly appreciated.
Thanks!

Just to post an alternative solution to the way you've fixed this.
As you've, there are a couple of <stlye> tags, this means that the foreach will attempt to use all style tags. But as you know that you are after the contents of the first tag only, you can drop the foreach loop and use the item() method...
$displacement = $dom->getElementsByTagName('style')->item(0)
->getElementsByTagName('displacement')->item(0)->nodeValue;
This also applies to how you fetch the data from the <VehicleDescription> tag. Drop the foreach and use
$vehicleDescription = $dom->getElementsByTagName('VehicleDescription')->item(0);

The error was within the XML document.
Within the <style> tags was another set of <style> tags. Changing the name of the second set solved this issue.

Related

Finding the first entry of XML and displaying it

Im working on compiling a list using XML and PHP. I'm looking to find the first "destination tag" found in the RailVehicleStateClass for each train ID and echo it out. I tried doing a foreach loop to gather the destination tag but it just loops the same data for each train ID until the end of the xml file. Below is a snippet of the XML file, the full version has well over 700 entries and each train can have anywhere from 1 to 100+ railvehiclaes associated with it.
XML
<ScnLoader xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<trainList>
<TrainLoader>
<trainID>99991</trainID>
<TrainWasAI>false</TrainWasAI>
<DispatchTrainDirection>0</DispatchTrainDirection>
<ManuallyAppliedSpeedLimitMPH>2147483647</ManuallyAppliedSpeedLimitMPH>
<PreviousSignalInstruction>Clear</PreviousSignalInstruction>
<unitLoaderList>
<RailVehicleStateClass>
<destinationTag>TRC SVMS</destinationTag>
</RailVehicleStateClass>
<RailVehicleStateClass>
<destinationTag>PRC</destinationTag>
</RailVehicleStateClass>
</unitLoaderList>
</TrainLoader>
</trainList>
</ScnLoader>
PHP
<?php
$trains = simplexml_load_file("Auto%20Save%20World.xml") or die("Error: Cannot create object");
$totalUnitCount=0;
echo "<table>";
echo "<th>#</th>";
echo "<th>Train ID</th>";
echo "<th>Symbol</th>";
echo "<th>Direction</th>";
echo "<th>AI</th>";
foreach ($trains->xpath("//TrainLoader") as $train) {
$totalUnitCount = $totalUnitCount + 1;
foreach (array_slice($trains->xpath("//RailVehicleStateClass"),0,1) as $unit){
echo $unit->destinationTag;
}
echo "</td>";
echo "<td>";
echo $train->DispatchTrainDirection;
echo "</td>";
echo "<td>";
echo $train->TrainWasAI;
echo "</td>";
}
?>
Your xpath query to get the RailVehicleStateClass elements needs to be made relative to the current $train. You can do that using:
$train->xpath(".//RailVehicleStateClass")
Note the use of . in front of // to make the path relative to the current $train element.
First you need to use the foreach in the unitLoaderList, then you the xpath in the object you create from the foreach and add a . before the // in the query to only obtain what is inside the object.
I really don't understand what you want to accomplish with HTML table, but here is an example code that work with your file and multiple trains without HTML.
$trains = simplexml_load_file("train.xml") or die("Error: Cannot create object");
$totalUnitCount=0;
foreach ($trains->xpath("//trainList") as $train) {
$totalUnitCount = $totalUnitCount + 1;
foreach ($train->xpath(".//unitLoaderList") as $unit){
foreach ($unit->xpath(".//RailVehicleStateClass") as $railV){
echo $railV->destinationTag[0]; //Assuming all RailVehicle has at list one destination tags if not you has to another foreach with xpath of destinationTag and only get the first one
}
}
}

PHP Web Scraping And JSON or Array Output

I'm experimenting scraping Amazon with PHP but I don't know what I am doing wrong. The problem is that I can't access all the data I scraped. Here is my code:
<?php
$url = 'https://www.amazon.com/s/ref=nb_sb_ss_c_1_9?url=search-alias%3Daps&field-keywords=most+sold+items+on+amazon&sprefix=most+sold%2Caps%2C435&crid=348CE8G406XVG&rh=i%3Aaps%2Ck%3Amost+sold+items+on+amazon';
$html = file_get_html($url);
foreach ($html->find('h2[class=a-size-medium]') as $element) {
echo "<li>" .$element->plaintext."</li><br>";
}
?>
The foreach statement loops through and output the plain text but I want to be able to pass the plain text to a variable or array. The problem is that if I do that and output the result, I only get the last string of the plain text array. I have done lots of research to find what I'm doing wrong but I can't find it. Please any help will be appreciated. Here is what I'm trying to achieve:
<?php
$url = 'https://www.amazon.com/s/ref=nb_sb_ss_c_1_9?url=search-alias%3Daps&field-keywords=most+sold+items+on+amazon&sprefix=most+sold%2Caps%2C435&crid=348CE8G406XVG&rh=i%3Aaps%2Ck%3Amost+sold+items+on+amazon';
$hold = array();
$html = file_get_html($url);
foreach ($html->find('h2[class=a-size-medium]') as $element) {
$hold = $element->plaintext;
}
print_r($hold);
?>
The second code will output the last string of the plain text which is: "Rubbermaid LunchBlox Side Container Kit, 2-Pack, 1806176". I also tried achieving this by encoding and decoding the plain text but nothing changed. What am I doing wrong?
Instead of setting the array hold to a string...add new elements to the array:
$hold[] = $element->plaintext;

Getting name spaces out of XML from simplexml_load_file in php

I am trying to parse this YouTube XML using simplexml_load_file in php.
The XML feed can be found here:
https://www.youtube.com/feeds/videos.xml?playlist_id=PL1mm1FfX5EHRjGyoBpEXBRIGAmCNt8pBT
Below in php I am trying to iterate through the media groups nested inside each entry node.
<?php
$xmlFeed=simplexml_load_file('https://www.youtube.com/feeds/videos.xml?playlist_id=PL1mm1FfX5EHRjGyoBpEXBRIGAmCNt8pBT')
or die("Cannot load YouTube video feed, please try again later.");
foreach ($xmlFeed->entry->children('media', true)->group as $video) {
echo $video->title;
echo $video->description;
echo $video->thumbnail->getNameSpaces(true);
}
?>
Title and description print just fine. But I'm trying to get at the thumbnail URL found in this namespace:
<media:thumbnail url="https://i1.ytimg.com/vi/HEYQXVGnwXc/hqdefault.jpg" width="480" height="360"/>
I've tried all 3 of the following:
echo $video->thumbnail->getNameSpaces(true);
echo $video->thumbnail->getNameSpaces(true)['url'];
echo $video->thumbnail->getNameSpaces(true)->url;
None return the url. The first returns Array and the last two are blank. What am I missing?
Several things: first, you have to use the attributes() function since there is no child of thumbnail. Secondly, you don't need to declare getNameSpaces(true) since the namespace prefix media is done in the for loop. Finally, you do not iterate across all media:group. Right now, you will return only the first set of xml values, not both from each <entry> node. Therefore, you need to add an outer loop -one that iterates across the frequency of <entry> nodes.
$attr = 'url';
for($i = 0; $i < sizeof($xmlFeed->entry); $i++) {
foreach ($xmlFeed->entry[$i]->children('media', true)->group as $video) {
echo $video->title."\n";
echo $video->description."\n";
echo $video->thumbnail->attributes()->$attr."\n";
}
}
XPATH Alternative
Even further, you could have handled your needs in XPath by simply registering the media namespace and querying to exact locations, iterating of course across each set:
$xmlFeed->registerXPathNamespace('media', 'http://search.yahoo.com/mrss/');
// ARRAYS TO HOLD XML VALUES
$videos = $xmlFeed->xpath('//media:group');
$title = $xmlFeed->xpath('//media:group/media:title');
$description = $xmlFeed->xpath('//media:group/media:description');
$url = $xmlFeed->xpath('//media:group/media:thumbnail/#url');
// ITERATING THROUGH EACH ARRAY
for($i = 0; $i < sizeof($videos); $i++) {
echo $title[$i]."\n";
echo $description[$i]."\n";
echo $url[$i]."\n";
}

Using PHP to get DOM Element

I'm struggling big time understanding how to use the DOMElement object in PHP. I found this code, but I'm not really sure it's applicable to me:
$dom = new DOMDocument();
$dom->loadHTML("index.php");
$div = $dom->getElementsByTagName('div');
foreach ($div->attributes as $attr) {
$name = $attr->nodeName;
$value = $attr->nodeValue;
echo "Attribute '$name' :: '$value'<br />";
}
Basically what I need is to search the DOM for an element with a particular id, after which point I need to extract a non-standard attribute (i.e. one that I made up and put on with JS) so I can see the value of that. The reason is I need one piece from the $_GET and one piece that is in the HTML based from a redirect. If someone could just explain how I use DOMDocument for this purpose, that would be helpful. I'm really struggling understanding what's going on and how to properly implement it, because I clearly am not doing it right.
EDIT (Where I'm at based on comment):
This is my code lines 4-26 for reference:
<div id="column_profile">
<?php
require_once($_SERVER["DOCUMENT_ROOT"] . "/peripheral/profile.php");
$searchResults = isset($_GET["s"]) ? performSearch($_GET["s"]) : "";
$dom = new DOMDocument();
$dom->load("index.php");
$divs = $dom->getElementsByTagName('div');
foreach ($divs as $div) {
foreach ($div->attributes as $attr) {
$name = $attr->nodeName;
$value = $attr->nodeValue;
echo "Attribute '$name' :: '$value'<br />";
}
}
$div = $dom->getElementById('currentLocation');
$attr = $div->getAttribute('srckey');
echo "<h1>{$attr}</a>";
?>
</div>
<div id="column_main">
Here is the error message I'm getting:
Warning: DOMDocument::load() [domdocument.load]: Extra content at the end of the document in ../public_html/index.php, line: 26 in ../public_html/index.php on line 10
Fatal error: Call to a member function getAttribute() on a non-object in ../public_html/index.php on line 21
getElementsByTagName returns you a list of elements, so first you need to loop through the elements, then through their attributes.
$divs = $dom->getElementsByTagName('div');
foreach ($divs as $div) {
foreach ($div->attributes as $attr) {
$name = $attr->nodeName;
$value = $attr->nodeValue;
echo "Attribute '$name' :: '$value'<br />";
}
}
In your case, you said you needed a specific ID. Those are supposed to be unique, so to do that, you can use (note getElementById might not work unless you call $dom->validate() first):
$div = $dom->getElementById('divID');
Then to get your attribute:
$attr = $div->getAttribute('customAttr');
EDIT: $dom->loadHTML just reads the contents of the file, it doesn't execute them. index.php won't be ran this way. You might have to do something like:
$dom->loadHTML(file_get_contents('http://localhost/index.php'))
You won't have access to the HTML if the redirect is from an external server. Let me put it this way: the DOM does not exist at the point you are trying to parse it. What you can do is pass the text to a DOM parser and then manipulate the elements that way. Or the better way would be to add it as another GET variable.
EDIT: Are you also aware that the client can change the HTML and have it pass whatever they want? (Using a tool like Firebug)

Parsing HTML with Php

I cant get the data between the tags into the arrays:
// Load the HTML string from file and create a SimpleXMLElement
$html_string = file_get_contents("data/csr.html"); /*the string really is in $html_string*/
$root = new SimpleXMLElement($html_string);
Problem starts here when I try to get that the value between the tags: div, h2 and span into an array
// Fetch all div, h2 and span values
$divArray = $hdlsArray = $dtlsArray = array();
foreach ($root->div as $div) {
$divArray[] = $div;
echo "".$div."<br />";
}
foreach ($root->h2 as $h2) {
$hdlsArray[] = $h2;
echo "".$h2."<br />";
}
foreach ($root->span as $span) {
$dtlsArray[] = $span;
echo "".$span."<br />";
}
The result of this is a blank page instead of printing the actual tag data
As an alternate to SimpleXMLElement, I suggest Simple HTML DOM (online manual). I've used it before and very much satisfied with the results. It allows you to use jQuery like selectors so fetching all div, h2 and span values is fairly simple.
This page says (about SimpleXML) "the only problem with it is that it'll only load valid XML" but may provide a workaround for HTML.
The 'Related Questions' on StackOverflow include this one, but it describes HTML inside valid XML tags.

Categories