php explode on xml file - php

Here is the xml file I have. ( I do not have editing abilities of the parser that creates this file... so hince why I am asking my question below).
<?xml version="1.0" encoding="UTF-8"?>
<JobSearchResults LookID="arkansas">
<!-- Served from qs-b-02.oc.careercast.com -->
<QueryString>clientid=arkansas&stringVar=xmlString&pageSize=200&searchType=featured&outFormat=xml</QueryString>
<channel>
<title>JobsArkansas Listings</title>
<items></items>
</channel>
<item>
<JobID>73451732</JobID>
<Title>Radiology</Title>
<Employer>Baptist-Health </Employer>
<Location>LITTLE ROCK, AR</Location>
<Description><![CDATA[IMMEDIATE OPENINGS for:Diabetes Patient Educator, RN Community Education Nurse-RN Baptist Health Community Outreach•Diabetes Patient Educator, RN: Full-time: 8am-5pm Minimum Requirements:•Requires graduation from a state approved school/college of Nursing•Current licensure by theAR State Board of Nursing. •2+ years bedside experience preferred. •Certified Diabetes Educator certificate preferred. •Community Education Nurse - RNMinimum Requirements (PRN: Varies):•Current RN license & 2 years clinical experience. •Current CPR certification. Apply online at: baptist-health.com/jobs]]></Description>
<LookID>arkansas</LookID>
<Url>http://jobs.arkansasonline.com/careers/jobsearch/detail/jobId/73451732/viewType/featured</Url>
</item>
<item>
<JobID>66703190</JobID>
<Title>Telemarketing Agents</Title>
<Employer>Arkansas Democrat Gazette </Employer>
<Location>Bryant, AR</Location>
<Description><![CDATA[Telemarketing Agents Needed Position is part-time Starting at $9.00/hour Plus Bonus! Looking for dependable and professional applicants. We are a drug and smoke free company located in Bryant. Hours: Mon-Fri 4:30pm to 8:30pm and Sat. 9am to 6pm. Send resumes to: clewis#wehco.com or P.O. Box 384 Bryant, AR 72089 Arkansas Democrat Gazette Arkansas' Largest NewspaperCLICK THE IMAGE TO VIEW THE AD]]></Description>
<LookID>arkansas</LookID>
<Url>http://jobs.arkansasonline.com/careers/jobsearch/detail/jobId/66703190/viewType/featured</Url>
</item>
</JobSearchResults>
sas</LookID>
<Url>http://jobs.arkansasonline.com/careers/jobsearch/detail/jobId/73004973/viewType/featured</Url>
</item>
</JobSearchResults>
I am using the following php code to open the above xml file, and take out the following:
sas
http://jobs.arkansasonline.com/careers/jobsearch/detail/jobId/73004973/viewType/featured
</JobSearchResults>
However the php code below:
error_reporting(E_ALL | E_STRICT);
ini_set('display_errors', 1);
// Load File
// $today = date('Ymd');
$file = '/Users/jasenburkett/Sites/jobsark/feed' . '.xml';
$newfile = '/Users/jasenburkett/Sites/jobsark/feed' . '.xml';
$file_contents = file_get_contents($file);
$data = $file_contents;
$parts = explode("</JobSearchResults>", $data);
// Save File
file_put_contents($newfile, $data);
?>
This works, however it deletes everything after the first </JobSearchResults> and I want to keep the very last one...
Any ideas where i am going wrong?

If what you are looking for is a way to cleanup the corrupt XML file, you can just add the string that gets missing when the explode is run. It is all a bit hackish, but it works.
$file = '/Users/jasenburkett/Sites/jobsark/feed.xml';
$data = file_get_contents($file);
$split = "</JobSearchResults>"; // Split on this
$parts = explode($split, $data); // Make the split
$cleanedXml = $parts[0]; // Use first part
$cleanedXml .= $split; // Put back last ending tag
file_put_contents($file, $cleanedXml);

Related

Regular Expressions - PHP and XML

I'm in college and new to PHP regular expressions but I have somewhat of an idea what I need to do I think. Basically I need to create a PHP program to read XML source code containing several 'stories' and store their details in a mySQL database. I've managed to create an expression that selects each story but I need to break this expression down further in order to get each element within the story. Here's the XML:
XML
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="test.xsl"?>
<latestIssue>
<issue number="256" />
<date>
<day> 21 </day>
<month> 1 </month>
<year> 2011 </year>
</date>
<story>
<title> Is the earth flat? </title>
<author> A. N. Redneck </author>
<url> http://www.HotStuff.ie/stories/story123456.xml </url>
</story>
<story>
<title> What the actress said to the bishop </title>
<author> Brated Film Critic </author>
<url> http://www.HotStuff.ie/stories/story123457.xml </url>
</story>
<story>
<title> What the year has in store </title>
<author> Stargazer </author>
<url> http://www.HotStuff.ie/stories/story123458.xml </url>
</story>
</latestIssue>
So I need to get the title, author and url from each story and add them as a row in my database. Here's what I have so far:
PHP
<?php
$url = fopen("http://address/to/test.xml", "r");
$contents = fread($url,10000000);
$exp = preg_match_all("/<title>(.+?)<\/url>/s", $contents, $matches);
foreach($matches[1] as $match) {
// NO IDEA WHAT TO DO FROM HERE
// $exp2 = "/<title>(.+?)<\/title><author>(.+?)<\/author><url>(.+?)<\/url>/";
// This is what I had but I'm not sure if it's right or what to do after
}
?>
I'd really appreciate the help guys, I've been stuck on this all day and I can't wrap my head around regular expressions at all. Once I've managed to get each story's details I can easily update the database.
EDIT:
Thanks for replying but are you sure this can't be done with regular expressions? It's just the question says "Use regular expressions to analyse the XML and extract the relevant data that you need. Note that information about each story is spread across several lines of XML". Maybe he made a mistake but I don't see why he'd write it like that if it can't be done this way.
First of all, start using
file_get_contents("UrlHere");
to gather the content from a page.
Now if you want to parse the XML use the XML parser in PHP for example.
You could also use third-party XML parsers
Regular expressions are not the correct tool to use here. You want to use a XML parser. I like PHP's SimpleXML
$sXML = new SimpleXMLElement('http://address/to/test.xml', 0, TRUE);
$stories = $sXML->story;
foreach($stories as $story){
$title = (string)$story->title;
$author = (string)$story->author;
$url = (string)$story->url;
}
You should never use regexp to parse an XML document (Ok, never is a big word, in some rare cases the regexp can be better but not in your case).
As it's a document reading, I suggest you to use the SimpleXML class and XPath queries.
For example :
$ cat test.php
#!/usr/bin/php
<?php
function xpathValueToString(SimpleXMLElement $xml, $xpath){
$arrayXpath = $xml->xpath($xpath);
return ($arrayXpath) ? trim((string) $arrayXpath[0]) : null;
}
$xml = new SimpleXMLElement(file_get_contents("test.xml"));
$arrayXpathStories = $xml->xpath("/latestIssue/story");
foreach ($arrayXpathStories as $story){
echo "Title : " . xpathValueToString($story, 'title') . "\n";
echo "Author : " . xpathValueToString($story, 'author') . "\n";
echo "URL : " . xpathValueToString($story, 'url') . "\n\n";
}
?>
$ ./test.php
Title : Is the earth flat?
Author : A. N. Redneck
URL : http://www.HotStuff.ie/stories/story123456.xml
Title : What the actress said to the bishop
Author : Brated Film Critic
URL : http://www.HotStuff.ie/stories/story123457.xml
Title : What the year has in store
Author : Stargazer
URL : http://www.HotStuff.ie/stories/story123458.xml

Get description item from XML with PHP and ignore br tags

I have this XML file:
http://www.dailymotion.com/rss/tag/house
I need obtain first title and description items content. Than I use this code:
$xml = simplexml_load_file('http://www.dailymotion.com/rss/tag/house/', 'SimpleXMLElement', LIBXML_NOCDATA);
$dom = new DOMDocument();
echo $xml->channel->item->title . "<br>";
#$dom->loadHtml($xml->channel->item->description);
$xpath = new DOMXPath($dom);
echo $description = $xpath->evaluate("string(//p[1]/text())");
This is OUTPUT
Fabio Lenzi - Beautiful Sorrow (Ivan Scratchin Funky Mix)
Download:
But if I read xml, in description item I find this content within p tag:
<p>Download:<br>https://itunes.apple.com/it/album/beautiful-sorrow/id661901853<br><br>©
Copyright protected work. ℗ Frutilla Records - All rights reserved. Only for watching,
listening and streaming. Downloading, copying, sharing and making available is strictly
prohibited.<br>
<br>frutillarecords#gmail.com<br>alex.voghi#dancetool.net<br>info#dancetool.net -
YourDancefloorTV – (Re)Discover your Dance greatest hits - YourDancefloorTV is your
channel for all the best Dance music. Find your favorite tracks and artists and
experience the best of Dance music. Subscribe for free to stay connected to our channel
and easily access our video updates! - YourDancefloorTV:
http://www.dailymotion.com/yourdancefloortv</p>
Than I noticed that parsing is interrupted from < br > tags. How can I ignore br tags and others any tags included in paragraph?
Any helps its really appreciated.
Use strip_tags to remove html tags !
Example:
$stripped_string = strip_tags($unstripped_string, $allowable_tags);
This may be ,
$x = // Your Result;
echo strip_tags($x,'<p>');

Parse XML namespaces with php SimpleXML

I know this has been asked many many times but I haven't been able to get any of the suggestions to work with my situation and I have searched the web and here and tried everything and anything and nothing works. I just need to parse this XML with the namespace cap: and just need four entries from it.
<?xml version="1.0" encoding="UTF-8"?>
<entry>
<id>http://alerts.weather.gov/cap/wwacapget.php?x=TX124EFFB832F0.SpecialWeatherStatement.124EFFB84164TX.LUBSPSLUB.ac20a1425c958f66dc159baea2f9e672</id>
<updated>2013-05-06T20:08:00-05:00</updated>
<published>2013-05-06T20:08:00-05:00</published>
<author>
<name>w-nws.webmaster#noaa.gov</name>
</author>
<title>Special Weather Statement issued May 06 at 8:08PM CDT by NWS</title>
<link href="http://alerts.weather.gov/cap/wwacapget.php?x=TX124EFFB832F0.SpecialWeatherStatement.124EFFB84164TX.LUBSPSLUB.ac20a1425c958f66dc159baea2f9e672"/>
<summary>...SIGNIFICANT WEATHER ADVISORY FOR COCHRAN AND BAILEY COUNTIES... AT 808 PM CDT...NATIONAL WEATHER SERVICE DOPPLER RADAR INDICATED A STRONG THUNDERSTORM 30 MILES NORTHWEST OF MORTON...MOVING SOUTHEAST AT 25 MPH. NICKEL SIZE HAIL...WINDS SPEEDS UP TO 40 MPH...CONTINUOUS CLOUD TO GROUND LIGHTNING...AND BRIEF MODERATE DOWNPOURS ARE POSSIBLE WITH</summary>
<cap:event>Special Weather Statement</cap:event>
<cap:effective>2013-05-06T20:08:00-05:00</cap:effective>
<cap:expires>2013-05-06T20:45:00-05:00</cap:expires>
<cap:status>Actual</cap:status>
<cap:msgType>Alert</cap:msgType>
<cap:category>Met</cap:category>
<cap:urgency>Expected</cap:urgency>
<cap:severity>Minor</cap:severity>
<cap:certainty>Observed</cap:certainty>
<cap:areaDesc>Bailey; Cochran</cap:areaDesc>
<cap:polygon>34.19,-103.04 34.19,-103.03 33.98,-102.61 33.71,-102.61 33.63,-102.75 33.64,-103.05 34.19,-103.04</cap:polygon>
<cap:geocode>
<valueName>FIPS6</valueName>
<value>048017 048079</value>
<valueName>UGC</valueName>
<value>TXZ027 TXZ033</value>
</cap:geocode>
<cap:parameter>
<valueName>VTEC</valueName>
<value>
</value>
</cap:parameter>
</entry>
I am using simpleXML and I have a small simple test script set up and it works great for parsing regular elements. I can't for the dickens of me find or get a way to parse the elements with the namespaces.
Here is a small sample test script with code I am using and works great for parsing simple elements. How do I use this to parse namespaces? Everything I've tried doesn't work. I need it to be able to create variables so I can be able to embed them in HTML for style.
<?php
$html = "";
// Get the XML Feed
$data = "http://alerts.weather.gov/cap/tx.php?x=1";
// load the xml into the object
$xml = simplexml_load_file($data);
for ($i = 0; $i < 10; $i++){
$title = $xml->entry[$i]->title;
$summary = $xml->entry[$i]->summary;
$html .= "<p><strong>$title</strong></p><p>$summary</p><hr/>";
}
echo $html;
?>
This works fine for parsing regular elements but what about the ones with the cap: namespace under the entry parent?
<?php
ini_set('display_errors','1');
$html = "";
$data = "http://alerts.weather.gov/cap/tx.php?x=1";
$entries = simplexml_load_file($data);
if(count($entries)):
//Registering NameSpace
$entries->registerXPathNamespace('prefix', 'http://www.w3.org/2005/Atom');
$result = $entries->xpath("//prefix:entry");
//echo count($asin);
//echo "<pre>";print_r($asin);
foreach ($result as $entry):
$title = $entry->title;
$summary = $entry->summary;
$html .= "<p><strong>$title</strong></p><p>$summary</p>$event<hr/>";
endforeach;
endif;
echo $html;
?>
Any help would be greatly appreciated.
-Thanks
I have given same type of answer here - solution to your question
You just need to register Namespace and then you can work normally with simplexml_load_file and XPath
<?php
$data = "http://alerts.weather.gov/cap/tx.php?x=1";
$entries = file_get_contents($data);
$entries = new SimpleXmlElement($entries);
if(count($entries)):
//echo "<pre>";print_r($entries);die;
//alternate way other than registring NameSpace
//$asin = $asins->xpath("//*[local-name() = 'ASIN']");
$entries->registerXPathNamespace('prefix', 'http://www.w3.org/2005/Atom');
$result = $entries->xpath("//prefix:entry");
//echo count($asin);
//echo "<pre>";print_r($result);die;
foreach ($result as $entry):
//echo "<pre>";print_r($entry);die;
$dc = $entry->children('urn:oasis:names:tc:emergency:cap:1.1');
echo $dc->event."<br/>";
echo $dc->effective."<br/>";
echo "<hr>";
endforeach;
endif;
That's it.
Here's an alternative solution:
<?php
$xml = <<<XML
<?xml version = '1.0' encoding = 'UTF-8' standalone = 'yes'?>
<?xml-stylesheet href='http://alerts.weather.gov/cap/capatom.xsl' type='text/xsl'?>
<!--
This atom/xml feed is an index to active advisories, watches and warnings
issued by the National Weather Service. This index file is not the complete
Common Alerting Protocol (CAP) alert message. To obtain the complete CAP
alert, please follow the links for each entry in this index. Also note the
CAP message uses a style sheet to convey the information in a human readable
format. Please view the source of the CAP message to see the complete data
set. Not all information in the CAP message is contained in this index of
active alerts.
-->
<feed
xmlns = 'http://www.w3.org/2005/Atom'
xmlns:cap = 'urn:oasis:names:tc:emergency:cap:1.1'
xmlns:ha = 'http://www.alerting.net/namespace/index_1.0'
>
<!-- http-date = Tue, 07 May 2013 04:14:00 GMT -->
<id>http://alerts.weather.gov/cap/tx.atom</id>
<logo>http://alerts.weather.gov/images/xml_logo.gif</logo>
<generator>NWS CAP Server</generator>
<updated>2013-05-06T23:14:00-05:00</updated>
<author>
<name>w-nws.webmaster#noaa.gov</name>
</author>
<title>Current Watches, Warnings and Advisories for Texas Issued by the National Weather Service</title>
<link href='http://alerts.weather.gov/cap/tx.atom'/>
<entry>
<id>http://alerts.weather.gov/cap/wwacapget.php?x=TX124EFFB8AA78.FireWeatherWatch.124EFFD70270TX.EPZRFWEPZ.1716207877d94d15d43d410892b9f175</id>
<updated>2013-05-06T23:14:00-05:00</updated>
<published>2013-05-06T23:14:00-05:00</published>
<author>
<name>w-nws.webmaster#noaa.gov</name>
</author>
<title>Fire Weather Watch issued May 06 at 11:14PM CDT until May 08 at 10:00PM CDT by NWS</title>
<link href="http://alerts.weather.gov/cap/wwacapget.php?x=TX124EFFB8AA78.FireWeatherWatch.124EFFD70270TX.EPZRFWEPZ.1716207877d94d15d43d410892b9f175"/>
<summary>...CRITICAL FIRE CONDITIONS EXPECTED WEDNESDAY ACROSS FAR WEST TEXAS AND THE SOUTHWEST NEW MEXICO LOWLANDS... .WINDS ALOFT WILL STRENGTHEN OVER THE REGION EARLY THIS WEEK...AHEAD OF AN UPPER LEVEL TROUGH FORECAST TO MOVE THROUGH NEW MEXICO AND TEXAS ON WEDNESDAY. SURFACE LOW PRESSURE WILL ALSO DEVELOP TO OUR EAST AS THE TROUGH APPROACHES. THIS COMBINATION WILL RESULT</summary>
<cap:event>Fire Weather Watch</cap:event>
<cap:effective>2013-05-06T23:14:00-05:00</cap:effective>
<cap:expires>2013-05-08T22:00:00-05:00</cap:expires>
<cap:status>Actual</cap:status>
<cap:msgType>Alert</cap:msgType>
<cap:category>Met</cap:category>
<cap:urgency>Future</cap:urgency>
<cap:severity>Moderate</cap:severity>
<cap:certainty>Possible</cap:certainty>
<cap:areaDesc>El Paso; Hudspeth</cap:areaDesc>
<cap:polygon></cap:polygon>
<cap:geocode>
<valueName>FIPS6</valueName>
<value>048141 048229</value>
<valueName>UGC</valueName>
<value>TXZ055 TXZ056</value>
</cap:geocode>
<cap:parameter>
<valueName>VTEC</valueName>
<value>/O.NEW.KEPZ.FW.A.0018.130508T1900Z-130509T0300Z/</value>
</cap:parameter>
</entry>
<entry>
<id>http://alerts.weather.gov/cap/wwacapget.php?x=TX124EFFABB2F0.AirQualityAlert.124EFFC750DCTX.HGXAQAHGX.7f2cf548a67d403f0541492b2804d621</id>
<updated>2013-05-06T14:16:00-05:00</updated>
<published>2013-05-06T14:16:00-05:00</published>
<author>
<name>w-nws.webmaster#noaa.gov</name>
</author>
<title>Air Quality Alert issued May 06 at 2:16PM CDT by NWS</title>
<link href="http://alerts.weather.gov/cap/wwacapget.php?x=TX124EFFABB2F0.AirQualityAlert.124EFFC750DCTX.HGXAQAHGX.7f2cf548a67d403f0541492b2804d621"/>
<summary>...OZONE ACTION DAY FOR TUESDAY... THE TEXAS COMMISSION ON ENVIRONMENTAL QUALITY (TCEQ)...HAS ISSUED AN OZONE ACTION DAY FOR THE HOUSTON...GALVESTON...AND BRAZORIA AREAS FOR TUESDAY...MAY 7 2013. ATMOSPHERIC CONDITIONS ARE EXPECTED TO BE FAVORABLE FOR PRODUCING HIGH LEVELS OF OZONE POLLUTION IN THE HOUSTON...GALVESTON AND</summary>
<cap:event>Air Quality Alert</cap:event>
<cap:effective>2013-05-06T14:16:00-05:00</cap:effective>
<cap:expires>2013-05-07T19:15:00-05:00</cap:expires>
<cap:status>Actual</cap:status>
<cap:msgType>Alert</cap:msgType>
<cap:category>Met</cap:category>
<cap:urgency>Unknown</cap:urgency>
<cap:severity>Unknown</cap:severity>
<cap:certainty>Unknown</cap:certainty>
<cap:areaDesc>Brazoria; Galveston; Harris</cap:areaDesc>
<cap:polygon></cap:polygon>
<cap:geocode>
<valueName>FIPS6</valueName>
<value>048039 048167 048201</value>
<valueName>UGC</valueName>
<value>TXZ213 TXZ237 TXZ238</value>
</cap:geocode>
<cap:parameter>
<valueName>VTEC</valueName>
<value></value>
</cap:parameter>
</entry>
</feed>
XML;
$sxe = new SimpleXMLElement($xml);
$capFields = $sxe->entry->children('cap', true);
echo "Event: " . (string) $capFields->event . "\n";
echo "Effective: " . (string) $capFields->effective . "\n";
echo "Expires: " . (string) $capFields->expires . "\n";
echo "Severity: " . (string) $capFields->severity . "\n";
Output:
Event: Fire Weather Watch
Effective: 2013-05-06T23:14:00-05:00
Expires: 2013-05-08T22:00:00-05:00
Severity: Moderate

read xml using file get contents

i hv a xml file,how to get values in title field using get file content method..i just want to get the value "TomTom XXL 550M - US, Canada & Mexico Automotive GPS. (Brand New)"
<Title>
TomTom XXL 550M - US, Canada & Mexico Automotive GPS. (Brand New)
</Title>
my code
$xmlstr = file_get_contents($source);
$parseXML = new SimpleXMLElement($xmlstr);
print($parseXML);
// load as file
$parseXMLFile = new SimpleXMLElement($source,null,true);
If you feel confortable with javascript, there is another solution called DOMDocument
You can load XML files and also use function like getElementsByTagName. For example, if you have a books.xml file like this:
<?xml version="1.0" encoding="utf-8"?>
<books>
<book><title>Patterns of Enterprise Application Architecture</title></book>
<book><title>Design Patterns: Elements of Reusable Software Design</title></book>
<book><title>Clean Code</title></book>
</books>
You can extract titles so:
$dom = new DOMDocument;
$dom->load('books.xml');
$books = $dom->getElementsByTagName('title');
foreach ($books as $book) {
echo $book->nodeValue.'<br>';
}
You just have to read your file with simplexml_load_file : Doc for this one
You will then get object of class SimpleXMLElement.
Then, you can use it to get what you want ! Some examples here : SimpleXML Examples

Parsing xml data in an order using xquery

I have a xml file like this
<?xml version="1.0" encoding="UTF-8"?>
<gallery >
<gallerydata>
<galleryname>
Home
</galleryname>
<createdat>
14/8/2010 4:53 pm
</createdat>
</gallerydata>
<gallerydata>
<galleryname>
School
</galleryname>
<createdat>
13/8/2010 5:19 pm
</createdat>
</gallerydata>
<gallerydata>
<galleryname>
Company
</galleryname>
<createdat>
15/8/2010 5:21 pm
</createdat>
</gallerydata>
</gallery>
Iam using xpath and xquery for parsing this xml file
$xml_str = file_get_contents($file);
$xml = simplexml_load_string($xml_str);
$nodes = $xml->xpath('//gallery/gallerydata');
How i can get the latest gallery data, i mean latest file (the data with last date )
Is their any way to do ?Please help me
IF you looking for a way to do it with the initial xpath query that could be rough, i dont think xpath 1.0 can parse dates so youd have to do some convoluted string comparison... If youre ok with doing it in php after the fact theres a multitude of ways... for example:
$xml = new SimpleXmlElement($xmlStr);
$gallerydata = $xml->xpath('//gallerydata');
$nodes = array();
foreach($gallerydata as $entry)
{
$utime = strtotime(str_replace('/', '-',$entry->createdat));
$nodes[$utime] = $entry;
}
krsort($nodes);
$newest = array_shift($nodes);
echo $newest->asXml(); // or do wht you need to do with the entry data...
This loops over your renturned node set and puts the nodes in an array indexed by unix time. Then I run a reversed key sort and shift the first element off the array. Note, that since im using strtotime for the conversion and you have UK date formats we have to do a string replace on the slashes.
Since your file does not store gallery data in chronological order, you can get the latest gallery entry only by looping through all the entries.

Categories