Xpath query in PHP to attribute from element with specific attribute value - php

I'm truly bending my head over something that should be way to simple. I have an XML feed with 25 entries in the root. I'm already iterating them as $entry in PHP.
Here is an example of one entry in the xml feed:
<entry>
<id>tag:blogger.com,1999:blog-7691515427771054332.post-4593968385603307594</id>
<published>2014-02-10T06:33:00.000-05:00</published>
<updated>2014-02-10T06:40:34.678-05:00</updated>
<category scheme="http://www.blogger.com/atom/ns#" term="Aurin" />
<category scheme="http://www.blogger.com/atom/ns#" term="fan art" />
<category scheme="http://www.blogger.com/atom/ns#" term="Fred-H" />
<category scheme="http://www.blogger.com/atom/ns#" term="spellslinger" />
<category scheme="http://www.blogger.com/atom/ns#" term="wildstar" />
<title type="text">Fan Art Showcase: She's gunnin' for trouble!</title>
<content type="html">Some random content</content>
<link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/7691515427771054332/posts/default/4593968385603307594" />
<link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/7691515427771054332/posts/default/4593968385603307594" />
<link rel="alternate" type="text/html" href="http://www.wildstarfans.net/2014/02/fan-art-showcase-shes-gunnin-for-trouble.html" title="Fan Art Showcase: She's gunnin' for trouble!" />
<author>
<name>Name Removed</name>
<uri>URL removed</uri>
<email>noreply#blogger.com</email>
<gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="32" height="32" src="//lh3.googleusercontent.com/-ow-dvUDbNxI/AAAAAAAAAAI/AAAAAAAABTY/MhrybgagMv0/s512-c/photo.jpg" />
</author>
<media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/-Ifp6awhDJuU/UWQEUl8nhUI/AAAAAAAABss/BSZ_YYM1U38/s72-c/fan-art-header.png" height="72" width="72" />
</entry>
I want to get the href of the third link with rel set to alternate. The alternate link isn't always the third one. I know how to do this through SimpleXML, but I want to get to know xpath for this, because through simpleXML it's more complicated and with this I hope I'm one step closer to understanding complex xpath queries.
The PHP I got that makes the most sense to me is:
$href = $entry->xpath('link[#rel="alternate"]/#href');
I tried multiple queries based on the information I found, but they all resulted in nothing. Here is a list of the queries I tried:
$href = $entry->xpath('link[#rel="alternate"]/#href/text()');
$href = $entry->xpath('link[#rel="alternate"]')->getAttributes()->href;
$href = $entry->xpath('*[#rel="alternate"]'); $href = $href['href'];

As it turns out from the chat conversation from my original question I had to register the namespace. In the end I used this website and the code turned out to be like this:
$feed = new DOMDocument();
$feed->load("http://www.wildstarfans.net/feeds/posts/default");
$xpath = new DOMXPath($feed);
$xpath->registerNamespace('atom', 'http://www.w3.org/2005/Atom');
foreach ($xpath->evaluate('//atom:entry') as $entry) {
$href = $xpath->evaluate('string(atom:link[#rel="alternate"]/#href)', $entry);
}
Credits go to ThW and Wrikken. Wish I could give you guys SO points for this.

$href = $entry->xpath('link[#rel="alternate"]');
$href = (string) $href[0]->attributes()->href;

Related

How to parse XML's <media:text type="html"> with PHP

I would be happy if there was someone who can tell me how to decode the following string from XML to PHP:
<media:text type="html">
<p>
<a href="foo.com">
<img src="foo.com/foo.jpg" align="left" alt="Foo title" title="Foo title" border="0" />
</a>
</p>
</media:text>
which is part of the following item:
<item>
<title>Foo title</title>
<description>Foo Description</description>
<link>foo.com</link>
<pubDate>Tue, 02 Feb 2021 18:23:51 EST</pubDate>
<media:content url="foo.com/foo.jpg" />
**<media:text type="html">
<p>
<a href="foo.com">
<img src="foo.com/foo.jpg" align="left" alt="Foo title" title="Foo title" border="0" />
</a>
</p>
</media:text>**
</item>
With the code portion
$ content = $ xml-> channel-> item [$ i] -> children ('media', True) -> content-> attributes ();
I can only value content but I can't extract
<media: text type = "html">
Thanks to those who can help me!
You can use the SimpleXMLElement function to parse your XML, you will receive an array which will be easily parsed.
See https://www.php.net/manual/fr/simplexml.examples-basic.php.

Regex - Replacing content - eZ Publish XML field

I have an Xml content that i want to modify before using the eZ Publish 5 API to create it.
I am trying to implement a Regex to modify the content.
Here is the Xml code that i have (with html entities) :
Print of Xml code http://img15.hostingpics.net/pics/453268xmlcode.jpg
I want to be able to catch empty.jpg in :
<img alt="" src="http://www.asite.org/empty.jpg" />
And replace the whole line for each occurrence by :
<custom name="my_checkbox"></custom>
Problem :
The img tag can sometimes contain other attributes like : height="15" width="12"
<img height="15" alt="" width="12" src="http://www.asite.org/empty.jpg" />
And sometimes the attributes are after the src attribute in a different order.
The aim would be :
Xml code - Aim http://img15.hostingpics.net/pics/318980xmlcodeaim.jpg
I've tried many things so far but nothing worked.
Thanks in advance for helping.
Cheers !
EDIT :
Here is an example of what i've tried so far :
/(<img [a-z = ""]* src="http:\/\/www\.asite\.org\/empty\.jpg" \/&gt)/g
Dealing with XML i've used an XML parser to reach the desired section.
Then we can apply a regex (~<img.*?>(?=</span)~) to select and replace the image tag with your custom tag (note that in the object received by the xml parser the html entities are replaces with their equivalent char).
This is a piece of code that emulates and handle your situation:
<?php
$xmlstr = <<<XML
<sections>
<section>
<paragraph>
<literal class="html">
<img alt="" src="http://asite.org/empty.png" /></span></span> Yes/no&nbsp;<br />
<img alt="" src="http://asite.org/empty.png" /></span></span> Other text/no&nbsp;<br />
</literal>
</paragraph>
</section>
</sections>
XML;
$sections = new SimpleXMLElement($xmlstr);
foreach ($sections->section->paragraph as $paragraph) {
$re = "~<img.*?>(?=</span)~";
$subst = "<custom name=\"my_checkbox\"></custom>";
$paragraph->literal = preg_replace($re, $subst, $paragraph->literal);
}
echo $sections->asXML();
?>
The output is:
<?xml version="1.0"?>
<sections>
<section>
<paragraph>
<literal class="html">
<custom name="my_checkbox"></custom></span></span> Yes/no&nbsp;<br />
<custom name="my_checkbox"></custom></span></span> Other text/no&nbsp;<br />
</literal>
</paragraph>
</section>
</sections>
An online demo can be found HERE

SimpleXMLElement Can't Find Node Attribute

One entry of the feed:-
<entry>
<id>tag:blogger.com,1999:blog-8729980629780635785.post-7267854162055446813</id>
<published>2015-08-12T10:51:00.000-04:00</published>
<updated>2015-08-12T10:51:07.914-04:00</updated>
<category scheme="http://www.blogger.com/atom/ns#" term="Cancer Prevention" />
<category scheme="http://www.blogger.com/atom/ns#" term="Cervical Cancer" />
<category scheme="http://www.blogger.com/atom/ns#" term="curcumin" />
<category scheme="http://www.blogger.com/atom/ns#" term="HPV" />
<category scheme="http://www.blogger.com/atom/ns#" term="Mouth Cancer" />
<category scheme="http://www.blogger.com/atom/ns#" term="STD" />
<category scheme="http://www.blogger.com/atom/ns#" term="Throat Cancer" />
<category scheme="http://www.blogger.com/atom/ns#" term="Tonsil Cancer" />
<category scheme="http://www.blogger.com/atom/ns#" term="turmeric" />
<title type="text">Curcumin May Prevent HPV-Related Cancers</title>
<content type="html"><h2>Maylin Rodriguez-Paez RN</h2><a href="http://3.bp.blogspot.com/-dPF4cftq7l8/VctX0J56YyI/AAAAAAAAEXo/yHPgb0E_Ha8/s1600/trun.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" height="167" src="http://3.bp.blogspot.com/-dPF4cftq7l8/VctX0J56YyI/AAAAAAAAEXo/yHPgb0E_Ha8/s1600/trun.jpg" width="320" /></a>HPV is the most common sexually transmitted disease in the United States. It infects at least fourteen million annually and most are unware of their infection status.<sup>1</sup><br /><br />For the most part, HPV does not cause any harm to the infected individual. However, in a minority of cases, HPV can lead to the development of mouth, throat, tonsil, and <a href="http://blog.lifeextension.com/2014/04/mushroom-extract-treat-cervical-cancer.html" target="_blank">cervical cancers</a>.<br /><br />Interestingly, a new study shows curcumin may offer help in combating the HPV virus and preventing HPV-induced cancers. The results were published in the online journal, <i>Ecancermedicalscience</i>. <br /><br /><h3>Curcumin Silences Cancer-Causing Genes </h3>Previous studies show curcumin combats multiple types of <a href="http://www.lifeextension.com//Magazine/2011/3/How-Curcumin-Protects-Against-Cancer/Page-01" target="_blank">cancers</a>. Benefits have been seen for cancers of the breast, prostate, pancreas, and colon.<br /><br />For the current study, oral cancer cells infected with the HPV virus were cultivated along with curcumin extract. According to the results of the study, curcumin suppressed the activity of transcription factors (proteins that control the activity of genes) needed for the cancer cells to develop.<sup>2</sup><br /><br />One transcription factor in particular, NF-kB, is known for being involved in cancer development. Curcumin was also shown to silence cancer-promoting genes (oncogenes) and caused HPV infected cells to undergo apoptosis (cell- suicide).<sup>2</sup><br /><br />HPV-related cancers (especially those developing in the throat and mouth) are growing within the United States. Prior research shows curcumin helps to clear the HPV virus from cervical tissue.<sup>3</sup><br /><br />Despite the results of this current study, more research is needed to confirm curcumin’s benefits against HPV-related cancers. <br /><br /><h3>How to Get More Curcumin in Your System</h3>Curcumin is an antioxidant found in the herb turmeric. It gives turmeric its characteristic yellow color.&nbsp; <br /><br />While adding turmeric to food is an excellent idea, it’s actually not the best way to obtain curcumin in your diet. Unfortunately, curcumin is poorly absorbed into the bloodstream.<br /><br />Fortunately, supplementing with a curcumin extract offers a way to increase curcumin levels in the blood. Look for preparations that contain <a href="http://www.lifeextension.com//Magazine/2014/2/Bio-Enhanced-TURMERIC-Compounds-Block-Multiple-Inflammatory-Pathways/Page-01" target="_blank">phospholipids </a>for optimal absorption. <br /><br /><h2>References:</h2><ol><li>Available at: <a href="http://www.cdc.gov/std/hpv/stdfact-hpv.htm">http://www.cdc.gov/std/hpv/stdfact-hpv.htm</a>. Accessed May 4<sup>th</sup>, 2015.&nbsp;</li><li><i>Ecancermedicalscience</i>. 2015 Apr 23;9:525.&nbsp;</li><li><i>Asian Pac J Cancer Prev</i>. 2013;14(10):5753-9.</li></ol><img src="http://feeds.feedburner.com/~r/LifeExtensionBlog/~4/BhZ_UDkx0gc" height="1" width="1" alt=""/></content>
<link rel="replies" type="application/atom+xml" href="http://blog.lifeextension.com/feeds/7267854162055446813/comments/default" title="Post Comments" />
<link rel="replies" type="text/html" href="http://blog.lifeextension.com/2015/08/curcumin-may-prevent-hpv-related-cancers.html#comment-form" title="0 Comments" />
<link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/8729980629780635785/posts/default/7267854162055446813" />
<link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/8729980629780635785/posts/default/7267854162055446813" />
<link rel="alternate" type="text/html" href="http://blog.lifeextension.com/2015/08/curcumin-may-prevent-hpv-related-cancers.html" title="Curcumin May Prevent HPV-Related Cancers" />
<author>
<name>LifeExtension</name>
<uri>http://www.blogger.com/profile/00252359139805937161</uri>
<email>noreply#blogger.com</email>
<gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="http://img2.blogblog.com/img/b16-rounded.gif" />
</author>
<media:thumbnail
xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/-dPF4cftq7l8/VctX0J56YyI/AAAAAAAAEXo/yHPgb0E_Ha8/s72-c/trun.jpg" height="72" width="72" />
<thr:total>0</thr:total>
</entry>
Code:-
<?php
$url = 'http://feeds.feedburner.com/LifeExtensionBlog';
$XmlObject = new SimpleXmlElement( file_get_contents($url) );
foreach($XmlObject->entry as $entry) {
$content = (string) $entry->content[0];
$post_title = (string) $entry->title[0];
$tmp = $entry->xpath('media:thumbnail/#url');
var_dump($tmp);
$image_url = (string) $tmp;
echo 'THE IMAGE:' . $image_url . "<br><hr><br>";
echo $post_title . "<br>";
echo $content . "<br>";
}
Output:-
Warning: SimpleXMLElement::xpath(): Undefined namespace prefix in C:\wamp\www\test-4.php on line 20
Title and the content prints fine.

XML to MySQL when xml file has multiple matching fields

I've been doing some work on an XML to Mysql using load XML. I have been successful with itin the past. The difference with the latest effort is that we have multiple occurrences of a field-name in the MySQL. A sample of this is below:
<row>
<pictures>
<picture name="Photo 1">
<filename>image1.jpg</filename>
</picture>
<picture name="Photo 2">
<filename>image2.jpg</filename>
</picture>
<picture name="Photo 4">
<filename>image3.jpg</filename>
</picture>
<picture name="Photo 3">
<filename>image4.jpg</filename>
</picture>
<picture name="Photo 7">
<filename>image5.jpg</filename>
</picture>
<picture name="Photo 6">
<filename>image6.jpg</filename>
</picture>
<picture name="Photo 5">
<filename>image7.jpg</filename>
</picture>
<picture name="Photo 8">
<filename>image8.jpg</filename>
</picture>
<picture name="Photo 9">
<filename>image9.jpg</filename>
</picture>
</pictures>
</row>
I need to import this into a MySQL table with the fields:
picture1
picture2
picture3
picture4
picture5
picture6
picture7
picture8
picture9
As you can see, the 'name' attribute doesn't necessarily occur in the correct order, so I need them to simply be inserted in order. So the first <filename> to go to picture1, the second <filename> to picture2 etc..
What is currently being achieved is that I always end up with the last <picture> entry in the list being in the table. This is I assume because the filed is being overwritten each time.
Any ideas how to achieve this? I have found similar queries to this but no answers as yet and have been looking for a good while. The rest of the file is loading fine as they have unique field-names and can easily be mapped to a MySQL column, but I am struggling with this one.
As the XML does not match the format you aim for you need to transform it first. Traditionally this is done with XSLT but you can also do this with XMLReader and XMLWriter in PHP which has the benefit that it does not require to keep the whole XML document(s) in memory.
The XMLReaderIterator package has support for such operations, an example is already given with the library.
Creating a modification of that example code by taking your specific case and an exemplary input file named pictures.xml and keeping the output to the standard-output for demonstration purposes allows me to quote the following excerpt:
[... starts like examples/read-write.php]
/** #var $iterator XMLWritingIteration|XMLReaderNode[] */
$iterator = new XMLWritingIteration($writer, $reader);
$writer->startDocument();
$rename = ['row' => 'resultset', 'pictures' => 'row'];
$trimLevel = null;
$pictureCount = null;
foreach ($iterator as $node) {
$name = $node->name;
$isElement = $node->nodeType === XMLReader::ELEMENT;
$isEndElement = $node->nodeType === XMLReader::END_ELEMENT;
$isWhitespace = $node->nodeType === XMLReader::SIGNIFICANT_WHITESPACE;
if (($isElement || $isEndElement) && $name === 'filename') {
// drop <filename> opening and closing tags
} elseif ($isElement && $name === 'picture') {
$writer->startElement('field');
$writer->writeAttribute('name', sprintf('picture%d', ++$pictureCount));
$trimLevel = $node->depth;
} elseif ($trimLevel && $isWhitespace && $node->depth > $trimLevel) {
// drop (trim) SIGNIFICANT_WHITESPACE
} elseif ($isElement && isset($rename[$name])) {
$writer->startElement($rename[$name]);
if ($rename[$name] === 'row') {
$pictureCount = 0;
}
} else {
$iterator->write();
}
}
This is one XMLWritingIteration that is composed of an XMLReader and XMLWriter object. That iteration allows you to take over everything from the input document (via $iterator->write()) and do the needed changes only on occasions:
drop the <filename> and </filename> tags
create <field> elements with the correct name attributes to have the pictures in document order (Mysql XML nomenclature)
drop significant whitespace as <filename> tags are dropped as well
rename the document element from <row> to <resultset> (Mysql XML nomenclature)
rename the <pictures> element to <row> (again Mysql XML nomenclature)
the counter for the picture fields is reset per each (output) row
everything else is kept as-is
Such a transformation results in the following example output with the XML presented in your question:
<?xml version="1.0"?>
<resultset>
<row>
<field name="picture1">image1.jpg</field>
<field name="picture2">image2.jpg</field>
<field name="picture3">image3.jpg</field>
<field name="picture4">image4.jpg</field>
<field name="picture5">image5.jpg</field>
<field name="picture6">image6.jpg</field>
<field name="picture7">image7.jpg</field>
<field name="picture8">image8.jpg</field>
<field name="picture9">image9.jpg</field>
</row>
</resultset>
For more information about the XML format used by Mysql, please see the Mysql documentation for the --xml commandline switch which describes the standard XML output format which can be read in by LOAD XML.
For this little example you could as well use XSLT as there would be no problem to do the whole transformation in memory. But if you need to look for memory (which can happen if you deal with XML database dumps), the XMLWritingIteration allows iteration based XML transformation with an XML Pull parser (XMLReader) and forward-only XML output via XMLWriter.
And here is the XSLT solution. As information, XSLT is a declarative special-purpose language to transform, re-style, and restructure XML documents in various formats for end use purposes. PHP maintains an XSLT processor. Be sure to uncomment out extension=php_xsl.dll
XLST (accommodates image numbers greater than two digits)
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output version="1.0" encoding="UTF-8"/>
<xsl:template name="picturesort" match="pictures" >
<row>
<pictures>
<xsl:for-each select="picture">
<xsl:variable name="numkey"
select="substring-after(substring-before(filename, '.'), 'e')"/>
<picture name="{../picture[substring-after(#name, ' ') = $numkey]/#name}">
<xsl:copy-of select="filename"/>
</picture>
</xsl:for-each>
</pictures>
</row>
</xsl:template>
</xsl:stylesheet>
XML OUTPUT
<?xml version="1.0" encoding="UTF-8"?>
<row>
<pictures>
<picture name="Photo 1">
<filename>image1.jpg</filename>
</picture>
<picture name="Photo 2">
<filename>image2.jpg</filename>
</picture>
<picture name="Photo 3">
<filename>image3.jpg</filename>
</picture>
<picture name="Photo 4">
<filename>image4.jpg</filename>
</picture>
<picture name="Photo 5">
<filename>image5.jpg</filename>
</picture>
<picture name="Photo 6">
<filename>image6.jpg</filename>
</picture>
<picture name="Photo 7">
<filename>image7.jpg</filename>
</picture>
<picture name="Photo 8">
<filename>image8.jpg</filename>
</picture>
<picture name="Photo 9">
<filename>image9.jpg</filename>
</picture>
</pictures>
</row>
PHP
<?php
// Load the XML source
$xml = new DOMDocument;
$xml->load('C:/Path/To/XMLfile.xml');
$xsl = new DOMDocument;
$xsl->load('C:/Path/To/XSLfile.xsl');
// Configure the transformer
$proc = new XSLTProcessor;
$proc->importStyleSheet($xsl);
// Transform XML source
$newXml = $proc->transformToXML($xml);
echo $newXml;
// Save output to file
file_put_contents("C:/Path/To/NewXMLfile.xml", $newXml);
?>
Possible way:
iterate over all <picture> in a <row>
build an associative array with key = name and value = filename
sort array by keys
feed the array to your DB

Storing the content of an xml child node based on a parent node's attribute in php

I'm trying to display the biggest image url returned from an xml result. So far the largest returned is 400 high so I hardcoded 400 in. If possible I would like to select just the largest in case in the future I get results that don't have a 400 height image in them.
I've tried
$x = file_get_contents($url);
$xml = simplexml_load_string($x);
$imageURL=$xml->categories->category->items->product->images->image[#height='400']->sourceURL;
Which gives me "syntax error, unexpected '=', expecting ']'".
And I also tried:
$imageURL= $xml->xpath("/categories/category/items/producct/images/image[#height='400']/sourceURL");
But got a bad link.
Here is the XML:
<images>
<image available="true" height="100" width="100">
<sourceURL>
Someurl.com
</sourceURL>
</image>
<image available="true" height="200" width="200">
<sourceURL>
Someurl.com
</sourceURL>
</image>
<image available="true" height="300" width="300">
<sourceURL>
Someurl.com
</sourceURL>
</image>
<image available="true" height="400" width="400">
<sourceURL>
Someurl.com
</sourceURL>
</image>
<image available="true" height="399" width="400">
<sourceURL>
Someurl.com
</sourceURL>
</image>
</images>
Any ideas?
->image[#height='400'] is a direct PHP array reference. This'd be interpreted as supressing errors (#) on a defined() constant (height), and trying to set its value via an assignment ='400'.
For your xpath version, remember that an xpath query returns a DOMNodeList, not an actual DOMElement. To get the URLs you need from the query results, you have to ierate over the node list:
$nodes = $xpath->query(...) {
foreach($nodes as $node) {
$url = $node->nodeValue;
}
Below code might help...
$xmlSQLProcedures = new DOMXPath($xmlSQLProcedures);
$strProcedureName = $xmlSQLProcedures->query("//SQLProcedure[#ID='$sSQLProcedureID']")->item(0)->nodeValue;
$nodeParameters = $xmlSQLProcedures->query("//SQLProcedure[#ID='$sSQLProcedureID']/Parameters/Parameter");
$ParamCount = $nodeParameters->length-1;
for ($i=0;$i<=$ParamCount;$i++) {
echo $nodeParameters->item($i)->getAttribute("Name").'<br>';
}
<?xml version="1.0" encoding="UTF-8"?>
<SQLProcedures>
<!-- ********** FOR KEYWORD IN LOCAL LANGUAGE ************* -->
<SQLProcedure ID="001070001">
<Name>P_ManipulateKeywordsInLL</Name>
<Parameters>
<Parameter Name="LanguageId"/>
<Parameter Name="KeywordId"/>
<Parameter Name="KeywordInLL"/>
<Parameter Name="ActionFor"/>
<Parameter Name="KeywordInLLId"/>
<Parameter Name="Keyword"/>
<Parameter Name="KeywordList"/>
<Parameter Name="SessionId"/>
<Parameter Name="WarehouseId"/>
</Parameters>
</SQLProcedure>
</SQLProcedures>

Categories