I'm trying to create database of new releases from boomkat.com RSS feed. Feed is located here:
link
Now, I'm having issues with selection of stuff inside paragraph tags.
One paragraph in RSS feed looks like this:
<p>GOAT<br/>World Music<br/>ROCKET RECORDINGS<br/>INDIE / ROCK / ALTERNATIVE<br/>MP3 Release</p>
What I did so far is this:
<?php
$dom = new DOMDocument;
$dom->validateOnParse = true;
$dom->load("http://feeds.boomkat.com/boomkat_downloads_just_arrived");
$content = $dom->getElementsByTagName('content');
foreach ($content as $result) {
echo $result->nodeValue, PHP_EOL;
}
?>
But that gives me whole feed. Writing 'p' in getElementsByTagName doesn't work.
I would suggest using DOMDocument::loadHTMLFile() method instead of DOMDocument::load() (as load() is strictly for reading XML, not HTML).
The reason why you're getting the whole document, is because you are querying the entire document for a element called "content". There is no such HTML element. Instead you should be using
$dom->getElementsByTagName('p');
This will grab all the tags in the HTML document, and then you can loop over that. The primary reason why querying tags with "p" doesn't work, is because you need to load the document as strict HTML, and not use the default XML.
OK, well I don't understand why you're having problems, but I just tried what I suggested with the URL you provided, and got a proper print out of all the text of each <p> tag.
Here's the code:
$doc = new DOMDocument();
$doc->loadHTMLFile("http://boomkat.com/downloads/601228-goat-world-music");
$content = $doc->getElementsByTagName("p");
foreach($content as $element) {
Util::debug($element->textContent); // helper method similar to PHP's var_dump()
}
Here's the results I was able to print to the screen:
string(91) "Residual Echoes have come up with a really rather lovely disc of psychedelic folk goodness."
string(8) "MAMMATUS"
string(8) "Mammatus"
string(17) "ROCKET RECORDINGS"
string(45) "MP3 Download // £2.95FLAC Download // £3.95"
string(0) ""
string(19) "SERPENTINA SATELITE"
string(16) "Mecanica Celeste"
string(17) "ROCKET RECORDINGS"
string(45) "MP3 Download // £3.95FLAC Download // £4.95"
string(0) ""
string(12) "SUNCOIL SECT"
string(25) "One Note Obscures Another"
string(17) "ROCKET RECORDINGS"
string(45) "MP3 Download // £6.99FLAC Download // £7.99"
string(0) ""
string(16) "TEETH OF THE SEA"
string(10) "Hypnoticon"
string(17) "ROCKET RECORDINGS"
string(45) "MP3 Download // £2.50FLAC Download // £3.50"
string(52) "Proggy kosmiche rock from London's Teeth Of The Sea."
string(16) "TEETH OF THE SEA"
string(21) "Orphaned By the Ocean"
string(17) "ROCKET RECORDINGS"
string(45) "MP3 Download // £5.99FLAC Download // £6.99"
Was this something you were doing in the code?
Related
Does anybody know why SimpleXMLElement is removing the attributes in my XML??
I have XML data that looks like this (note the translation "language" attribute):
<events>
<event id="d8f17143-0c67-48aa-a7f1-003a5ddbd28f">
<details>
<names>
<translation language="en">English title</translation>
<translation language="de">German title</translation>
</names>
</details>
</event>
</events>
I run it through SimpleXmlElement like so:
$xmlConvertedData = new \SimpleXMLElement($xml);
I dump out the data and it looks like so:
object(SimpleXMLElement)#958 (2) {
["#attributes"]=>
array(1) {
["Index"]=>
string(1) "1"
}
["Events"]=>
object(SimpleXMLElement)#956 (1) {
["Event"]=>
array(1) {
[0]=>
object(SimpleXMLElement)#959 (1) {
["Details"]=>
object(SimpleXMLElement)#826 (13) {
["Names"]=>
object(SimpleXMLElement)#834 (1) {
["Translation"]=>
array(2) {
[0]=>
string(32) "English title"
[1]=>
string(33) "German title"
}
}
}
}
}
}
}
...notice "translation" no longer has a "language" attribute, just an ID number 0 and 1. I need to know the attribute value because the XML does not always show the same language first.
(I edited the shortened the sample code to one record, so please ignore the #958 part)
Do not use any of the print_r() or var_dump() on a SimpleXML object, this will abbreviate the output as there is potentially a lot of it. If you want to check the document loaded use asXML()...
echo $xmlConvertedData->asXML();
or to output the one elements language...
echo $xmlConvertedData->event[0]->details->names->translation['language'];
( You also need to correct the last element of the sample - </events>)
I have an XML file similar to the text below:
<?xml version="1.0" standalone="yes"?>
<Calendar xmlns="urn:AvmInterchangeSchema-Calendario-1.0">
<Date>
<Day>12/04/2017</Day>
<TypesDay>
<Type>Test 1</Type>
<Type>Test 2</Type>
<Type>Test 3</Type>
</TypesDay>
</Date>
</Calendar>
And I am using this Xpath to select the nodes:
$xml = simplexml_load_file("file.xml");
$response = $xml->xpath('//*[text()="'.date("d/m/Y").'"]');
How can I take the "TypesDay" entries if that condition is met?
I hope not to create duplicates ... I'm going crazy for hours and it will definitely be a trivial thing.
There are a few approaches to do this. First of all, you should register the namespace:
$xml->registerXPathNamespace('x', 'urn:AvmInterchangeSchema-Calendario-1.0');
I'm assuming that your node name with the value 12/04/2017 could change.
First
Find a node called TypesDay inside the named namespace x that the parent node has a child node with value 12/04/2017
$response = $xml->xpath('//*[*[text()="'.date("d/m/Y").'"]]/x:TypesDay');
Second
Find a node called TypesDay inside the named namespace x that is sibling of node with value 12/04/2017
$response = $xml->xpath('//*[text()="'.date("d/m/Y").'"]/following-sibling::x:TypesDay');
The result for both is:
array(1) {
[0]=>
object(SimpleXMLElement)#2 (1) {
["Type"]=>
array(3) {
[0]=>
string(6) "Test 1"
[1]=>
string(6) "Test 2"
[2]=>
string(6) "Test 3"
}
}
}
After all, if you want only the entries, just add the next level /x:Type:
$response = $xml->xpath('//*[*[text()="'.date("d/m/Y").'"]]/x:TypesDay/x:Type');
Or:
$response = $xml->xpath('//*[text()="'.date("d/m/Y").'"]/following-sibling::x:TypesDay/x:Type');
Result:
array(3) {
[0]=>
object(SimpleXMLElement)#3 (1) {
[0]=>
string(6) "Test 1"
}
[1]=>
object(SimpleXMLElement)#4 (1) {
[0]=>
string(6) "Test 2"
}
[2]=>
object(SimpleXMLElement)#5 (1) {
[0]=>
string(6) "Test 3"
}
}
I need som help reading a XML that has namespaces.
I can read file with out any namepaces but not with namespaces..
XML sample:
<?xml version="1.0" encoding="utf-8"?>
<OrderResponse xmlns:cac="urn:basic:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:xsi="http://www.w3.org/" xmlns:cbc="urn:basic:names:specification:ubl:schema:xsd:BasicComponents-2" xmlns="urn:basic:names:specification:ubl:schema:xsd:OrderResponse-2">
<cbc:UBLVersionID>2.1</cbc:UBLVersionID>
<cbc:AccountingCostCode>TESTER TEST</cbc:AccountingCostCode>
<cac:OrderReference>
<cbc:ID>100067010</cbc:ID>
<cbc:IssueDate>2016-06-15</cbc:IssueDate>
<cbc:OrderTypeCode>EDI</cbc:OrderTypeCode>
</cac:OrderReference>
</OrderResponse>
I need to get the value of the ..
Im trying do it with DomDocument.
Here is my code:
function SearchXMLID($xml){
var_dump($xml);
$doc = new DOMDocument();
$doc->load($xml);
$id = $doc->getElementsByTagNameNS('urn:basic:names:specification:ubl:schema:xsd:CommonAggregateComponents-2','cbc:ID');
foreach($id as $i){
echo "<pre>";var_dump('NS',$i->nodeValue,PHP_EOL);"</pre>";
}
}
$files = glob('dataXMl/*xml');
echo "<pre>";var_dump($files,PHP_EOL);"</pre>";
foreach($files as $f){
SearchXMLID($f);
}
This code works but is getting all namespaces with 'cbc:' and stores the in a string..
array(1) {
[0]=>
string(17) "dataXMl/test1.xml"
}
string(1) "
"
string(17) "dataXMl/test1.xml"
string(2) "NS"
string(40) "
100000050
2016-06-15
EDI
"
string(1) "
"
It gets all tags with the namespace 'cbc'.. but i want to get the tag 'cbc:ID' only.
What am i doing wrong?
I'm no expert with php coding but my gut tells me that both of your parameters for getElementsByTagNameNS are wrong.
Try this:
$id = $doc->getElementsByTagNameNS('urn:basic:names:specification:ubl:schema:xsd:BasicComponents-2','ID');
i.e. use the correct namespace-uri: "urn:basic:names:specification:ubl:schema:xsd:BasicComponents-2"
and drop the cbc prefix
I have seen several similar questions, but not found the exact answer I'm looking for. It may be that the functionality I am looking for does not exist.
If I do an xpath query that results in an array of objects, but each object only holds one value, a string, I'd like to quickly convert that into an array of strings. Obviously I can do a foreach on the object and push the string value onto a new array, but if there is a built in function I'm not thinking of, please let me know.
example:
array(3) {
[0]=>
object(SimpleXMLElement)#24 (1) {
[0]=>
string(20) "Network Media Player"
}
[1]=>
object(SimpleXMLElement)#25 (1) {
[0]=>
string(12) "Music Player"
}
[2]=>
object(SimpleXMLElement)#26 (1) {
[0]=>
string(8) "Juke Box"
}
}
I'd like that to become
array('Network Media Player','Music Player','Juke Box')
Here's my test :
<pre><?php
$xml = "<data>
<item>
<value>Network Media Player</value>
</item>
<item>
<value>Music Player</value>
</item>
<item>
<value>Jukebox Player</value>
</item>
</data>";
$sx = simplexml_load_string($xml);
print_r($sx);
print_r(explode("|",implode("|",$sx->xpath("//data/item/value"))));
?></pre>
and here's the result : http://codepad.org/ZkaWpzMc
Working with PHP Xpath trying to quickly pull certain links within a html page.
The following will find all href links on mypage.html:
$nodes = $x->query("//a[#href]");
Whereas the following will find all href links where the description matches my needle:
$nodes = $x->query("//a[contains(#href,'click me')]");
What I am trying to achieve is matching on the href itself, more specific finding url's that contain certain parameters. Is that possible within a Xpath query or should I just start manipulating the output from the first Xpath query?
Not sure I understand the question correctly, but the second XPath expression already does what you are describing. It does not match against the text node of the A element, but the href attribute:
$html = <<< HTML
<ul>
<li>
Description
</li>
<li>
Description
</li>
</ul>
HTML;
$xml = simplexml_load_string($html);
$list = $xml->xpath("//a[contains(#href,'foo')]");
Outputs:
array(1) {
[0]=>
object(SimpleXMLElement)#2 (2) {
["#attributes"]=>
array(1) {
["href"]=>
string(31) "http://example.com/page?foo=bar"
}
[0]=>
string(11) "Description"
}
}
As you can see, the returned NodeList contains only the A element with href containing foo (which I understand is what you are looking for). It contans the entire element, because the XPath translates to Fetch all A elements with href attribute containing foo. You would then access the attribute with
echo $list[0]['href'] // gives "http://example.com/page?foo=bar"
If you only want to return the attribute itself, you'd have to do
//a[contains(#href,'foo')]/#href
Note that in SimpleXml, this would return a SimpleXml element though:
array(1) {
[0]=>
object(SimpleXMLElement)#3 (1) {
["#attributes"]=>
array(1) {
["href"]=>
string(31) "http://example.com/page?foo=bar"
}
}
}
but you can output the URL now by
echo $list[0] // gives "http://example.com/page?foo=bar"