PHP DOMDocument generates namespace declarations on wrong elements

PHP DOMDocument generates namespace declarations on wrong elements - php

Here's the PHP code Im using from http://pastebin.com/7FBysx2X
$doc = new DOMDocument('1.0', 'UTF-8');
$xns = 'http://www.w3.org/2000/xmlns/';
$mns = 'http://example.com/aBc/2/';
$ons = 'http://example.com/test/2005/something';
$ns = 'http://example.com/main/';
$firstChild = $doc->createElement('firstChild');
$firstChild->setAttributeNS($xns, 'xmlns:cns1', $mns);
$firstChild->setAttributeNS($xns, 'xmlns:i', $ons);
$elements = $doc->createElementNS($mns, 'cns1:elements');
for($i = 0; $i < 3; $i++) {
$e = $doc->createElementNS($mns, 'cns1:element');
for($k = 0; $k < 2; $k++) {
$r = rand(100, 999);
$value = round(($r*rand(1,9))/rand(1,9), 2);
$ce = $doc->createElementNS($mns, "cns1:elementValue$r", $value);
$e->appendChild($ce);
}
$elements->appendChild($e);
}
$firstChild->appendChild($elements);
$otherTag = $doc->createElementNS($mns, 'cns1:otherTag', 'some_value');
$emptyTag = $doc->createElementNS($mns, 'cns1:emptyTag');
$emptyTag->setAttributeNS($ons, 'i:nil', 'true');
$firstChild->appendChild($otherTag);
$firstChild->appendChild($emptyTag);
$main = $doc->createElementNS($ns, 'main');
$main->appendChild($firstChild);
$doc->appendChild($main);
header('Content-Type: text/xml');
echo $doc->saveXML();
The above code generates XML like this:
<?xml version="1.0" encoding="UTF-8"?>
<main xmlns:cns1="http://example.com/aBc/2/" xmlns:i="http://example.com/test/2005/something" xmlns="http://example.com/main/">
<firstChild xmlns:cns1="http://example.com/aBc/2/" xmlns:i="http://example.com/test/2005/something">
<cns1:elements>
<cns1:element>
<cns1:elementValue303>101</cns1:elementValue303>
<cns1:elementValue608>304</cns1:elementValue608>
</cns1:element>
<cns1:element>
<cns1:elementValue735>147</cns1:elementValue735>
<cns1:elementValue901>4505</cns1:elementValue901>
</cns1:element>
</cns1:elements>
<cns1:otherTag>some_value</cns1:otherTag>
<cns1:emptyTag i:nil="true"/>
</firstChild>
</main>
Document is expected to look like this:
<?xml version="1.0" encoding="UTF-8"?>
<main xmlns="http://example.com/main/">
<firstChild xmlns:cns1="http://example.com/aBc/2/" xmlns:i="http://example.com/test/2005/something">
<cns1:elements>
<cns1:element>
<cns1:elementValue303>101</cns1:elementValue303>
<cns1:elementValue608>304</cns1:elementValue608>
</cns1:element>
<cns1:element>
<cns1:elementValue735>147</cns1:elementValue735>
<cns1:elementValue901>4505</cns1:elementValue901>
</cns1:element>
</cns1:elements>
<cns1:otherTag>some_value</cns1:otherTag>
<cns1:emptyTag i:nil="true"/>
</firstChild>
</main>
The problem is at <main> tag. Why it has cns1 and i namespace declarations? They should be only at firstChild element. What i need to change to get needed structure?

This is caused by adding child nodes to node which is not already added to document.
Changing code to this:
$doc = new DOMDocument('1.0', 'UTF-8');
$doc->formatOutput = true;
$xns = 'http://www.w3.org/2000/xmlns/';
$mns = 'http://example.com/aBc/2/';
$ons = 'http://example.com/test/2005/something';
$ns = 'http://example.com/main/';
$main = $doc->createElementNS($ns, 'main');
$doc->appendChild($main);
$firstChild = $doc->createElement('firstChild');
$firstChild->setAttributeNS($xns, 'xmlns:cns1', $mns);
$firstChild->setAttributeNS($xns, 'xmlns:i', $ons);
$doc->getElementsByTagName('main')->item(0)->appendChild($firstChild);
$elements = $doc->createElementNS($mns, 'cns1:elements');
$doc->getElementsByTagName('firstChild')->item(0)->appendChild($elements);
for($i = 0; $i < 3; $i++) {
$e = $doc->createElementNS($mns, 'cns1:element');
$doc->getElementsByTagName('elements')->item(0)->appendChild($e);
for($k = 0; $k < 2; $k++) {
$r = rand(100, 999);
$value = round(($r*rand(1,9))/rand(1,9), 2);
$ce = $doc->createElementNS($mns, "cns1:elementValue$r", $value);
$doc->getElementsByTagName('element')->item($i)->appendChild($ce);
}
}
$otherTag = $doc->createElementNS($mns, 'cns1:otherTag', 'some_value');
$emptyTag = $doc->createElementNS($mns, 'cns1:emptyTag');
$emptyTag->setAttributeNS($ons, 'i:nil', 'true');
$doc->getElementsByTagName('firstChild')->item(0)->appendChild($otherTag);
$doc->getElementsByTagName('firstChild')->item(0)->appendChild($emptyTag);
echo $doc->saveXML();
Produces XML which looks exactly like your expected one. Maybe there is more 'pretty' or 'proper' way to do this, but for sure this one is working.

Related

Got stuck parsing tabular content from a website

I've written a script in PHP to get the tabular data from a webpage. When I execute my script I can get them in a single column. However, I wish to parse them as a list, as in how they look like in that webpage.
Website link
To be clearer:
My current output are like:
978
EMU
EUR
1
118.2078
36
Australija
AUD
1
73.1439
My expected output are like:
['978', 'EMU', 'EUR', '1', '118.2078']
['36', 'Australija', 'AUD', '1', '73.1439']
['124', 'Kanada', 'CAD', '1', '77.7325']
['156', 'Kina', 'CNY', '1', '14.6565']
['191', 'Hrvatska', 'HRK', '1', '15.9097']
This is my try so far:
<?php
$url = "http://www.nbs.rs/kursnaListaModul/srednjiKurs.faces?lang=lat";
$dom = new DomDocument;
$dom->loadHtmlFile($url);
$xpath = new DomXPath($dom);
$rowData = array();
foreach ($xpath->query('//tbody[#id="index:srednjiKursList:tbody_element"]//tr') as $node) {
foreach ($xpath->query('td', $node) as $cell) {
$rowData[] = $cell->nodeValue;
}
}
foreach($rowData as $rows){
echo $rows . "<br/>";
}
?>

You are adding each element one at a time to the output array, you probably wanted to build up a row at a time and output that...
$rowData = array();
foreach ($xpath->query('//tbody[#id="index:srednjiKursList:tbody_element"]//tr') as $node) {
$row = array();
foreach ($xpath->query('td', $node) as $cell) {
$row[] = $cell->nodeValue;
}
$rowData[] = $row;
}
foreach($rowData as $rows){
print_r($rows); // Format the data as needed
}

Try this.
$htmlContent = file_get_contents("http://www.nbs.rs/kursnaListaModul/srednjiKurs.faces?lang=lat");
$DOM = new DOMDocument();
$DOM->loadHTML($htmlContent);
$Header = $DOM->getElementsByTagName('th');
$Detail = $DOM->getElementsByTagName('td');
//#Get header name of the table
foreach($Header as $NodeHeader)
{
$aDataTableHeaderHTML[] = trim($NodeHeader->textContent);
}
//#Get row data/detail table without header name as key
$i = 0;
$j = 0;
foreach($Detail as $sNodeDetail)
{
$aDataTableDetailHTML[$j][] = trim($sNodeDetail->textContent);
$i = $i + 1;
$j = $i % count($aDataTableHeaderHTML) == 0 ? $j + 1 : $j;
}
//print_r($aDataTableDetailHTML)
//#Get row data/detail table with header name as key and outer array index as row number
for($i = 0; $i < count($aDataTableDetailHTML); $i++)
{
for($j = 0; $j < count($aDataTableHeaderHTML); $j++)
{
#$aTempData[$i][$aDataTableHeaderHTML[$j]] = $aDataTableDetailHTML[$i][$j];
}
}
$aDataTableDetailHTML = $aTempData; unset($aTempData);
print_r($aDataTableDetailHTML);

Array is bigger then my XML tags? (PHP)

I want to add a child with a specified value which I get from my array. But my problem is that my array is bigger then my amount of XML products...
This is why I get this error message:
PHP Notice: Undefined offset: 1589 in C:\Users\Jan\PhpstormProjects\censored\test.php on line 63
PHP Fatal error: Uncaught Error: Call to a member function appendChild() on null in C:\Users\Jan\PhpstormProjects\censored\test.php:64
To check that I made two loops which say me that I have 1588 names and 1588 products, both loops say that. Thats logical!
$markerProduct = $root->getElementsByTagName("product");
for($i = $markerProduct->length - 1; $i >= 0; $i--){
$productCounter = $productCounter + 1;
}
$markerTitle = $root->getElementsByTagName("name");
for($i = 0; $i < $markerTitle->length; $i++){
$nameCounter = $nameCounter + 1;
}
But my array in which I store the specified value for each product is 1589 (0 - 1588) values big... And I don't know.
Can anyone help me and tell me the error?
$searches = ["Steam", "Uplay", "Xbox", "Nintendo", "PS3", "PS4", "PSN"];
function isContaining($searches, $titleTag, $urlTag, $productTag, $path){
$dom = new DOMDocument('1.0', 'utf-8');
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->load($path);
$root = $dom->documentElement;
$markerTitle = $root->getElementsByTagName($titleTag);
$markerURL = $root->getElementsByTagName($urlTag);
$plat = array();
for($i = $markerTitle->length - 1; $i >= 0 ; $i--){
$title = $markerTitle->item($i)->textContent;
$url = $markerURL->item($i)->textContent;
$co = count($searches);
$productFound = false;
for($j = 0; $j < $co; $j++){
if(stripos($title, $searches[$j]) !== false){
if($j > 3){
array_push($plat, "PlayStation");
}else{
array_push($plat, $searches[$j]);
}
$productFound = true;
}elseif(stripos($url, $searches[$j]) !== false){
if($j > 3){
array_push($plat, "PlayStation");
}else{
array_push($plat, $searches[$j]);
}
$productFound = true;
}
}
if($productFound == false){
array_push($plat, "Nothing");
}
}
print_r($plat);
$c = count($plat);
echo $c;
for($i = $c - 1; $i >= 0; $i--){
$node = $dom->createElement('plat', $plat[$c]);
$dom->getElementsByTagName($productTag)->item($i)->appendChild($node);
}
$dom->saveXML();
$dom->save('data/gamesplanet2.xml');
}
Error is in line 63:
$node = $dom->createElement('plat', $plat[$c]);
Greetings and Thank You!

how to write the simple short code for parsing

I'm working on my PHP to get the list of time. I'm using DomDocument to get the time, I want to find a way to reduce the code as I have got the 69 tags of time in my get-listing.php script.
if I use this:
$time1 = $xpath->query("*/span[#id='time1']");
$time2 = $xpath->query("*/span[#id='time2']");
$time3 = $xpath->query("*/span[#id='time3']");
$time4 = $xpath->query("*/span[#id='time4']");
$time5 = $xpath->query("*/span[#id='time5']");
$time6 = $xpath->query("*/span[#id='time6']");
$time7 = $xpath->query("*/span[#id='time7']");
...etc until to get time69
$time69 = $xpath->query("*/span[#id='time69']");
It will be too large for me to write the list of code to parsing the time from the time1 tag to time69 tag.
<?php
ini_set('max_execution_time', 300);
$errmsg_arr = array();
$errflag = false;
function getState($string)
{
$ex = explode(" ",$string);
return $ex[1];
}
$xml .= '<?xml version="1.0" encoding="UTF-8" ?>';
$xml .= '
<tv generator-info-name="www.mysite.com/xmltv">';
$baseUrl = file_get_contents('http://www.mysite.com/get-listing.php');
$domdoc = new DOMDocument();
$domdoc->strictErrorChecking = false;
$domdoc->recover=true;
//#$domdoc->loadHTMLFile($baseUrl);
#$domdoc->loadHTML($baseUrl);
$links = $domdoc->getElementsByTagName('a');
$i = 0;
$count = 0;
$data = array();
foreach($links as $link)
{
//echo $domdoc->saveXML($link);
if($link->getAttribute('href'))
{
if(!$link->hasAttribute('id') || $link->getAttribute('id')!='streams')
{
$url = str_replace("rtmp://", "", $link->getAttribute('href'));
$url = str_replace(" ", "%20", $link->getAttribute('href'));
//echo $url;
//echo "<br>";
$sdoc = new DOMDocument();
$sdoc->strictErrorChecking = false;
$sdoc->recover=true;
#$sdoc->loadHTMLFile($url);
$time1_span = $sdoc->getElementById('time1');
//$spans = $sdoc->getElementsByTagName('time1');
$query = parse_url($url)['query'];
$channel_split = explode("&", $query)[0];
$channel = urldecode(explode("=",$channel_split)[1]);
$id_split = explode("&", $query)[1];
$my_id = urldecode(explode("=",$id_split)[1]);
$xpath = new DOMXpath($sdoc);
$time1 = $xpath->query("*/span[#id='time1']");
$time2 = $xpath->query("*/span[#id='time2']");
$time3 = $xpath->query("*/span[#id='time3']");
//$time4 = $xpath->query("*/span[#id='time4']");
$array = array(
$time1,$time2,$time3
);
// Save the output format
$DATE_FORMAT_STRING = "YmdHis";
// GET the current STAGE
$current_state = getState($array[0]->nodeValue);
$offset = 0;
$flag = 0;
foreach($array as $time)
{
echo $time->item(0)->nodeValue;
// Get the item state.
//$this_state = getState($time);
$this_state = getState($time->item(0)->nodeValue);
//echo $time->nodeValue;
// check if we past a day?
if($current_state == "PM" && $this_state == "AM")
{
$offset++;
}
$this_unix = strtotime($time->item(0)->nodeValue) + (60 * 60 * 24 * $offset);
$values[] = date($DATE_FORMAT_STRING, $this_unix);
//echo date($DATE_FORMAT_STRING, $this_unix);
echo $values[$count];
echo "<br></br>";
$starttime = $times->nodeValue;
//echo $starttime;
echo "<programme channel='".$my_id." ".$channel." start='".$starttime."' stop='".$stoptime."'>";
/*if($flag>0)
{
//echo "<programme channel='".$my_id." ".$channel." start='".$starttime."' stop='".$stoptime."'>";
$stoptime = $starttime;
$flag=1;
}
else
{
$stoptime = $starttime;
}*/
$current_state = $this_state;
$count++;
}
}
}
}
?>
My question is how do you write the simple way to write the code to make it shorter to get the element id of time1 to time69 using with few line of code?
Edit: I'm getting fatal error when I'm trying to print the list of strings.
The error are jumping on this line:
$time{$i} = $xpath->query("*/span[#id='time".$i."']");
It could be this:
echo $time->item(0)->nodeValue;
Fatal error: Cannot use object of type DOMNodeList as array in /home/myusername/public_html/work_on_this.php on line 57
Here is the update code:
for ($i = 1; $i < 70; $i++)
{
$time{$i} = $xpath->query("*/span[#id='time".$i."']");
$array = array(
$time{$i}
);
foreach($array as $time)
{
echo $time->item(0)->nodeValue;
}
}

Update the code to following,
$time_arr = array();
for ($i = 1; $i < 70; $i++){
$time_arr[] = $xpath->query("*/span[#id='time".$i."']");
}
foreach($time_arr as $time){
echo $time->item(0)->nodeValue;
}
I'm adding the values to an array directly in the for loop instead of assigning them separately.
And also foreach should not be within for loop which gets the time value.

Use a for loop:
for ($i = 1; $i < 70; $i++) {
$time{$i} = $xpath->query("*/span[#id = \"time\" . $i]");
}

how to read only part of an xml file with php xmlreader

I have an RSS xml file that is pretty large, with more than 700 nodes.
I am using XMLReader Iterator library to parse it and display the results as 10 per page.
This is my sample code for parsing xml:
<?php
require('xmlreader-iterators.php');
$xmlFile = 'http://www.example.com/rss.xml';
$reader = new XMLReader();
$reader->open($xmlFile);
$itemIterator = new XMLElementIterator($reader, 'item');
$items = array();
foreach ($itemIterator as $item) {
$xml = $item->asSimpleXML();
$items[] = array(
'title' => (string)$xml->title,
'link' => (string)$xml->link
);
}
// Logic for displaying the array values, based on the current page.
// page = 1 means $items[0] to $items[9]
for($i = 0; $i <= 9; $i++)
{
echo ''.$items[$i]['title'].'<br>';
}
?>
But the problem is that, for every page, i am parsing the entire xml file and then just displaying the corresponding page results, like: if the page is 1, displaying the 1 to 10 nodes, and if the page is 5, displaying 41 to 50 nodes.
It is causing delay in displaying data. Is it possible to read just the nodes corresponding to the requested page? So for the first page, i can read nodes from 1 to 10 positions, instead of parsing all the xml file and then display first 10 nodes. In other words, can i apply a limit while parsing an xml file?
I came across this answer of Gordon that addresses a similar question, but it is using SimpleXML, which is not recommended for parsing large xml files.

use array_splice to extract the portion of array
require ('xmlreader-iterators.php');
$xmlFile = 'http://www.example.com/rss.xml';
$reader = new XMLReader();
$reader->open($xmlFile);
$itemIterator = new XMLElementIterator($reader, 'item');
$items = array();
$curr_page = (0 === (int) $_GET['page']) ? 1 : $_GET['page'];
$pages = 0;
$max = 10;
foreach ($itemIterator as $item) {
$xml = $item->asSimpleXML();
$items[] = array(
'title' => (string) $xml->title,
'link' => (string) $xml->link
);
}
// Take the length of the array
$len = count($items);
// Get the number of pages
$pages = ceil($len / $max);
// Calculate the starting point
$start = ceil(($curr_page - 1) * $max);
// return the portion of results
$arrayItem = array_slice($items, $start, $max);
for ($i = 0; $i <= 9; $i ++) {
echo '' . $arrayItem[$i]['title'] . '<br>';
}
// pagining stuff
for ($i = 1; $i <= $pages; $i ++) {
if ($i === (int) $page) {
// current page
$str[] = sprintf('<span style="color:red">%d</span>', $i);
} else {
$str[] = sprintf('%d', $i, $i);
}
}
echo implode('', $str);

Use cache in this case, since you cannot parse partially an XML.

Check this
<?php
if($_GET['page']!=""){
$startPagenew = $_GET['page'];
$startPage = $startPagenew-1;
}
else{
$startPage = 0;
}
$perPage = 10;
$currentRecord = 0;
$xml = new SimpleXMLElement('http://sports.yahoo.com/mlb/teams/bos/rss.xml', 0, true);
echo $startPage * $perPage;
foreach($xml->channel->item as $key => $value)
{
$currentRecord += 1;
if($currentRecord > ($startPage * $perPage) && $currentRecord < ($startPage * $perPage + $perPage)){
echo "$value->title";
echo "<br>";
}
}
//and the pagination:
//echo $currentRecord;
for ($i = 1; $i <= ($currentRecord / $perPage); $i++) {
echo("<a href='xmlpagination.php?page=".$i."'>".$i."</a>");
} ?>
Updated
Check this Link
http://www.phpclasses.org/package/5667-PHP-Parse-XML-documents-and-return-arrays-of-elements.html

You can use Dom and Xpath. It should be much faster, since Xpath allows you to select nodes by their position in a list.
<?php
$string = file_get_contents("http://oar.icrisat.org/cgi/exportview/subjects/s1=2E2/RSS2/s1=2E2.xml");
$dom = new DOMDocument('1.0', 'utf-8');
$dom->loadXML($string);
$string = "";
$xpath = new DOMXPath($dom);
$channel = $dom->getElementsByTagName('channel')->item(0);
$numItems = $xpath->evaluate("count(item)", $channel);
// get your paging logic
$start = 10;
$end = 20;
$items = $xpath->evaluate("item[position() >= $start and not(position() > $end)]", $channel);
$count = $start;
foreach($items as $item) {
print_r("\r\n_____Node number $count ");
print_r( $item->nodeName);
$childNodes = $item->childNodes;
foreach($childNodes as $childNode) {
print_r($childNode->nodeValue);
}
$count ++;
}

Is it possible to query the first 5 images with DOMDocument?

Is it possible to query the first 5 images with DOMDocument?
$dom = new DOMDocument;
$list = $dom->query('img');

With XPath You can fetch all images like this:
$xpath = new DOMXPath($dom);
$list = $xpath->query('//img');
Then you limit the results by only iterating over the first five.
for ($i = 0, $n = min(5, $list->length); $i < $n; ++$i) {
$node = $list->item(0);
}
XPath is very versatile thanks to its expression language. However, in this particular case, you may not need all that power and a simple $list = $dom->getElementsByTagName('img') would yield the same result set.

You can use getElementsByTagName to build and array of images:
$dom = new DOMDocument();
$dom->loadHTML($string);
$images = $dom->getElementsByTagName('img');
$result = array();
for ($i=0; $i<5; $i++){
$node = $images->item($i);
if (is_object( $node)){
$result[] = $node->ownerDocument->saveXML($node);
}
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP DOMDocument generates namespace declarations on wrong elements - php

Related

Got stuck parsing tabular content from a website

Array is bigger then my XML tags? (PHP)

how to write the simple short code for parsing

how to read only part of an xml file with php xmlreader

Is it possible to query the first 5 images with DOMDocument?

Categories

Resources