How to pass special characters to xml file - php

I am editing a xml file a particular node file after tha I am saving tha but it contains some special character because of line number 7 of my code
$xml = simplexml_load_file('demo.xml');
$i=2;
foreach($xml->Page as $myPage){
if($myPage['id']==$i) {
$da = "data";
$text = "helloworld";
$myPage->$da ="<![CDATA[{$text}]]>"; //line number
$xml->asXML('demo.xml');
}
how can I put the string as it is in xml file?

SimpleXML does not handle CDATA very well. If you want to write CDATA you need to use the DOM objects. For example:
$xml = new DOMDocument();
$xml->load('demo.xml');
$i = 2;
foreach ($xml->getElementsByTagName('Page') as $page) {
if ($page->attributes->getNamedItem('id')->value == $i) {
$da = 'data';
$text = 'helloworld';
$data = $xml->createElement($da);
$data->appendChild($xml->createCDATASection($text));
$page->appendChild($data);
}
}
If you want to continue to use SimpleXML, you can load just the element you want to write the CDATA into as a DOM object.
$xml = simplexml_load_file('demo.xml');
$i = 2;
foreach ($xml->Page as $page) {
if ($page['id'] == $i) {
$da = 'data';
$text = 'helloworld';
$page->$da = '';
$node = dom_import_simplexml($page->$da);
$dom = $node->ownerDocument;
$node->appendChild($dom->createCDATASection($text));
}
}
$xml->asXML('demo.xml');

Related

How to create looped XML file from HTML in PHP?

I would like to be able to create an XML file from some of the content of a html page. I have tried intensively but seem to miss something.
I have created two arrays, I have setup a DOMdocument and I have prepared to save an XML file on the server... I have tried to make tons of different foreach loops all over the place - but it won't work.
Here is my code:
<?php
$page = file_get_contents('http://www.halfmen.dk/!hmhb8/score.php');
$doc = new DOMDocument();
$doc->loadHTML($page);
$score = $doc->getElementsByTagName('div');
$keyarray = array();
$teamarray = array();
foreach ($score as $value) {
if ($value->getAttribute('class') == 'xml') {
$keyarray[] = $value->firstChild->nodeValue;
$teamarray[] = $value->firstChild->nextSibling->nodeValue;
}
}
print_r($keyarray);
print_r($teamarray);
$doc = new DOMDocument('1.0','utf-8');
$doc->formatOutput = true;
$droot = $doc->createElement('ROOT');
$droot = $doc->appendChild($droot);
$dsection = $doc->createElement('SECTION');
$dsection = $droot->appendChild($dsection);
$dkey = $doc->createElement('KEY');
$dkey = $dsection->appendChild($dkey);
$dteam = $doc->createElement('TEAM');
$dteam = $dsection->appendChild($dteam);
$dkeytext = $doc->createTextNode($keyarray);
$dkeytext = $dkey->appendChild($dkeytext);
$dteamtext = $doc->createTextNode($teamarray);
$dteamtext = $dteam->appendChild($dteamtext);
echo $doc->save('xml/test.xml');
?>
I really like simplicity, thank you.
You need to add each item in one at a time rather than as an array, which is why I build the XML for each div tag rather than as a second pass. I've had to assume that your XML is structured the way I've done it, but this may help you.
$page = file_get_contents('http://www.halfmen.dk/!hmhb8/score.php');
$doc = new DOMDocument();
$doc->loadHTML($page);
$score = $doc->getElementsByTagName('div');
$doc = new DOMDocument('1.0','utf-8');
$doc->formatOutput = true;
$droot = $doc->createElement('ROOT');
$droot = $doc->appendChild($droot);
foreach ($score as $value) {
if ($value->getAttribute('class') == 'xml') {
$dsection = $doc->createElement('SECTION');
$dsection = $droot->appendChild($dsection);
$dkey = $doc->createElement('KEY', $value->firstChild->nodeValue);
$dkey = $dsection->appendChild($dkey);
$dteam = $doc->createElement('TEAM', $value->firstChild->nextSibling->nodeValue);
$dteam = $dsection->appendChild($dteam);
}
}

php DOMDocument createTextNode from blob field(text) not showing data

I am trying to create XML using DOMDocument from database table. All field types are showing in XML node except BLOB Type.
Below what I did:
$rs = ibase_query("SELECT * FROM mytable");
$coln = ibase_num_fields($rs);
$fieldnames = array();
for ($i = 0; $i < $coln; $i++) {
$col_info = ibase_field_info($rs, $i);
$fieldnames[] = array('name' => $col_info['name'], 'type' => $col_info['type']);
}
$doc = new DOMDocument('1.0');
$sth = ibase_query($dbh, $stmt);
$doc->formatOutput = true;
$root = $doc->createElement('FA_ARTIKEL');
$root = $doc->appendChild($root);
while ($row = ibase_fetch_object($sth, IBASE_TEXT)) {
$title = $doc->createElement('RECORD');
$title = $root->appendChild($title);
$text = $doc->createTextNode('');
$text = $title->appendChild($text);
foreach ($fieldnames as $value) {
switch ($value['type']) {
case 'VARCHAR':
$rtitle = $doc->createElement($value['name']);
$rtitle = $title->appendChild($rtitle);
$rtext = $doc->createTextNode($row->$value['name']);
$rtext = $rtitle->appendChild($rtext);
break;
case 'BLOB':
$rbtitle = $doc->createElement($value['name']);
$rbtitle = $title->appendChild($rbtitle);
$rbtext = $doc->createTextNode($row->$value['name']);
$rbtext = $rbtitle->appendChild($rbtext);
break;
default:
if ($row->$value['name']) {
$rtitle = $doc->createElement($value['name']);
$rtitle = $title->appendChild($rtitle);
$rtext = $doc->createTextNode($row->$value['name']);
$rtext = $rtitle->appendChild($rtext);
} else {
$rtitle = $doc->createElement($value['name']);
$rtitle = $title->appendChild($rtitle);
$rtext = $doc->createTextNode('0');
$rtext = $rtitle->appendChild($rtext);
}
break;
}
}
}
Header('Content-type: text/xml');
echo $doc->saveXML() . "\n";
ibase_free_result($sth);
ibase_close($dbh);
I tried with SimpleXMLElement also but it also failed. What I am missing?
My Database is Firebird and I set BLOB fields as
BLOB SUB_TYPE 1 SEGMENT SIZE 16384
PHPs DOMDocument expects UTF-8 strings. It is possible that the blob contains control characters/invalid unicode sequences. Try to put the data that breaks the XML into a variable and reduce your problem to the absolute minimum.
$blobData = $record['blobField'];
$document = new DOMDocument();
$document
->appendChild($document->createElement('foo'))
->appendChild($document->createTextNode($blobData));
echo $document->saveXml();
This way you can see if the blob data is really the problem or merely a symptom.
If the BLOB contains binary data you will need to convert it into a TEXT format. Atom feeds for example urlencode binary data that they want to embed. In this you will need to decode the value in the reading program.

Extracting certain portions of HTML from within PHP

Ok, so I'm writing an application in PHP to check my sites if all the links are valid, so I can update them if I have to.
And I ran into a problem. I've tried to use SimpleXml and DOMDocument objects to extract the tags but when I run the app with a sample site I usually get a ton of errors if I use the SimpleXml object type.
So is there a way to scan the html document for href attributes that's pretty much as simple as using SimpleXml?
<?php
// what I want to do is get a similar effect to the code described below:
foreach($html->html->body->a as $link)
{
// store the $link into a file
foreach($link->attributes() as $attribute=>$value);
{
//procedure to place the href value into a file
}
}
?>
so basically i'm looking for a way to preform the above operation. The thing is I'm currently getting confused as to how should I treat the string that i'm getting with the html code in it...
just to be clear, I'm using the following primitive way of getting the html file:
<?php
$target = "http://www.targeturl.com";
$file_handle = fopen($target, "r");
$a = "";
while (!feof($file_handle)) $a .= fgets($file_handle, 4096);
fclose($file_handle);
?>
Any info would be useful as well as any other language alternatives where the above problem is more elegantly fixed (python, c or c++)
You can use DOMDocument::loadHTML
Here's a bunch of code we use for a HTML parsing tool we wrote.
$target = "http://www.targeturl.com";
$result = file_get_contents($target);
$dom = new DOMDocument;
$dom->preserveWhiteSpace = false;
#$dom->loadHTML($result);
$links = extractLink(getTags( $dom, 'a', ));
function extractLink( $html, $argument = 1 ) {
$href_regex_pattern = '/<a[^>]*?href=[\'"](.*?)[\'"][^>]*?>(.*?)<\/a>/si';
preg_match_all($href_regex_pattern,$html,$matches);
if (count($matches)) {
if (is_array($matches[$argument]) && count($matches[$argument])) {
return $matches[$argument][0];
}
return $matches[1];
} else
function getTags( $dom, $tagName, $element = false, $children = false ) {
$html = '';
$domxpath = new DOMXPath($dom);
$children = ($children) ? "/".$children : '';
$filtered = $domxpath->query("//$tagName" . $children);
$i = 0;
while( $myItem = $filtered->item($i++) ){
$newDom = new DOMDocument;
$newDom->formatOutput = true;
$node = $newDom->importNode( $myItem, true );
$newDom->appendChild($node);
$html[] = $newDom->saveHTML();
}
if ($element !== false && isset($html[$element])) {
return $html[$element];
} else
return $html;
}
You could just use strpos($html, 'href=') and then parse the URL. You could also search for <a or .php

The characters in my HTML saved from DOMdocument become escaped

I have an irritating problem using PHP's DOMdocument. I have loaded HTML, and changed some of the element's attributes. I want to save the changed HTML, and output it.
The strange thing is, when I use ->saveHTML() or ->saveXML() my closing tags' slashes become escaped. I could remove the escaping with regex, but I would like to know if there is any cleaner way...
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML ($roosterHTML);
$dom->preserveWhiteSpace = false;
libxml_clear_errors();
libxml_use_internal_errors(false);
$tables = $dom->getElementsByTagName('table');
$cols = $tables->item(0)->getElementsByTagName('td');
$name = preg_replace("/(\\n|\\r| )/", "", $cols->item(3)->nodeValue);
$sirname = preg_replace("/(\\n|\\r| )/", "", $cols->item(2)->nodeValue);
$class = preg_replace("/(\\n|\\r| )/", "", $cols->item(1)->nodeValue);
$header = "Rooster van $name $sirname ($class)";
$rooster = $tables->item(1);
$firstRow = true;
foreach ($rooster->getElementsByTagName('tr') as $row) {
if ($firstRow) {
$firstRow = false;
continue;
}
$firstCol = true;
foreach ($row->getElementsByTagName('td') as $col) {
if ($firstCol) {
$firstCol = false;
continue;
}
$text = $col->nodeValue;
$col->setAttribute('style','background-color:#FF0');
//$return.= $text;
}
}
$rooster = $dom->saveXML($rooster);
Testing (just click submit, to send a POST value):
http://bit.ly/ymK3DA
No, the escaped is caused by the json
which mean this page is not output HTML but json-alike plain text

extracting anchor values hidden in div tags

From a html page I need to extract the values of v from all anchor links…each anchor link is hidden in some 5 div tags
<a href="/watch?v=value to be retrived&list=blabla&feature=plpp_play_all">
Each v value has 11 characters, for this as of now am trying to read it by character by character like
<?php
$file=fopen("xx.html","r") or exit("Unable to open file!");
$d='v';
$dd='=';
$vd=array();
while (!feof($file))
{
$f=fgetc($file);
if($f==$d)
{
$ff=fgetc($file);
if ($ff==$dd)
{
$idea='';
for($i=0;$i<=10;$i++)
{
$sData = fgetc($file);
$id=$id.$sData;
}
array_push($vd, $id);
That is am getting each character of v and storing it in sData variable and pushing it into id so as to get those 11 characters as a string(id)…
the problem is…searching for the ‘v=’ through the entire html file and if found reading the 11characters and pushing it into a sData array is sucking, it is taking considerable amount of time…so pls help me to sophisticate the things
<?php
function substring(&$string,$start,$end)
{
$pos = strpos(">".$string,$start);
if(! $pos) return "";
$pos--;
$string = substr($string,$pos+strlen($start));
$posend = strpos($string,$end);
$toret = substr($string,0,$posend);
$string = substr($string,$posend);
return $toret;
}
$contents = #file_get_contents("xx.html");
$old="";
$videosArray=array();
while ($old <> $contents)
{
$old = $contents;
$v = substring($contents,"?v=","&");
if($v) $videosArray[] = $v;
}
//$videosArray is array of v's
?>
I would better parse HTML with SimpleXML and XPath:
// Get your page HTML string
$html = file_get_contents('xx.html');
// As per comment by Gordon to suppress invalid markup warnings
libxml_use_internal_errors(true);
// Create SimpleXML object
$doc = new DOMDocument();
$doc->strictErrorChecking = false;
$doc->loadHTML($html);
$xml = simplexml_import_dom($doc);
// Find a nodes
$anchors = $xml->xpath('//a[contains(#href, "v=")]');
foreach ($anchors as $a)
{
$href = (string)$a['href'];
$url = parse_url($href);
parse_str($url['query'], $params);
// $params['v'] contains what we need
$vd[] = $params['v']; // push into array
}
// Clear invalid markup error buffer
libxml_clear_errors();

Categories