My file xml:
<pasaz:Envelope>
<pasaz:Body>
<loadOffe>
<offe>
<off>
<id>120023</id>
<name>my name John</name>
<name>Test</name>
</off>
</offe>
</loadOffe>
</pasaz:Body>
</pasaz:Envelope>
How to view a php (id and name).
If you're just looking for a simple way to extract the contents of a tag, but don't want to go to all the trouble of parsing the XML properly, you could do something like this:
$xml = ""; // your xml data as a string
function get_tag_contents($xml, $tagName) {
$startPosition = strpos($xml, "<" . $tagName . ">");
$endPosition = strpos($xml, "</" . $tagName . ">");
$length = $endPosition - ($startPosition + 1);
return substr($xml, $startPosition, $length);
}
$id = get_tag_contents($xml, "id");
$name = get_tag_contents($xml, "name");
This assumes you haven't assigned any attributes to your tags, and that each tag is unique (in the example you gave us I noted two "name" tags, and if you want both you'll need to make this solution a bit more robust or do proper XML parsing).
How to get all items?
Example (does not work ..)
$pliks = simplexml_load_file("file.xml");
foreach ($pliks->children('pasaz', true) as $body)
{
foreach ($body->children() as $loadOffe)
{
if ($loadOffe->offe->off) {
echo "<p>id: $loadOffe->id</p>";
echo "$id->id";
echo "<p>name: <b>$name->name</b></p>";
}
}
// echo $loadOffe->offe->off->id;
}
As Marc B suggested in his comment you should use DOM, either use getElementsByTagName() or DOMXPath, example for getElementaByTagName():
$dom = new DOMDocument;
$dom->loadXML($xml);
$ids = $dom->getElementsByTagName('id');
if( $ids || !$ids->length){
throw new Exception( 'Id not found');
}
return $ids->item(0);
Related
I want to get the link and scrape its content but I can';t event reach there. What's wrong with my nested selector?
my php
$dom = file_get_html('http://mojim.com/%E5%BF%83%E8%B7%B3.html?t3');
$tables = $dom->find('.iB');
$firstRow = $tables->find('tr',1)->find('td',4);
foreach ($firstRow as $value) {
echo $value;
}
?>
here is how the DOM look like
You just have a problem on pointing/traversing the correct element.
Example:
$dom = file_get_html('http://mojim.com/%E5%BF%83%E8%B7%B3.html?t3');
$firstRow = $dom->find('table.iB', 0)->find('tr', 1)->find('td', 3);
$link = $firstRow->find('a', 0);
echo $link->href . '<br/>' . $link->title;
Should output:
/twy100015x34x8.htm
心跳 歌詞 王力宏
Hello I'm new with domnode and i'm trying to check the values from an xml tree which loads ok.
Here is my code but I dont understand why is not working.
private function createCSV($xml, $f)
{
foreach ($xml->getElementsByTagName('*') as $item)
{
$hasChild = $item->hasChildNodes() ? true : false;
if(!$hasChild)
{
//echo 'Doesn\'t have children';
echo 'Value: ' . $item->nodeValue;
}
else
{
//echo 'Has children';
$this->createCSV($item, $f);
}
}
}
$item->nodeValue doesnt print anything to the browser.
I read the documentation but I can't see any mistake.
PS. $item->tagname doesnt work either.
UPDATE
whe using this: echo $item->ownerDocument->saveHTML($item);
I get the tags listed but i dont get the data inside(between the tags) like innerHTML in javascript.
UPDATE
sample xml data : http://pastebin.com/dkuUUC0Q
Text nodes are also considered child nodes, but you're only iterating element nodes (get Elements ByTagName). Because of this you're almost never getting into the 2nd condition.
Try this:
if(!$xml->hasChildNodes()){
printf('Value: %s', $xml->nodeValue);
return;
}
foreach($xml->childNodes as $item)
$this->createCSV($item, $f);
XPath version:
$xpath = new DOMXPath($xml);
$text = $xpath->query('//text()[normalize-space()]');
foreach($text as $node)
printf('Value: %s', $node->nodeValue);
We have the following code that lists the xpaths where $value is found.
We have detected for a given URL (see on picture) a non standard tag td1 which in addition doesn't have a closing tag. Probably the site developers have put that there intentionally, as you see in the screen shot below.
This element creates problems identifying the corect XPath for nodes.
A broken Xpath example :
/html/body/div[2]/div[2]/table/tr[2]/td/table/tr[1]/td[2]/table/tr[2]/td[2]/table[3]/tr[2]/**td1**/td[2]/span/u[1]
(as you see td1 is identified and chained in the Xpath)
We think by removing this element it helps us to build the valid XPath we are after.
A valid example is
/html/body/div[2]/div[2]/table/tr[2]/td/table/tr[1]/td[2]/table/tr[2]/td[2]/table[3]/tr[2]/td[2]/span/u[1]
How can we remove prior loading in DOMXpath? Do you have some other approach?
We would like to remove all the invalid tags which may be other than td1, as h8, diw, etc...
private function extract($url, $value) {
$dom = new DOMDocument();
$file = 'content.txt';
//$current = file_get_contents($url);
$current = CurlTool::downloadFile($url, $file);
//file_put_contents($file, $current);
#$dom->loadHTMLFile($current);
//use DOMXpath to navigate the html with the DOM
$dom_xpath = new DOMXpath($dom);
$elements = $dom_xpath->query("//*[text()[contains(., '" . $value . "')]]");
var_dump($elements);
if (!is_null($elements)) {
foreach ($elements as $element) {
var_dump($element);
echo "\n1.[" . $element->nodeName . "]\n";
$nodes = $element->childNodes;
foreach ($nodes as $node) {
if( ($node->nodeValue != null) && ($node->nodeValue === $value) ) {
echo '2.' . $node->nodeValue . "\n";
$xpath = preg_replace("/\/text\(\)/", "", $node->getNodePath());
echo '3.' . $xpath . "\n";
}
}
}
}
}
You could use XPath to find the offending nodes and remove them, while promoting its children into its place in the DOM. Then your paths will be correct.
$dom_xpath = new DOMXpath($dom);
$results = $dom_xpath->query('//td1'); // (or any offending element)
foreach ($results as $invalidNode)
{
$parentNode = $invalidNode->parentNode;
while ($invalidNode->childNodes)
{
$firstChild = $invalidNode->firstChild;
$parentNode->insertBefore($firstChild,$invalidNode);
}
$parentNode->removeChild($invalidNode);
}
EDIT:
You could also build a list of offending elements by using a list of valid elements and negating it.
// Build list manually from the HTML spec:
// See: http://www.w3.org/TR/html5/section-index.html#elements-1
$validTags = array();
// Convert list to XPath:
$validTagsStr = '';
foreach ($validTags as $tag)
{
if ($validTagsStr)
{ $validTagsStr .= ' or '; }
$validTagsStr .= 'self::'.$tag;
}
$results = $dom_xpath->query('//*[not('.$validTagsStr.')');
Sooo... perhaps str_replace($current, "<td1 va-laign=\"top\">", "") could do the trick?
I've been searching for a solution to this but haven't found quite the right thing yet.
The situation is this:
I need to find all links on a page with a given class (say class="tracker") and then append query string values on the end, so when a user loads a page, those certain links are updated with some dynamic information.
I know how this can be done with Javascript, but I'd really like to adapt it to run server side instead. I'm quite new to PHP, but from the looks of it, XPath might be what I'm looking for but I haven't found a suitable example to get started with. Is there anything like GetElementByClass?
Any help would be greatly appreciated!
Shadowise
Is there anything like GetElementByClass?
Here is an implementation I whipped up...
function getElementsByClassName(DOMDocument $domNode, $className) {
$elements = $domNode->getElementsByTagName('*');
$matches = array();
foreach($elements as $element) {
if ( ! $element->hasAttribute('class')) {
continue;
}
$classes = preg_split('/\s+/', $element->getAttribute('class'));
if ( ! in_array($className, $classes)) {
continue;
}
$matches[] = $element;
}
return $matches;
}
This version doesn't rely on the helper function above.
$str = '<body>
a
a
a
a
</body>
';
$dom = new DOMDocument;
$dom->loadHTML($str);
$anchors = $dom->getElementsByTagName('body')->item(0)->getElementsByTagName('a');
foreach($anchors as $anchor) {
if ( ! $anchor->hasAttribute('class')) {
continue;
}
$classes = preg_split('/\s+/', $anchor->getAttribute('class'));
if ( ! in_array('tracker', $classes)) {
continue;
}
$href = $anchor->getAttribute('href');
$url = parse_url($href);
$attach = 'stackoverflow=true';
if (isset($url['query'])) {
$href .= '&' . $attach;
} else {
$href .= '?' . $attach;
}
$anchor->setAttribute('href', $href);
}
echo $dom->saveHTML();
Output
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
a
a
a
a
</body></html>
I need to find all links on a page
with a given class (say
class="tracker")
[...]
I'm quite new to PHP, but from the
looks of it, XPath might be what I'm
looking for but I haven't found a
suitable example to get started with.
Is there anything like
GetElementByClass?
This XPath 1.0 expression:
//a[contains(
concat(' ',normalize-space(#class),' '),
' tracker '
)
]
A bit shorter, using xpath:
$dom = new DomDocument();
$dom->loadXml('<?xml version="1.0" encoding="UTF-8" ?>
<root>
label
label
label
label
label
label
label
</root>');
$xpath = new DomXPath($dom);
foreach ($xpath->query('//a[contains(#class, "tracker")]') as $node) {
if (preg_match('/\btracker\b/', $node->getAttribute('class'))) {
$node->setAttribute(
'href',
$node->getAttribute('href') . '#some_extra'
);
}
}
header('Content-Type: text/xml; charset"UTF-8"');
echo $dom->saveXml();
I'm using DOMDocument to generate a new XML file and I would like for the output of the file to be indented nicely so that it's easy to follow for a human reader.
For example, when DOMDocument outputs this data:
<?xml version="1.0"?>
<this attr="that"><foo>lkjalksjdlakjdlkasd</foo><foo>lkjlkasjlkajklajslk</foo></this>
I want the XML file to be:
<?xml version="1.0"?>
<this attr="that">
<foo>lkjalksjdlakjdlkasd</foo>
<foo>lkjlkasjlkajklajslk</foo>
</this>
I've been searching around looking for answers, and everything that I've found seems to say to try to control the white space this way:
$foo = new DOMDocument();
$foo->preserveWhiteSpace = false;
$foo->formatOutput = true;
But this does not seem to do anything. Perhaps this only works when reading XML? Keep in mind I'm trying to write new documents.
Is there anything built-in to DOMDocument to do this? Or a function that can accomplish this easily?
DomDocument will do the trick, I personally spent couple of hours Googling and trying to figure this out and I noted that if you use
$xmlDoc = new DOMDocument ();
$xmlDoc->loadXML ( $xml );
$xmlDoc->preserveWhiteSpace = false;
$xmlDoc->formatOutput = true;
$xmlDoc->save($xml_file);
In that order, It just doesn't work but, if you use the same code but in this order:
$xmlDoc = new DOMDocument ();
$xmlDoc->preserveWhiteSpace = false;
$xmlDoc->formatOutput = true;
$xmlDoc->loadXML ( $xml );
$xmlDoc->save($archivoxml);
Works like a charm, hope this helps
After some help from John and playing around with this on my own, it seems that even DOMDocument's inherent support for formatting didn't meet my needs. So, I decided to write my own indentation function.
This is a pretty crude function that I just threw together quickly, so if anyone has any optimization tips or anything to say about it in general, I'd be glad to hear it!
function indent($text)
{
// Create new lines where necessary
$find = array('>', '</', "\n\n");
$replace = array(">\n", "\n</", "\n");
$text = str_replace($find, $replace, $text);
$text = trim($text); // for the \n that was added after the final tag
$text_array = explode("\n", $text);
$open_tags = 0;
foreach ($text_array AS $key => $line)
{
if (($key == 0) || ($key == 1)) // The first line shouldn't affect the indentation
$tabs = '';
else
{
for ($i = 1; $i <= $open_tags; $i++)
$tabs .= "\t";
}
if ($key != 0)
{
if ((strpos($line, '</') === false) && (strpos($line, '>') !== false))
$open_tags++;
else if ($open_tags > 0)
$open_tags--;
}
$new_array[] = $tabs . $line;
unset($tabs);
}
$indented_text = implode("\n", $new_array);
return $indented_text;
}
I have tried running the code below setting formatOutput and preserveWhiteSpace in different ways, and the only member that has any effect on the output is formatOutput. Can you run the script below and see if it works?
<?php
echo "<pre>";
$foo = new DOMDocument();
//$foo->preserveWhiteSpace = false;
$foo->formatOutput = true;
$root = $foo->createElement("root");
$root->setAttribute("attr", "that");
$bar = $foo->createElement("bar", "some text in bar");
$baz = $foo->createElement("baz", "some text in baz");
$foo->appendChild($root);
$root->appendChild($bar);
$root->appendChild($baz);
echo htmlspecialchars($foo->saveXML());
echo "</pre>";
?>
Which method do you call when printing the xml?
I use this:
$doc = new DOMDocument('1.0', 'utf-8');
$root = $doc->createElement('root');
$doc->appendChild($root);
(...)
$doc->formatOutput = true;
$doc->saveXML($root);
It works perfectly but prints out only the element, so you must print the <?xml ... ?> part manually..
Most answers in this topic deal with xml text flow.
Here is another approach using the dom functionalities to perform the indentation job.
The loadXML() dom method imports indentation characters present in the xml source as text nodes. The idea is to remove such text nodes from the dom and then recreate correctly formatted ones (see comments in the code below for more details).
The xmlIndent() function is implemented as a method of the indentDomDocument class, which is inherited from domDocument.
Below is a complete example of how to use it :
$dom = new indentDomDocument("1.0");
$xml = file_get_contents("books.xml");
$dom->loadXML($xml);
$dom->xmlIndent();
echo $dom->saveXML();
class indentDomDocument extends domDocument {
public function xmlIndent() {
// Retrieve all text nodes using XPath
$x = new DOMXPath($this);
$nodeList = $x->query("//text()");
foreach($nodeList as $node) {
// 1. "Trim" each text node by removing its leading and trailing spaces and newlines.
$node->nodeValue = preg_replace("/^[\s\r\n]+/", "", $node->nodeValue);
$node->nodeValue = preg_replace("/[\s\r\n]+$/", "", $node->nodeValue);
// 2. Resulting text node may have become "empty" (zero length nodeValue) after trim. If so, remove it from the dom.
if(strlen($node->nodeValue) == 0) $node->parentNode->removeChild($node);
}
// 3. Starting from root (documentElement), recursively indent each node.
$this->xmlIndentRecursive($this->documentElement, 0);
} // end function xmlIndent
private function xmlIndentRecursive($currentNode, $depth) {
$indentCurrent = true;
if(($currentNode->nodeType == XML_TEXT_NODE) && ($currentNode->parentNode->childNodes->length == 1)) {
// A text node being the unique child of its parent will not be indented.
// In this special case, we must tell the parent node not to indent its closing tag.
$indentCurrent = false;
}
if($indentCurrent && $depth > 0) {
// Indenting a node consists of inserting before it a new text node
// containing a newline followed by a number of tabs corresponding
// to the node depth.
$textNode = $this->createTextNode("\n" . str_repeat("\t", $depth));
$currentNode->parentNode->insertBefore($textNode, $currentNode);
}
if($currentNode->childNodes) {
$indentClosingTag = false;
foreach($currentNode->childNodes as $childNode) $indentClosingTag = $this->xmlIndentRecursive($childNode, $depth+1);
if($indentClosingTag) {
// If children have been indented, then the closing tag
// of the current node must also be indented.
$textNode = $this->createTextNode("\n" . str_repeat("\t", $depth));
$currentNode->appendChild($textNode);
}
}
return $indentCurrent;
} // end function xmlIndentRecursive
} // end class indentDomDocument
Yo peeps,
just found out that apparently, a root XML element may not contain text children. This is nonintuitive a. f. But apparently, this is the reason that, for instance,
$x = new \DOMDocument;
$x -> preserveWhiteSpace = false;
$x -> formatOutput = true;
$x -> loadXML('<root>a<b>c</b></root>');
echo $x -> saveXML();
will fail to indent.
https://bugs.php.net/bug.php?id=54972
So there you go, h. t. h. et c.
header("Content-Type: text/xml");
$str = "";
$str .= "<customer>";
$str .= "<offer>";
$str .= "<opened></opened>";
$str .= "<redeemed></redeemed>";
$str .= "</offer>";
echo $str .= "</customer>";
If you are using any extension other than .xml then first set the header Content-Type header to the correct value.