Get href value with DOMDocument in PHP - php

Following a file_get_contents, I receive this HTML:
<h1>
Manhattan Skyline
</h1>
I want to get the blablabla.html part only.
How can I parse it with DOMDocument feature in PHP?
Important: the HTML I receive contains more than one <a href="...">.
What I try is:
$page = file_get_contents('https://...');
$dom = new DOMDocument();
$dom->loadHTML($page);
$xp = new DOMXpath($dom);
$url = $xp->query('h1//a[#href=""]');
$url = $url->item(0)->getAttribute('href');
Thanks for your help.

h1//a[#href=""] is looking for an a element with an href attribute with an empty string as the value, whereas your href attribute contains something other than the empty string as the value.
If that's the entire document, then you could use the expression //a.
Otherwise, h1//a should work as well.
If you require the a element to have an href attribute with any kind of value, you could use h1//a[#href].
If the h1 is not at the root of the document, you might want to use //h1 instead. So the last example would become //h1//a[#href].

Related

How to add text in between html tags

I want to add a string in between html tags in php. I'm using php's dom document class and you can only add strings inside html. Here is an example of what I am trying to accomplish
<tag>example</tag><another>sometext</another>
I want to add a string in-between these two tags so it should look like
<tag>example</tag>STRING<another>sometext</another>
I want to be able to seperate these tags so I can use the explode function to split every tag in the html page to an array then traverse them for later use.
You can add a textnode without being or having a tag.
$doc = new DOMDocument();
$tag = $doc->createElement('tag');
$doc->appendChild($tag);
$tag->appendChild($doc->createTextNode('example'));
$node = $doc->createTextNode('STRING');
$doc->appendChild($node);
$another = $doc->createElement('another');
$doc->appendChild($another);
$another->appendChild($doc->createTextNode('sometext'));
echo $doc->saveHTML();
will give
<tag>example</tag>STRING<another>sometext</another>
You need to connect php and html.
Exemple:
<?php echo "<tag>example</tag>.'STRING'.<another>sometext</another> " ?>

Prepend HTML text using DOMDocument without parent container

Let's say I have <p>Text</p>
I'd like to create a function using DOMDocument to be able to insert text, eg:
insertText('<p>Text</p>', '<strong>1.</strong> ')
So that the result was <p><strong>1.</strong> Text<p>
I'm already accessing this paragraph tag, so I think I'm almost there, I just cannot figure out how to append plain text that can be read as HTML
$dom = new DOMDocument();
$question_paragraphs = array();
$dom->loadHTML($str);
$par = $dom->getElementsByTagName('p');
if ($par->length == 1) {
$par->item(0)->setAttribute("class", "first last");
###
### How do I do this here?
###
}
Is it possible to inject text this way?
You can get use the method insertBefore (Official Documentation) as follows
You create your strong element
You insert this node before the text node
$span = $dom->createElement('strong', '1.');
$par->item(0)->insertBefore($span, $par->item(0)->firstChild);
Please note that the second parameter of the insertBefore function is the child to which you want to prepend your tag. So in this case you can use firstChild as your <p> only contains the Text.
This will finally output
<p class="first last"><span>1.</span>Text</p>

Is it possible to get an attribute's value and the text within a node at the same time in XPath 1.0?

I've tried the solution here: Getting attribute using XPath
but it gives me an error.
I have some XHTML like this:
Click me!
I'm recursively parsing the XML and trying to get both the href attribute (link.php) and the link text (Click me!) at the same time.
<?php
$node = $xpath->query('string(self::a/#href) | self::a/text()', $nodes->item(0));
This code throws the following error:
Warning: DOMXPath::query(): Invalid type
If I do either of these two separately they work, but not together:
<?php
$node = $xpath->evaluate('string(self::a/#href)', $nodes->item(0));
$node = $xpath->query('self::a/text()', $nodes->item(0));
If I use the following I get the whole attribute (href="link.php"), not just its value:
<?php
$node = $xpath->query('self::a/#href | self::a/text()', $nodes->item(0));
Is there any way of getting both text values at the same time using XPath 1.0 in PHP?
As suggested by others, you can use concat() (and PHP XPath supports it! see the demo below) to combine value of attribute and content of an element.
The problem with others' suggested XPath probably was, judging from your attempted code i.e the use of self::a, that the context node ($nodes->item(0)) is already the <a> element, so that a/#href relative to current context node means return href attribute of child element a of current element, that's why you got no match. You were correct by using self::a in this case or, alternatively, just . which can be used to reference current context node :
$doc = new DOMDocument();
$xml = <<<XML
<root>
Click me!
</root>
XML;
$doc->loadXML($xml);
$xpath = new DOMXpath($doc);
$nodes = $xpath->query('//a');
$node = $xpath->evaluate('concat(#href, "|", .)', $nodes->item(0));
echo $node;
eval.in demo
output :
link.php|Click me!

PHP DOMXPath problem

$xpath = new DOMXpath($doc);
$res = $xpath->query(".//*[#id='post2679883']/tr[2]/td[2]/div[2]");
foreach( $res as $obj ) {
var_dump($obj->nodeValue);
}
I need to take all the items in the id with the word "post".
Example:
<div id="post2242424">trarata</div>
<div id="post114525">trarata</div>
<div id="post8568686">trarata</div>
Question number two:
I need to get this elements with HTML tags, but $obj->nodeValue returns text without html tags.
You could use the xpath function starts-with to filter the nodes in your XPath if all the nodes you want start with "post". For example;
$xpath->query(".//*[starts-with(#id, 'post')]/tr[2]/td[2]/div[2]");
For the second part, I think has been answered already - PHP DOMDocument stripping HTML tags

Add an attribute to an HTML element

I can't quite figure it out, I'm looking for some code that will add an attribute to an HTML element.
For example lets say I have a string with an <a> in it, and that <a> needs an attribute added to it, so <a> gets added style="xxxx:yyyy;". How would you go about doing this?
Ideally it would add any attribute to any tag.
It's been said a million times. Don't use regex's for HTML parsing.
$dom = new DOMDocument();
#$dom->loadHTML($html);
$x = new DOMXPath($dom);
foreach($x->query("//a") as $node)
{
$node->setAttribute("style","xxxx");
}
$newHtml = $dom->saveHtml()
Here is using regex:
$result = preg_replace('/(<a\b[^><]*)>/i', '$1 style="xxxx:yyyy;">', $str);
but Regex cannot parse malformed HTML documents.

Categories