DOMDocument error on loadHTML() "Empty string supplied as input"? - php

When I remove the # sign from my $d, $x DOMdocument variables below, I'm getting the error...
Warning: DOMDocument::loadHTML()
[domdocument.loadhtml]: Empty string
supplied as input in
C:\xampplite\htdocs\mysite\wp-content\plugins\myplugin\index.php
on line 50
On the $content variable, when I run the function below. Even though I can echo $content and get a string. What am I missing?
add_filter('wp_insert_post_data', 'decorate_keyword');
function decorate_keyword($postarray) {
global $post;
$keyword = getKeyword($post);
/*
Even though I can echo $content, I'm getting the error referenced above.
I have to explicitly set it to a string to overcome the error.
*/
$content = $postarray['post_content'];
//$content = "this is a test phrase";
$d = new DOMDocument();
$d->loadHTML($content);
$x = new DOMXpath($d);
$nodes = $x->query("//text()[contains(.,'$keyword') and not(ancestor::h1) and not(ancestor::h2) and not(ancestor::h3) and not(ancestor::h4) and not(ancestor::h5) and not(ancestor::h6)]");
if ($nodes && $nodes->length) {
$node = $nodes->item(0);
// Split just before the keyword
$keynode = $node->splitText(strpos($node->textContent, $keyword));
// Split after the keyword
$node->nextSibling->splitText(strlen($keyword));
// Replace keyword with <b>keyword</b>
$replacement = $d->createElement('b', $keynode->textContent);
$keynode->parentNode->replaceChild($replacement, $keynode);
}
$postarray['post_content'] = $d;
return $postarray;
}

You should input a URL string like http://www.example.com to loadHTML() instead of an array.

Related

Is there a way to match words to sentences inside a html <b> tag in PHP

So i have this code to extract the text between in b tags.
$source_url = "https://www.wordpress.com/";
$html = file_get_contents($source_url);
$dom = new DOMDocument;
#$dom->loadHTML($html);
$links = $dom->getElementsByTagName('b');
$words = "php";
echo "<pre>";
print_r($dom);
echo "</pre>";
I tried to put the text inside in an array using array_push and others but if im going to use in_array
i need to put the whole sentence to return true not only a word.
So what i want exactly is :
If that sentence contains 'php' then return true
Try This:
foreach($links as $link) {
$p = strtolower($link->nodeValue);
if (strpos($p, 'php') !== false) {
// do something
}
}

set tags in html using domdocument and preg_replace_callback

I try to replace words that are in my dictionary of terminology with an (html)anchor so it gets a tooltip. I get the replace-part done, but I just can't get it back in the DomDocument object.
I've made a recursive function that iterates the DOM, it iterates every childnode, searching for the word in my dictionary and replacing it with an anchor.
I've been using this with an ordinary preg_match on HTML, but that just runs into problems.. when HTML gets complex
The recursive function:
$terms = array(
'example'=>'explanation about example'
);
function iterate_html($doc, $original_doc = null)
{
global $terms;
if(is_null($original_doc)) {
self::iterate_html($doc, $doc);
}
foreach($doc->childNodes as $childnode)
{
$children = $childnode->childNodes;
if($children) {
self::iterate_html($childnode);
} else {
$regexes = '~\b' . implode('\b|\b',array_keys($terms)) . '\b~i';
$new_nodevalue = preg_replace_callback($regexes, function($matches) {
$doc = new DOMDocument();
$anchor = $doc->createElement('a', $matches[0]);
$anchor->setAttribute('class', 'text-info');
$anchor->setAttribute('data-toggle', 'tooltip');
$anchor->setAttribute('data-original-title', $terms[strtolower($matches[0])]);
return $doc->saveXML($anchor);
}, $childnode->nodeValue);
$dom = new DOMDocument();
$template = $dom->createDocumentFragment();
$template->appendXML($new_nodevalue);
$original_doc->importNode($template->childNodes, true);
$childnode->parentNode->replaceChild($template, $childnode);
}
}
}
echo iterate_html('this is just some example text.');
I expect the result to be:
this is just some <a class="text-info" data-toggle="tooltip" data-original-title="explanation about example">example</a> text
I don't think building a recursive function to walk the DOM is usefull when you can use an XPath query. Also, I'm not sure that preg_replace_callback is an adapted function for this case. I prefer to use preg_split. Here is an example:
$html = 'this is just some example text.';
$terms = array(
'example'=>'explanation about example'
);
// sort by reverse order of key size
// (to be sure that the longest string always wins instead of the first in the pattern)
uksort($terms, function ($a, $b) {
$diff = mb_strlen($b) - mb_strlen($a);
return ($diff) ? $diff : strcmp($a, $b);
});
// build the pattern inside a capture group (to have delimiters in the results with the PREG_SPLIT_DELIM_CAPTURE option)
$pattern = '~\b(' . implode('|', array_map(function($i) { return preg_quote($i, '~'); }, array_keys($terms))) . ')\b~i';
// prevent eventual html errors to be displayed
$libxmlInternalErrors = libxml_use_internal_errors(true);
// determine if the html string have a root html element already, if not add a fake root.
$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$fakeRootElement = false;
if ( $dom->documentElement->nodeName !== 'html' ) {
$dom->loadHTML("<div>$html</div>", LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED);
$fakeRootElement = true;
}
libxml_use_internal_errors($libxmlInternalErrors);
// find all text nodes (not already included in a link or between other unwanted tags)
$xp = new DOMXPath($dom);
$textNodes = $xp->query('//text()[not(ancestor::a)][not(ancestor::style)][not(ancestor::script)]');
// replacement
foreach ($textNodes as $textNode) {
$parts = preg_split($pattern, $textNode->nodeValue, -1, PREG_SPLIT_DELIM_CAPTURE);
$fragment = $dom->createDocumentFragment();
foreach ($parts as $k=>$part) {
if ($k&1) {
$anchor = $dom->createElement('a', $part);
$anchor->setAttribute('class', 'text-info');
$anchor->setAttribute('data-toggle', 'tooltip');
$anchor->setAttribute('data-original-title', $terms[strtolower($part)]);
$fragment->appendChild($anchor);
} else {
$fragment->appendChild($dom->createTextNode($part));
}
}
$textNode->parentNode->replaceChild($fragment, $textNode);
}
// building of the result string
$result = '';
if ( $fakeRootElement ) {
foreach ($dom->documentElement->childNodes as $childNode) {
$result .= $dom->saveHTML($childNode);
}
} else {
$result = $dom->saveHTML();
}
echo $result;
demo
Feel free to put that into one or more functions/methods, but keep in mind that this kind of editing has a non-neglictable weight and should be used each time the html is edited (and not each time the html is displayed).

Php variable into a XML request string

I have the below code wich is extracting the Artist name from a XML file with the ref asrist code.
<?php
$dom = new DOMDocument();
$dom->load('http://www.bookingassist.ro/test.xml');
$xpath = new DOMXPath($dom);
echo $xpath->evaluate('string(//Artist[ArtistCode = "COD Artist"] /ArtistName)');
?>
The code that is pulling the artistcode based on a search
<?php echo $Artist->artistCode ?>
My question :
Can i insert the variable generated by the php code into the xml request string ?
If so could you please advise where i start reading ...
Thanks
You mean the XPath expression. Yes you can - it is "just a string".
$expression = 'string(//Artist[ArtistCode = "'.$Artist->artistCode.'"]/ArtistName)'
echo $xpath->evaluate($expression);
But you have to make sure that the result is valid XPath and your value does not break the string literal. I wrote a function for a library some time ago that prepares a string this way.
The problem in XPath 1.0 is that here is no way to escape any special character. If you string contains the quotes you're using in XPath it breaks the expression. The function uses the quotes not used in the string or, if both are used, splits the string and puts the parts into a concat() call.
public function quoteXPathLiteral($string) {
$string = str_replace("\x00", '', $string);
$hasSingleQuote = FALSE !== strpos($string, "'");
if ($hasSingleQuote) {
$hasDoubleQuote = FALSE !== strpos($string, '"');
if ($hasDoubleQuote) {
$result = '';
preg_match_all('("[^\']*|[^"]+)', $string, $matches);
foreach ($matches[0] as $part) {
$quoteChar = (substr($part, 0, 1) == '"') ? "'" : '"';
$result .= ", ".$quoteChar.$part.$quoteChar;
}
return 'concat('.substr($result, 2).')';
} else {
return '"'.$string.'"';
}
} else {
return "'".$string."'";
}
}
The function generates the needed XPath.
$expression = 'string(//Artist[ArtistCode = '.quoteXPathLiteral($Artist->artistCode).']/ArtistName)'
echo $xpath->evaluate($expression);

Using variable for tag in getElementsByTagName() for PHP and XML?

See my PHP:
file = "routingConfig.xml";
global $doc;
$doc = new DOMDocument();
$doc->load( $file );
function traverseXML($ElTag, $attr = null, $arrayNum = 'all'){
$tag = $doc->getElementsByTagName($ElTag);
$arr = array();
foreach($tag as $el){
$arr[] = $el->getAttribute($attr);
}
if ($arrayNum == 'all'){
return $arr;
}else if(is_int($arrayNum)){
return $arr[$arrayNum];
}else{
return "Invalid $arrayNum value: ". $arrayNum;
};
}
echo traverseXML("Route", "type", 2);
XML is:
<Routes>
<Route type="source"></Route>
<Route></Route>
<Routes>
Error returned is:
Fatal error: Call to a member function getElementsByTagName() on a non-object
I'm not sure how to do this?
EDIT: Here is the actual code being used. I originally stripped it a little bit trying to make it easier to read, but I think my problem is related to using the function.
Your problem is that the global $doc; statement is outside the function, so the variable $doc is not defined inside the function.
This would fix it:
// ...
function traverseXML($ElTag, $attr = null, $arrayNum = 'all') {
global $doc;
// ...
...but
Global variables are bad news. They usually indicate poor design.
Really you should pass $doc in as an argument, like this:
function traverseXML($doc, $ElTag, $attr = null, $arrayNum = 'all'){
$tag = $doc->getElementsByTagName($ElTag);
$arr = array();
foreach($tag as $el){
$arr[] = $el->getAttribute($attr);
}
if ($arrayNum == 'all'){
return $arr;
}else if(is_int($arrayNum)){
return $arr[$arrayNum];
}else{
return "Invalid $arrayNum value: ". $arrayNum;
};
}
$file = "routingConfig.xml";
$doc = new DOMDocument();
$doc->load( $file );
echo traverseXML($doc, "Route", "type", 2);
Although you might consider whether you need the function at all - if you don't use it anywhere else in you application, you might as well just do this:
$file = "routingConfig.xml";
$ElTag = "Route";
$attr = "type";
$arrayNum = 2;
$doc = new DOMDocument();
$doc->load( $file );
$tag = $doc->getElementsByTagName($ElTag);
$arr = array();
foreach($tag as $el){
$arr[] = $el->getAttribute($attr);
}
if ($arrayNum == 'all'){
echo $arr;
}else if(is_int($arrayNum)){
echo $arr[$arrayNum];
}else{
echo "Invalid $arrayNum value: ". $arrayNum;
};
The $doc variable is not defined inside your function. You have two options:
Pass $doc as one of the function arguments, which is preferred.
Write global $doc; at the top of your function ... devs usually try to avoid globals.

extracting anchor values hidden in div tags

From a html page I need to extract the values of v from all anchor links…each anchor link is hidden in some 5 div tags
<a href="/watch?v=value to be retrived&list=blabla&feature=plpp_play_all">
Each v value has 11 characters, for this as of now am trying to read it by character by character like
<?php
$file=fopen("xx.html","r") or exit("Unable to open file!");
$d='v';
$dd='=';
$vd=array();
while (!feof($file))
{
$f=fgetc($file);
if($f==$d)
{
$ff=fgetc($file);
if ($ff==$dd)
{
$idea='';
for($i=0;$i<=10;$i++)
{
$sData = fgetc($file);
$id=$id.$sData;
}
array_push($vd, $id);
That is am getting each character of v and storing it in sData variable and pushing it into id so as to get those 11 characters as a string(id)…
the problem is…searching for the ‘v=’ through the entire html file and if found reading the 11characters and pushing it into a sData array is sucking, it is taking considerable amount of time…so pls help me to sophisticate the things
<?php
function substring(&$string,$start,$end)
{
$pos = strpos(">".$string,$start);
if(! $pos) return "";
$pos--;
$string = substr($string,$pos+strlen($start));
$posend = strpos($string,$end);
$toret = substr($string,0,$posend);
$string = substr($string,$posend);
return $toret;
}
$contents = #file_get_contents("xx.html");
$old="";
$videosArray=array();
while ($old <> $contents)
{
$old = $contents;
$v = substring($contents,"?v=","&");
if($v) $videosArray[] = $v;
}
//$videosArray is array of v's
?>
I would better parse HTML with SimpleXML and XPath:
// Get your page HTML string
$html = file_get_contents('xx.html');
// As per comment by Gordon to suppress invalid markup warnings
libxml_use_internal_errors(true);
// Create SimpleXML object
$doc = new DOMDocument();
$doc->strictErrorChecking = false;
$doc->loadHTML($html);
$xml = simplexml_import_dom($doc);
// Find a nodes
$anchors = $xml->xpath('//a[contains(#href, "v=")]');
foreach ($anchors as $a)
{
$href = (string)$a['href'];
$url = parse_url($href);
parse_str($url['query'], $params);
// $params['v'] contains what we need
$vd[] = $params['v']; // push into array
}
// Clear invalid markup error buffer
libxml_clear_errors();

Categories