Consider this simple piece of code, working normally using the PHP Simple HTML DOM Parser, it outputs current community.
<?php
//PHP Simple HTML DOM Parser from simplehtmldom.sourceforge.net
include_once('simple_html_dom.php');
//Target URL
$url = 'http://stackoverflow.com/questions/ask';
//Getting content of $url
$doo = file_get_html($url);
//Passing the variable $doo to $abd
$abd = $doo ;
//Trying to find the word "current community"
echo $abd->find('a', 0)->innertext; //Output: current community.
?>
Consider this other piece of code, same as above but I add an empty space to the parsed html content (in the future, I need to edit this string, so I just added a space here to simplify things).
<?php
//PHP Simple HTML DOM Parser from simplehtmldom.sourceforge.net
include_once('simple_html_dom.php');
//Target URL
$url = 'http://stackoverflow.com/questions/ask';
//Getting content of $url
$doo = file_get_html($url);
//Passing the variable $url to $doo - and adding an empty space.
$abd = $doo . " ";
//Trying to find the word "current community"
echo $abd->find('a', 0)->innertext; //Outputs: nothing.
?>
The second code gives this error:
PHP Fatal error: Call to undefined function file_get_html() in /home/name/public_html/code.php on line 5
Why can't I edit the string gotten from file_get_html? I need to edit it for many important reasons (like removing some scripts before processing the html content of the page). I also do not understand why is it giving the error that file_get_html() could not be found (It's clear we're importing the correct parser from the first code).
Additional note:
I have tried all those variations:
include_once('simple_html_dom.php');
require_once('simple_html_dom.php');
include('simple_html_dom.php');
require('simple_html_dom.php');
file_get_html() returns an object, not a string. Attempting to concatenate a string to an object will call the object's _toString() method if it exists, and the operation returns a string. Strings do not have a find() method.
If you want to do as you have described read the file contents and concatenate the extra string first:
$content = file_get_contents('someFile.html');
$content .= "someString";
$domObject = str_get_html($content);
Alternatively, read the file with file_get_html() and manipulate it with the DOM API.
$doo is not a string! It's an object, an instance of Simple HTML DOM. You can't call -> methods on strings, only on objects. You cannot treat this object like a string. Trying to concatenate something to it makes no sense. $abd in your code is the result of an object concatenated with a string; this either results in a string or an error, depending in the details of the object. What it certainly does not do is result in a usable object, so you certainly can't do $abd->find().
If you want to modify the content of the page, do it using the DOM API which the object gives you.
Related
I am trying to create a custom CMS, every page has a unique ID and on every page is a string (<--UNIQUEID-->) at the place where the CMS text has to come.
I am trying to replace that string with the text that is saved in a database for that page, but I can't get that to work. I am trying this with DOM documents.
I have this at the moment:
This is before the <html>tag:
ob_start()
And after the </html>> tag:
if ((($html = ob_get_clean()) !== false) && (ob_start() === true))
{
$dom = new DOMDocument();
$dom->loadHTML($html); // load the output HTML
/* your specific search and replace logic goes here */
$StringToReplace = '<--754764-->';
$ReplacementString = 'test';
str_replace($StringToReplace, $ReplacementString, $html);
echo $dom->saveHTML(); // output the replaced HTML
}
It is showing the page, but it's not showing the replacement string text.
You're trying to do two things and getting confused in the process.
When you load your HTML buffered output into a DOMDocument object (via DOMDocument::loadHTML), the state of that object is now the parsed HTML. You then replace your string into $html itself, and then output the HTML from the DOMDocument.
Due to the fact that by the time you get to your str_replace call, the inner state of the DOMDocument is independent from $html, that replace call effectively does nothing to it.
If you're certain that the comment will be of exactly that form, you can just echo $html; after the call to str_replace. This also saves you from having to worry about your output being compliant and parsing properly (DOMDocument is stricter than most browsers when it comes to that).
The code you posted doesn't use the DOMDocument object to do any transformation of the document. It just parses the HTML then generate another one that is functionally identical to the original.
You just don't need the DOMDocument object.
The str_replace() does the expected transformation but the value it returns is completely ignored. You have to echo it in order to get the desired result.
The following code is enough:
if (($html = ob_get_clean()) !== false) {
/* your specific search and replace logic goes here */
$StringToReplace = '<--754764-->';
$ReplacementString = 'test';
echo str_replace($StringToReplace, $ReplacementString, $html);
}
This works
$feed = $xml->xpath('//room[#id="103"]');
I am trying to replace 103 by a variable namely $id.
I tried
$feed = $xml->xpath('//room[#id=$id]');
and
$feed = $xml->xpath('//room[#id=".$id."]');
None work.
What is the appropriate syntax for putting a variable in xpath?
When you writed $id in string, php use it like string. You should close quote like
$feed = $xml->xpath('//room[#id="'.$id.'"]');
Or write variable in {}
$feed = $xml->xpath('//room[#id="{$id}"]');
your question already answer from Mohammad
but i give a more fully example with [0] at the end of the line to avoid more rows for some usages
$feed = $xml->xpath('//room[#id="'.$id.'"])[0];
also if you want result from xml with some others values inside to the root where use #attribute
$day= $_GET['Day'];
$feed = $xml->xpath('//room[#id="'.$id.'"]/Day'.$day.'')[0];
this mean if you have a xml file like harder code you can cal a day dynamic
I would like to find all URLs in a string (curl results) and then encode any query strings in those results, example
urls found:
http://www.example.com/index.php?favoritecolor=blue&favoritefood=sharwarma
to replace all those URLS found with encoded string (i can only do one of them)
http%3A%2F%2Fwww.example.com%2Findex.php%3Ffavoritecolor%3Dblue%26favoritefood%3Dsharwarma
but do this in a html curl response, find all URLS from html page.
Thank you in advanced, i have searched for hours.
This will do what you want if your CURL result is an HTML page and you only want a links (and not images or other clickable elements).
$xml = new DOMDocument();
// $html should be your CURL result
$xml->loadHTML($html);
// or you can do that directly by providing the requested page's URL to loadHTMLFile
// $xml->loadHTMLFile("http://...");
// this array will contain all links
$links = array();
// loop through all "a" elements
foreach ($xml->getElementsByTagName("a") as $link) {
// URL-encodes the link's URL and adds it to the previous array
$links[] = urlencode($link->getAttribute("href"));
}
// now do whatever you want with that array
The $links array will contain all the links found in the page in URL-encoded format.
Edit: if you instead want to replace all links in the page while keeping everything else, it's better to use DOMDocument than regular expressions (related : why you shouldn't use regex to handle HTML), here's an edited version of my code that replaces every link with its URL-encoded equivalent and then saves the page into a variable :
$xml = new DOMDocument();
// $html should be your CURL result
$xml->loadHTML($html);
// loop through all "a" elements
foreach ($xml->getElementsByTagName("a") as $link) {
// gets original (non URL-encoded link)
$original = $link->getAttribute("href");
// sets new link to URL-encoded format
$link->setAttribute("href", urlencode($original));
}
// save modified page to a variable
$page = $xml->saveHTML();
// now do whatever you want with that modified page, for example you can "echo" it
echo $page;
Code based on this.
Do not use php Dom directly, it will slow down your execution time, use simplehtmldom, its easy
function decodes($data){
foreach($data->find('a') as $hres){
$bbs=$hres->href;
$hres->__set("href", urlencode($bbs));
}
return $data;
}
I've written a script to process html files from URLs, however, due to a 30's script runtime restriction with my cheap host provider I've had to alter the script to store the html as txt files and run it from a local WAMP server.
I am trying to load each file up, extract what I need, then move onto the next file.
URL's as source file_get_html was doing the job perfectly (I could ->find the required elements)
Txt file as source file_get_html is returning a blank object.
Based on some advice in the below post I changed file_get_html for file_get_contents which created an array with a single large string containing the contents of the text file.
First, make sure that file_get_contents can get data. If it can, file_get_html will be able to load data to simplehtml Dom
If file_get_contents returns a string, which it does, how would I "load data to simplehtml Dom?"
File not getting read using file_get_html
I then tried to convert the string into an object str_get_html, however, this didn't work either.
include('simple_html_dom.php');
$html = file_get_html('file.txt');
var_dump($html);
Returns: object(simple_html_dom)[1] but with no other contents or arrays.
include('simple_html_dom.php');
$html = file_get_contents('file.txt');
var_dump($html);
Returns: string < ! DOCTYPE html PUBLIC.....
Questions:
Can anyone give me any advice? What's the best way to load up a text file containing html markup into an object so that I can utilise the find method on it's contents. I want to avoid loading the file into an array of strings and using regex to process contents.
Are there any considerations I need to make if using a local WAMP server?
(Answered by the OP in a question. Converted to a community wiki answer. See Question with no answers, but issue solved in the comments (or extended in chat) )
The OP wrote:
I managed to solve this myself. I am sure i'd already tried to extract html from string, doh!
include('simple_html_dom.php');
$html = file_get_contents('file.txt');
$html = str_get_html($html);
var_dump($html)
Returns object(simple_html_dom)[1] including all expected arrays etc
Instead of trying to create the html object directly from the source file using file_get_html I've extracted the file contents file_get_contents then converted str to html using str_get_html which allows me to use the simple html dom methods e.g. find on attributes within the object e.g.
$html->find('a');
This question already has answers here:
How to insert HTML to PHP DOMNode?
(5 answers)
Closed 7 years ago.
I am using PHP's DOM object to create HTML pages for my website. This works great for my head, however since I will be entering a lot of HTML into the body (not via DOM), I would think I would need to use DOM->createElement($bodyHTML) to add my HTML from my site to the DOM object.
However DOM->createElement seems to parse all HTML entities so my end result ended up displaying the HTML on the page and not the actual renders HTML.
I am currently using a hack to get this to work,
$body = $this->DOM
->createComment('DOM Glitch--><body>'.$bodyHTML."</body><!--Woot");
Which puts all my site code in a comment, which I bypass athe comment and manually add the <body> tags.
Currently this method works, but I believe there should be a more proper way of doing this. Ideally something like DOM->createElement() that will not parse any of the string.
I also tried using DOM->createDocumentFragment() However it does not like some of the string so it would error and not work (Along with take up extra CPU power to re-parse the body's HTML).
So, my question is, is there a better way of doing this other than using DOM->createComment()?
You use the DOMDocumentFragment objec to insert arbitrary HTML chunks into another document.
$dom = new DOMDocument();
#$dom->loadHTML($some_html_document); // # to suppress a bajillion parse errors
$frag = $dom->createDocumentFragment(); // create fragment
$frag->appendXML($some_other_html_snippet); // insert arbitary html into the fragment
$node = // some operations to find whatever node you want to insert the fragment into
$node->appendChild($frag); // stuff the fragment into the original tree
I FOUND THE SOLUTION but it's not a pure php solution, but works very well. A little hack for everybody who lost countless hours, like me, to fix this
$dom = new DomDocument;
// main object
$object = $dom->createElement('div');
// html attribute
$attr = $dom->createAttribute('value');
// ugly html string
$attr->value = "<div> this is a really html string ©</div><i></i> with all the © that XML hates!";
$object->appendChild($attr);
// jquery fix (or javascript as well)
$('div').html($(this).attr('value')); // and it works!
$('div').removeAttr('value'); // to clean-up
loadHTML works just fine.
<?php
$dom = new DOMDocument();
$dom->loadHTML("<font color='red'>Hey there mrlanrat!</font>");
echo $dom->saveHTML();
?>
which outputs Hey there mrlanrat! in red.
or
<?php
$dom = new DOMDocument();
$bodyHTML = "here is the body, a nice body I might add";
$dom->loadHTML("<body> " . $bodyHTML . " </body>");
// this would even work as well.
// $bodyHTML = "<body>here is the body, a nice body I might add</body>";
// $dom->loadHTML($bodyHTML);
echo $dom->saveHTML();
?>
Which outputs:
here is the body, a nice body I might add and inside of your HTML source code, its wrapped inside body tags.
I spent a lot of time working on Anthony Forloney's answer, But I cannot seem to get the html to append to the body without it erroring.
#Mark B: I have tried doing that, but as I said in the comments, it errored on my html.
I forgot to add the below, my solution:
I decided to make my html object much simpler and to allow me to do this by not using DOM and just use strings.