Replace a string before the user sees the page - php

I am trying to create a custom CMS, every page has a unique ID and on every page is a string (<--UNIQUEID-->) at the place where the CMS text has to come.
I am trying to replace that string with the text that is saved in a database for that page, but I can't get that to work. I am trying this with DOM documents.
I have this at the moment:
This is before the <html>tag:
ob_start()
And after the </html>> tag:
if ((($html = ob_get_clean()) !== false) && (ob_start() === true))
{
$dom = new DOMDocument();
$dom->loadHTML($html); // load the output HTML
/* your specific search and replace logic goes here */
$StringToReplace = '<--754764-->';
$ReplacementString = 'test';
str_replace($StringToReplace, $ReplacementString, $html);
echo $dom->saveHTML(); // output the replaced HTML
}
It is showing the page, but it's not showing the replacement string text.

You're trying to do two things and getting confused in the process.
When you load your HTML buffered output into a DOMDocument object (via DOMDocument::loadHTML), the state of that object is now the parsed HTML. You then replace your string into $html itself, and then output the HTML from the DOMDocument.
Due to the fact that by the time you get to your str_replace call, the inner state of the DOMDocument is independent from $html, that replace call effectively does nothing to it.
If you're certain that the comment will be of exactly that form, you can just echo $html; after the call to str_replace. This also saves you from having to worry about your output being compliant and parsing properly (DOMDocument is stricter than most browsers when it comes to that).

The code you posted doesn't use the DOMDocument object to do any transformation of the document. It just parses the HTML then generate another one that is functionally identical to the original.
You just don't need the DOMDocument object.
The str_replace() does the expected transformation but the value it returns is completely ignored. You have to echo it in order to get the desired result.
The following code is enough:
if (($html = ob_get_clean()) !== false) {
/* your specific search and replace logic goes here */
$StringToReplace = '<--754764-->';
$ReplacementString = 'test';
echo str_replace($StringToReplace, $ReplacementString, $html);
}

Related

String of file_get_html can't be edited?

Consider this simple piece of code, working normally using the PHP Simple HTML DOM Parser, it outputs current community.
<?php
//PHP Simple HTML DOM Parser from simplehtmldom.sourceforge.net
include_once('simple_html_dom.php');
//Target URL
$url = 'http://stackoverflow.com/questions/ask';
//Getting content of $url
$doo = file_get_html($url);
//Passing the variable $doo to $abd
$abd = $doo ;
//Trying to find the word "current community"
echo $abd->find('a', 0)->innertext; //Output: current community.
?>
Consider this other piece of code, same as above but I add an empty space to the parsed html content (in the future, I need to edit this string, so I just added a space here to simplify things).
<?php
//PHP Simple HTML DOM Parser from simplehtmldom.sourceforge.net
include_once('simple_html_dom.php');
//Target URL
$url = 'http://stackoverflow.com/questions/ask';
//Getting content of $url
$doo = file_get_html($url);
//Passing the variable $url to $doo - and adding an empty space.
$abd = $doo . " ";
//Trying to find the word "current community"
echo $abd->find('a', 0)->innertext; //Outputs: nothing.
?>
The second code gives this error:
PHP Fatal error: Call to undefined function file_get_html() in /home/name/public_html/code.php on line 5
Why can't I edit the string gotten from file_get_html? I need to edit it for many important reasons (like removing some scripts before processing the html content of the page). I also do not understand why is it giving the error that file_get_html() could not be found (It's clear we're importing the correct parser from the first code).
Additional note:
I have tried all those variations:
include_once('simple_html_dom.php');
require_once('simple_html_dom.php');
include('simple_html_dom.php');
require('simple_html_dom.php');
file_get_html() returns an object, not a string. Attempting to concatenate a string to an object will call the object's _toString() method if it exists, and the operation returns a string. Strings do not have a find() method.
If you want to do as you have described read the file contents and concatenate the extra string first:
$content = file_get_contents('someFile.html');
$content .= "someString";
$domObject = str_get_html($content);
Alternatively, read the file with file_get_html() and manipulate it with the DOM API.
$doo is not a string! It's an object, an instance of Simple HTML DOM. You can't call -> methods on strings, only on objects. You cannot treat this object like a string. Trying to concatenate something to it makes no sense. $abd in your code is the result of an object concatenated with a string; this either results in a string or an error, depending in the details of the object. What it certainly does not do is result in a usable object, so you certainly can't do $abd->find().
If you want to modify the content of the page, do it using the DOM API which the object gives you.

Find all URLS in a string and encode query string?

I would like to find all URLs in a string (curl results) and then encode any query strings in those results, example
urls found:
http://www.example.com/index.php?favoritecolor=blue&favoritefood=sharwarma
to replace all those URLS found with encoded string (i can only do one of them)
http%3A%2F%2Fwww.example.com%2Findex.php%3Ffavoritecolor%3Dblue%26favoritefood%3Dsharwarma
but do this in a html curl response, find all URLS from html page.
Thank you in advanced, i have searched for hours.
This will do what you want if your CURL result is an HTML page and you only want a links (and not images or other clickable elements).
$xml = new DOMDocument();
// $html should be your CURL result
$xml->loadHTML($html);
// or you can do that directly by providing the requested page's URL to loadHTMLFile
// $xml->loadHTMLFile("http://...");
// this array will contain all links
$links = array();
// loop through all "a" elements
foreach ($xml->getElementsByTagName("a") as $link) {
// URL-encodes the link's URL and adds it to the previous array
$links[] = urlencode($link->getAttribute("href"));
}
// now do whatever you want with that array
The $links array will contain all the links found in the page in URL-encoded format.
Edit: if you instead want to replace all links in the page while keeping everything else, it's better to use DOMDocument than regular expressions (related : why you shouldn't use regex to handle HTML), here's an edited version of my code that replaces every link with its URL-encoded equivalent and then saves the page into a variable :
$xml = new DOMDocument();
// $html should be your CURL result
$xml->loadHTML($html);
// loop through all "a" elements
foreach ($xml->getElementsByTagName("a") as $link) {
// gets original (non URL-encoded link)
$original = $link->getAttribute("href");
// sets new link to URL-encoded format
$link->setAttribute("href", urlencode($original));
}
// save modified page to a variable
$page = $xml->saveHTML();
// now do whatever you want with that modified page, for example you can "echo" it
echo $page;
Code based on this.
Do not use php Dom directly, it will slow down your execution time, use simplehtmldom, its easy
function decodes($data){
foreach($data->find('a') as $hres){
$bbs=$hres->href;
$hres->__set("href", urlencode($bbs));
}
return $data;
}

Blog display code, keeping other content in post

Alright, I have some code that will find a <code></code> tag set and clean up any code inside of it so it displays instead of functioning like regular code. Everything works, but my problem is how can I find the tag set/multiple tag sets inside, say, $content. Clean the code, and still have ALL of the other content in it? Here is my code, the problem is it checks for matches, and when it finds one it cleans it. But after it cleans it it has no way to put it back into it's original position $content. ($content is being grabbed from a form)
<?php
preg_match_all("'<code>(.*?)</code>'si", $html, $match);
if ($match) {
foreach ($match[1] as $snippet) {
$fixedCode = htmlspecialchars($snippet, ENT_QUOTES);
}
}
?>
What do I do with $fixedCode, now that it is clean?
Using regex for parsing HTML is bad. I'd suggest getting familiar with a DOM parser, such as PHP's DOM module.
The DOM extension allows you to operate on XML documents through the DOM API with PHP 5.
Using the DOM module, in order to get the HTML/data from <code> tags in the document, you'd want to do something like this:
<?php
//So many variables!
$html = "<div> Testing some <code>code</code></div><div>Nother div, nother <code>Code</code> tag</div>";
$dom_doc = new DOMDocument;
$dom_doc->loadHTML($html);
$code = $dom_doc->getElementsByTagName('code');
foreach ($code as $scrap) {
echo htmlspecialchars($scrap->nodeValue, ENT_QUOTES), "<br />";
}
?>

How do I insert HTML into a PHP DOM object? [duplicate]

This question already has answers here:
How to insert HTML to PHP DOMNode?
(5 answers)
Closed 7 years ago.
I am using PHP's DOM object to create HTML pages for my website. This works great for my head, however since I will be entering a lot of HTML into the body (not via DOM), I would think I would need to use DOM->createElement($bodyHTML) to add my HTML from my site to the DOM object.
However DOM->createElement seems to parse all HTML entities so my end result ended up displaying the HTML on the page and not the actual renders HTML.
I am currently using a hack to get this to work,
$body = $this->DOM
->createComment('DOM Glitch--><body>'.$bodyHTML."</body><!--Woot");
Which puts all my site code in a comment, which I bypass athe comment and manually add the <body> tags.
Currently this method works, but I believe there should be a more proper way of doing this. Ideally something like DOM->createElement() that will not parse any of the string.
I also tried using DOM->createDocumentFragment() However it does not like some of the string so it would error and not work (Along with take up extra CPU power to re-parse the body's HTML).
So, my question is, is there a better way of doing this other than using DOM->createComment()?
You use the DOMDocumentFragment objec to insert arbitrary HTML chunks into another document.
$dom = new DOMDocument();
#$dom->loadHTML($some_html_document); // # to suppress a bajillion parse errors
$frag = $dom->createDocumentFragment(); // create fragment
$frag->appendXML($some_other_html_snippet); // insert arbitary html into the fragment
$node = // some operations to find whatever node you want to insert the fragment into
$node->appendChild($frag); // stuff the fragment into the original tree
I FOUND THE SOLUTION but it's not a pure php solution, but works very well. A little hack for everybody who lost countless hours, like me, to fix this
$dom = new DomDocument;
// main object
$object = $dom->createElement('div');
// html attribute
$attr = $dom->createAttribute('value');
// ugly html string
$attr->value = "<div> this is a really html string ©</div><i></i> with all the © that XML hates!";
$object->appendChild($attr);
// jquery fix (or javascript as well)
$('div').html($(this).attr('value')); // and it works!
$('div').removeAttr('value'); // to clean-up
loadHTML works just fine.
<?php
$dom = new DOMDocument();
$dom->loadHTML("<font color='red'>Hey there mrlanrat!</font>");
echo $dom->saveHTML();
?>
which outputs Hey there mrlanrat! in red.
or
<?php
$dom = new DOMDocument();
$bodyHTML = "here is the body, a nice body I might add";
$dom->loadHTML("<body> " . $bodyHTML . " </body>");
// this would even work as well.
// $bodyHTML = "<body>here is the body, a nice body I might add</body>";
// $dom->loadHTML($bodyHTML);
echo $dom->saveHTML();
?>
Which outputs:
here is the body, a nice body I might add and inside of your HTML source code, its wrapped inside body tags.
I spent a lot of time working on Anthony Forloney's answer, But I cannot seem to get the html to append to the body without it erroring.
#Mark B: I have tried doing that, but as I said in the comments, it errored on my html.
I forgot to add the below, my solution:
I decided to make my html object much simpler and to allow me to do this by not using DOM and just use strings.

Extract data from website via PHP

I am trying to create a simple alert app for some friends.
Basically i want to be able to extract data "price" and "stock availability" from a webpage like the folowing two:
http://www.sparkfun.com/commerce/product_info.php?products_id=5
http://www.sparkfun.com/commerce/product_info.php?products_id=9279
I have made the alert via e-mail and sms part but now i want to be able to get the quantity and price out of the webpages (those 2 or any other ones) so that i can compare the price and quantity available and alert us to make an order if a product is between some thresholds.
I have tried some regex (found on some tutorials, but i an way too n00b for this) but haven't managed to get this working, any good tips or examples?
$content = file_get_contents('http://www.sparkfun.com/commerce/product_info.php?products_id=9279');
preg_match('#<tr><th>(.*)</th> <td><b>price</b></td></tr>#', $content, $match);
$price = $match[1];
preg_match('#<input type="hidden" name="quantity_on_hand" value="(.*?)">#', $content, $match);
$in_stock = $match[1];
echo "Price: $price - Availability: $in_stock\n";
It's called screen scraping, in case you need to google for it.
I would suggest that you use a dom parser and xpath expressions instead. Feed the HTML through HtmlTidy first, to ensure that it's valid markup.
For example:
$html = file_get_contents("http://www.example.com");
$html = tidy_repair_string($html);
$doc = new DomDocument();
$doc->loadHtml($html);
$xpath = new DomXPath($doc);
// Now query the document:
foreach ($xpath->query('//table[#class="pricing"]/th') as $node) {
echo $node, "\n";
}
What ever you do: Don't use regular expressions to parse HTML or bad things will happen. Use a parser instead.
1st, asking this question goes too into details. 2nd, extracting data from a website might not be legitimate. However, I have hints:
Use Firebug or Chrome/Safari Inspector to explore the HTML content and pattern of interesting information
Test your RegEx to see if the match. You may need do it many times (multi-pass parsing/extraction)
Write a client via cURL or even much simpler, use file_get_contents (NOTE that some hosting disable loading URLs with file_get_contents)
For me, I'd better use Tidy to convert to valid XHTML and then use XPath to extract data, instead of RegEx. Why? Because XHTML is not regular and XPath is very flexible. You can learn XSLT to transform.
Good luck!
You are probably best off loading the HTML code into a DOM parser like this one and searching for the "pricing" table. However, any kind of scraping you do can break whenever they change their page layout, and is probably illegal without their consent.
The best way, though, would be to talk to the people who run the site, and see whether they have alternative, more reliable forms of data delivery (Web services, RSS, or database exports come to mind).
The simplest method to extract data from Website. I've analysed that my all data is covered within <h3> tag only, so I've prepared this one.
<?php
include(‘simple_html_dom.php’);
// Create DOM from URL, paste your destined web url in $page
$page = ‘http://facebook4free.com/category/facebookstatus/amazing-facebook-status/’;
$html = new simple_html_dom();
//Within $html your webpage will be loaded for further operation
$html->load_file($page);
// Find all links
$links = array();
//Within find() function, I have written h3 so it will simply fetch the content from <h3> tag only. Change as per your requirement.
foreach($html->find(‘h3′) as $element)
{
$links[] = $element;
}
reset($links);
//$out will be having each of HTML element content you searching for, within that web page
foreach ($links as $out)
{
echo $out;
}
?>

Categories