SimpleHtmlDom Edit and Save file - php

http://simplehtmldom.sourceforge.net/
Looking to parse through an HTML file, make some small changes, and overwrite the current file with the updates. Was wondering if this was possible through simplehtmldom
As of right now, I can access everything, but have no way of displaying the entirely of the HTML With all of the changes included. Is it only possible to grab specific values?
If so, what other methods could I use to accomplish this? I'm afraid the environment I have to work in is extremely limited due to security issues.

The simplehtmldom object can be used as a string, and will just contain the full contents of the modified document.
For example:
echo($html);
file_put_contents($filename, $html);
If you prefer an object-oriented method, you can use save():
$updated = $html->save();
or
$html->save($filename);

Related

Changing/deleting html from file_get_contents

I'm currently using this code:
$blog= file_get_contents("http://powback.tumblr.com/post/" . $post);
echo $blog;
And it works. But tumblr has added a script that activates each time you enter a password-field. So my question is:
Can i remove certain parts with file_get_contents? Or just remove everything above the <html> tag? could i possibly kill a whole div so it wont load at all? And if so; how?
edit:
I managed to do it the simple way. By skipping 766 characters. The script now work as intended!
$blog= file_get_contents("powback.tumblr.com/post/"; . $post, NULL, NULL, 766);
After file_get_contents returns, you have in your hands a string. You can do anything you want to it, including cutting out parts of it.
There are two ways to actually do the cutting:
Using string functions like str_replace, preg_replace and others; the exact recipe depends on what you need to do. This approach is kind of frowned upon because you are working at the wrong level of abstraction, but in some cases it has an unmatched performance to time spent ratio.
Parsing the HTML into a DOM tree, modifying it appropriately (this time working at the appropriate level of abstraction) and then turn it back into a string and echo it. This can be more convenient to work with if your requirements are not dead simple and is easier to maintain, but it typically requires more code to be written.
If you want to do something that's most naturally expressed in HTML document terms ("cutting out this <div>") then don't be tempted and go with the second approach.
At that point, $blog is just a string, so you can use normal PHP functions to alter it. Look into these 2:
http://php.net/manual/en/function.str-replace.php
http://us2.php.net/manual/en/function.preg-replace.php
You can parse your output using simple html dom parser and display olythe contents thatyou really want to display

Mediawiki tag extension - chained tags do not get processed

I'm trying to develop a simple Tag Extension for Mediawiki. So far I'm basically outputing the input as it comes. The problem arises when there are chained tags. For instance, for this example:
function efSampleParserInit( Parser &$parser ) {
$parser->setHook( 'sample', 'efSampleRender' );
return true;
}
function efSampleRender( $input, array $args, Parser $parser, PPFrame $frame ) {
return "hello ->" . $input . "<- hello";
}
If I write this in an article:
This is the text <sample type="1">hello my <sample type="2">brother</sample> John</sample>
Only the first sample tag gets processed. The other one isn't. I guess I should work with the $parser object I receive so I return the parsed input, but I don't know how to do it.
Furthermore, Mediawiki's reference is pretty much non existant, it would be great to have something like a Doxygen reference or something.
Use $parser->recursiveTagParse(), as shown at Manual:Tag_extensions#How do I render wikitext in my extension?.
It is kind of a clunky interface, and not very well documented. The underlying reason why such a seemingly natural thing to do is so tricky to accomplish is that it sort of goes against the original design intent of tag extensions — they were originally conceived as low-level filters that take in raw unparsed text and spit out HTML, completely bypassing normal parsing. So, for example, if you wanted to include some content written in Markdown (such as a StackOverflow post) on a wiki page, the idea was that you could install a suitable extension and then write
<markdown>
**Look,** here's some Markdown text!
</markdown>
on the page, and the MediaWiki parser would leave everything between the <markdown> tags alone and just hand it over to the extension for parsing.
Of course, it turned out that most people who wrote MediaWiki tag extensions didn't really want to replace the parser, but just to apply some tweaks to it. But the way the tag extension interface was set up, the only way to do that was to call the parser recursively. I've sometimes thought it would be nice to add a new parser extension type to MediaWiki, something that looked like tag extensions but didn't interrupt normal parsing in such a drastic manner. Alas, my motivation and copious free time hasn't so far been sufficient to actually do something about it.

Is there a php function for using the source code of another web page?

I want to create a PHP script that grabs the content of a website. So let's say it grabs all the source code for that website and I say which lines of code I need.
Is there a function in PHP that allows you too do this or is it impossible?
Disclaimer: I'm not going to use this for any illegal purposes at all and not asking you too write any code, just tell me if its possible and if you can how I'd go about doing it. Also I'm just asking in general, not for any specific reason. Thanks! :)
file('http://the.url.com') returns an array of lines from a url.
so for the 24th line do this:
$lines = file('http://www.whatever.com');
echo $lines[23];
This sounds like a horrible idea, but here we go:
Use file_get_contents() to get the file. You cannot get the source if the web server first processes it, so you may need to use an extension like .txt. Unless you password protect the file, obviously anybody can get it.
Use explode() with the \n delimiter to split the source code into lines.
Use array_slice() to get the lines you need.
eval() the code.
Note: if you just want the HTML output, then ignore the bit about the source in step 1 and obviously you can skip the whole eval() thing.

Why use an XML parser?

I'm a somewhat experienced PHP scripter, however I just dove into parsing XML and all that good stuff.
I just can't seem to wrap my head around why one would use a separate XML parser instead of just using the explode function, which seems to be just as simple. Here's what I've been doing (assuming there is a valid XML file at the path xml.php):
$contents = file_get_contents("xml.php");
$array1 = explode("<a_tag>", $contents);
$array2 = explode("</a_tag>", $array1[1]);
$data = $array2[0];
So my question is, what is the practical use for an XML parser if you can just separate the values into arrays and extract the data from that point?
Thanks in advance! :)
Excuse me for not going into details but for starters try parsing
$contents = '<a xmlns="urn:something">
<a_tag>
<b>..</b>
<related>
<a_tag>...</a_tag>
</related>
</a_tag>
<foo:a_tag xmlns:foo="urn:something">
<![CDATA[This is another <a_tag> element]]>
</foo:a_tag>
</a>';
with your explode-approach. When you're done we can continue with some trickier things ;-)
In a nutshell, its consistency. Before XML came into wide use there were numerous undocumented formats for keeping information in files. One of the motivators behind XML was to create a well defined, standard document format. With this well defined format in place, a general set of parsing tools could be developed that would work consistently on documents so long as the documents adhered to the aforementioned well defined format.
In some specific cases, your example code will work. However, if the document changes
...
<!-- adding an attribute -->
<a_tag foo="bar">Contents of the Tag</a_tag>
...
...
<!-- adding a comment to the contents -->
<a_tag>Contents <!-- foobar --> of the Tag</a_tag>
...
Your parsing code will probably break. Code written using a correctly defined XML parser will not.
XML parsers:
Handle encoding
May have xpath support
Allow you to easily modify and save the XML; append/remove child nodes, add/remove attributes, etc.
Don't need to load the whole file into memory (except from DOM parsers)
Know about namespaces
...
How would you explode the same file if a_tag had an attribute?
explode("<a_tag>" ... will work differently than explode("<a_tag attr='value'>" ..., after all.
XML Parsers understand the XML specification. Explode can only handle the simplest of cases, and will most likely fail in a lot of instances of that case.
Using a proven XML parsing method will make the code more maintainable and easy to read. It will also make it more easily adaptable should the schema change, and it can make it easier to determine error conditions. XPath and XSLT exist for a reason, they are proven ways to deal with XML data in a sensible, legible manner. I'd suggest you use whichever is applicable in your given situation. Remember, just because you think you're only writing code for one specific purpose, you never know what a piece of well-written code could evolve into.

Read in external links in PHP

I know how to do this in Ruby, but I want to do this in PHP. Grab a page and be able to parse stuff out of it.
Take a look at cURL. Knowing about cURL and how to use it will help in many ways as it's not specific to PHP. If you want something specific however, you can use file_get_contents which is the recommended way in PHP to get the contents of a file into a string.
$file = file_get_contents("http://google.com/");
How to parse it depends on what you are trying to do, but I'd recommend one of the XML libraries for PHP.
You could use fopen in read mode: fopen($url, 'r'); or more simply file_get_contents($url);. You could also use readfile(), but file_get_contents() is potentially more efficient and therefore recommended.
Note: these are dependent on config (see the linked manual page) but will work on most setups.
For parsing, simplexml is enabled by default in PHP.
$xmlObject = simplexml_load_string($string);
// If the string was valid, you now have a fully functional xml object.
echo $xmlObject->username;
Its funny, I had the opposite question when I started rails development

Categories