insert string on simple html dom output - php

I need to insert a string into the dom elements so that they display when simple_html_dom::__tostring is called, but so they do not affect the simple_html_dom api.
So, if I have a simple_html_dom node where $node->outertext is the following:
<div class="MyClass">
<div itemprop="myVar">
</div>
</div>
Then, I want to assign $string='INSERTED STRING' to auto-insert on display. The output would be like:
<div class="MyClass">
INSERTED STRING
<div itemprop="myVar">
</div>
</div>
So the idea is that when interacting with the elements using the simple_html_dom api, its as if $string were not inserted. Then when the html is output, insert $string immediately after(or before) the opening (or closing) tag.
For example, $node->innertext = $string.$node->innertext is not acceptable because it affects the parsing since $node would have a new child at the beginning.
Is there a built-in way to do that?
If not, would there be a way to accomplish it without editing the source of simple_html_dom?
EDIT: Performance is not a concern because the output will be cached.
AND: I just realized I could just do $node->setAttribute('insertOnDisplay',$string) then scrape the document again before displaying, remove the attribute, and put the attribute value to the innertext. I'll see if I get other better options (and test it out) before posting it as an answer.

I extended simple_html_dom and simple_html_dom_node and in simple_html_dom_node, I created a prepareOutput method that goes through and finds any attributes with jdom-[before|after]:[outertext|innertext] then set the text to be before or after innertext or outertext. It feels like a hacky way to do it, but it works.
I also changed simple_html_dom a little bit to use new static::$domClass for instantiating a dom and new static::$nodeClass for instantiating a node instead of using new static, so that I could make it instantiate my subclass when creating a new node internally

Related

How to traverse DOM (children/siblings) using watir-webdriver?

I'm used to using PHP's Simple HTML DOM Parser(SHDP) to access elements, but I'm using ruby now with watir-webdriver, and I'm wondering if this can replace the functionality of SHDP as far as accessing elements on pages goes.
So in SHDP I'd do this:
$ret = $html->find('div[id=foo]');
Which is an array of all instances of divs with id=foo. Oh, and $html is the HTML source of a specified URL. Anyway, so then I'd put it in a loop:
foreach($ret as $element)
echo $element->first_child ()->first_child ()->first_child ()->first_child ()->first_child ()->first_child ()->first_child ()->plaintext . '<br>';
Now, here, each ->first_child() is a child of the parent div with id=foo (notice I have seven) and then I print the plaintext of the 7th child. Something like this
<div id="foo">
<div ...>
<div ...>
<div ...>
<div ...>
<div ...>
<div ...>
<div ...>HAPPINESS</div>
</div>
</div>
</div>
</div>
</div>
</div
</div>
would get "HAPPINESS" printed. So, my question is, how can this be done using watir-webdriver (if it all possible)?
Also, and more generally, how can I get SHDP's DOM-traversing abilities in watir-webdriver:
enter image description here
I ask because if watir-webdriver can't do this, I'm going to have to figure out a way to pipe source of a browser instance in watir-webdriver to a PHP script that uses SHDP and get it that way, and somehow get it back to ruby with the relevant information...
Watir implements an :index feature (zero-based):
browser.div(id: 'foo').divs # children
browser.div(id: 'foo').div(index: 6) # nth-child
browser.div(id: 'foo').parent # parent
browser.div(id: 'foo').div # first-child
browser.div(id: 'foo').div(index: -1) # last-child
next_sibling and previous_sibling are not currently implemented, please make a comment here if you think it is necessary for your code: https://github.com/watir/watir/pull/270
Note that in general you should prefer using indexes to using collections, but these also work:
browser.div(id: 'foo').divs.first
browser.div(id: 'foo').divs.last
Paperback code example (are you looking to select by text or obtain the text?):
browser.li(text: /Paperback/)
browser.td(class: "bucket").li
browser.table(id: 'productDetailsTable').li
We've also had requests in the past to support things like direct children instead of parsing all of the descendants: https://github.com/watir/watir/issues/329
We're actively working on how we want to improve things in the upcoming versions of Watir, so if this solution does not work for you, please post a suggestion with your ideal syntax for accomplishing what you want here: https://github.com/watir/watir/issues and we'll see how we can support it.
I don't believe there's a .child method to do this for you. If you know it will always be seven child divs in that structure you could do the inelegant
require 'watir-webdriver'
#browser = Watir::Browser.new
puts #browser.div(id: 'foo').div.div.div.div.div.div.div.text
You can always grab a collection of them and then address the last one, assuming it is the last one, the deepest in the stack.
puts #browser.div(id: 'foo').divs.last.text
That would also work, but assumes something absolute about the structure of the page. It's also not equivalent to the iteration of elements you've got above. As I'm not clear on the value of doing it that way I'm not comfortable taking a stab at equivalent code.
Maybe I am not giving you exactly what you were doing in PHP. However, if you know that text of 7th child will be HAPPINESS then you could simply locate an element via XPath:
STEPS:
Given(/^I click the div "(.*?)" xpath$/) do |div_xpath|
Watir::Wait.until { #browser.div(:xpath => div_xpath).exist? }
#browser.div(:xpath => div_xpath).click
end
FEATURE:
Given I click the div "//div[#id='foo'][text()='HAPPINESS']" xpath

i want to get data from another website and display it on mine but with my style.css

So my school has this very annoying way to view my rooster.
you have to bypass 5 links to get to my rooster.
this is the link for my class (it updates weekly without changing the link)
https://webuntis.a12.nl/WebUntis/?school=roc%20a12#Timetable?type=1&departmentId=0&id=2147
i want to display the content from that page on my website but with my
own stylesheet.
i don't mean this:
<?php
$homepage = file_get_contents('http://www.example.com/');
echo $homepage;
?>
or an iframe....
I think this can be better done using jquery and ajax. You can get jquery to load the target page, use selectors to strip out what you need, then attach it to your document tree. You should then be able to style it anyway you like.
I would recommend you to use the cURL library: http://www.php.net/manual/en/curl.examples.php
But you have to extract part of the page you want to display, because you will get the whole HTML document.
You'd probably read the whole page into a string variable (using file_get_contents like you mentioned for example) and parse the content, here you have some possibilities:
Regular expressions
Walking the DOM tree (eg. using PHPs DOMDocument classes)
After that, you'd most likely replace all the style="..." or class="..." information with your own.

Getting information from ID's

First off, I'm brand new to PHP so I'm sorry if this is a stupid question, second of all sorry if this title is incorrect.
Now, what I'm trying to do is create an overlay for a game that I play. My code for the overlay works perfectly, and now I'm working on my HTML file which gets its information from a website and outputs it. The code on the website looks like this:
<span id="example1">Information I want</span>
<span id="example2">More Info I want</span>
...
<span id="example3">And some more</span>
Now what I want to do is create a PHP script which goes in and finds elements by their names and gives me the information in those span tags. Here's what I've tried so far, it's not working however (no surprise):
//Some HTML here
<?php
$doc = new DomDocument;
$doc->validateOnParse = true;
$doc->Load('www.website.com');
echo "Example1: " . $doc->getElementById('example1') . "\n";
?>
//More HTML
To be honest, I have no clue what I'm doing. If anyone could show me an example of how to do this properly, or to point me in the right direction I would appreciate it.
The text between open and close tags is a Text Node.
Just write $doc->getElementById('example1')->nodeValue
Your code seems along the right lines, but you're missing a few things.
First of all, your load call is literally looking for a file named "www.website.com". If it's a remote file, you must include the http:// prefix.
Then, you are attempting to echo out the node itself, whereas you want its value (ie. its contents).
Try $doc->getElementById("example1")->nodeValue instead.
That should do it. You may want to add libxml_use_internal_errors(true); so that any errors in the source file won't destroy your page with PHP errors. Also, I would suggest using loadHTMLFile instead of load, as this will be more lenient towards malformed documents.
you can use getElementById:
$a = $doc->getElementById("example1");
var_dump($a); so you will see what you want to echo or put, or something.
You can also make all the names i HTML as example[] end then foreach the example array, so you can get element by id from example array with just one row of code

At certain points in the HTML, I need to close every open tag, insert a DIV, then open all tags again, in order

I am dealing with HTML that's been generated with FCKeditor. So it will look something like this:
<p>Paragraph</p>
<ul>
<li>List item</li>
</ul>
No head tag, no body tag, just a snippet of HTML.
I am trying to add support for certain variables that, when inserted into the HTML, will be replaced with dynamic content. So the HTML, variable inserted, might look like this:
<p>Here's a variable: {widget}</p>
I want to replace {widget} with this:
<div class="widget">Hi, I'm a widget.</div>
FCKeditor encapsulates content (rightly) into paragraphs when you insert a line break. So if I did a straight replace, the resulting HTML would be this:
<p>Here's a variable: <div class="widget">Hi, I'm a widget.</div></p>
That's not going to work because the div tag is inside of the p tag. So what I want to do is close the paragraph and insert the DIV after it:
<p>Here's a variable: </p>
<div class="widget">Hi, I'm a widget.</div>
Let's take this example:
<p class="someclass">Here's a <strong>variable: {widget} more</strong> content
after</p>
I would want this result:
<p class="someclass">Here's a <strong>variable: </strong></p>
<div class="widget">Hi, I'm a widget.</div>
<p class="someclass"><strong> more</strong> content after</p>
At every instance of {widget} in HTML snippet, I need to make a "break" in the HTML. Which is to close every open tag, insert the widget code, then open them all again in order.
Is this possible using a PHP HTML parser? If so, how would I go about it?
I would suggest an entirely different approach. (F)CKEditor can already do what you want. Just try to add a table in the middle of a paragraph. It will close the inline tag stack, add the table, and reopen the stack again.
I suggest that, instead of having your users write {widget}, you write an (F)CKEditor plugin that adds the widgets for you. You can take a look at the code for the table button (or any other block-level element) to see how (F)CKEditor inserts them.
There are two things you can do when a user hits the "widget" button. Eitther you insert some custom tag such as <widget type="foo" />, or you insert a HTML tag that you can recognise later on, like <div class="widget foo"></div>.
With some extra elbow grease you can even make this fancier by actually loading the widget itself, wrapped in such tags. That way, the user would see exactly the same in the editor window as when it was stored. When the editor saves to the server, simply empty the tags wrapping the widget to get rid of it.
Example workflow (cursor marked by | sign):
User types text:
<p>foo bar| baz</p>
User hits "widget" button:
<p>foo bar</p>
<div class="widget foo"> ... contents of "foo" widget ... </div>
<p>|baz</p>
When saving, drop the widget contents:
<p>foo bar</p>
<div class="widget foo"></div>
<p>baz</p>
When displaying the saved contents, parse for div tags with a "widget" class and dynamically fill it:
<p>foo bar</p>
<div class="widget foo"> ... contents of "foo" widget ... </div>
<p>baz</p>
This could be done post-process when saving with regex if you were pretty careful about what you allowed. Alternatively, I do a fair amount of juggling on the front end with my editor (CKEditor) output, combining the user-input content plus things that I jam in both between and around the string that I parse and regex.
Another option to be explored is the BBCode plugin that CKEditor has added. Having been a longtime user of FCK plus a current user of CK, I can tell you that it's well worth the time to make the upgrade. And, according to the CK Developer site, it claims to be built-in. I also found a plugin that will allow BBCode. Both could easily be adapted for your purpose.
Finally, if you're adventurous and confident with Javascript, the HTML Processor can be hacked to do quite a few things. For instance, CK now outputs with styles rather than traditional HTML, and my editor does strictly HTML Emails which don't support style declarations so well. So, I hacked the HTML Processor's code to output everything with height=, width=, align= etc rather than style="height=x; width=x" etc.

Getting and placing content within html tag by its class using php

Is it possible to get and place content within an html tag by its class name?
For Example:
<div class='edit'>
Wow! I'm the Content.
</div>
Is it possible to get that value, edit and place it back or a new value to that div etc? If it's possible... will it work if it has multiple classes? Like:
<div class='span-20 edit'>
Wow! I'm the Content.
</div>
If you can determine which specific HTML tag to manipulate, you have various tools at your disposal. You can use str_replace, preg_replace, DOMDocument, DOMXPath, and simplexml in this situation.
If in PHP, try this:
$xhtml = simplexml_load_string("<div class='edit'>Wow! I'm the Content.</div>");
$divs = $xhtml->xpath('//div[#class=edit]');
if (!empty($divs){
foreach ($divs as $div){
$div['class'] .= ' span-20';
}
}
return $xhtml->asXML();
With jQuery javascript library, do this:
$('.edit').addClass('span-20');

Categories