I'm used to using PHP's Simple HTML DOM Parser(SHDP) to access elements, but I'm using ruby now with watir-webdriver, and I'm wondering if this can replace the functionality of SHDP as far as accessing elements on pages goes.
So in SHDP I'd do this:
$ret = $html->find('div[id=foo]');
Which is an array of all instances of divs with id=foo. Oh, and $html is the HTML source of a specified URL. Anyway, so then I'd put it in a loop:
foreach($ret as $element)
echo $element->first_child ()->first_child ()->first_child ()->first_child ()->first_child ()->first_child ()->first_child ()->plaintext . '<br>';
Now, here, each ->first_child() is a child of the parent div with id=foo (notice I have seven) and then I print the plaintext of the 7th child. Something like this
<div id="foo">
<div ...>
<div ...>
<div ...>
<div ...>
<div ...>
<div ...>
<div ...>HAPPINESS</div>
</div>
</div>
</div>
</div>
</div>
</div
</div>
would get "HAPPINESS" printed. So, my question is, how can this be done using watir-webdriver (if it all possible)?
Also, and more generally, how can I get SHDP's DOM-traversing abilities in watir-webdriver:
enter image description here
I ask because if watir-webdriver can't do this, I'm going to have to figure out a way to pipe source of a browser instance in watir-webdriver to a PHP script that uses SHDP and get it that way, and somehow get it back to ruby with the relevant information...
Watir implements an :index feature (zero-based):
browser.div(id: 'foo').divs # children
browser.div(id: 'foo').div(index: 6) # nth-child
browser.div(id: 'foo').parent # parent
browser.div(id: 'foo').div # first-child
browser.div(id: 'foo').div(index: -1) # last-child
next_sibling and previous_sibling are not currently implemented, please make a comment here if you think it is necessary for your code: https://github.com/watir/watir/pull/270
Note that in general you should prefer using indexes to using collections, but these also work:
browser.div(id: 'foo').divs.first
browser.div(id: 'foo').divs.last
Paperback code example (are you looking to select by text or obtain the text?):
browser.li(text: /Paperback/)
browser.td(class: "bucket").li
browser.table(id: 'productDetailsTable').li
We've also had requests in the past to support things like direct children instead of parsing all of the descendants: https://github.com/watir/watir/issues/329
We're actively working on how we want to improve things in the upcoming versions of Watir, so if this solution does not work for you, please post a suggestion with your ideal syntax for accomplishing what you want here: https://github.com/watir/watir/issues and we'll see how we can support it.
I don't believe there's a .child method to do this for you. If you know it will always be seven child divs in that structure you could do the inelegant
require 'watir-webdriver'
#browser = Watir::Browser.new
puts #browser.div(id: 'foo').div.div.div.div.div.div.div.text
You can always grab a collection of them and then address the last one, assuming it is the last one, the deepest in the stack.
puts #browser.div(id: 'foo').divs.last.text
That would also work, but assumes something absolute about the structure of the page. It's also not equivalent to the iteration of elements you've got above. As I'm not clear on the value of doing it that way I'm not comfortable taking a stab at equivalent code.
Maybe I am not giving you exactly what you were doing in PHP. However, if you know that text of 7th child will be HAPPINESS then you could simply locate an element via XPath:
STEPS:
Given(/^I click the div "(.*?)" xpath$/) do |div_xpath|
Watir::Wait.until { #browser.div(:xpath => div_xpath).exist? }
#browser.div(:xpath => div_xpath).click
end
FEATURE:
Given I click the div "//div[#id='foo'][text()='HAPPINESS']" xpath
Related
I need to insert a string into the dom elements so that they display when simple_html_dom::__tostring is called, but so they do not affect the simple_html_dom api.
So, if I have a simple_html_dom node where $node->outertext is the following:
<div class="MyClass">
<div itemprop="myVar">
</div>
</div>
Then, I want to assign $string='INSERTED STRING' to auto-insert on display. The output would be like:
<div class="MyClass">
INSERTED STRING
<div itemprop="myVar">
</div>
</div>
So the idea is that when interacting with the elements using the simple_html_dom api, its as if $string were not inserted. Then when the html is output, insert $string immediately after(or before) the opening (or closing) tag.
For example, $node->innertext = $string.$node->innertext is not acceptable because it affects the parsing since $node would have a new child at the beginning.
Is there a built-in way to do that?
If not, would there be a way to accomplish it without editing the source of simple_html_dom?
EDIT: Performance is not a concern because the output will be cached.
AND: I just realized I could just do $node->setAttribute('insertOnDisplay',$string) then scrape the document again before displaying, remove the attribute, and put the attribute value to the innertext. I'll see if I get other better options (and test it out) before posting it as an answer.
I extended simple_html_dom and simple_html_dom_node and in simple_html_dom_node, I created a prepareOutput method that goes through and finds any attributes with jdom-[before|after]:[outertext|innertext] then set the text to be before or after innertext or outertext. It feels like a hacky way to do it, but it works.
I also changed simple_html_dom a little bit to use new static::$domClass for instantiating a dom and new static::$nodeClass for instantiating a node instead of using new static, so that I could make it instantiate my subclass when creating a new node internally
could you please help me.
I'm trying to scrape website while using php simple dome parser from here http://simplehtmldom.sourceforge.net/
Problem is that tags I need to identify have the same beginning, but don't have the same ending.
For example this is the structure:
<div id="postmenu_2861574">
<div id="post_message_2861574"> one posted message </div>
</div>
<div id="postmenu_2861617">
<div id="post_message_2861617"> another posted message </div>
</div>
All have tags have with the same beginning "post_menu" and "post_message_" but ending differs.
Is it possible to gather all post without knowing all tags endings?
Is there a way like in sql to use % sign at the end of the search phrase?
As simple way didn't work, showed that variable $postmenu empty.
foreach($html->find('div#postmenu_') as $postmenu)
$item['message'] = $article->find('div#post_message_', 0)->plaintext;
thank you for the help
According to http://www.w3.org/TR/CSS2/selector.html what you are asking is not possible.
I would make all divs with post messages the same class, e.g. class="post_message".
Then you can find all divs with this class using:
foreach($html->find('div.post_message') ...
Since you are scraping a website, performance is probably not an issue. In this case you can simply find all divs and check the ID, to see if it matches.
foreach($html->find('div') ...
// retreive ID
if (0 === strpos($id, 'post_message_')))
...
I am using Selenium WebDriver wrapped in PHPUnit and Sausage to test clicking a button in a specific row in a table that's laid out similar to:
<tr id="dynamically generated 1">
<td class="foo">
<div class="bar"></div>
</td>
<td class = "mybutton">
<span class = "icon clickable"></span>
</td>
</tr>
<tr id="dynamically generated 2">
<td class="foo">
<div class="baz"></div>
</td>
<td class = "mybutton">
<span class = "icon clickable"></span>
</td>
</tr>
In particular, I want to click a specific element #mybutton > span.icon.clickable whose sibling is .foo with child .baz. The "whose sibling is .foo with child .baz" requirement is the only way I can currently identify the correct element, as other rows in the same table have element #mybutton > span.icon.clickable, and the ids for those rows are dynamically generated.
At the moment I am using XPath, but as you might expect, performance on FF and IE is horrendous. Is there a method for retrieving the value of tr#id from the element tr#id div.bar? If I can get this, I can use the id to use CSS to find the element I am looking for. I am using PHP, but a solution in any language would be useful.
Alternatively, a more straightforward CSS3 solution would work, but after quite a bit of reading, I've all but concluded that using a standard CSS3 selector is not an option for this case. Just in case there is something I'm missing, is there a CSS3 solution for this? I know there is a CSS4 solution, but I need full browser support, so until all the browsers I am testing support CSS4, I'll have to rely on CSS3.
Thanks in advance.
EDIT: Until there is better cross-browser support for CSS4, I need to use CSS3
The only way I can think to do this is to find a List<WebElement> generated by By.cssSelector(".foo~span.icon.clickable") and on each element do a findElement(By.cssSelector(".baz")) surrounded by try/catch (catching NoSuchElementExceptions).
When there isn't an error thrown, then you know that you have found your element.
Note: The ~ selects it if it has ANY proceeding .foo siblings. If you want it to be the immediately previous sibling, use +
There is no CSS way to select this, because it would require a selector that looks at previous elements, which is not available in CSS3.
However, this would be quite simple in jQuery. I made a selector for your situation
$(".foo").has(".baz").siblings(".mybutton").find("span.icon.clickable").addClass('red');
$(".foo") - selects elements with a class of foo
.has(".baz") - only returns elements that has a sibling with a class of baz. In this case, .foo with a .baz element
.siblings(".mybutton") - looks for an element with the class of mybutton on the same level as the previous element. Since we used has() instead of .foo .baz, this will still target the .foo element
.find("span.icon.clickable") - looks for a descending span from the previous element with classes of icon and clickable
.addClass('red'); - just a function to finish the example
Fiddle
There is a sibling selector in css called ~ and there are some css4 selectors which might help.
Perhaps this would work, if I understood the requirement correctly:
#mybutton > .foo ~ span.icon.clickable! > .baz
This should work in chrome, but its css4 so, it probably wont work in a lot of browsers.
Hint: It selects the span.icon.clickable which has a child .baz
First off, I'm brand new to PHP so I'm sorry if this is a stupid question, second of all sorry if this title is incorrect.
Now, what I'm trying to do is create an overlay for a game that I play. My code for the overlay works perfectly, and now I'm working on my HTML file which gets its information from a website and outputs it. The code on the website looks like this:
<span id="example1">Information I want</span>
<span id="example2">More Info I want</span>
...
<span id="example3">And some more</span>
Now what I want to do is create a PHP script which goes in and finds elements by their names and gives me the information in those span tags. Here's what I've tried so far, it's not working however (no surprise):
//Some HTML here
<?php
$doc = new DomDocument;
$doc->validateOnParse = true;
$doc->Load('www.website.com');
echo "Example1: " . $doc->getElementById('example1') . "\n";
?>
//More HTML
To be honest, I have no clue what I'm doing. If anyone could show me an example of how to do this properly, or to point me in the right direction I would appreciate it.
The text between open and close tags is a Text Node.
Just write $doc->getElementById('example1')->nodeValue
Your code seems along the right lines, but you're missing a few things.
First of all, your load call is literally looking for a file named "www.website.com". If it's a remote file, you must include the http:// prefix.
Then, you are attempting to echo out the node itself, whereas you want its value (ie. its contents).
Try $doc->getElementById("example1")->nodeValue instead.
That should do it. You may want to add libxml_use_internal_errors(true); so that any errors in the source file won't destroy your page with PHP errors. Also, I would suggest using loadHTMLFile instead of load, as this will be more lenient towards malformed documents.
you can use getElementById:
$a = $doc->getElementById("example1");
var_dump($a); so you will see what you want to echo or put, or something.
You can also make all the names i HTML as example[] end then foreach the example array, so you can get element by id from example array with just one row of code
I'm trying to retrieve the game mode of a server.
This is the code:
<p>
<strong>Grand Bazaar</strong>
<span class="bullet">•</span>
Rush •
<img src="src.png">
</p>
I'm trying to find Rush. I tried this script:
foreach($html->find('p .bullet') as $e)
{
$mode = $e->nextSibling ();
}
But the script just skips "Rush" and continues over to the next tag.
I'm sure you guys know what you're doing better than me.
Could anyone help me out here?
You need to make your questions clearer mate... "I'm trying to retrieve the game mode of a server" <- This is irrelevant in relation with your problem for example.
The problem you're having is that "Rush" is nothing but text, it's not a sibling of .bullet as that would imply Rush being the content of a tag that's a sibling to .bullet, say
<span class="bullet">•</span>
<span>Rush •</span>
<img src="src.png">
If the structure you presented is identical all the time though, and you're using Simple HTML DOM by the looks of the code (http://simplehtmldom.sourceforge.net/), then you could maybe clear the contents of the tag first:
$strong = $html->find('strong'); // I think you can use prevSibling in your example
$strong->innerText = null;
And then just strip_tags() on the whole paragraph and get the text?