xpath issue using behat for angularjs application - php

I've been trying to make a web crawler for the company i work for using behat combined with the mink extension. I did something similar in the past as well but the difference is that now the page i am trying to crawl through is build in angularjs.
This seems to cause an issue in my xpath selector which can not locate the elements i am requesting in the DOM.
To ensure i have the right xpath and double check it i also use the Xpath Helper from chrome (extension).
My function that i try to use is :
/**
* #Then I login
*/
public function login()
{
$page = $this->getSession()->getPage();
$username = $page->find('xpath', "//*[#id='inputEmail']/self::INPUT");
$password = $page->find('xpath', "//*[#id='inputPassword']/self::INPUT");
if ($username == null && $password == null) {
echo "nothing found";
}
else{
$username->setValue('username');
$password->setValue('password');
}
}
In my .feature file i am able to access the login page and check that i am there by "seeing" some text in the dom to verify it is working, but when i try to fill the values in the fields above i am getting an error.
I guess it is an angular issue by the way it creates the DOM elements which causes my xpath unable to locate the fields i need but i can't think of a work around for this one.
Of course to use a JS framework like Protractor can be a solution but i would like to stay in php as it is easier maintainable by the team i am working with.
Any ideas would be more than welcome.

Few ideas:
Use page objects to clean your code
Make sure the element is present before using it by waiting for it (find() can return null or element)
If you don't need UI use guzzle or use some headless driver
Try to create a wait condition specific for the angular if you are using the UI

Related

PhP reading script erros from console

i am trying to test if my script is loaded on a given website and if the script is actually working without any errors onload (later on i will have to do the same for onclick)
So far i have
$testResult = array();
$homepage = 'http://www.example.dk/';
$data = file_get_contents($homepage);
if (strpos($data,'example_script.js'))
{
$testResult['scriptLoaded'] = true;
print_r("win");
}else{
$testResult['scriptLoaded'] = false;
}
Now this loads the page and checks if the javascript is on the page. But how can i read from the console to check if there is any errors while loading the script?
Also is this the right way to check if the script is on the page? The only restriction i have is that i HAVE to use PHP.
The only thing you can check with your code is weather or not somewhere in the code/contents you've gotten from the given url, there is a string example_script.js. If you were to use the url to this page, you'd get true and "win", too, because the substring will be found.
The JS might be riddled with fauklts, but since PHP doesn't understand Js, you won't be able to see that.
If you want to test your site, without a browser, the only thing I can think of is using phantomjs:
Which can be found Here
Using PHP alone, you might be able to do a couple of checks using scriptable browser, cUrl, and the DOMDocument class (to parse and validate the markup).

Pull data from web managment interface

Im looking to pull a status field from a web management interface of a UPS so that the data could be utilised in another web application I am writing. I was wondering if anyone would know a way to go about this as I cant seem to find the information I'm looking for through my web searches. Id also need it the value to refresh or re check. Example of UPS web interface below looking at the online field first off:
This is a very basic example, that I haven't tested (php isnt installed).
You need to look at the source of your control panel, and find out how to identify the elements that contain your desired information.
The code below (hopefully) searches for an element with an id server-status if that element exists it then checks its class attribute to determine the state of the server.
You do not have to use the dom stuff, you could also do it with regex or whatever. So long as you can accurately find the information you need.
You may also need to use cURL or something a little more advanced than file_get_contents() as you will likely require login credentials to view the page in question.
<?php
$html = file_get_contents("http://path.to/your/control.panel");
// you may need to use cURL or something more advanced if you need to provide login credentials
$dom = new DOMDocument;
$dom->loadHTML($html);
$test = $dom->getElementById('server-status');
if ($test == NULL) {
// unable to find element, somethings up!
} else {
if ($test->getAttribute('class') == "online") {
// status element has "online" class, server is online
} else {
// status element does not have "online" class, somethings up!
}
}
?>
Update
Had a quick look at a demo of that management software and it wont be quite as simple as my example as there don't appear to be any helpful element id or class names. Its still do-able though.

PHP Xpath Scrape Possible Namespace Issue

UPDATE: The source code is very much different from what Developer Tools shows.
Check out the source: view-source:http://www.machinerytrader.com/list/list.aspx?ETID=1&catid=1002
Is that javascript that needs to be rendered by a browser into html? If so, how can I have php do that process so that I have Html to parse? It's weird that you can use Xpath Checker to return the items I'm looking for (see below), but you cannot access the full html!
(Xpath: //table[contains(#id, 'ctl00_ContentPlaceHolder1') and (contains(#id,"tblContent") or contains(#id,"tblListingHeader"))])
END UPDATE
I need to scrape some information off of this site for work on a regular basis. I am attempting to write some PHP code to scrape this data. I think I have some namespace issues here, having read a number of other posts on SO. I have never encountered namespace problems before and used the approach shown on another SO post (to no avail :().
It appears the xpath query is just not happening for whatever reason. If you have any guesses or solutions as to how to handle this issue, I am open for suggestions.
Also here is the output from my code:
object(DOMXPath)#2 (0) {
}
Debug 1
array(0) {
}
array(0) {
}
I left out the bottom of the code where I var_dump testarray and create and var_dump otherarray. Their output is included above. Obviously the two arrays will be empty if the DOMXPath element has length 0 as well.
$string = 'http://www.machinerytrader.com/list/list.aspx?ETID=1&catid=1002';
$machine_trader = file_get_contents($string);
$xml = new DOMDocument();
$xml->loadHTML($machine_trader);
$xpath = new DOMXPath($xml);
$rootNamespace = $xml->lookupNamespaceUri($xml->namespaceURI);
$xpath->registerNamespace('x', $rootNamespace);
$tableRows = $xpath->query("//x:table[contains(#id, 'ctl00_ContentPlaceHolder1') and (contains(#id,'tblContent') or contains(#id,'tblListingHeader'))]");
var_dump($xpath);
$testarray = array();
$otherarray = array();
foreach ( $tableRows as $row )
{
echo "Debug 1"."\n";
$testarray[] = $row->nodeValue;
}
This is not an XPath issue insofar that the actual content is found from a form post, which you didn't reach yet. JS Source code here does nothing more than authenticate a proper 'user' for the information request, and then send the request via form submission.
At each request, the salt / encryption 'key' is randomized and changes, preventing simple scrapes.
You could rewrite that JavaScript to PHP and then issue two requests, battling the authentication process along the way.
Or, rather than diddle with reverse-engineering this, you could switch your scraping to NodeJS and use something like PhantomJS since it can evaluate javascript but give you programmatic access. Given the complexity of this task, it'd be much simpler to use the right tool.

Scraping Library for PHP - phpQuery?

I'm looking for a PHP library that allows me to scrap webpages and takes care about all the cookies and prefilling the forms with the default values, that's what annoys me the most.
I'm tired of having to match every single input element with xpath and I would love if something better existed. I've come across phpQuery but the manual isn't much clear and I can't find out how to make POST requests.
Can someone help me? Thanks.
#Jonathan Fingland:
In the example provided by the manual for browserGet() we have:
require_once('phpQuery/phpQuery.php');
phpQuery::browserGet('http://google.com/', 'success1');
function success1($browser)
{
$browser->WebBrowser('success2')
->find('input[name=q]')->val('search phrase')
->parents('form')
->submit();
}
function success2($browser)
{
echo $browser;
}
I suppose all the other fields are scrapped and send back in the GET request, I want to do the same with the phpQuery::browserPost() method but I don't know how to do it. The form I'm trying to scrape has a input token and I would love if phpQuery could be smart enough to scrape the token and just let me change the other fields (in this case username and password), submiting via POST everything.
PS: Rest assured, this is not going to be used for spamming.
See http://code.google.com/p/phpquery/wiki/Ajax and in particular:
phpQuery::post($url, $data, $callback, $type)
and
# data Object, String which defines the data parameter as being either an Object or a String. POST requests should be possible using query string format, e.g.:
$data = "username=Jon&password=123456";
$url = "http://www.mysite.com/login.php";
phpQuery::post($url, $data, $callback, $type)
as phpQuery is a jQuery port the method signature is the same (the docs link directly to the jquery site -- http://docs.jquery.com/Ajax/jQuery.post)
Edit
Two things:
There is also a phpQuery::browserPost function which might meet your needs better.
However, also note that the success2 callback is only called on the submit() or click() methods so you can fill in all of the form fields prior to that.
e.g.
require_once('phpQuery/phpQuery.php');
phpQuery::browserGet('http://www.mysite.com/login.php', 'success1');
function success1($browser) {
$handle = $browser
->WebBrowser('success2');
$handle
->find('input[name=username]')
->val('Jon');
$handle
->find('input[name=password]')
->val('123456');
->parents('form')
->submit();
}
function success2($browser) {
print $browser;
}
(Note that this has not been tested, but should work)
I've used SimpleTest's ScriptableBrowser for such stuff in the past. It's part of the SimpleTest testing framework, but you can use it stand-alone.
I would use a dedicated library for parsing HTML files and a dedicated library for processing HTTP requests. Using the same library for both seems like a bad idea, IMO.
For processing HTTP requests, check out eg. Httpful, Unirest, Requests or Guzzle. Guzzle is especially popular these days, but in the end, whichever library works best for you is still a matter of personal taste.
For parsing HTML files I would recommend a library that I wrote myself : DOM-Query. It allows you to (1) load an HTML file and then (2) select or change parts of your HTML pretty much the same way you'd do it if you'd be using jQuery in a frontend app.

How do I implement Direct Identity based OpenID authentication with Zend OpenID

I'm using the Zend framework and the openid selector from http://code.google.com/p/openid-selector/ - however I find I can't login using sites like Google and Yahoo as they use direct identity based login system whereby one is just redirected to a url as opposed to entering a unique url of their own for authentication.
I've checked out many options and hacks but none of them seem to work. How can i get this to work here btw - how is it implemented at stack overflow? I could really use all the help here guys..
Edit
Well the issue here is that from what I have noticed is that the Zend OpenID class doesn't support OpenID 2.0 the thing is that a typical open ID providor gives you a unique url such as your-name.openid-providor.com or openid-providor.com/your-name and the Zend OpenId class just parses through that url and then redirects you to the providor website where upon authentication you are redirected back.
In the case of Yahoo and google - you don't enter a unique url instead you are redirected to the providors login site and upon login and authentication you are redirected back - so basically whats happeining is that the zend_openID object when it parses to tell who the providor is it fails to tell from the general url itself. Like when you click on teh Google link it redirects you to https://www.google.com/accounts/o8/id
Its more an issue with the zend openid object here and there isn't any help on zend related forums - so I was wondering if someone had already hacked or had an alteration I could make to the class to accomplish this. Sorry if I'm missing something but I'm kinda new to this and programming with open ID and have just started to get my feet wet.
Thanks for the follow up - I did check into RPX a while back and they do have a php class but I wasnt able to check it out plus I really just want to for now get the code selector used as on stackoverflow to work with Yahoo and Google authentication. There has to be some kind of way to tweak the parsing which the Zend OpenID class uses as it runs a series of regular expression checks to make a discovery.
Little late to the game but I was able to get this working with some hacks I found around the interwebs.
First. Yahoo. To get Yahoo working all I had to do was change the JavaScript to use me.yahoo.com instead of just yahoo.com and it worked perfectly with the version of the Zend Framework I'm using. Unfortunately Google still wasn't, so some hacking was in order.
All of these changes go in Zend/OpenId/Consumer.php
First, in the _discovery method add the following on the series of preg_match checks that starts at around line 740.
} else if (preg_match('/<URI>([^<]+)<\/URI>/i', $response, $r)) {
$version = 2.0;
$server = $r[1];
I added this right before the return false; statement that's in the else {} block.
Second, in the _checkId method you'll need to add 3 new blocks (I haven't dug around enough to know what causes each of these three cases to be called, so I covered all to be on the safe side.
Inside the $version <= 2.0 block, you'll find an if/else if/else block. In the first if statement ($this->_session !== null) add this to the end:
if ($server == 'https://www.google.com/accounts/o8/ud') {
$this->_session->identity = 'http://specs.openid.net/auth/2.0/identifier_select';
$this->_session->claimed_id = 'http://specs.openid.net/auth/2.0/identifier_select';
}
In the else if (defined('SID') block add this to the end:
if ($server == 'https://www.google.com/accounts/o8/ud') {
$_SESSION['zend_openid']['identity'] = 'http://specs.openid.net/auth/2.0/identifier_select';
$_SESSION['zend_openid']['claimed_id'] = 'http://specs.openid.net/auth/2.0/identifier_select';
}
And then after the else block (so outside the if/else if/else block all together, but still inside the $version <= 2.0 block) add this:
if ($server == 'https://www.google.com/accounts/o8/ud') {
$params['openid.identity'] = 'http://specs.openid.net/auth/2.0/identifier_select';
$params['openid.claimed_id'] = 'http://specs.openid.net/auth/2.0/identifier_select';
}
Link to the bug in Zend Framework Issue Tracker
I need to use Google's OpenID stuff, and I tried Steven's code and couldn't get it to work as-is. I've made some modifications.
The _discovery change method is still the same:
Zend/OpenId/Consumer.php, line 765, add:
} else if (preg_match('/<URI>([^<]+)<\/URI>/i', $response, $r)) {
$version = 2.0;
$server = $r[1];
The rest is different, though:
Zend/OpenId/Consumer.php, line 859 (after making the above change), add:
if (stristr($server, 'https://www.google.com/') !== false) {
$id = 'http://specs.openid.net/auth/2.0/identifier_select';
$claimedId = 'http://specs.openid.net/auth/2.0/identifier_select';
}
This is right before:
$params['openid.identity'] = $id;
$params['openid.claimed_id'] = $claimedId;
And to get it to return the ID, once authorized:
Zend/Auth/Adapter/OpenId.php, line 278:
if(isset($_REQUEST['openid_identity']))
{
$this->_id = $_REQUEST['openid_identity'];
$id = $this->_id;
}
This is right before:
return new Zend_Auth_Result(
Zend_Auth_Result::SUCCESS,
$id,
array("Authentication successful"));
Note that I have not thoroughly tested this code. The code below is even more shakey.
I have spent more time and I've gotten it to work with my Google Apps domain with the following changes, in addition to the above:
Zend/OpenId/Consumer.php, line 734
$discovery_url = $id;
if(strpos($discovery_url, '/', strpos($discovery_url, '//')+2) !== false) {
$discovery_url = substr($discovery_url, 0, strpos($discovery_url, '/', strpos($discovery_url, '//')+2));
}
$discovery_url .= '/.well-known/host-meta';
$response = $this->_httpRequest($discovery_url, 'GET', array(), $status);
if ($status === 200 && is_string($response)) {
if (preg_match('/Link: <([^><]+)>/i', $response, $r)) {
$id = $r[1];
}
}
This is right after:
/* TODO: OpenID 2.0 (7.3) XRI and Yadis discovery */
I believe that was the only change I had to make. I'm pretty sure there's supposed to be some checking involved with the above for security reasons, but I haven't looked far enough into it to see what they would be.
Going over all the advice provided - I've decided to ditch using the zend_openid class [ sorry about that zend ] and instead I've switched to using JanRains OpenID library. Its taken a few hours to get it up and running with my project but atleast its working like a breeze. Had to make a lot of hacking and a bit of code spill over to get it working but its worth it.
I couldn't use any of Zend adapters with Zend-Auth to settle this new code library in as the library did the authentication on its own. SO I hacked and made a generic adapter that just returned a filled zend_result set to the Auth object thus I authenticate using my library and merely store the result in the Auth object pulling a bit of a fast one one the Zend-Auth object rather than have to rewrite my code again.
The library is available at http://openidenabled.com/php-openid/
Thanks for all the help guys.
I'm dealing with similar issues. I'm planning on using RPX now with Zend Framework. Maybe I'll write an adapter. Just to let you know.
Info: 'RPS now' provides an all-in-one interface and UI for user registration with
facebook
Google
Yahoo
mySpaceID
Windows LiveID
OpenID
aol
I'm pretty sure that Yahoo only works with OpenID 2.0. If you want to support Yahoo users, you're going to have to upgrade to a library with 2.0 support. That's going to be a matter of more than tweaking some parsing.
Did you check out the manual -- Zend_OpenId_Consumer basics? Check out 38.2.2 on that page and let me know if this helps, because it should.
Specifically, I don't know if Google offers OpenID. I know that Yahoo worked because I've tried it a while back.
Thanks for the information. I started by using JanRain's library, but I have problems with getting Simple Registration to work: I have not succeeded in getting any data that way. And there is no documentation on using Attribute Exchange. :(
So, I found and was trying Zend/OpenId, but had the same problem as you: no Yahoo!, Google and who knows what else support. Reading this, it seems I'll have to get back to JanRain; RPX is not an option in my case as it's a third party service.

Categories