Scraping Library for PHP - phpQuery?

Scraping Library for PHP - phpQuery? - php

I'm looking for a PHP library that allows me to scrap webpages and takes care about all the cookies and prefilling the forms with the default values, that's what annoys me the most.
I'm tired of having to match every single input element with xpath and I would love if something better existed. I've come across phpQuery but the manual isn't much clear and I can't find out how to make POST requests.
Can someone help me? Thanks.
#Jonathan Fingland:
In the example provided by the manual for browserGet() we have:
require_once('phpQuery/phpQuery.php');
phpQuery::browserGet('http://google.com/', 'success1');
function success1($browser)
{
$browser->WebBrowser('success2')
->find('input[name=q]')->val('search phrase')
->parents('form')
->submit();
}
function success2($browser)
{
echo $browser;
}
I suppose all the other fields are scrapped and send back in the GET request, I want to do the same with the phpQuery::browserPost() method but I don't know how to do it. The form I'm trying to scrape has a input token and I would love if phpQuery could be smart enough to scrape the token and just let me change the other fields (in this case username and password), submiting via POST everything.
PS: Rest assured, this is not going to be used for spamming.

See http://code.google.com/p/phpquery/wiki/Ajax and in particular:
phpQuery::post($url, $data, $callback, $type)
and
# data Object, String which defines the data parameter as being either an Object or a String. POST requests should be possible using query string format, e.g.:
$data = "username=Jon&password=123456";
$url = "http://www.mysite.com/login.php";
phpQuery::post($url, $data, $callback, $type)
as phpQuery is a jQuery port the method signature is the same (the docs link directly to the jquery site -- http://docs.jquery.com/Ajax/jQuery.post)
Edit
Two things:
There is also a phpQuery::browserPost function which might meet your needs better.
However, also note that the success2 callback is only called on the submit() or click() methods so you can fill in all of the form fields prior to that.
e.g.
require_once('phpQuery/phpQuery.php');
phpQuery::browserGet('http://www.mysite.com/login.php', 'success1');
function success1($browser) {
$handle = $browser
->WebBrowser('success2');
$handle
->find('input[name=username]')
->val('Jon');
$handle
->find('input[name=password]')
->val('123456');
->parents('form')
->submit();
}
function success2($browser) {
print $browser;
}
(Note that this has not been tested, but should work)

I've used SimpleTest's ScriptableBrowser for such stuff in the past. It's part of the SimpleTest testing framework, but you can use it stand-alone.

I would use a dedicated library for parsing HTML files and a dedicated library for processing HTTP requests. Using the same library for both seems like a bad idea, IMO.
For processing HTTP requests, check out eg. Httpful, Unirest, Requests or Guzzle. Guzzle is especially popular these days, but in the end, whichever library works best for you is still a matter of personal taste.
For parsing HTML files I would recommend a library that I wrote myself : DOM-Query. It allows you to (1) load an HTML file and then (2) select or change parts of your HTML pretty much the same way you'd do it if you'd be using jQuery in a frontend app.

Related

Creating an API that handles a form post

I have recently been asked to create an API that can process data using PHP. I am not that accustomed to PHP so I am not quite sure how to proceed.
Basically what I would like to achieve is create an API that processes a form post that the user can call like this:
<form METHOD="POST" ACTION="https://MyURL/index.php" id=aForm name=aForm>
<input type="hidden" id="Lite_Merchant_ApplicationID" name="Lite_Merchant_ApplicationID" value="Your Application Id">
(various other fields to be processed)
</form>
I might be wrong in calling this an API, because it's supposed to handle a form post. But I need to compile documentation for users to be able to integrate with our system and post the form to our URL which will then process the info in the form.
Are there any good tutorials that I can have a look at? I am not sure if the ones I am looking at are applicable as they mention nothing about using a form to call the API? e.g.
https://docs.phalconphp.com/en/latest/reference/tutorial-rest.html
and
http://coreymaynard.com/blog/creating-a-restful-api-with-php/
Or do I just process the form as normal in PHP and accesss the values using:
$_POST["name"];
If that is the case will users be able to call the API using the language of their choice?
An additional question I have would be if there is anything I would need to look at or consider due to the fact that it will be "https"?
Thanks in advance and my apologies if this is not very specific, any advise/pointers will be appreciated.
Additional info:
The system needs to be able to perform redirects and login credentials will be sent within the hidden form inputs

Your question is a little wide ranging, and you may be using words in a way that isn't consistent my understanding.
An API typically is more than a single method, whereas handling a form POST event is just - well, a form handler. The difference is more than semantic - for an API, you probably need to consider versioning (how do you upgrade your API without breaking client applications), abstraction (how can you make your API easy to use), documentation, and security (how can you ensure that only authorised users/applications consume your API?). An API often has more than one user, and often needs to support the scalability requirements of the client applications.
REST is a great way to design an API - it's easy to understand for clients, and lots of smart people have solved problems like authentication/authorisation, versioning and abstraction.
It's important to note that REST uses existing HTTP concepts, so a RESTful API would expose POST requests to create new entities. That POST request can be called from a web page with a <form> element, or from a REST client.
If you write a RESTful API, clients can be written in any language that supports HTTP.
There are a bunch of frameworks which make building RESTful web APIs easier in PHP. I haven't used any, so can't make a recommendation.
If, however, all you have to do is handle a POST request, from a web page that won't change - well, I'd not build a RESTful API, I'd just write a PHP "POST" handler. In this case, the client can be anything that understands your POST parameters (in practice, pretty much any application that can make an HTTP request).
However, the difference between "POST Handler" and "API" in my view is that when you create an API, you make certain promises that your clients depend on. "I won't change the field names without telling you". "I won't change the location without telling you". "You can depend on what my documentation says". When you create a POST handler, you only promise the maker of the HTML form that it works, and that you will tell that team of any changes.
The only challenge with HTTPS is that you must make sure that the calling application can handle it, and that the keys work.

Just process the form as normal in PHP and accesss the values using:
$_POST["name"];
The API user just has to send a POST request, by html form, AJAX, or whatever.
You should add a field for the response format html, xml, json, then use that to format the response.

Check below links (restful services)... Its very simple and meets your requirement.
http://rest.elkstein.org/2008/02/what-is-rest.html
http://www.9lessons.info/2012/05/create-restful-services-api-in-php.html

Going along with Neville K's answer here is an example of how my company handles RESTful api calls.
First we have a php file that handles the calls with a switch statement. Routing the different actions to said functions and classes.
/* Class file that is called on this page */
include_once "$_SERVER[DOCUMENT_ROOT]/classes/class.myclass.php";
/**
* This function makes it simpler to stop it from working for debugging purposes.
* All we have to do is comment out the one line of code apiCall($_REQUEST);
* You could have this outside of the function and it would work just as well.
* #param type $REQUEST
*/
function apiCall($REQUEST) {
$con = new MyClass();
switch ($REQUEST['action']) {
case 'getList':
/* Setting the content type to json means that the developer can
* expect a response in the form of parseable json.
*/
header('Content-Type: application/json');
echo json_encode($con->getList($REQUEST));
case 'setValue':
header('Content-Type: application/json');
echo json_encode($con->setValue($REQUEST));
case 'login':
if ($con->login($REQUEST)) {
header('Location: /index.php');
} else {
header('Content-Type: /login.php?status=Failed+Login');
}
default:
header('Content-Type: application/json');
/* If an invalid action was sent in, then this error message will be sent
* back to the user
*/
echo json_encode(['status' => 'Invalid API Call']);
}
}
/* Using $_REQUEST allows developers to access the api via GET or POST */
apiCall($_REQUEST);
Then we handle all the logic in the different classes we called.
class MyClass {
public function getList($REQUEST) {
$id = $REQUEST['id'];
/* code */
return ['status' => 'ok', 'results' => $array];
}
public function setList($REQUEST) {
/* code */
return ['status' => 'ok'];
}
public function login($REQUEST) {
/* code */
$_SESSION['user_id'] = $user_id;
return $login_successful;
}
}
Using JSON is good for applications that send information via AJAX calls. Using the header('Location:') are good for form submissions without ajax.
You can then use JavaScript ajax calls or for submissions based on how you handle the submission of data.
Example of using jQuery.getJSON
$.getJSON('/switch.php', $.param({id: id, action: 'getList'}), function (json) {
if (json) {
/*code*/
}
});
Then you would pass a hidden input with action in it to the switch page for regular form submissions.
<form action="/switch.php" method="post">
<!--hidden input named action to direct which switch to use-->
<input name="action" value="login" type="hidden"/>
<input name="username"/>
<input name="password" type="password"/>
<input type="submit"/>
</form>
These examples are for html/JavaScript web applications. If you are using JAVA, Python, .NET, or some other language, it would be as simple as using the REST API and parsing out the JSON to figure out how to handle your application logic.
You can even run a php to php api call using file_get_contents or curl.
$data = [
'action' => 'setValue',
'information' => 'More'
];
$json = json_decode(file_get_contents('/switch.php?' . http_build_query($data)),true);
if(!empty($json)){
/*code*/
}
You could create a seperate page for each call and not have to worry about passing in an action to every request. But then your filetree starts to look like this.
/api/loginSubmit.php
/api/login.php
/api/getListFromId.php
/api/getList.php
/api/setValues.php
/api/getValues.php
It gets really tedious to traverse all these files to figure out where the problem is.

I created API Framework, its very light weight, simple, fast.
Github
Clickme
OR
Link : https://github.com/mackraja/mackApi

Hand over "data/params" on reroute(); in "fat free framework"

Im looking for an elegant way to hand over data/params when using $f3->reroute();
I have multiple routes configured in a routes.ini:
GET #sso: /sso/first [sync] = Controller\Ccp\Sso->first, 0
GET #map: /map [sync] = Controller\MapController->second, 3600
Now I reroute(); to #map route, from first();
class Sso {
public function first($f3){
$msg = 'My message!';
if( !empty($msg) ){
$f3->reroute('#map');
}
}
}
Is there any "elegant" way to pass data (e.g. $msg) right into $MapController->second(); ?
I don´t want to use $SESSION or the global $f->set('msg', $msg); for this.

This isn't an issue specific to fat-free-framework, but web in general. When you reroute, you tell the browser to redirect the user's browser page using a 303 header redirect code.
Take a minute to read the doc regarding re-routing: http://fatfreeframework.com/routing-engine#rerouting
There seems to be some contradicting information in your question, which leads me to question the purpose of what you are trying to achieve.
If you are rerouting, you can either use the session, cookies, or use part of the url to pass messages or references to a message.
If you do not need to redirect, but just want to call the function without changing the passed parameters, you could abstract the content of the function and call that function from both routes. You could also use the $f3 globals, which are a great way of passing data between functions in cases where you don't want to pass the data using the function call. is there a reason why you don't want to to use this? The data is global for the single session, so there is no security concern, and the data gets wiped at the end of the request, so there is very little extra footprint or effect on the server.

If you're alright with not using #map_name in re-routes you can do something like this:
$f3->reroute('path/?foo=bar');
Not the prettiest I'll admit. I wish $f3->reroute('#path_name?foo=bar') would work.

Sharing access restrictions between php and javascript

The actual questions
How to "map" access restrictions so it can be used from php and javasript?
What kind of method should I use to share access restrictions / rules between php and javascript?
Explanation
I have created a RESTful backend using php which will use context-aware access control to limit data access and modification. For example, person can modify address information that belongs to him and can view (but not modify) address information of all other persons who are in the same groups. And of course, group admin can modify address details of all the persons in that group.
Now, php side is quite "simple" as that is all just a bunch of checks. Javascript side is also quite "simple" as that as well is just a bunch of checks. The real issue here is how to make those checks come from the same place?
Javascript uses checks to show/hide edit/save buttons.
PHP uses checks to make the actual changes.
and yes,
I know this would be much more simpler situation if I ran javascript (NodeJS or the like) on server, but the backend has already been made and changing ways at this point would cause major setbacks.
Maybe someone has already deviced a method to model access checks in "passive" way, then just use some sort of "compiler" to run the actual checks?
Edit:
Im case it helps to mention, the front-end (js) part is built with AngularJS...
Edit2
This is some pseudo-code to clarify what I think I am searching for, but am not at all certain that this is possible in large scale. On the plus side, all access restrictions would be in single place and easy to amend if needed. On the darkside, I would have to write AccessCheck and canAct functions in both languages, or come up with a way to JIT compile some pseudo code to javascript and php :)
AccessRestrictions = {
Address: {
View: [
OWNER, MEMBER_OF_OWNER_PRIMARY_GROUP
],
Edit: [
OWNER, ADMIN_OF_OWNER_PRIMARY_GROUP
]
}
}
AccessCheck = {
OWNER: function(Owner) {
return Session.Person.Id == Owner.Id;
},
MEMBER_OF_OWNER_PRIMARY_GROUP: function(Owner) {
return Session.Person.inGroup(Owner.PrimaryGroup)
}
}
canAct('Owner', 'Address', 'View') {
var result;
AccessRestrictions.Address.View.map(function(role) {
return AccessCheck[role](Owner);
});
}

First things first.
You can't "run JavaScript on the server" because Javascript is always run on the client, at the same way PHP is always run on the server and never on the client.
Next, here's my idea.
Define a small library of functions you need to perform the checks. This can be as simple as a single function that returns a boolean or whatever format for your permissions. Make sure that the returned value is meaningful for both PHP and Javascript (this means, return JSON strings more often than not)
In your main PHP scripts, include the library when you need to check permissions and use the function(s) you defined to determine if the user is allowed.
Your front-end is the one that requires the most updates: when you need to determine user's permission, fire an AJAX request to your server (you may need to write a new script similar to #2 to handle AJAX requests if your current script isn't flexible enough) which will simply reuse your permissions library. Since the return values are in a format that's easily readable to JavaScript, when you get the response you'll be able to check what to show to the user

There are some solutions to this problem. I assume you store session variables, like the name of the authorized user in the PHP's session. Let's assume all you need to share is the $authenticated_user variable. I assume i'ts just a string, but it can also be an array with permissions etc.
If the $authenticated_user is known before loading the AngularJS app you may prepare a small PHP file whish mimics a JS file like this:
config.js.php:
<?php
session_start();
$authenticated_user = $_SESSION['authenticated_user'];
echo "var authenticated_user = '$authenticated_user';";
?>
If you include it in the header of your application it will tell you who is logged in on the server side. The client side will just see this JS code:
var authenticated_user = 'johndoe';
You may also load this file with ajax, or even better JSONP if you wrap it in a function:
<?php
session_start();
$authenticated_user = $_SESSION['authenticated_user'];
echo <<<EOD;
function set_authenticated_user() {
window.authenticated_user = '$authenticated_user';
}
EOD;
?>

Keyboard input in PHP

I am trying to control stuff with PHP from keyboard input. The way I am currently detecting keystrokes is with:
function read() {
$fp1=fopen("/dev/stdin", "r");
$input=fgets($fp1, 255);
fclose($fp1);
return $input;
}
print("What is your first name? ");
$first_name = read();
The problem is that it is not reading the keystrokes 'live'. I don't know if this is possible using this method, and I would imagine that this isn't the most effective way to do it either. My question is 1) if this is a good way to do it, then how can I get it to work so that as you type on the page, it will capture the keystrokes, and 2) if this is a bad way of doing it, how can I implement it better (maybe using ajax or something)?
edit: I am using PHP as a webpage, not command line.

I'm assuming that you're using PHP as a web-scripting language (not from the command line)...
From what I've seen, you'll want to use Javascript on the client side to read key inputs. Once the server delivers the page to the client, there's no PHP interaction. So using AJAX to read client key inputs and make calls back to the server is the way to go.
There's some more info on Javascript and detecting key presses here and some info on how to use AJAX here.
A neat option for jQuery is to use something like delayedObserver

If you are writing a CLI application (as opposed to a web application), you can use ncurses' getch() function to get a single key stroke. Good luck!
If you're not writing a CLI application, I would suggest following Andrew's answer.

Try readline:
http://us3.php.net/manual/en/function.readline.php

How can I implement jquery in my Zend Framework application in a custom manner?

How can I implement jquery in my Zend Framework application in a custom manner.
appending jquery.js ok
appending script ok
send POST data to controller ok
process POSTed data ok
send 'AjaxContext' respond to client now ok (thanks)
I'm using jquery for the first time, what am I doing wrong?

Early on, the best practice to get Zend to respond to ajax requests without the full layout was to check a variable made available via request headers. According to the documentation many client side libraries including jQuery, Prototype, Yahoo UI, MockiKit all send the the right header for this to work.
if($this->_request->isXmlHttpRequest())
{
//The request was made with via ajax
}
However, modern practice, and what you're likely looking for, is now to use one of two new helpers:
ContextSwitcher
AjaxContent
Which make the process considerably more elegant.
class CommentController extends Zend_Controller_Action
{
public function init()
{
$ajaxContext = $this->_helper->getHelper('AjaxContext');
$ajaxContext->addActionContext('view', 'html')
->initContext();
}
public function viewAction()
{
// Pull a single comment to view.
// When AjaxContext detected, uses the comment/view.ajax.phtml
// view script.
}
Please Note: This modern approach requires that you request a format in order for the context to be triggered. It's not made very obvious in the documentation and is somewhat confusing when you end up just getting strange results in the browser.
/url/path?format=html
Hopefully there's a workaround we can discover. Check out the full documentation for more details.

Make sure your using $(document).ready() for any jQuery events that touch the DOM. Also, check the javascript/parser error console. In Firefox it's located in Tools->Error Console. And if you don't already have it installed, I would highly recommend Firebug.

This should have been a comment, can't, yet...
It has nothing to do with ZF+Jquery combination.
First try a proto of what you need with a simple php file. No framework, just Jquery and straight forward, dirty php.
Oh, and don't forget to track what happens with FireBug.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.