Crawler + Guzzle: Accessing to form - php

I am using the php guzzle Client to grab the website, and then process it with the symfony 2.1 crawler
I am trying to access a form....for example this test form here
http://de.selfhtml.org/javascript/objekte/anzeige/forms_method.htm
$url = 'http://de.selfhtml.org/javascript/objekte/anzeige/forms_method.htm';
$client = new Client($url);
$request = $client->get();
$request->getCurlOptions()->set(CURLOPT_SSL_VERIFYHOST, false);
$request->getCurlOptions()->set(CURLOPT_SSL_VERIFYPEER, false);
$response = $request->send();
$body = $response->getBody(true);
$crawler = new Crawler($body);
$filter = $crawler->selectButton('submit')->form();
var_dump($filter);die();
But i get the exception:
The current node list is empty.
So i am kind of lost, on how to access the form

Try using Goutte, It is a screen scraping and web crawling library build on top of the tools that you are already using (Guzzle, Symfony2 Crawler). See the GitHub repo for more info.
Your code would look like this using Goutte
<?php
use Goutte\Client;
$url = 'http://de.selfhtml.org/javascript/objekte/anzeige/forms_method.htm';
$client = new Client();
$crawler = $client->request('GET', $url);
$form = $crawler->selectButton('submit')->form();
$crawler = $client->submit($form, array(
'username' => 'myuser', // assuming you are submitting a login form
'password' => 'P#S5'
));
var_dump($crawler->count());
echo $crawler->html();
echo $crawler->text();
If you really need to setup the CURL options you can do it this way:
<?php
$url = 'http://de.selfhtml.org/javascript/objekte/anzeige/forms_method.htm';
$client = new Client();
$guzzle = $client->getClient();
$guzzle->setConfig(
array(
'curl.CURLOPT_SSL_VERIFYHOST' => false,
'curl.CURLOPT_SSL_VERIFYPEER' => false,
));
$client->setClient($guzzle);
// ...
UPDATE:
When using the DomCrawler I often times get that same error. Most of the time is because I'm not selecting the correct element in the page, or because it doesn't exist. Try instead of using:
$crawler->selectButton('submit')->form();
do the following:
$form = $crawler->filter('#signin_button')->form();
Where you are using the filter method to get the element by id if it has one '#signin_button' or you could also get it by class '.signin_button'.
The filter method requires The CssSelector Component.
Also debug your form by printing out the HTML (echo $crawler->html();) and ensuring that you are actually on the right page.

Related

Get html content from auth protected route inside laravel app?

I want to get with (curl) guzzle html content of a other page inside my laravel app.
The classic way would be:
$client = new Client();
$client = $client->request('GET', route('print.page'))->getBody();
The problem is, all this routes are auth protected and I get there only html from my login page.
I tried to send login trough guzzle again but I think this is not a good idea with double login.
Is there any better way to get html from this protected route?
In case you calling this inside a controller and you have a current authenticated user, you have to get the session name and the real session id:
public function FooController()
{
$name = Session::getName();
$sessionId = $_COOKIE[$name];
$cookieJar = CookieJar::fromArray([
$name => $sessionId,
], 'example.com');
$client = new Client();
$body = $client->request( // changed the variable from $client to $body here
'GET',
route('print.page'),
['cookies' => $cookieJar]
)->getBody();
}

PHP goutte screenshot

I have a PHP goutte script that submits a couple of forms from a web page, but the next thing I want to is to take a screenshot of the page the crawler is and save it on a folder.
<?php
require_once __DIR__.'/vendor/autoload.php';
use Goutte\Client;
use Symfony\Component\DomCrawler\Crawler;
$client = new Client();
$crawler = $client->request('GET', 'https://login.siat.sat.gob.mx/nidp/idff/sso?id=mat-ptsc-totp&sid=10&option=credential&sid=10');
$form = $crawler->selectButton('Enviar')->form();
$crawler = $client->submit($form, array('Ecom_User_ID' => 'xxx', 'Ecom_Password' => 'xxx'));
$crawler = $client->request('GET', 'https://www.siat.sat.gob.mx/PTSC/');
echo $crawler->html();
Any ideas?

How to add form data on Post requests for Buzz HTTP Client on Laravel?

I'm using Buzz HTTP Client for Laravel.
I have a problem adding form data to my POST requests, since it wasn't specified in it's wiki/documentation.
Listed below are the two ways of sending requests.
Example 1:
$response = Buzz::post('http://api.website.com/login');
//how do I add a "username", and "password" field in my POST request?
echo $response;
echo $response->getContent;
Example 2:
$request = new Buzz\Message\Request('POST', '/', 'http://google.com');
$response = new Buzz\Message\Response();
//how do I add a "username", and "password" field in my POST request?
$client = new Buzz\Client\FileGetContents();
$client->send($request, $response);
echo $request;
echo $response;
The answer here is going to really depend on what the API expects. Lets assume, the API expects the password and username sent as JSON in the content of the request. The example http request would look something like:
POST /login HTTP/1.1
Content-Type: application/json
{
"username": "bugsBunny",
"password": "wh4tsUpD0c"
}
To do this with Buzz, this should work:
$jsonPayload = json_encode([
‘username’ => ‘bugsBunny’,
‘password’ => ‘wh4tsUpD0c
]);
$headers = ['Content-Type', 'application/json'];
$response = Buzz::post('http://api.website.com/login', $headers, $jsonPayload);
If you're attempting to submit a form on a given website, you shouldn't use the above method. Instead use Buzz's built in form method which will attach the correct headers.
use Buzz\Message\Form;
$request = new Form(Form::METHOD_POST, ‘login’, ‘api.website.com’);
$request->setFields([
‘username’ => ‘bugsBunny’,
‘password’ => ‘wh4tsUpD0c’
]);
$response = new Buzz\Message\Response();
$client = new Buzz\Client\Curl();
$client->send($request, $response);
On a side note, I'd suggest not using this library. The library is, as you stated, Laravel integration for Buzz. The issue here is, the author should have made buzz a dependency listed in composer, rather than include the Buzz source directly. This prevents updates to Buzz from making their way into this project. You can see on the actual Buzz repo, the last commit was 29 days ago. Also if another package is using Buzz and including it correctly by composer, composer would install both packages. But when an instance of Buzz was created, you couldn't be certain which version was being loaded. You should just use Buzz, which can be found on packagist.
// assuming $headers and $jsonPayload are the same as in previous example.
$browser = new Buzz\Browser();
$response = $browser->post('http://api.website.com/login', $headers, $jsonPayload);
It was foolish of me to not read the code first before asking.
The form data is actually pased on the third parameter for the function. Though it accepts strings only so don't forget to json encode your data.
Buzz Class
public function post($url, $headers = array(), $content = '')
{
....
....
}
Buzz::post($url, array(), json_encode(array('Username'=>'usernamexx','Password'=>'p#$$w0rD')) );

Is it possible to parse JSON with Goutte?

I'm working on crawling web sites and there is no problem for parsing HTML with Goutte so far. But I need to retrieve JSON from a web site and because of the cookie management, I don't want to do this with file_get_contents() - that doesn't work.
I can do with pure cURL but in this case I just want to use Goutte and don't want to use any other library.
So is there any method that I can parse only text via Goutte or do I really have to do this with good old methods?
/* Sample Code */
$client = new Client();
$crawler = $client->request('foo');
$crawler = $crawler->filter('bar'); // of course not working
Thank you.
After very deep search inside Goutte libraries I found a way and I wanted to share. Because Goutte is really powerful library but there are so complicated documentation.
Parsing JSON via (Goutte > Guzzle)
Just get needed output page and store json into an array.
$client = new Client(); // Goutte Client
$request = $client->getClient()->createRequest('GET', 'http://***.json');
/* getClient() for taking Guzzle Client */
$response = $request->send(); // Send created request to server
$data = $response->json(); // Returns PHP Array
Parsing JSON with Cookies via (Goutte + Guzzle) - For authentication
Send request one of the page of the site (main page looks better) to get cookies and then use these cookies for authentication.
$client = new Client(); // Goutte Client
$crawler = $client->request("GET", "http://foo.bar");
/* Send request directly and get whole data. It includes cookies from server and
it automatically stored in Goutte Client object */
$request = $client->getClient()->createRequest('GET', 'http://foo.bar/baz.json');
/* getClient() for taking Guzzle Client */
$cookies = $client->getRequest()->getCookies();
foreach ($cookies as $key => $value) {
$request->addCookie($key, $value);
}
/* Get cookies from Goutte Client and add to cookies in Guzzle request */
$response = $request->send(); // Send created request to server
$data = $response->json(); // Returns PHP Array
I hope it helps. Because I almost spend 3 days to understand Gouttle and it's components.
I figured this out after several hours of search , simply do this :
$client = new Client(); // Goutte Client
$crawler = $client->request("GET", "http://foo.bar");
$jsonData = $crawler->text();
mithataydogmus' solution didn't work for me. I created a new class "BetterClient":
use Goutte\Client as GoutteClient;
class BetterClient extends GoutteClient
{
private $guzzleResponse;
public function getGuzzleResponse() {
return $this->guzzleResponse;
}
protected function createResponse($response)
{
$this->guzzleResponse = $response;
return parent::createResponse($response);
}
}
Usage:
$client = new BetterClient();
$request = $client->request('GET', $url);
$data = $client->getGuzzleResponse()->json();
I also could get JSON with:
$client->getResponse()->getContent()->getContents()

Symfony2 - How to perform an external Request

Using Symfony2, I need to access an external API based on HTTPS.
How can I call an external URI and manage the response to "play" with it. For example, to render a success or a failure message?
I am thinking in something like (note that performRequest is a completely invented method):
$response = $this -> performRequest("www.someapi.com?param1=A&param2=B");
if ($response -> getError() == 0){
// Do something good
}else{
// Do something too bad
}
I have been reading about Buzz and other clients. But I guess that Symfony2 should be able to do it by its own.
I'd suggest using CURL:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'www.someapi.com?param1=A&param2=B');
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-type: application/json')); // Assuming you're requesting JSON
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$response = curl_exec($ch);
// If using JSON...
$data = json_decode($response);
Note: The php on your web server must have the php5-curl library installed.
Assuming the API request is returning JSON data, this page may be useful.
This doesn't use any code that is specific to Symfony2. There may well be a bundle that can simplify this process for you, but if there is I don't know about it.
Symfony doesn't have a built-in service for this, but this is a perfect opportunity to create your own, using the dependency injection framework. What you can do here is write a service to manage the external call. Let's call the service "http".
First, write a class with a performRequest() method:
namespace MyBundle\Service;
class Http
{
public function performRequest($siteUrl)
{
// Code to make the external request goes here
// ...probably using cUrl
}
}
Register it as a service in app/config/config.yml:
services:
http:
class: MyBundle\Service\Http
Now your controller has access to a service called "http". Symfony manages a single instance of this class in the "container", and you can access it via $this->get("http"):
class MyController
{
$response = $this->get("http")->performRequest("www.something.com");
...
}
Best client that I know is: http://docs.guzzlephp.org/en/latest/
There is already bundle that integrates it into Symfony2 project:
https://github.com/8p/GuzzleBundle
$client = $this->get('guzzle.client');
// send an asynchronous request.
$request = $client->createRequest('GET', 'http://httpbin.org', ['future' => true]);
// callback
$client->send($request)->then(function ($response) {
echo 'I completed! ' . $response;
});
// optional parameters
$response = $client->get('http://httpbin.org/get', [
'headers' => ['X-Foo-Header' => 'value'],
'query' => ['foo' => 'bar']
]);
$code = $response->getStatusCode();
$body = $response->getBody();
// json response
$response = $client->get('http://httpbin.org/get');
$json = $response->json();
// extra methods
$response = $client->delete('http://httpbin.org/delete');
$response = $client->head('http://httpbin.org/get');
$response = $client->options('http://httpbin.org/get');
$response = $client->patch('http://httpbin.org/patch');
$response = $client->post('http://httpbin.org/post');
$response = $client->put('http://httpbin.org/put');
More info can be found on: http://docs.guzzlephp.org/en/latest/index.html
https://github.com/sensio/SensioBuzzBundle seems to be what you are looking for.
It implements the Kris Wallsmith buzz library to perform HTTP requests.
I'll let you read the doc on the github page, usage is pretty basic:
$buzz = $this->container->get('buzz');
$response = $buzz->get('http://google.com');
echo $response->getContent();
Symfony does not have its own rest client, but as you already mentioned there are a couple of bundles. This one is my prefered one:
https://github.com/CircleOfNice/CiRestClientBundle
$restClient = $this->container->get('ci.restclient');
$restClient->get('http://www.someUrl.com');
$restClient->post('http://www.someUrl.com', 'somePayload');
$restClient->put('http://www.someUrl.com', 'somePayload');
$restClient->delete('http://www.someUrl.com');
$restClient->patch('http://www.someUrl.com', 'somePayload');
$restClient->head('http://www.someUrl.com');
$restClient->options('http://www.someUrl.com', 'somePayload');
$restClient->trace('http://www.someUrl.com');
$restClient->connect('http://www.someUrl.com');
You send the request via
$response = $restclient->get($url);
and get a Symfony response object.
Then you can get the status code via
$httpCode = $response-> getStatusCode();
Your code would look like:
$restClient = $this->container->get('ci.restclient');
if ($restClient->get('http://www.yourUrl.com')->getStatusCode !== 200) {
// no error
} else {
// error
}
Use the HttpClient class to create the low-level HTTP client that makes requests, like the following GET request:
use Symfony\Component\HttpClient\HttpClient;
$client = HttpClient::create();
$response = $client->request('GET', 'https://api.github.com/repos/symfony/symfony-docs');
$statusCode = $response->getStatusCode();
// $statusCode = 200
$contentType = $response->getHeaders()['content-type'][0];
// $contentType = 'application/json'
$content = $response->getContent();
// $content = '{"id":521583, "name":"symfony-docs", ...}'
$content = $response->toArray();
// $content = ['id' => 521583, 'name' => 'symfony-docs', ...]
This is compatible with Symfony 5. Symfony Manual on this topic: The HttpClient Component

Categories