I'm working on crawling web sites and there is no problem for parsing HTML with Goutte so far. But I need to retrieve JSON from a web site and because of the cookie management, I don't want to do this with file_get_contents() - that doesn't work.
I can do with pure cURL but in this case I just want to use Goutte and don't want to use any other library.
So is there any method that I can parse only text via Goutte or do I really have to do this with good old methods?
/* Sample Code */
$client = new Client();
$crawler = $client->request('foo');
$crawler = $crawler->filter('bar'); // of course not working
Thank you.
After very deep search inside Goutte libraries I found a way and I wanted to share. Because Goutte is really powerful library but there are so complicated documentation.
Parsing JSON via (Goutte > Guzzle)
Just get needed output page and store json into an array.
$client = new Client(); // Goutte Client
$request = $client->getClient()->createRequest('GET', 'http://***.json');
/* getClient() for taking Guzzle Client */
$response = $request->send(); // Send created request to server
$data = $response->json(); // Returns PHP Array
Parsing JSON with Cookies via (Goutte + Guzzle) - For authentication
Send request one of the page of the site (main page looks better) to get cookies and then use these cookies for authentication.
$client = new Client(); // Goutte Client
$crawler = $client->request("GET", "http://foo.bar");
/* Send request directly and get whole data. It includes cookies from server and
it automatically stored in Goutte Client object */
$request = $client->getClient()->createRequest('GET', 'http://foo.bar/baz.json');
/* getClient() for taking Guzzle Client */
$cookies = $client->getRequest()->getCookies();
foreach ($cookies as $key => $value) {
$request->addCookie($key, $value);
}
/* Get cookies from Goutte Client and add to cookies in Guzzle request */
$response = $request->send(); // Send created request to server
$data = $response->json(); // Returns PHP Array
I hope it helps. Because I almost spend 3 days to understand Gouttle and it's components.
I figured this out after several hours of search , simply do this :
$client = new Client(); // Goutte Client
$crawler = $client->request("GET", "http://foo.bar");
$jsonData = $crawler->text();
mithataydogmus' solution didn't work for me. I created a new class "BetterClient":
use Goutte\Client as GoutteClient;
class BetterClient extends GoutteClient
{
private $guzzleResponse;
public function getGuzzleResponse() {
return $this->guzzleResponse;
}
protected function createResponse($response)
{
$this->guzzleResponse = $response;
return parent::createResponse($response);
}
}
Usage:
$client = new BetterClient();
$request = $client->request('GET', $url);
$data = $client->getGuzzleResponse()->json();
I also could get JSON with:
$client->getResponse()->getContent()->getContents()
Related
I'm using Buzz HTTP Client for Laravel.
I have a problem adding form data to my POST requests, since it wasn't specified in it's wiki/documentation.
Listed below are the two ways of sending requests.
Example 1:
$response = Buzz::post('http://api.website.com/login');
//how do I add a "username", and "password" field in my POST request?
echo $response;
echo $response->getContent;
Example 2:
$request = new Buzz\Message\Request('POST', '/', 'http://google.com');
$response = new Buzz\Message\Response();
//how do I add a "username", and "password" field in my POST request?
$client = new Buzz\Client\FileGetContents();
$client->send($request, $response);
echo $request;
echo $response;
The answer here is going to really depend on what the API expects. Lets assume, the API expects the password and username sent as JSON in the content of the request. The example http request would look something like:
POST /login HTTP/1.1
Content-Type: application/json
{
"username": "bugsBunny",
"password": "wh4tsUpD0c"
}
To do this with Buzz, this should work:
$jsonPayload = json_encode([
‘username’ => ‘bugsBunny’,
‘password’ => ‘wh4tsUpD0c
]);
$headers = ['Content-Type', 'application/json'];
$response = Buzz::post('http://api.website.com/login', $headers, $jsonPayload);
If you're attempting to submit a form on a given website, you shouldn't use the above method. Instead use Buzz's built in form method which will attach the correct headers.
use Buzz\Message\Form;
$request = new Form(Form::METHOD_POST, ‘login’, ‘api.website.com’);
$request->setFields([
‘username’ => ‘bugsBunny’,
‘password’ => ‘wh4tsUpD0c’
]);
$response = new Buzz\Message\Response();
$client = new Buzz\Client\Curl();
$client->send($request, $response);
On a side note, I'd suggest not using this library. The library is, as you stated, Laravel integration for Buzz. The issue here is, the author should have made buzz a dependency listed in composer, rather than include the Buzz source directly. This prevents updates to Buzz from making their way into this project. You can see on the actual Buzz repo, the last commit was 29 days ago. Also if another package is using Buzz and including it correctly by composer, composer would install both packages. But when an instance of Buzz was created, you couldn't be certain which version was being loaded. You should just use Buzz, which can be found on packagist.
// assuming $headers and $jsonPayload are the same as in previous example.
$browser = new Buzz\Browser();
$response = $browser->post('http://api.website.com/login', $headers, $jsonPayload);
It was foolish of me to not read the code first before asking.
The form data is actually pased on the third parameter for the function. Though it accepts strings only so don't forget to json encode your data.
Buzz Class
public function post($url, $headers = array(), $content = '')
{
....
....
}
Buzz::post($url, array(), json_encode(array('Username'=>'usernamexx','Password'=>'p#$$w0rD')) );
I asked a similar question earlier, in a nutshell I have an API application that takes json requests and outputs an json response.
For instance here is one of the requests that I need to test out, how can I use this json object with my testing to emulate a 'real request'
{
"request" : {
"model" : {
"code" : "PR92DK1Z"
}
}
The response is straightforward (this bit has been done).
From other users on here this is the optimised method using Yii to do this, I am just unsure how to emulate the json request - e.g essentially send a JSON HTTP request, can anyone assist on how to do this?
public function actionMyRequest() {
// somehow add my json request...
$requestBody = Yii::app()->request->getRawBody();
$parsedRequest = CJSON::decode($requestBody);
$code = $parsedRequest["request"]["model"]["code"];
}
I don't understand if you want your app to send an http request and get the result or at the opposite receive a http request
I answered for the first assumption, I'll change my answer if you want the other
For me the best way to send an HTTP request is to use Guzzle http client.
This is not a yii extension, but you can use third party libraries with yii.
Here's an example from Guzzle page:
$client = new GuzzleHttp\Client();
$res = $client->get('https://api.github.com/user', [
'auth' => ['user', 'pass']
]);
echo $res->getStatusCode(); // 200
echo $res->getHeader('content-type'); // 'application/json; charset=utf8'
echo $res->getBody();
So in your case you could do something like:
public function actionMyRequest() {
$client = new GuzzleHttp\Client();
$res = $client->get('https://api.your-url.com/');
$requestBody = $res->getBody();
$parsedRequest = CJSON::decode($requestBody);
$code = $parsedRequest["request"]["model"]["code"];
}
I'm trying to access to the Guzzle Response object from Goutte. Because that object has nice methods that i want to use. getEffectiveUrl for example.
As far as i can see there is no way doing it without hacking the code.
Or without accessing the response object, is there a way to get the last redirected url froum goutte?
A little late, but:
If you are only interested in getting the URL you were last redirected to, you could simply do
$client = new Goutte\Client();
$crawler = $client->request('GET', 'http://www.example.com');
$url = $client->getHistory()->current()->getUri();
EDIT:
But, extending Goutte to serve your needs is fairly easy. All you need is to override the createResponse() method and store the GuzzleResponse
namespace Your\Name\Space;
class Client extends \Goutte\Client
{
protected $guzzleResponse;
protected function createResponse(\Guzzle\Http\Message\Response $response)
{
$this->guzzleResponse = $response;
return parent::createResponse($response);
}
/**
* #return \Guzzle\Http\Message\Response
*/
public function getGuzzleResponse()
{
return $this->guzzleResponse;
}
}
Then you can access the response object as desired
$client = new Your\Name\Space\Client();
$crawler = $client->request('GET', 'http://localhost/redirect');
$response = $client->getGuzzleResponse();
echo $response->getEffectiveUrl();
I'm building a client app based on Guzzle. I'm getting stucked with cookie handling. I'm trying to implement it using Cookie plugin but I cannot get it to work. My client application is standard web application and it looks like it's working as long as I'm using the same guzzle object, but across requests it doesn't send the right cookies. I'm using FileCookieJar for storing cookies. How can I keep cookies across multiple guzzle objects?
// first request with login works fine
$cookiePlugin = new CookiePlugin(new FileCookieJar('/tmp/cookie-file'));
$client->addSubscriber($cookiePlugin);
$client->post('/login');
$client->get('/test/123.php?a=b');
// second request where I expect it working, but it's not...
$cookiePlugin = new CookiePlugin(new FileCookieJar('/tmp/cookie-file'));
$client->addSubscriber($cookiePlugin);
$client->get('/another-test/456');
You are creating a new instance of the CookiePlugin on the second request, you have to use the first one on the second (and subsequent) request as well.
$cookiePlugin = new CookiePlugin(new FileCookieJar('/tmp/cookie-file'));
//First Request
$client = new Guzzle\Http\Client();
$client->addSubscriber($cookiePlugin);
$client->post('/login');
$client->get('/test/first');
//Second Request, same client
// No need for $cookiePlugin = new CookiePlugin(...
$client->get('/test/second');
//Third Request, new client, same cookies
$client2 = new Guzzle\Http\Client();
$client2->addSubscriber($cookiePlugin); //uses same instance
$client2->get('/test/third');
$cookiePlugin = new CookiePlugin(new FileCookieJar($cookie_file_name));
// Add the cookie plugin to a client
$client = new Client($domain);
$client->addSubscriber($cookiePlugin);
// Send the request with no cookies and parse the returned cookies
$client->get($domain)->send();
// Send the request again, noticing that cookies are being sent
$request = $client->get($domain);
$request->send();
print_r ($request->getCookies());
Current answers will work if all requests are done in the same user request. But it won't work if the user first log in, then navigate through the site and query again later the "Domain".
Here is my solution (with ArrayCookieJar()):
Login
$cookiePlugin = new CookiePlugin(new ArrayCookieJar());
//First Request
$client = new Client($domain);
$client->addSubscriber($cookiePlugin);
$request = $client->post('/login');
$response = $request->send();
// Retrieve the cookie to save it somehow
$cookiesArray = $cookiePlugin->getCookieJar()->all($domain);
$cookie = $cookiesArray[0]->toArray();
// Save in session or cache of your app.
// In example laravel:
Cache::put('cookie', $cookie, 30);
Other request
// Create a new client object
$client = new Client($domain);
// Get the previously stored cookie
// Here example for laravel
$cookie = Cache::get('cookie');
// Create the new CookiePlugin object
$cookie = new Cookie($cookie);
$cookieJar = new ArrayCookieJar();
$cookieJar->add($cookie);
$cookiePlugin = new CookiePlugin($cookieJar);
$client->addSubscriber($cookiePlugin);
// Then you can do other query with these cookie
$request = $client->get('/getData');
$response = $request->send();
Using Symfony2, I need to access an external API based on HTTPS.
How can I call an external URI and manage the response to "play" with it. For example, to render a success or a failure message?
I am thinking in something like (note that performRequest is a completely invented method):
$response = $this -> performRequest("www.someapi.com?param1=A¶m2=B");
if ($response -> getError() == 0){
// Do something good
}else{
// Do something too bad
}
I have been reading about Buzz and other clients. But I guess that Symfony2 should be able to do it by its own.
I'd suggest using CURL:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'www.someapi.com?param1=A¶m2=B');
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-type: application/json')); // Assuming you're requesting JSON
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$response = curl_exec($ch);
// If using JSON...
$data = json_decode($response);
Note: The php on your web server must have the php5-curl library installed.
Assuming the API request is returning JSON data, this page may be useful.
This doesn't use any code that is specific to Symfony2. There may well be a bundle that can simplify this process for you, but if there is I don't know about it.
Symfony doesn't have a built-in service for this, but this is a perfect opportunity to create your own, using the dependency injection framework. What you can do here is write a service to manage the external call. Let's call the service "http".
First, write a class with a performRequest() method:
namespace MyBundle\Service;
class Http
{
public function performRequest($siteUrl)
{
// Code to make the external request goes here
// ...probably using cUrl
}
}
Register it as a service in app/config/config.yml:
services:
http:
class: MyBundle\Service\Http
Now your controller has access to a service called "http". Symfony manages a single instance of this class in the "container", and you can access it via $this->get("http"):
class MyController
{
$response = $this->get("http")->performRequest("www.something.com");
...
}
Best client that I know is: http://docs.guzzlephp.org/en/latest/
There is already bundle that integrates it into Symfony2 project:
https://github.com/8p/GuzzleBundle
$client = $this->get('guzzle.client');
// send an asynchronous request.
$request = $client->createRequest('GET', 'http://httpbin.org', ['future' => true]);
// callback
$client->send($request)->then(function ($response) {
echo 'I completed! ' . $response;
});
// optional parameters
$response = $client->get('http://httpbin.org/get', [
'headers' => ['X-Foo-Header' => 'value'],
'query' => ['foo' => 'bar']
]);
$code = $response->getStatusCode();
$body = $response->getBody();
// json response
$response = $client->get('http://httpbin.org/get');
$json = $response->json();
// extra methods
$response = $client->delete('http://httpbin.org/delete');
$response = $client->head('http://httpbin.org/get');
$response = $client->options('http://httpbin.org/get');
$response = $client->patch('http://httpbin.org/patch');
$response = $client->post('http://httpbin.org/post');
$response = $client->put('http://httpbin.org/put');
More info can be found on: http://docs.guzzlephp.org/en/latest/index.html
https://github.com/sensio/SensioBuzzBundle seems to be what you are looking for.
It implements the Kris Wallsmith buzz library to perform HTTP requests.
I'll let you read the doc on the github page, usage is pretty basic:
$buzz = $this->container->get('buzz');
$response = $buzz->get('http://google.com');
echo $response->getContent();
Symfony does not have its own rest client, but as you already mentioned there are a couple of bundles. This one is my prefered one:
https://github.com/CircleOfNice/CiRestClientBundle
$restClient = $this->container->get('ci.restclient');
$restClient->get('http://www.someUrl.com');
$restClient->post('http://www.someUrl.com', 'somePayload');
$restClient->put('http://www.someUrl.com', 'somePayload');
$restClient->delete('http://www.someUrl.com');
$restClient->patch('http://www.someUrl.com', 'somePayload');
$restClient->head('http://www.someUrl.com');
$restClient->options('http://www.someUrl.com', 'somePayload');
$restClient->trace('http://www.someUrl.com');
$restClient->connect('http://www.someUrl.com');
You send the request via
$response = $restclient->get($url);
and get a Symfony response object.
Then you can get the status code via
$httpCode = $response-> getStatusCode();
Your code would look like:
$restClient = $this->container->get('ci.restclient');
if ($restClient->get('http://www.yourUrl.com')->getStatusCode !== 200) {
// no error
} else {
// error
}
Use the HttpClient class to create the low-level HTTP client that makes requests, like the following GET request:
use Symfony\Component\HttpClient\HttpClient;
$client = HttpClient::create();
$response = $client->request('GET', 'https://api.github.com/repos/symfony/symfony-docs');
$statusCode = $response->getStatusCode();
// $statusCode = 200
$contentType = $response->getHeaders()['content-type'][0];
// $contentType = 'application/json'
$content = $response->getContent();
// $content = '{"id":521583, "name":"symfony-docs", ...}'
$content = $response->toArray();
// $content = ['id' => 521583, 'name' => 'symfony-docs', ...]
This is compatible with Symfony 5. Symfony Manual on this topic: The HttpClient Component