php-html-parser How to follow redirects - php

https://github.com/paquettg/php-html-parser
Anybody knows how to to follow redirects in this library?
For example:
require "vendor/autoload.php";
use PHPHtmlParser\Dom;
$dom = new Dom;
$dom->loadFromUrl($html);

Versions:
guzzlehttp/guzzle: "7.2.0"
paquettg/php-html-parser: "3.1.1"
Why does the library not natively allow redirects?
The loadFromUrl method has the following signature (at the time is 3.1.1)
public function loadFromUrl(string $url, ?Options $options = null, ?ClientInterface $client = null, ?RequestInterface $request = null): Dom
{
if ($client === null) {
$client = new Client();
}
if ($request === null) {
$request = new Request('GET', $url);
}
$response = $client->sendRequest($request);
$content = $response->getBody()->getContents();
return $this->loadStr($content, $options);
}
Looking at the line $response = $client->sendRequest($request); it goes to Guzzle's Client - https://github.com/guzzle/guzzle/blob/master/src/Client.php#L131
/**
* The HttpClient PSR (PSR-18) specify this method.
*
* #inheritDoc
*/
public function sendRequest(RequestInterface $request): ResponseInterface
{
$options[RequestOptions::SYNCHRONOUS] = true;
$options[RequestOptions::ALLOW_REDIRECTS] = false;
$options[RequestOptions::HTTP_ERRORS] = false;
return $this->sendAsync($request, $options)->wait();
}
The $options[RequestOptions::ALLOW_REDIRECTS] = false; will automatically turn off redirects. No matter what you pass in with the Client or Request it will automatically turn off redirects.
How to follow redirects with the library
Observing that the method loadFromUrl will make the request and get the response then use loadStr we'll mimic the same but use Guzzle (as it's a dependency of the library).
<?php
// Include the autoloader
use GuzzleHttp\Client;
use GuzzleHttp\Exception\GuzzleException;
use PHPHtmlParser\Dom;
include_once("vendor/autoload.php");
$client = new Client();
try {
// Showing the allow_redirects for verbosity sake. This is on by default with GuzzleHTTP clients.
$request = $client->request('GET', 'http://theeasyapi.com', ['allow_redirects' => true]);
// This would work exactly the same
//$request = $client->request('GET', 'http://theeasyapi.com');
} catch(GuzzleException $e) {
// Probably do something with $e
var_dump($e->getMessage());
exit;
}
$dom = new Dom();
$domExample = $dom->loadStr($request->getBody()->getContents());
foreach($domExample->find('a') as $link) {
var_dump($link->text);
}
The code above will instantiate a new Guzzle Client, and make a request to the URL allowing redirects. The website used in this example is a site that will 301 redirect from non-secure to secure.

Related

I'm trying PHP Amp client but it's not working, keeps returning error

I'm trying Amp client to return page content but it keeps failing. I've installed the package, and trying the example given by the docs.. but I can't figure out why it's not working. Here's the code:
namespace App\Http\Controllers;
use Amp\Http\Client\HttpClientBuilder;
use Amp\Http\Client\Request;
use Amp\Http\Client\Response;
//use Illuminate\Http\Request;
class AmpConcurrentRequestsController extends Controller
{
public function ampTest1()
{
$httpClient = HttpClientBuilder::buildDefault();
$request = new Request('GET', 'http://example.com');
$promise = $httpClient->request($request);
/** #var Response $response */
$response = Amp\wait($promise);
$statusCode = $response->getStatus();
$body = yield $response->getBody()->buffer();
}
}
I get this error:
Symfony\Component\HttpFoundation\Response::setContent(): Argument #1
($content) must be of type ?string, Generator given, called in
C:\xampp\htdocs\laundarySaaS\vendor\laravel\framework\src\Illuminate\Http\Response.php
on line 72
What eventually worked for me is below code.
public function ampTest1()
{
// Create a new HTTP client
$httpClient = HttpClientBuilder::buildDefault();
// Send a GET request to the specified URL
$request = new Request( 'https://example.com');
$promise = $httpClient->request($request);
/** #var Response $response */
$response = Promise\wait($promise);
// Get the response code
$code = $response->getStatus();
// Do something with the response code
echo $code;
}

Are all existing endpoints listed in documentation really still working for v4.9? (i.e. those not replaced by v1 so far)

I have tried using old v4.9 endpoints that haven't been replaced by v1 so far such as:
https://developers.google.com/my-business/reference/rest/v4/accounts.locations/reportInsights
https://developers.google.com/my-business/reference/rest/v4/accounts.locations.reviews
However, none of these endpoints work anymore.
I am using PHP client that had these endpoints missing, but using the official v4.9 library listed here: https://developers.google.com/my-business/samples/previousVersions I have been able to reach some of the old endpoints such as reviews.
However they no longer return any data or data object is empty.
Anyone has experienced similar issues?
The v4.9 (not yet deprecated) endpoints such as reviews, insights etc. are working, but the official library is broken and botched.
I had to code a replacement using Guzzle client reaching to the endpoints directly instead. So you need to code the API library yourself from scratch for these v4.9 endpoints as the official library does not work.
How to fetch reviews:
public static function listReviews($client, $params, $account, $location)
{
$response = $client->authorize()->get('https://mybusiness.googleapis.com/v4/' . $account . '/' . $location . '/reviews', ['query' => $params]);
return json_decode((string) $response->getBody(), false);
}
How to fetch insights:
/** v4.9 working 02/2022 **/
public static function reportInsights($client, $params, $account)
{
try {
$response = $client->authorize()->post('https://mybusiness.googleapis.com/v4/' . $account . '/locations:reportInsights', [
\GuzzleHttp\RequestOptions::JSON => $params,
]);
} catch (\GuzzleHttp\Exception\RequestException $ex) {
return $ex->getResponse()->getBody()->getContents();
}
return json_decode((string) $response->getBody(), false);
}
How to prepare payload for insights:
$params = new \stdClass();
$params->locationNames = $account->name . '/' . $location->name;
$time_range = new \stdClass();
$time_range->startTime = Carbon::parse('3 days ago 00:00:00')->toISOString();
$time_range->endTime = Carbon::parse('2 days ago 00:00:00')->toISOString();
if ($force == 'complete') {
$time_range->startTime = Carbon::parse('17 months ago 00:00:00')->toIso8601ZuluString();
$time_range->endTime = Carbon::parse('3 days ago 00:00:00')->toIso8601ZuluString();
}
$params->basicRequest = new \stdClass();
$params->basicRequest->timeRange = $time_range;
$params->basicRequest->metricRequests = new \stdClass();
$metric_request = new \stdClass();
$metric_request->metric = 'ALL';
$metric_request->options = ['AGGREGATED_DAILY'];
$params->basicRequest->metricRequests = [
$metric_request,
];
Note: if you are getting empty insights response, you have to check verification using new v1 API call such as:
$verifications = \Google_Service_MyBusinessVerifications($client)->locations_verifications->listLocationsVerifications($location->getName());
$verification = '0';
if ($verifications->getVerifications()) {
$verification = $verifications->getVerifications()[0]->getState();
}
Using official API client with an existing token (needs to be fetched via OAuth2):
$provider = new GoogleClientServiceProvider(true);
$client = $provider->initializeClient($known_token, ['https://www.googleapis.com/auth/plus.business.manage', 'https://www.googleapis.com/auth/drive']);

Unable to get routing to work in php application using symfony routing

index.php
require "vendor/autoload.php";
require "routes.php";
routes.php
<?php
require "vendor/autoload.php";
use Symfony\Component\Routing\Matcher\UrlMatcher;
use Symfony\Component\Routing\RequestContext;
use Symfony\Component\Routing\RouteCollection;
use Symfony\Component\Routing\Route;
use Symfony\Component\HttpFoundation\Request;
use Symfony\Component\Routing\Generator\UrlGenerator;
use Symfony\Component\Routing\Exception\ResourceNotFoundException;
try {
$form_add_route = new Route(
'/blog/add',
array(
'controller' => '\HAPBlog\Controller\EntityAddController',
'method'=>'load'
)
);
$routes = new RouteCollection();
$routes->add('blog_add', $form_add_route);
// Init RequestContext object
$context = new RequestContext();
$context->fromRequest(Request::createFromGlobals());
$matcher = new UrlMatcher($routes, $context);
$parameters = $matcher->match($context->getPathInfo());
// How to generate a SEO URL
$generator = new UrlGenerator($routes, $context);
$url = $generator->generate('blog_add');
echo $url;
}
catch (Exception $e) {
echo '<pre>';
print_r($e->getMessage());
}
src/Controller/EntityAddController.php
<?php
namespace HAPBlog\Controller;
use Symfony\Component\HttpFoundation\Response;
class EntityAddController {
public function load() {
return new Response('ENTERS');
}
}
I am referring to the tutorial given below:
https://code.tutsplus.com/tutorials/set-up-routing-in-php-applications-using-the-symfony-routing-component--cms-31231
But when I try to access the site http://example.com/routes.php/blog/add
It gives a blank page.
Debugging via PHPStorm shows that it does not enter "EntityAddController" Class
What is incorrect in the above code ?
There is no magic behind this process, once you get the route information, you will have to call the configured controller and send the response content.
Take a complete example here:
// controllers.php
class BlogController
{
public static function add(Request $request)
{
return new Response('Add page!');
}
}
// routes.php
$routes = new RouteCollection();
$routes->add('blog_add', new Route('/blog/add', [
'controller' => 'BlogController::add',
]));
// index.php
$request = Request::createFromGlobals();
$context = new RequestContext();
$context->fromRequest($request);
$matcher = new UrlMatcher($routes, $context);
try {
$attributes = $matcher->match($request->getPathInfo());
$response = $attributes['controller']($request);
} catch (ResourceNotFoundException $exception) {
$response = new Response('Not Found', 404);
} catch (Exception $exception) {
$response = new Response('An error occurred', 500);
}
$response->send();

PHP - Guzzle Middleware

I'm using the Pole Emploi's API,but I encounter 401 error 25 minutes later, when my token expires.
I looked for a way to get a new token and retry the request, but no way for me to understand how Middlewares work, and if I should use a middleware for my needings.
On Guzzle's docs this is written :
Middleware functions return a function that accepts the next handler to invoke. This returned function then returns another function that acts as a composed handler-- it accepts a request and options, and returns a promise that is fulfilled with a response. Your composed middleware can modify the request, add custom request options, and modify the promise returned by the downstream handler.
And this is an example code from the docs :
use Psr\Http\Message\RequestInterface;
function my_middleware()
{
return function (callable $handler) {
return function (RequestInterface $request, array $options) use ($handler) {
return $handler($request, $options);
};
};
}
So I think I need to manage the "promise" to see if its HTTP code is 401, and then get a new token and retry the request ?
I'm lost, so I would appreciate if someone can explain me the logic of this with different words maybe :)
Thank you in advance.
It doesn't need to be that difficult, add a handler that takes care of the job, in combination with cache that expires.
If you don't use cache then I guess you could probably save it to a file along with a timestamp for expiration that you check against when fetching it.
class AuthenticationHandler
{
private $username;
private $password;
private $token_name = 'access_token';
public function __construct($username, $password)
{
$this->username = $username;
$this->password = $password;
}
public function __invoke(callable $handler)
{
return function (RequestInterface $request, array $options) use ($handler) {
if (is_null($token = Cache::get($this->token_name))) {
$response = $this->getJWT();
Cache::put($this->token_name, $token = $response->access_token, floor($response->expires_in));
}
return $handler(
$request->withAddedHeader('Authorization', 'Bearer '.$token)
->withAddedHeader('Api-Key', $this->api_key), $options
);
};
}
private function getJWT()
{
$response = (new Client)->request('POST', 'new/token/url', [
'form_params' => [
'grant_type' => 'client_credentials',
'username' => $this->username,
'password' => $this->password,
],
]);
return json_decode($response->getBody());
}
}
Then use it:
$stack = HandlerStack::create(new CurlHandler());
$stack->push(new AuthenticationHandler('username', 'password'));
$client = new GuzzleHttp\Client([
'base_uri' => 'https://api.com',
'handler' => $stack,
]);
Now you will always have a valid token, and you will never have to worry about it ever again.
I wouldn't recommend doing this as it can become hell to debug your application and as far as I am aware Guzzle doesn't really allow access to the client from middleware. Regardless you can use Promises to get around. If I were you I would refresh token before other requests, or refresh periodically. It might be fine if you are firing requests one by one, but in a Pool it will become a nightmare because you can end up having script fetch token too often and then some request ends up with out-dated token.
Anyway here is a rough example:
use Psr\Http\Message\RequestInterface;
use Psr\Http\Message\ResponseInterface;
use GuzzleHttp\Client;
use GuzzleHttp\HandlerStack;
function my_middleware()
{
return function (callable $handler) {
return function (RequestInterface $request, array $options) use ($handler) {
/**
* #var $promise \GuzzleHttp\Promise\Promise
*/
$promise = $handler($request, $options);
return $promise->then(
function (ResponseInterface $response) use ($request, $options) {
if ($response->getStatusCode() === 404) {
var_dump($response->getStatusCode());
var_dump(strlen($response->getBody()));
// Pretend we are getting new token key here
$client = new Client();
$key = $client->get('https://www.iana.org/domains/reserved');
// Then we modify the failed request. For your case you use ->withHeader() to change the
// Authorization header with your token.
$uri = $request->getUri();
$uri = $uri->withHost('google.com')->withPath('/');
// New instance of Request
$request = $request->withUri($uri);
// Send the request again with our new header/URL/whatever
return $client->sendAsync($request, $options);
}
return $response;
}
);
};
};
}
$handlerStack = HandlerStack::create();
$handlerStack->push(my_middleware());
$client = new Client([
'base_uri' => 'https://example.org',
'http_errors' => false,
'handler' => $handlerStack
]);
$options = [];
$response = $client->request('GET', '/test', $options);
var_dump($response->getStatusCode());
var_dump(strlen($response->getBody()));
echo $response->getBody();

Mock response and use history middleware at the same time in Guzzle

Is there any way to mock response and request in Guzzle?
I have a class which sends some request and I want to test.
In Guzzle doc I found a way how can I mock response and request separately. But how can I combine them?
Because, If use history stack, guzzle trying to send a real request.
And visa verse, when I mock response handler can't test request.
class MyClass {
public function __construct($guzzleClient) {
$this->client = $guzzleClient;
}
public function registerUser($name, $lang)
{
$body = ['name' => $name, 'lang' = $lang, 'state' => 'online'];
$response = $this->sendRequest('PUT', '/users', ['body' => $body];
return $response->getStatusCode() == 201;
}
protected function sendRequest($method, $resource, array $options = [])
{
try {
$response = $this->client->request($method, $resource, $options);
} catch (BadResponseException $e) {
$response = $e->getResponse();
}
$this->response = $response;
return $response;
}
}
Test:
class MyClassTest {
//....
public function testRegisterUser()
{
$guzzleMock = new \GuzzleHttp\Handler\MockHandler([
new \GuzzleHttp\Psr7\Response(201, [], 'user created response'),
]);
$guzzleClient = new \GuzzleHttp\Client(['handler' => $guzzleMock]);
$myClass = new MyClass($guzzleClient);
/**
* But how can I check that request contains all fields that I put in the body? Or if I add some extra header?
*/
$this->assertTrue($myClass->registerUser('John Doe', 'en'));
}
//...
}
#Alex Blex was very close.
Solution:
$container = [];
$history = \GuzzleHttp\Middleware::history($container);
$guzzleMock = new \GuzzleHttp\Handler\MockHandler([
new \GuzzleHttp\Psr7\Response(201, [], 'user created response'),
]);
$stack = \GuzzleHttp\HandlerStack::create($guzzleMock);
$stack->push($history);
$guzzleClient = new \GuzzleHttp\Client(['handler' => $stack]);
First of all, you don't mock requests. The requests are the real ones you are going to use in production. The mock handler is actually a stack, so you can push multiple handlers there:
$container = [];
$history = \GuzzleHttp\Middleware::history($container);
$stack = \GuzzleHttp\Handler\MockHandler::createWithMiddleware([
new \GuzzleHttp\Psr7\Response(201, [], 'user created response'),
]);
$stack->push($history);
$guzzleClient = new \GuzzleHttp\Client(['handler' => $stack]);
After you run your tests, $container will have all transactions for you to assert. In your particular test - a single transaction. You are interested in $container[0]['request'], since $container[0]['response'] will contain your canned response, so there is nothing to assert really.

Categories