Using Goutte with Symfony2 in Controller - php

I'm trying to scrape a page and I'm not very familiar with php frameworks, so I've been trying to learn Symfony2. I have it up and running, and now I'm trying to use Goutte. It's installed in the vendor folder, and I have a bundle I'm using for my scraping project.
Question is, is it good practice to do scraping from a Controller? And how? I have searched forever and cannot figure out how to use Goutte from a bundle, since it's buried deep withing the file structure.
<?php
namespace ontf\scraperBundle\Controller;
use Symfony\Bundle\FrameworkBundle\Controller\Controller;
use Goutte\Client;
class ThingController extends Controller
{
public function somethingAction($something)
{
$client = new Client();
$crawler = $client->request('GET', 'http://www.symfony.com/blog/');
echo $crawler->text();
return $this->render('scraperBundle:Thing:index.html.twig');
// return $this->render('scraperBundle:Thing:index.html.twig', array(
// 'something' => $something
// ));
}
}

I'm not sure I have heard of "good practices" as far as scraping goes but you may be able to find some in the book PHP Architect's Guide to Web Scraping with PHP.
These are some guidelines I have used in my own projects:
Scraping is a slow process, consider delegating that task to a background process.
Background process normally run as a cron job that executing a CLI application or a worker that is constantly running.
Use a process control system to manage your workers. Take a look at supervisord
Save every scraped file (the "raw" version), and log every error. This will enable you to detect problems. Use Rackspace Cloud Files or AWS S3 to archive these files.
Use the Symfony2 Console tool to create the commands to run your scraper. You can save the commands in your bundle under the Command directory.
Run your Symfony2 commands using the following flags to prevent running out of memory: php app/console scraper:run example.com --env=prod --no-debug Where app/console is where the Symfony2 console applicaiton lives, scraper:run is the name of your command, example.com is an argument to indicate the page you want to scrape, and the --env=prod --no-debug are the flags you should use to run in production. see code below for example.
Inject the Goutte Client into your command like such:
Ontf/ScraperBundle/Resources/services.yml
services:
goutte_client:
class: Goutte\Client
scraperCommand:
class: Ontf\ScraperBundle\Command\ScraperCommand
arguments: ["#goutte_client"]
tags:
- { name: console.command }
And your command should look something like this:
<?php
// Ontf/ScraperBundle/Command/ScraperCommand.php
namespace Ontf\ScraperBundle\Command;
use Symfony\Component\Console\Command\Command;
use Symfony\Component\Console\Input\InputArgument;
use Symfony\Component\Console\Input\InputInterface;
use Symfony\Component\Console\Input\InputOption;
use Symfony\Component\Console\Output\OutputInterface;
use Goutte\Client;
abstract class ScraperCommand extends Command
{
private $client;
public function __construct(Client $client)
{
$this->client = $client;
parent::__construct();
}
protected function configure()
{
->setName('scraper:run')
->setDescription('Run Goutte Scraper.')
->addArgument(
'url',
InputArgument::REQUIRED,
'URL you want to scrape.'
);
}
protected function execute(InputInterface $input, OutputInterface $output)
{
$url = $input->getArgument('url');
$crawler = $this->client->request('GET', $url);
echo $crawler->text();
}
}

You Should take a Symfony-Controller if you want to return a response, e.G a html output.
if you only need the function for calculating or storing stuff in database,
You should create a Service class that represents the functionality of your Crawler, e.G
class CrawlerService
{
function getText($url){
$client = new Client();
$crawler = $client->request('GET', $url);
return $crawler->text();
}
and to execute it i would use a Console Command
If you want to return a Response use a Controller

Related

Symfony 5 Generating a URL from Console Command

I'm new to Symfony and am using 5.x. I have created a Console command using Symfony\Component\Console\Command\Command and am trying to use Symfony\Component\HttpClient\HttpClient to POST to a URL. I need to generate the URL to a route running on the same machine (but in future this may possibly change to a different machine), so the host could be like localhost or example.com, and the port of the API is custom. I have searched on the web but the only possible solution I got involved the use of Symfony\Component\Routing\Generator\UrlGeneratorInterface, and the web is cluttered with code samples for old versions of Symfony, and I haven't yet managed to get this working.
My latest attempt was:
public function __construct(UrlGeneratorInterface $router)
{
parent::__construct();
$this->router = $router;
}
but I don't really understand how to inject the parameter UrlGeneratorInterface $router to the constructor. I get an error that the parameter was not supplied. Do I have to create an instance of UrlGenerator elsewhere and inject it over here, or is there a simpler way to just generate an absolute URL in Symfony from within a Command? I don't really understand containers yet.
$url = $context->generate('view', ['Param' => $message['Param']], UrlGeneratorInterface::ABSOLUTE_URL);
services.yaml:
App\Command\MyCommand:
arguments: ['#router.default']
Is there a simpler way to generate a URL from a Console Command by
explicitly specifying host, protocol, port, route, parameters etc?
Why isn't UrlGeneratorInterface or RouterInterface autowiring?
Do I need to specify wiring manually as $router.default in
services.yaml if I also have autowiring enabled?
I understand that the execute function implementation may be
incorrect, but I couldn't get to fixing that without first getting
the constructor working. This is still, work in progress.
EDIT:
Updated gist: https://gist.github.com/tSixTM/86a29ee75dbd117c8f8571d458ed72db
EDIT 2: Made the problem statement clearer by adding question points: I slept on it :)
EDIT 3:
#!/usr/bin/env php
<?php
// application.php
require __DIR__.'/vendor/autoload.php';
use Symfony\Component\Console\Application;
$application = new Application();
$application->add(new App\Command\MyCommand());
$application->run();
I tinkered around with your gist and found the following to work:
https://gist.github.com/Matts/528c249a82e5844164039c4f6c0db046
The problem that you seemed to have, was not due to your service declaration, rather it was that you were missing the declaration of the private $router variable in MyCommand, see line 25.
So you can keep the services.yaml as you show in your gist, no changes required to the autowire variable, also you don't have to manually declare the command
Further, you don't need to fetch $context from the router, you can also set the base URL in your framework.yaml, here you can find where I found this.
Please note that I removed some code from the execute, this was due to me not having access to your other files. You can just re-add this.
Well, it wasn't all that straightforward figuring this out. A lot of the docs are out of date or don't address this issue completely. This is what I got so far:
services.yaml:
Symfony\Component\Routing\RouterInterface:
arguments: ['#router']
application.php:
#!/usr/bin/env php
<?php
// application.php
require __DIR__.'/vendor/autoload.php';
require __DIR__.'/src/Kernel.php';
use Symfony\Bundle\FrameworkBundle\Console\Application;
use Symfony\Component\Routing\RouterInterface;
use Symfony\Component\Dotenv\Dotenv;
$dotenv = new Dotenv();
$dotenv->load(__DIR__.'/.env', __DIR__.'/.env.local');
$kernel = new App\Kernel(getenv('APP_ENV'), getenv('APP_DEBUG'));
$kernel->boot();
$container = $kernel->getContainer();
$application = new Application($kernel);
$application->add(new App\Command\MyCommand($container->get('router')));
$application->run();
Note: I changed the Application import to Symfony\Bundle\FrameworkBundle\Console\Application
MyCommand.php:
<?php
// src/Command/MyCommand.php
namespace App\Command;
use Symfony\Component\Console\Command\Command;
use Symfony\Component\Console\Command\LockableTrait;
use Symfony\Component\Console\Input\InputInterface;
use Symfony\Component\Console\Output\OutputInterface;
use Symfony\Component\Routing\Generator\UrlGeneratorInterface;
use Symfony\Component\Routing\RouterInterface;
use Symfony\Component\HttpClient\HttpClient;
use App\SQSHelper;
class MyCommand extends Command
{
use LockableTrait;
// the name of the command (the part after "bin/console")
protected static $defaultName = 'app:my-command';
protected $router;
public function __construct(RouterInterface $router)
{
parent::__construct();
$this->router = $router;
}
protected function configure()
{
}
protected function execute(InputInterface $input, OutputInterface $output)
{
if($this->lock()) { // Prevent running more than one instance
$endpoint =
$queueName = 'Queue';
$queue = new SQSHelper();
while($queue->getApproxNumberOfMessages($queueName)) {
$message = $queue->receiveMessage($queueName);
if($message) {
if($message['__EOQ__'] ?? FALSE) // End-of-Queue marker received
break;
$context = $this->router->getContext();
$context->setHost('localhost');
$context->setHttpPort('49100');
$context->setHttpsPort('49100');
$context->setScheme('https');
$context->setBaseUrl('');
$url = $this->router->generate('ep', ['MessageId' => $message['MessageId']], UrlGeneratorInterface::ABSOLUTE_URL);
$client = HttpClient::create();
$response = $client->request('POST', $url, [
'headers' => ['Content-Type' => 'application/json'],
'body' => $message['Body'] // Already JSON encoded
]);
}
}
$this->release(); // Release lock
// this method must return an integer number with the "exit status code"
// of the command. You can also use these constants to make code more readable
// return this if there was no problem running the command
// (it's equivalent to returning int(0))
return Command::SUCCESS;
// or return this if some error happened during the execution
// (it's equivalent to returning int(1))
// return Command::FAILURE;
}
}
}
If anything feels off or if you could offer a better solution or improvements, please contribute...
Thanks Matt Smeets for your invaluable help figuring out there is no problem with the command, and if you can suggest a better alternative for the application.php, I'll accept your answer.
Solution introduced with Symfony 5.1 :
https://symfony.com/doc/current/routing.html#generating-urls-in-commands
Generating URLs in commands works the same as generating URLs in services. The only difference is that commands are not executed in the HTTP context. Therefore, if you generate absolute URLs, you’ll get http://localhost/ as the host name instead of your real host name.
The solution is to configure the default_uri option to define the “request context” used by commands when they generate URLs:
# config/packages/routing.yaml
framework:
router:
# ...
default_uri: 'https://example.org/my/path/'

How to Implement Asynchronous Queue to run Method in Symfony 3

First off, some basic information about my project: I have a website built with Symfony 3. For some tasks I'm thinking about implementing to run asynchronous PHP methods. Some events use a lot of time but their results need not be immediately evident.
For instance: in method newOrder I have function addUserLTV who do few steps. The customer does not have to wait for all the steps to complete, only to get immediately the confirmation after the basic operation - 'newOrder' will add addUserLTV to queue and show immediately confirmation (finished run).
The queue tasks will be run when the server have time to do it.
public function addUserLTV( $userID, $addLTV )
{ //same code
}
How to do it? It is possible in symphony 3?
This is something you can easily do with enqueue bundle. Just a few words on why should you choose it:
It supports a lot of transports from the simplest one (filesystem) to enterprise ones (RabbitMQ or Amazon SQS).
It comes with a very powerful bundle.
It has a top level abstraction which could be used with the greatest of ease.
There are a lot more which might come in handy.
Regarding your question. Here's how you can do this with the enqueue bundle. Follow setup instructions from the doc.
Now the addUserLTV method will look like this:
<?php
namespace Acme;
use Enqueue\Client\ProducerInterface;
class AddUserLTVService
{
/**
* #var ProducerInterface
*/
private $producer;
/**
* #param ProducerInterface $producer
*/
public function __construct(ProducerInterface $producer)
{
$this->producer = $producer;
}
public function addUserLTV( $userID, $addLTV )
{
$this->producer->sendCommand('add_user_ltv', [
'userId' => $userID,
'ltv' => $addLTV]
);
}
}
It sends the message to a message queue using the client (top level abstraction I've mentioned before). The service has to be registered to the Symfony container:
services:
Acme\AddUserLTVService:
arguments: ['#enqueue.producer']
Now let look at the consumption side. You need a command processor that do the job:
<?php
namespace Acme;
use Enqueue\Client\CommandSubscriberInterface;
use Enqueue\Psr\PsrContext;
use Enqueue\Psr\PsrMessage;
use Enqueue\Psr\PsrProcessor;
use Enqueue\Util\JSON;
class AddUserTVAProcessor implements PsrProcessor, CommandSubscriberInterface
{
public function process(PsrMessage $message, PsrContext $context)
{
$data = JSON::decode($message->getBody());
$userID = $data['userID'];
$addLTV = $data['ltv'];
// do job
return self::ACK;
}
public static function getSubscribedCommand()
{
return 'add_user_ltv';
}
}
Register it as a service with a enqueue.client.processor tag:
services:
Acme\AddUserTVAProcessor:
tags:
- {name: 'enqueue.client.processor'}
That's it for coding. Run the consume command and you are done:
./bin/console enqueue:consume --setup-broker -vvv

Accessing and using Symfony model layer from Outside the Symfony applications

What I have is a symfony application, which contains some entities along with some repositories. A second non-symfony application should interface with the first one for interacting with some logic written in it (in this very moment just using the entities and their proper repositories).
Keep in mind that the first application could have its own autoload register etc.
I thought of an API class for external applications, which stays in the app directory. To use that the application should require a script. Here is the idea:
app/authInterface.php that the external application should require:
$loader = require __DIR__.'/autoload.php';
require_once (__DIR__.'/APIAuth.php');
return new APIAuth();
and an example of a working APIAuth I wrote (the code is kind of messy: remember this is just a try but you can get the idea):
class APIAuth
{
public function __construct()
{
//dev_local is a personal configuration I'm using.
$kernel = new AppKernel('dev_local', false);
$kernel->loadClassCache();
$kernel->boot();
$doctrine = $kernel->getContainer()->get('doctrine');
$em = $doctrine->getManager();
$users = $em->getRepository('BelkaTestBundle:User')->findUsersStartingWith('thisisatry');
}
by calling it by the shell everything works and I'm happy with it:
php app/authInterface.php
but I'm wondering if I'm doing in the best way possible in terms of:
resources am I loading just the resources I really need to run my code? Do I really need the kernel? That way everything is properly loaded - including the DB connection- but I'm not that sure if there are other ways to do it lighter
symfony logics am I interacting with symfony the right way? Are there better ways?
Symfony allows using its features from the command line. If you use a CronJob or another application, and want to call your Symfony application, you have two general options:
Generating HTTP endpoints in your Symfony application
Generating a command which executes code in your Symfony application
Both options will be discussed below.
HTTP endpoint (REST API)
Create a route in your routing configuration to route a HTTP request to a Controller/Action.
# app/config/routing.yml
test_api:
path: /test/api/v1/{api_key}
defaults: { _controller: AppBundle:Api:test }
which will call the ApiController::testAction method.
Then, implement the testAction with your code you want to excecute:
use Symfony\Component\HttpFoundation\Response;
public function testAction() {
return new Response('Successful!');
}
Command
Create a command line command which does something in your application. This can be used to execute code which can use any Symfony service you have defined in your (web)application.
It might look like:
// src/AppBundle/Command/TestCommand.php
namespace AppBundle\Command;
use Symfony\Bundle\FrameworkBundle\Command\ContainerAwareCommand;
use Symfony\Component\Console\Input\InputArgument;
use Symfony\Component\Console\Input\InputInterface;
use Symfony\Component\Console\Input\InputOption;
use Symfony\Component\Console\Output\OutputInterface;
class GreetCommand extends ContainerAwareCommand
{
protected function configure()
{
$this
->setName('myapp:section:thecommand')
->setDescription('Test command')
->addArgument(
'optionname',
InputArgument::OPTIONAL,
'Test option'
)
;
}
protected function execute(InputInterface $input, OutputInterface $output)
{
$option = $input->getArgument('optionname');
if ($option) {
$text = 'Test '.$option;
} else {
$text = 'Test!';
}
$output->writeln($text);
}
}
Look here for documentation.
Call your command using something like
bin/console myapp:section:thecommand --optionname optionvalue
(Use app/console for pre-3.0 Symfony installations.)
Use whichever option you think is best.
One word of advice. Do not try to use parts of the Symfony framework when your application is using the full Symfony framework. Most likely you will walk into trouble along the way and you're making your own life hard.
Use the beautiful tools you have at your disposal when you are already using Symfony to build your application.

How can I use a non-default Output object when running a Symfony command?

Symfony provides other classes that implement OutputInterface. How can I provide instances of these classes - ideally from the command line or other config options - to a command?
My current workaround for using different Output objects is to immediately reassign $output to the preferred object like so:
<?php
namespace AppBundle\Console\Command;
use Symfony\Component\Console\Command\Command;
use Symfony\Component\Console\Input\InputArgument;
use Symfony\Component\Console\Input\InputInterface;
use Symfony\Component\Console\Input\InputOption;
use Symfony\Component\Console\Output\OutputInterface;
use Symfony\Component\Console\Output\NullOutput;
class DebugCommand extends Command
{
protected function Configure()
{
$this->setName('AppBundle:DebugCommand');
}
protected function execute(InputInterface $input, OutputInterface $output)
{
$output = new NullOutput();
$output->writeln('Done!');
}
}
But this feels sloppy. It would make much more sense to simply provide the intended object as a parameter to DebugCommand::execute(). Plus, If I decided I did want the output - I would have to modify the code to get the intended behavior.
How can I achieve this properly?
EDIT:
My hope is that it will be possible to set a default for each command. This would be helpful because I could create a new class that implements OutputInterface that would post output to, say, my team's Slack channel. But a different command might need to post to a different team's Slack channel. Being able to customize the output object for each command would be helpful as each command might affect different teams.
When you are using the framework, then you do have similar code in your console script:
$application = new Application($kernel);
$application->setDefaultCommand('default');
$application->run($input);
what you can do now is to add a second argumen to the run function, like:
$application->run($input, new NullOutput());
EDIT:
How to do it per command needs a new class which extends the Application class:
class SlackOutputApplication extends Application{
protected function doRunCommand(Command $command, InputInterface $input, OutputInterface $output)
{
if ($command->getName() == 'foo') {
$output = new SlackOutput('channelname');
}
parent::doRunCommand($command, $input, $output);
}
}

Integrating CLI PHP with CakePHP

I have a nice functioning CakePHP 1.3.11 site and I need a scheduled maintenance CLI script to run, so I'm writing it in PHP. Is there any way to make a cake-friendly script? Ideally I could use Cake's functions and Cake's Database models, the CLI requires database access and not much else however. I would ideally like to include my CLI code in a controller and the datasource in a model so I can call the function like any other Cake function, but only from the CLI as a sheduled task.
Searching for CakePHP CLI mostly brings results about CakeBake and cron jobs; this article sounded very helpful but it's for an old version of cake and requires a modified version of index.php. I'm no longer sure how to change the file to make it work in the new version of cakePHP.
I'm on Windows if it matters, but I have complete access to the server. I'm currently planning to schedule a simple cmd "php run.php" style script.
Using CakePHP's shells, you should be able to access all of your CakePHP app's models and controllers.
As an example, I've set up a simple model, controller and shell script:
/app/models/post.php
<?php
class Post extends AppModel {
var $useTable = false;
}
?>
/app/controllers/posts_controller.php
<?php
class PostsController extends AppController {
var $name = 'Posts';
var $components = array('Security');
function index() {
return 'Index action';
}
}
?>
/app/vendors/shells/post.php
<?php
App::import('Component', 'Email'); // Import EmailComponent to make it available
App::import('Core', 'Controller'); // Import Controller class to base our App's controllers off of
App::import('Controller', 'Posts'); // Import PostsController to make it available
App::import('Sanitize'); // Import Sanitize class to make it available
class PostShell extends Shell {
var $uses = array('Post'); // Load Post model for access as $this->Post
function startup() {
$this->Email = new EmailComponent(); // Create EmailComponent object
$this->Posts = new PostsController(); // Create PostsController object
$this->Posts->constructClasses(); // Set up PostsController
$this->Posts->Security->initialize(&$this->Posts); // Initialize component that's attached to PostsController. This is needed if you want to call PostsController actions that use this component
}
function main() {
$this->out($this->Email->delivery); // Should echo 'mail' on the command line
$this->out(Sanitize::html('<p>Hello</p>')); // Should echo <p>Hello</p> on the command line
$this->out($this->Posts->index()); // Should echo 'Index action' on the command line
var_dump(is_object($this->Posts->Security)); // Should echo 'true'
}
}
?>
The whole shell script is there to demonstrate that you can have access to:
Components that you load directly and that are not loaded through a controller
Controllers (first import the Controller class, then import your own controller)
Components that are used by controllers (After creating a new controller, run the constructClasses() method and then the particular component's initialize() method as shown above.
Core utility classes, like the Sanitize class shown above.
Models (just include in your shell's $uses property).
Your shell can have a startup method that is always run first, and the main method, which is your shell scripts main process and which is run after the startup.
To run this script, you would enter /path/to/cake/core/console/cake post on your command line (might have to check the proper way to do this on Windows, the info is in the CakePHP book (http://book.cakephp.org).
The result of the above script should be:
mail
<p>Hello</p>
Index action
bool(true)
This works for me, but maybe people who are more advanced in CakePHP shells could offer more advice, or possibly correct some of the above... However, I hope this is enough to get you started.
As of CakePHP 2, the shell scripts should now be saved to \Console\Command. There is good documentation at http://book.cakephp.org/2.0/en/console-and-shells.html

Categories