Mediawiki: Getting jobs to run on edit on wiki farm/family - php

I've set up a Wiki family consisting of a small number of Wikis that have (and are expected to continue having) low to moderate traffic.
When you run a single MediaWiki, it runs a job on every page request which is nice for keeping links and categories up to date, but I can't get this behaviour to work for a wiki family.
I have a wiki setup with a branching localSettings (depending on the SERVER_NAME) and have despite searching (and asking on Mediawiki) found no way to keep this job behaviour, rather I get jobs queueing up, presumably because the maintenance scripts being automatically run do not know which Wiki they originate from.
Is there a way to fix/circumvent this? I have not found any kind of variable being supplied when the job queue is run that could be passed into the localSettings.php so that the correct settings are loaded and the jobs can run properly.

Generally jobs are run on each page load within context of current Wiki, this means in your case there should be no problems with queue, because your LocalSettings file is branched. Though, in certain circumstances job queue may be overloaded, in this case you will need to disable default queue behavior ( by setting $wgJobRunRate = 0;) and configure maintenance scripts runner in crontab. This can be tricky for branched farm, but i think it will work like that:
* * * * * php /path/to/your/wiki/maintenance/runJobs.php --wiki domainA.com
* * * * * php /path/to/your/wiki/maintenance/runJobs.php --wiki domainB.com
* * * * * php /path/to/your/wiki/maintenance/runJobs.php --wiki domainC.com
In this scenario during script execution two constants will be available in LocalSettings.php: MW_DB and MW_PREFIX (use only MW_DB), so you will need to modify your LocalSettings.php like that:
...
$activeWiki = 'defaultWiki';
$switchVar = $_SERVER['SERVER_NAME'];
if( defined('DO_MAINTENANCE') && defined('MW_DB') ) {
$switchVar = MW_DB;
}
switch( $switchVar ) {
...
}
...

Problem found - the issue was that the wikis were behind a permission gate (just a regular Apache one), and async jobs don't inherit the permissions, so I had to set async jobs to false to solve it.
In case anyone else gets this problem - $wgRunJobsAsync = false; should be added to localSettings.php

Related

Environment variables contain wrong values in jobs

When I try to access an environment variable in a job (queue is database), I get an incorrect value.
I am trying to retrieve the APP_DEBUG variable because during development a Windows program is to be used for execution, while in the production system a Linux binary is used.
I have already restarted the queue, worked with queue:listen, and cleared the cache.
I assume that the .env is not read by the queue.
What options do I have to work around this behavior?
Or is there a solution for the problem?
Here is a sample of my code
/**
* Execute the index job.
*/
public function handle()
{
// Some code ...
if (true == env('APP_DEBUG')) {
WindowsFileIndexer::dispatchNow();
} else {
LinuxFileIndexer::dispatchNow();
}
// Some more job calls ...
}
Of course I know that I could get to the operating system via other ways (PHP_OS, etc.) and thus also check it. However, it must be possible to use the environment variables in a job.

Laravel 5.4 MultiSite with Queues per site

My Laravel is setup with a MultiSite Middleware Provider that checks the subdomain of the address and based on this subdomain changes the connection on-the-fly to another database.
e.g.
Config::set('database.connections.mysql.host', $config['host'] );
Config::set('database.connections.mysql.database', $config['db_name'] );
Config::set('database.connections.mysql.username', $config['user']);
Config::set('database.connections.mysql.password', $config['password']);
Config::set('database.connections.mysql.prefix', $config['prefix']);
Config::set('database.connections.mysql.theme', $config['theme']);
// purge main to prevent issues (and potentially speed up connections??)
DB::disconnect('main');
DB::purge();
DB::reconnect();
return $next($request);
This all works fantastic, except that I now want to use Laravel Queues with the built-in Database driver (sync actually works fine but blocks the user experience for long report generations).
Except Artisan isn't sure which database to connect to so I'm guessing it connects to the default, which is a kind of supervisor database that stores all the subdomains and corresponding db names etc.
Note none of these databases are setup in my database conf as connections, they're stored in a singular management database as there's quite a lot of them.
I've tried cloning the built-in Queue listener and modifying it to swap to the different site connection as so:
/**
* Create a new queue listen command.
*
* #param \Illuminate\Queue\Listener $listener
* #return void
*/
public function __construct(Listener $listener)
{
// multisite swap
$site = MultiSites::where('machine_name', $this->argument('site'));
MultiSites::changeSite($site->id);
parent::__construct();
$this->setOutputHandler($this->listener = $listener);
}
But this fails with
$commandPath argument missing for the Listener class.
Trying a similar database/site swap in the fire() or handle() methods stops the $commandPath error however it simply does nothing, no feedback and doesn't begin to process any jobs from the database.
I'm at a loss how to get this working with a multisite environment, does anyone have any ideas or am I going the wrong way about this?
My ideal scenario would be being able to run a singular Queue command, have supervisor monitor that and it to skip through each database checking. But I am also willing to spawn a queue command per database/site if necessary.

HHVM - Running resource extensive php daemons

Hi i'm trying to use hhvm to run all of the background PHP workers that are currently there in my application. I don't want to run hhvm as a server as Apache is already taking care of it , all i want to do is to run my php codes with hhvm, instead of the regular Zend engine.
Ok here are the codes which i want to run.
This is the entry point of the computationally intensive modules that i want to run
-------------**RunRenderer.php**--------------
#!/usr/bin/php
<?php
require_once 'Config.php';
require_once 'Renderer.php';
Renderer::getInstance()->run();
?>
Here is just a small a portion of the main controller that controls/forks/manages thousands of php tasks/processes.
----------------------------Renderer.php---------------------
<?php
require 'Workers/BumpMapsCalc.php';
/**
* Main Entry class of the Map rendering module
*
* Create workers for all of the different maps calc sub routines
*
*
*
*/
class Renderer extends \Core_Daemon {
/**
* the interval at which the execute method will run
*
* Interval : 10 min
*
*/
protected $loop_interval = 600;
/**
* Set the chunk size
*/
protected $chunkSize = 500;
/**
* Loop counter
*/
protected $loopCounter;
/**
* Low limit and the high limit
*/
protected $lowLimit;
protected $highLimit;
/**
* set the plugins for lock file and settings ini files
*
*/
protected function setup_plugins() {
$this->plugin('Lock_File');
$this->plugin('settings', new \Core_Plugin_Ini());
$this->settings->filename = BASE_PATH . "/Config/settings.ini";
$this->settings->required_sections = array('geometry');
}
protected function setup() {
$this->log("Computing Bumps Maps");
}
/**
* Create multiple separate task that will run in parallel
* Provide the low limit and the high limit which should effectively partition
* the whole table into more manageable chunks , thus making importing and
* storing data much faster and finished within 10 min
*
*/
protected function execute() {
for ($this->loopCounter = 1 ; $this->loopCounter <= $this->settings['geometry']['number'] ; $this->loopCounter += $this->chunkSize) {
$this->lowLimit = $this->loopCounter;
$this->highLimit = $this->loopCounter + $this->chunkSize;
$this->task(new LocalBumpMaps($this->lowLimit, $this->highLimit));
}
}
protected function log_file() {
$dir = BASE_PATH . "/Logs";
if (#file_exists($dir) == false)
#mkdir($dir, 0777, true);
return $dir . '/log_' . date('Y-m-d');
}
}
?>
So normally i would run the program as
php RunRenderer.php -d -p ./pid/pid $1
which would invoke the default zend engine and Renderer.php would fork around thousands of instances of LocalBumpMaps ( along with 100 other map rendering classes ). Now with each of this subtasks taking around 20-30 mb all of the memory in the workstation gets exhausted pretty quickly thus causing the system to screech to a halt.
Of course the main rendering engine is written in C++, but due to some weird requirement the whole front end is in PHP. And the php modules needs to perform around billions of calculations per second. So the only options that was left was to use HHVM in hopes of some significant increase in performance and efficiency.
But the problem is i can't get this code to run with hhvm. This is what i'm trying
hhvm RunRenderer.php -p ./pid $1
This doesn't do anything at all. No processes are forked, no output, nothing happens. So can anyone please tell me how do i run the php scripts with hhvm instead of zend.
I hope my question makes sense, and i would really appreciate any help.
Thanks,
Maxx
Just run the following line first without forking a process:
hhvm RunRenderer.php
If you see console output, and that you can Ctrl+C to terminate the process, then you can demonize the process with an Upstart script. Create a file called /etc/init/renderer.conf:
start on startup
stop on shutdown
respawn
script
hhvm RunRenderer.php
end script
Then you can manually start and stop the process by running:
start renderer
and
stop renderer
If you are running Ubuntu 12.04LTS and above, a log file will be created for you automatically under the name /var/log/upstart/renderer.log. You can fetch live output by tailing the file:
tail -f /var/log/upstart/renderer.log

How should I create one-off scheduled tasks in PHP? Cron?

I'm creating a web app where users can specify a time and date to run 2 scheduled tasks (one at the start date and one at the end date). As these are only run once each I didn't know if a cron job would be appropriate.
The other option I thought of would be to save all of the task times to a DB and run a cron job every hour to check if $usertime == NOW(), etc. But I was worried about jobs overlapping, etc.
Thoughts?
Additional: Many users can create many tasks that run 2 scripts each.
cron is great for scripts run on a regular basis, but if you want a one-off (or two-off) script to run at a particular time you would use the unix 'at' command, and you can do it directly from php using code like this:
/****
* Schedule a command using the AT command
*
* To do this you need to ensure that the www-data user is allowed to
* use the 'at' command - check this in /etc/at.deny
*
*
* EXAMPLE USAGE ::
*
* scriptat( '/usr/bin/command-to-execute', 'time-to-run');
* The time-to-run shoud be in this format: strftime("%Y%m%d%H%M", $unixtime)
*
**/
function scriptat( $cmd = null, $time = null ) {
// Both parameters are required
if (!$cmd) {
error_log("******* ScriptAt: cmd not specified");
return false;
}
if (!$time) {
error_log("******* ScriptAt: time not specified");
return false;
}
// We need to locate php (executable)
if (!file_exists("/usr/bin/php")) {
error_log("~ ScriptAt: Could not locate /usr/bin/php");
return false;
}
$fullcmd = "/usr/bin/php -f $cmd";
$r = popen("/usr/bin/at $time", "w");
if (!$r) {
error_log("~ ScriptAt: unable to open pipe for AT command");
return false;
}
fwrite($r, $fullcmd);
pclose($r);
error_log("~ ScriptAt: cmd=${cmd} time=${time}");
return true;
}
I'd do it like that, save settings in a database and check when needed if the task should start.
You could run a checking/initiating cronjob every minute. Just make sure the checking code is not not too heavy (exits quickly). A database query for a couple of rows shouldn't be a problem to execute every minute.
If the "task" is really heavy, you should consider a daemon instead of a cronjob calling php. Here is a good & easy-to-read introduction: Create daemons in PHP
Edit: I took for granted that even if the tasks are only ran "once each", you have multiple users which are 1:1 to the "once each", thereby jobs for each user. If not, at (as the comments says) looks worthy of an experiment.
Whatever mechanism you chose (cron/at/daemon) I would only put the start task into the queue. Along with that start task is to place the end task. That part can either place it into the future or it the time has elapsed start it immediately. That way they will never overlap.
I would also favour the PHP/DB and cron option. Seems simpler and gives more flexibility - could chose multiple threads etc if performance dicttates.

Create cronjob with Zend Framework

I am trying to write a cronjob controller, so I can call one website and have all modules cronjob.php executed. Now my problem is how do I do that?
Would curl be an option, so I also can count the errors and successes?
[Update]
I guess I have not explained it enough.
What I want to do is have one file which I can call like from http://server/cronjob and then make it execute every /application/modules/*/controller/CronjobController.php or have another way of doing it so all the cronjobs aren't at one place but at the same place the module is located. This would offer me the advantage, that if a module does not exist it does not try to run its cronjob.
Now my question is how would you execute all the modules CronjobController or would you do it a completly different way so it still stays modular?
And I want to be able to giveout how many cronjobs ran successfully and how many didn't
After some research and a lot procrastination I came to the simple conclusion that a ZF-ized cron script should contain all the functionality of you zend framework app - without all the view stuff. I accomplished this by creating a new cronjobfoo.php file in my application directory. Then I took the bare minimum from:
-my front controller (index.php)
-my bootstrap.php
I took out all the view stuff and focused on keeping the environment setup, db setup, autoloader, & registry setup. I had to take a little time to correct the document root variable and remove some of the OO functionality copied from my bootstrap.
After that I just coded away.. in my case it was compiling and emailing out nightly reports. It was great to use Zend_Mail. When I was confident that my script was working the way I wanted, I just added it my crontab.
good luck!
For Zend Framework I am currently using the code outlined bellow. The script only includes the portal file index.php, where all the paths, environment and other Zendy code is bootstrapped. By defining a constant in the cron script we cancel the final step , where the application is run.
This means the application is only setup, not even bootstrapped. At this point we start bootstraping the resources we need and that is that
//public/index.php
if(!defined('DONT_RUN_APP') || DONT_RUN_APP == false) {
$application->bootstrap()->run();
}
// application/../cron/cronjob.php
define("DONT_RUN_APP",true);
require(realpath('/srv/www/project/public/index.php'));
$application->bootstrap('config');
$application->bootstrap('db');
//cron code follows
I would caution putting your cronjobs accessible to the public because they could be triggered outside their normal times and, depending on what they do, cause problems (I know that is not what you intend, but by putting them into an actual controller it becomes reachable from the browser). For example, I have one cron that sends e-mails. I would be spammed constantly if someone found the cron URL and just began hitting it.
What I did was make a cron folder and in there created a heartbeat.php which bootstraps Zend Framework (minus MVC) for me. It checks a database which has a list of all the installed cron jobs and, if it is time for them to run, generates an instances of the cron job's class and runs it.
The cron jobs are just child classes from an abstract cron class that has methods like install(), run(), deactivate(), etc.
To fire off my job I just have a simple crontab entry that runs every 5 minutes that hits heartbeat.php. So far it's worked wonderful on two different sites.
Someone mentioned this blog entry a couple days ago on fw-general (a mailinglist which I recommend reading when you use the Zend Framework).
There is also a proposal for Zend_Controller_Request_Cli, which should address this sooner or later.
I have access to a dedicated server and I initially had a different bootstrap for the cron jobs. I eventually hated the idea, just wishing I could do this within the existing MVC setup and not have to bother about moving things around.
I created a file cron.sh, saved is within my site root (not public) and in it I put a series of commands I would like to run. As I wanted to run many commands at once I wrote the PHP within my controllers as usual and added curl calls to those urls within cron.sh. for example curl http://www.mysite.com/cron_controller/action Then on the cron interface I ran bash /path/to/cron.sh.
As pointed out by others your crons can be fired by anyone who guesses the url so there's always that caveat. You can find a solution to that in many different ways.
Take a look at zf-cli:
scripts at master from padraic/ZFPlanet - GitHub
This handles well all cron jobs.
Why not just create a crontab.php, including, or requiring the index.php bootstrap file?
Considering that the bootstrap is executing Zend_Loader::registerAutoload(), you can start working directly with the modules, for instance, myModules_MyClass::doSomething();
That way you are skipping the controllers. The Controller job is to control the access via http. In this case, you don't need the controller approach because you are accessing locally.
Do you have filesystem access to the modules' directories? You could iterate over the directories and determine where a CronjobController.php is available. Then you could either use Zend_Http_Client to access the controller via HTTP or use an approach like Zend_Test_PHPUnit: simulate the actual dispatch process locally.
You could set up a database table to hold references to the cronjob scripts (in your modules), then use a exec command with a return value on pass/fail.
I extended gregor answer with this post. This is what came out:
//public/index.php
// Run application, only if not started from command line (cli)
if (php_sapi_name() != 'cli' || !empty($_SERVER['REMOTE_ADDR'])) {
$application->run();
}
Thanks gregor!
My solution:
curl /cron
Global cron method will include_once all controllers
Check whether each of the controllors has ->cron method
If they have, run those.
Public cron url (for curl) is not a problem, there are many ways to avoid abuse. As said, checking remote IP is the easiest.
This is my way to run Cron Jobs with Zend Framework
In Bootstrap I will keep environment setup as it is minus MVC:
public static function setupEnvironment()
{
...
self::setupFrontController();
self::setupDatabase();
self::setupRoutes();
...
if (PHP_SAPI !== 'cli') {
self::setupView();
self::setupDbCaches();
}
...
}
Also in Bootstrap, I will modify setupRoutes and add a custom route:
public function setupRoutes()
{
...
if (PHP_SAPI == 'cli') {
self::$frontController->setRouter(new App_Router_Cli());
self::$frontController->setRequest(new Zend_Controller_Request_Http());
}
}
App_Router_Cli is a new router type which determines the controller, action, and optional parameters based on this type of request: script.php controller=mail action=send. I found this new router here: Setting up Cron with Zend Framework 1.11
:
class App_Router_Cli extends Zend_Controller_Router_Abstract
{
public function route (Zend_Controller_Request_Abstract $dispatcher)
{
$getopt = new Zend_Console_Getopt (array());
$arguments = $getopt->getRemainingArgs();
$controller = "";
$action = "";
$params = array();
if ($arguments) {
foreach($arguments as $index => $command) {
$details = explode("=", $command);
if($details[0] == "controller") {
$controller = $details[1];
} else if($details[0] == "action") {
$action = $details[1];
} else {
$params[$details[0]] = $details[1];
}
}
if($action == "" || $controller == "") {
die("Missing Controller and Action Arguments == You should have:
php script.php controller=[controllername] action=[action]");
}
$dispatcher->setControllerName($controller);
$dispatcher->setActionName($action);
$dispatcher->setParams($params);
return $dispatcher;
}
echo "Invalid command.\n", exit;
echo "No command given.\n", exit;
}
public function assemble ($userParams, $name = null, $reset = false, $encode = true)
{
throw new Exception("Assemble isnt implemented ", print_r($userParams, true));
}
}
In CronController I do a simple check:
public function sendEmailCliAction()
{
if (PHP_SAPI != 'cli' || !empty($_SERVER['REMOTE_ADDR'])) {
echo "Program cannot be run manually\n";
exit(1);
}
// Each email sent has its status set to 0;
Crontab runs a command of this kind:
* * * * * php /var/www/projectname/public/index.php controller=name action=send-email-cli >> /var/www/projectname/application/data/logs/cron.log
It doesn't make sense to run the bootstrap in the same directory or in cron job folder. I've created a better and easy way to implement the cron job work. Please follow the below things to make your work easy and smart:
Create a cron job folder such as "cron" or "crobjob" etc. whatever you want.
Sometimes we need the cron job to run on a server with different interval like for 1 hr interval or 1-day interval that we can setup on the server.
Create a file in cron job folder like I created an "init.php", Now let's say you want to send a newsletter to users in once per day. You don't need to do the zend code in init.php.
So just set up the curl function in init.php and add the URL of your controller action in that curl function. Because our main purpose is that an action should be called on every day. for example, the URL should be like this:
https://www.example.com/cron/newsletters
So set up this URL in curl function and call this function in init.php in the same file.
In the above link, you can see "cron" is the controller and newsletters is the action where you can do your work, in the same way, don't need to run the bootstrap file etc.

Categories