How do I customize and use Phirehose functions? - php

I'm trying to put in a check for Phirehose to stop running after 10 seconds or 100 tweets...basically, I want to be able to stop the script.
I was told I could customize the statusUpdate() function or the heartBeat() function, but I'm uncertain how to do that. Right now, I'm just testing with the filter-track.php example.
How do I customize the functions, and where should I call them in the class?
class FilterTrackConsumer extends OauthPhirehose
{
/**
* Enqueue each status
*
* #param string $status
*/
public function enqueueStatus($status)
{
/*
* In this simple example, we will just display to STDOUT rather than enqueue.
* NOTE: You should NOT be processing tweets at this point in a real application, instead they should be being
* enqueued and processed asyncronously from the collection process.
*/
$data = json_decode($status, true);
if (is_array($data) && isset($data['user']['screen_name'])) {
print $data['user']['screen_name'] . ': ' . urldecode($data['text']) . "\n";
}
}
public function statusUpdate()
{
return "asdf";
}
}
// The OAuth credentials you received when registering your app at Twitter
define("TWITTER_CONSUMER_KEY", "");
define("TWITTER_CONSUMER_SECRET", "");
// The OAuth data for the twitter account
define("OAUTH_TOKEN", "");
define("OAUTH_SECRET", "");
// Start streaming
$sc = new FilterTrackConsumer(OAUTH_TOKEN, OAUTH_SECRET, Phirehose::METHOD_FILTER);
$sc->setLang('en');
$sc->setTrack(array('love'));
$sc->consume();

To stop after 100 tweets, have a counter in that function receiving the tweets, and call exit when done:
class FilterTrackConsumer extends OauthPhirehose
{
private $tweetCount = 0;
public function enqueueStatus($status)
{
//Process $status here
if(++$this->tweetCount >= 100)exit;
}
...
(Instead of exit you could throw an exception, and put a try/catch around your $sc->consume(); line.)
For shutdown "after 10 seconds", this is easy if it can be roughly 10 seconds (i.e. put a time check in enqueueStatus(), and exit if it has been more than 10 seconds since the program started), but hard if you want it to be exactly 10 seconds. This is because enqueueStatus() is only called when a tweet comes in. So, as an extreme example, if you get 200 tweets in the first 9 seconds, but then it goes quiet and the 201st tweet does not arrive for 80 more seconds, your program would not exit until the program has been running 89 seconds.
Taking a step back, wanting to stop Phirehose is normally a sign it is the wrong tool for the job. If you just want to poll 100 recent tweets, every now and again, then the REST API, doing a simple search, is better. The streaming API is more for applications that intend to run 24/7, and want to react to tweets as soon as they are, well, tweeted. (More critically, Twitter will rate-limit, or close, your account if you connect too frequently.)

Related

Symfony Lock does not lock when making two requests from the same browser

I want to prevent a user from making the same request two times by using the Symfony Lock component. Because now users can click on a link two times(by accident?) and duplicate entities are created. I want to use the Unique Entity Constraint which does not protect against race conditions itself.
The Symfony Lock component does not seem to work as expected. When I create a lock in the beginning of a page and open the page two times at the same time the lock can be acquired by both requests. When I open the test page in a standard and incognito browser window the second request doesn't acquire the lock. But I can't find anything in the docs about this being linked to a session. I have created a small test file in a fresh project to isolate the problem. This is using php 7.4 symfony 5.3 and the lock component
<?php
namespace App\Controller;
use Sensio\Bundle\FrameworkExtraBundle\Configuration\Template;
use Symfony\Bundle\FrameworkBundle\Controller\AbstractController;
use Symfony\Component\Lock\LockFactory;
use Symfony\Component\Routing\Annotation\Route;
class LockTest extends AbstractController
{
/**
* #Route("/test")
* #Template("lock/test.html.twig")
*/
public function test(LockFactory $factory): array
{
$lock = $factory->createLock("test");
$acquired = $lock->acquire();
dump($lock, $acquired);
sleep(2);
dump($lock->isAcquired());
return ["message" => "testing"];
}
}
I slightly rewrote your controller like this (with symfony 5.4 and php 8.1):
class LockTestController extends AbstractController
{
#[Route("/test")]
public function test(LockFactory $factory): JsonResponse
{
$lock = $factory->createLock("test");
$t0 = microtime(true);
$acquired = $lock->acquire(true);
$acquireTime = microtime(true) - $t0;
sleep(2);
return new JsonResponse(["acquired" => $acquired, "acquireTime" => $acquireTime]);
}
}
It waits for the lock to be released and it counts the time the controller waits for the lock to be acquired.
I ran two requests with curl against a caddy server.
curl -k 'https://localhost/test' & curl -k 'https://localhost/test'
The output confirms one request was delayed while the first one slept with the acquired lock.
{"acquired":true,"acquireTime":0.0006971359252929688}
{"acquired":true,"acquireTime":2.087146043777466}
So, the lock works to guard against concurrent requests.
If the lock is not blocking:
$acquired = $lock->acquire(false);
The output is:
{"acquired":true,"acquireTime":0.0007710456848144531}
{"acquired":false,"acquireTime":0.00048804283142089844}
Notice how the second lock is not acquired. You should use this flag to reject the user's request with an error instead of creating the duplicate entity.
If the two requests are sufficiently spaced apart to each get the lock in turn, you can check that the entity exists (because it had time to be fully committed to the db) and return an error.
Despite those encouraging results, the doc mentions this note:
Unlike other implementations, the Lock Component distinguishes lock instances even when they are created for the same resource. It means that for a given scope and resource one lock instance can be acquired multiple times. If a lock has to be used by several services, they should share the same Lock instance returned by the LockFactory::createLock method.
I understand two locks acquired by two distinct factories should not block each other. Unless the note is outdated or wrongly phrased, it seems possible to have non working locks under some circumstances. But not with the above test code.
StreamedResponse
A lock is released when it goes out of scope.
As a special case, when a StreamedResponse is returned, the lock goes out of scope when the response is returned by the controller. But the StreamedResponse has yet to return anything!
To keep the lock while the response is generated, it must be passed to the function executed by the StreamedResponse:
public function export(LockFactory $factory): Response
{
// create a lock with a TTL of 60s
$lock = $factory->createLock("test", 60);
if (!$lock->acquire(false)) {
return new Response("Too many downloads", Response::HTTP_TOO_MANY_REQUESTS);
}
$response = new StreamedResponse(function () use ($lock) {
// now $lock is still alive when this function is executed
$lockTime = time();
while (have_some_data_to_output()) {
if (time() - $lockTime > 50) {
// refresh the lock well before it expires to be on safe side
$lock->refresh();
$lockTime = time();
}
output_data();
}
$lock->release();
};
$response->headers->set('Content-Type', 'text/csv');
// lock would be released here if it wasn't passed to the StreamedResponse
return $response;
}
The above code refreshes the lock every 50s to cut down on communication time with the storage engine (such as redis).
The lock remains locked for at most 60s should the php process suddenly die.

LARAVEL: Is re queue a job is a bad idea?

I would like to know re queueing a laravel job is a bad idea or not. i had a scenario where i need to pull users post from facebook once they integrated there facebook account to my application. i want to pull {x} days historic data. facebook api like any other api limit there api request per minute. i keep track the request headers and once rate limit reached i saved those information in database and for each re queue i check whether i am eligible to make a call to facebook api
here is the code snippet for a better visualization
<?php
namespace App\Jobs;
class FacebookData implements ShouldQueue
{
/**
* The number of seconds the job can run before timing out.
*
* #var int
*/
public $timeout = 120;
public $userid;
public function __construct($id)
{
$this->userid=$id;
}
public function handle()
{
if($fbhelper->canPullData())
{
$res=$fbhelper->getData($user->id);
if($res['code']==429)
{
$fbhelper->storeRetryAfter($res);
self::dispatch($user->id);
}
}
}
}
The above snippet is a rough idea. is this a good idea? the reason why i post this question is the self::dispatch($user->id); looks like a recursion and it will try until $fbhelper->canPullData() returns true.that probably will take 6 minutes.i am worried about any impact would happen in my application.Thanks in advance
Retrying job is not a bad idea, it is just build into jobs design already. Laravel have retries for this matter, that jobs can do unreliable operations.
As an example in a project i have been working on, an external API we are working with has 1-5 http 500 errors per 100 requests we are sending. This is thou handled by the built in retry functionality of Laravel.
As of Laravel 5.4 you can set it in the class like so. This will do exactly what you want, without defining the logic. Finally for hitting the retry limit, you can define a function called retryAfter(), which specifies when the job should be retried.
class FacebookData {
public $tries = 5;
public function retryAfter() {
//wait 6 minutes
return 360;
}
}
If you want to keep your logic where you only retry 429 errors, i would use the inverse of that to delete the job, if its anything else than a 429.
if ($res['code'] !== 429) {
$this->delete();
}

Rate limiting PHP function

I have a php function which gets called when someone visits POST www.example.com/webhook. However, the external service which I cannot control, sometimes calls this url twice in rapid succession, messing with my logic since the webhook persists stuff in the database which takes a few ms to complete.
In other words, when the second request comes in (which can not be ignored), the first request is likely not completed yet however I need this to be completed in the order it came in.
So I've created a little hack in Laravel which should "throttle" the execution with 5 seconds in between. It seems to work most of the time. However an error in my code or some other oversight, does not make this solution work everytime.
function myWebhook() {
// Check if cache value (defaults to 0) and compare with current time.
while(Cache::get('g2a_webhook_timestamp', 0) + 5 > Carbon::now()->timestamp) {
// Postpone execution.
sleep(1);
}
// Create a cache value (file storage) that stores the current
Cache::put('g2a_webhook_timestamp', Carbon::now()->timestamp, 1);
// Execute rest of code ...
}
Anyone perhaps got a watertight solution for this issue?
You have essentially designed your own simplified queue system which is the right approach but you can make use of the native Laravel queue to have a more robust solution to your problem.
Define a job, e.g: ProcessWebhook
When a POST request is received to /webhook queue the job
The laravel queue worker will process one job at a time[1] in the order they're received, ensuring that no matter how many requests are received, they'll be processed one by one and in order.
The implementation of this would look something like this:
Create a new Job, e.g: php artisan make:job ProcessWebhook
Move your webhook processing code into the handle method of the job, e.g:
public function __construct($data)
{
$this->data = $data;
}
public function handle()
{
Model::where('x', 'y')->update([
'field' => $this->data->newValue
]);
}
Modify your Webhook controller to dispatch a new job when a POST request is received, e.g:
public function webhook(Request $request)
{
$data = $request->getContent();
ProcessWebhook::dispatch($data);
}
Start your queue worker, php artisan queue:work, which will run in the background processing jobs in the order they arrive, one at a time.
That's it, a maintainable solution to processing webhooks in order, one-by-one. You can read the Queue documentation to find out more about the functionality available, including retrying failed jobs which can be very useful.
[1] Laravel will process one job at a time per worker. You can add more workers to improve queue throughput for other use cases but in this situation you'd just want to use one worker.

How to handle deadlock in Doctrine?

I have a mobile application and server based on Symfony which gives API for the mobile app.
I have a situation, where users can like Post. When users like Post I add an entry in ManyToMany table that this particular user liked this particular Post (step 1). Then in Post table I increase likesCounter (step 2). Then in User table I increase gamification points for user (because he liked the Post) (step 3).
So there is a situation where many users likes particular Post at the same time and deadlock occurs (on Post table or on User table).
How to handle this? In Doctrine Docs I can see solution like this:
<?php
try {
// process stuff
} catch (\Doctrine\DBAL\Exception\RetryableException $e) {
// retry the processing
}
but what should I do in catch part? Retry the whole process of liking (steps 1 to 3) for instance 3 times and if failed return BadRequest to the mobile application? Or something else?
I don't know if this is a good example cause maybe I could try to rebuild the process so the deadlock won't happen but I would like to know what should I do if they actually happen?
I disagree with Stefan, deadlocks are normal as the MySQL documentation says:
Normally, you must write your applications so that they are always prepared to re-issue a transaction if it gets rolled back because of a deadlock.
See: MySQL documentation
However, the loop suggested by Stefan is the right solution. Except that it lacks an important point: after Doctrine has thrown an Exception, the EntityManager becomes unusable and you must create a new one in the catch clause with resetManager() from the ManagerRegistry instance.
When I had exactly the same concern as you, I searched the web but couldn't find any completely satisfactory answer. So I got my hands dirty and came back with an article where you'll find an implementation exemple of what I said above:
Thread-safe business logic with Doctrine
What I'd do is post all likes on a queue and consume them using a batch consumer so that you can group the updates on a single post.
If you insist on keeping you current implementation you could go down the road you yourself suggested like this:
<?php
for ($i = 0; $i < $retryCount; $i++) {
try {
// try updating
break;
} catch (\Doctrine\DBAL\Exception\RetryableException $e) {
// you could also add a delay here
continue;
}
}
if ($i === $retryCount) {
// throw BadRequest
}
This is an ugly solution and I wouldn't suggest it. Deadlocks shouldn't be "avoided" by retrying or using delays. Also have a look at named locks and use the same retry system, but don't wait for the deadlock to happen.
The problem is that after Symfony Entity Manager fails - it closes db connection and you can't continue you work with db even if you catch the ORMException.
First good solution is to process your 'likes' async, with rabbitmq or other queue implementation.
Step-by-step:
Create message like {type: 'like', user:123, post: 456}
Publish it in queue
Consume it and update 'likes' count.
You can have several consumers that try to obtain lock on based on postId. If two consumers try to update same post - one of them will fail obtaining the lock. But it's ok, you can consume failed message after.
Second solution is to have special table e.g. post_likes (userId, postId, timestamp). Your endpoint could create new rows in this table synchronously. And you can count 'likes' on some post with this table. Or you can write some cron script, which will update post likes count by this table.
I've made a special class to retry on deadlock (I'm on Symfony 4.4).
Here it is :
class AntiDeadlockService
{
/**
* #var EntityManagerInterface
*/
private $em;
public function __construct(EntityManagerInterface $em)
{
$this->em = $em;
}
public function safePush(): void
{
// to retry on deadlocks or other retryable exceptions
$connection = $this->em->getConnection();
$retry = 0;
$maxRetries = 3;
while ($retry < $maxRetries) {
try {
if (!$this->em->isOpen()) {
$this->em = $this->em->create(
$connection = $this->em->getConnection(),
$this->em->getConfiguration()
);
}
$connection->beginTransaction(); // suspend auto-commit
$this->em->flush();
$connection->commit();
break;
} catch (RetryableException $exception) {
$connection->rollBack();
$retry++;
if ($retry === $maxRetries) {
throw $exception;
}
}
}
}
}
Use this safePush() method instead of the $entityManager->push() one ;)

HHVM - Running resource extensive php daemons

Hi i'm trying to use hhvm to run all of the background PHP workers that are currently there in my application. I don't want to run hhvm as a server as Apache is already taking care of it , all i want to do is to run my php codes with hhvm, instead of the regular Zend engine.
Ok here are the codes which i want to run.
This is the entry point of the computationally intensive modules that i want to run
-------------**RunRenderer.php**--------------
#!/usr/bin/php
<?php
require_once 'Config.php';
require_once 'Renderer.php';
Renderer::getInstance()->run();
?>
Here is just a small a portion of the main controller that controls/forks/manages thousands of php tasks/processes.
----------------------------Renderer.php---------------------
<?php
require 'Workers/BumpMapsCalc.php';
/**
* Main Entry class of the Map rendering module
*
* Create workers for all of the different maps calc sub routines
*
*
*
*/
class Renderer extends \Core_Daemon {
/**
* the interval at which the execute method will run
*
* Interval : 10 min
*
*/
protected $loop_interval = 600;
/**
* Set the chunk size
*/
protected $chunkSize = 500;
/**
* Loop counter
*/
protected $loopCounter;
/**
* Low limit and the high limit
*/
protected $lowLimit;
protected $highLimit;
/**
* set the plugins for lock file and settings ini files
*
*/
protected function setup_plugins() {
$this->plugin('Lock_File');
$this->plugin('settings', new \Core_Plugin_Ini());
$this->settings->filename = BASE_PATH . "/Config/settings.ini";
$this->settings->required_sections = array('geometry');
}
protected function setup() {
$this->log("Computing Bumps Maps");
}
/**
* Create multiple separate task that will run in parallel
* Provide the low limit and the high limit which should effectively partition
* the whole table into more manageable chunks , thus making importing and
* storing data much faster and finished within 10 min
*
*/
protected function execute() {
for ($this->loopCounter = 1 ; $this->loopCounter <= $this->settings['geometry']['number'] ; $this->loopCounter += $this->chunkSize) {
$this->lowLimit = $this->loopCounter;
$this->highLimit = $this->loopCounter + $this->chunkSize;
$this->task(new LocalBumpMaps($this->lowLimit, $this->highLimit));
}
}
protected function log_file() {
$dir = BASE_PATH . "/Logs";
if (#file_exists($dir) == false)
#mkdir($dir, 0777, true);
return $dir . '/log_' . date('Y-m-d');
}
}
?>
So normally i would run the program as
php RunRenderer.php -d -p ./pid/pid $1
which would invoke the default zend engine and Renderer.php would fork around thousands of instances of LocalBumpMaps ( along with 100 other map rendering classes ). Now with each of this subtasks taking around 20-30 mb all of the memory in the workstation gets exhausted pretty quickly thus causing the system to screech to a halt.
Of course the main rendering engine is written in C++, but due to some weird requirement the whole front end is in PHP. And the php modules needs to perform around billions of calculations per second. So the only options that was left was to use HHVM in hopes of some significant increase in performance and efficiency.
But the problem is i can't get this code to run with hhvm. This is what i'm trying
hhvm RunRenderer.php -p ./pid $1
This doesn't do anything at all. No processes are forked, no output, nothing happens. So can anyone please tell me how do i run the php scripts with hhvm instead of zend.
I hope my question makes sense, and i would really appreciate any help.
Thanks,
Maxx
Just run the following line first without forking a process:
hhvm RunRenderer.php
If you see console output, and that you can Ctrl+C to terminate the process, then you can demonize the process with an Upstart script. Create a file called /etc/init/renderer.conf:
start on startup
stop on shutdown
respawn
script
hhvm RunRenderer.php
end script
Then you can manually start and stop the process by running:
start renderer
and
stop renderer
If you are running Ubuntu 12.04LTS and above, a log file will be created for you automatically under the name /var/log/upstart/renderer.log. You can fetch live output by tailing the file:
tail -f /var/log/upstart/renderer.log

Categories