I'm creating an application in Symfony, and I need to retrieve a large number of customer records (tens of thousands) from an external API endpoint, then store it in a Doctrine database. The API will only return 100 results at a time, so it's paginated into a few hundred pages. Triggering this synchronously resulted in a very long wait before running out of memory (Not surprising), so I pulled the code out to a message handler, which essentially looks like this
public function __invoke(GetCustomersMessage $message)
$response = $this->client->request('GET', $message->getQueryUrl() . '&page=' . $message->getPage(), [
'auth_basic' => $this->authCreds,
$statusCode = $response->getStatusCode();
if ($statusCode != 200) {
throw new \Exception('Error: ' . $statusCode);
$content = $response->getContent();
$data = json_decode($content, true);
$result = $this->createCustomers($data, $message->getStoreId());
// If we successfully added all new customers, get another batch.
if ($result === true) {
$this->bus->dispatch(new GetCustomersMessage($message->getQueryUrl(), $message->getPage() + 1, $message->getStoreId()));
So essentially I query the api, then try to add those customers to my database, and if they all get added successfully, I dispatch a message to get the next batch. The message transport is the async Doctrine transport. I spun up a worker to consume the messages, and it synced way more customers than the previous attempt, but still ran out of memory after about 250. I was surprised to see though that when the worker died, it didn't leave the final message as incomplete, nor move it to the "failed" queue, it was just gone, so when I created another worker it was unable to pick back up where the other one left off.
This is my first time attempting a messenger/bus architecture, am I approaching this wrong? I considered queueing up all the messages at once, but I still believe I'd lose the data contained in whichever message the worker died on. Secondarily the intention is that this would run whenever we need to sync customers, so it stops when it reaches a record we already have in the database, if I queue up a message for every batch it would make a few hundred useless calls on every sync besides the first. Is there a way to monitor a worker and kill it before it reaches the memory limit?
I'm trying to listen for subscription changes (new and existing) of my Google Play app on the server. Here's the code I'm using. This uses the google/cloud-pubsub composer package:
$projectId = 'app-name';
$keyFile = file_get_contents(storage_path('app/app-name.json'));
$pubsub = new PubSubClient([
'projectId' => $projectId,
'keyFile' => json_decode($keyFile, true)
$httpPostRequestBody = file_get_contents('php://input');
$requestData = json_decode($httpPostRequestBody, true);
$message = $pubsub->consume($requestData);
The code above works but the problem is that the data I get doesn't match the one I'm getting in the app side. This is a sample data:
If you base64_decode() the data, you'll get something like this:
version: "1.0",
packageName: "com.dev.app",
eventTimeMillis: "1607997631636",
subscriptionNotification: {
version: "1.0",
notificationType: 4,
purchaseToken: "kmloa....",
subscriptionId: "app_subs1"
This is where I'm expecting the purchaseToken to be the same as the one I'm getting from the client side.
Here's the code in the client-side. I'm using Expo in-app purchases to implement subscriptions:
setPurchaseListener(async ({ responseCode, results, errorCode }) => {
if (responseCode === IAPResponseCode.OK) {
const { orderId, purchaseToken, acknowledged } = results[0];
if (!acknowledged) {
await instance.post("/subscribe", {
order_id: orderId,
order_token: purchaseToken,
data: JSON.stringify(results[0]),
finishTransactionAsync(results[0], true);
"You're now subscribed! You can now use the full functionality of the app."
I'm expecting the purchaseToken I'm extracting from results[0] to be the same as the one the Google server is returning when it pushes the notification to the endpoint. But it doesn't.
I think my main problem is that I'm assumming all the data I need will be coming from Google Pay, so I'm just relying on the data published by Google when a user subscribes in the app.
This isn't actually the one that publishes the message:
await instance.post("/subscribe")
It just updates the database with the purchase token. I can just use this to subscribe the user but there's no guarantee that the request is legitimate. Someone can just construct the necessary credentials based on an existing user and they can pretty much subscribe without paying anything. Plus this method can't be used to keep the user subscribed. So the data really has to come from Google.
Based on the answer below, I now realized that you're supposed to trigger the publish from your own server? and then you listen for that? So when I call this from the client:
await instance.post("/subscribe", {
I actually need to publish the message containing the purchase token like so:
$pubsub = new PubSubClient([
'projectId' => $projectId,
$topic = $pubsub->topic($topicName);
$message = [
'purchaseToken' => request('purchaseToken')
$topic->publish(['data' => $message]);
Is that what you're saying? But the only problem with this approach is how to validate if the purchase token is legitimate, and how to renew the subscription in the server? I have a field that needs to be updated each month so the user stays "subscribed" in the eyes of the server.
Maybe, I'm just overcomplicating things by using pub/sub. If there's actually an API which I could pull out data from regularly (using cron) which allows me to keep the user subscription data updated then that will also be acceptable as an answer.
First of all - I have a really bad experience with php and pubsub because of the php PubSubClient. If your script is only waiting for push and checking the messages then remove the pubsub package and handle it with few lines of code.
$message = file_get_contents('php://input');
$message = json_decode($message, true);
if (is_array($message)) {
$message = (isset($message['message']) && isset($message['message']['data'])) ? base64_decode($message['message']['data']) : false;
if (is_string($message)) {
$message = json_decode($message, true);
if (is_array($message)) {
$type = (isset($message['type'])) ? $message['type'] : null;
$data = (isset($message['data'])) ? $message['data'] : [];
I'm not sure how everything works on your side but if this part publishes the message:
await instance.post("/subscribe", {
order_id: orderId,
order_token: purchaseToken,
data: JSON.stringify(results[0]),
It looks like it's a proxy method to publish your messages. Because payload sent with it is not like a PubSub described schema and in the final message it doesn't look like IAPQueryResponse
If I was in your situation I will check few things to debug the problem:
How I publish/read a message to/from PubSub (topic, subscription and message payload)
I will write the publish mechanism as it is described in Google PubSub publish documentation
I will check my project, topic and subscription
If everything is set-up correctly then I will compare all other message data
If the problem persist then I will try to publish to PubSub minimal amount of data - just purchaseToken at the start to check what breaks the messages
For easier debug:
Create pull subscription
When you publish a message check pull subscription messages with "View messages"
For me the problem is not directly in PubSub but in your implementation of publish/receiving of messages.
UPDATE 21-12-2020:
Customer create/renew subscription
Publish to pubsub with authentication
PubSub transfers the message to analysis application via "push" to make your analysis.
If you need information like:
New subscribers count
Renews count
Active subscriptions count
You can create your own analysis application but if you need something more complicated then you have to pick a tool to met your needs.
You can get the messages from pubsub also with "pull" but there are few cases I've met:
Last time I've used pull pubsub returns random amount of messages - if my limit is 50 and I have more than 50 messages in the queue I'm expecting to get 50 messages but sometimes pubsub gives me less messages.
PubSub returned messages in random order - now there is an option to use ordering key but it's something new.
To implement "pull" you have to run crons or something with "push" you receive the message as soon as possible.
With "pull" you have to depend on library/package (or whatever in any language it's called) but on "push" you can handle the message with just few lines of code as my php exapmle.
This actually follows on from a previous question I had that, unfortunately, did not receive any answers so I'm not exactly holding my breath for a response but I understand this can be a bit of a tricky issue to solve.
I am currently trying to implement rate limiting on outgoing requests to an external API to match the limit on their end. I have tried to implement a token bucket library (https://github.com/bandwidth-throttle/token-bucket) into the class we are using to manage Guzzle requests for this particular API.
Initially, this seemed to be working as intended but we have now started seeing 429 responses from the API as it no longer seems to be correctly rate limiting the requests.
I have a feeling what is happening is that the number of tokens in the bucket is now being reset every time the API is called due to how Symfony handles services.
I am setting currently setting up the bucket location, rate and starting amount in the service's constructor:
public function __construct()
$storage = new FileStorage(__DIR__ . "/api.bucket");
$rate = new Rate(50, Rate::MINUTE);
$bucket = new TokenBucket(50, $rate, $storage);
$this->consumer = new BlockingConsumer($bucket);
I'm then attempting to consume a token before each request:
public function fetch(): array
try {
$response = $this->client->request(
'GET', $this->buildQuery(), [
'query' => array_merge($this->params, ['api_key' => $this->apiKey]),
'headers' => [ 'Content-type' => 'application/json' ]
} catch (ServerException $e) {
// Process Server Exception
} catch (ClientException $e) {
// Process Client Exception
return $this->checkResponse($response);
I can't see anything obvious in that, that would allow it to request more than 50 times per minute unless the amount of available tokens was being reset on each request.
This is being supplied to a set of repository services that handle converting the data from each endpoint into objects used within the system. Consumers use the appropriate repository to request the data needed to complete their process.
If the amount of tokens is being reset by the bootstrap function being in service constructor, where should it be moved to within the Symfony framework that would still work with consumers?
I assume that it should work, but maybe try to move the ->bootstrap(50) call from every request? Not sure, but it can be the reason.
Anyway it's better to do that only once, as a part of your deployment (every time you deploy a new version). It doesn't have anything with Symfony, really, because the framework doesn't have any restrictions on deployment procedure. So it depends on how you do the deployment.
P.S. Have you considered to just handle 429 errors from the server? IMO you can wait (that's what BlockingConsumer does inside) when you receive 429 error. It's simpler and doesn't require an additional layer in your system.
BTW, have you considered nginx's ngx_http_limit_req_module as an alternative solution? It usually comes with nginx by default, so no additional actions to install, only a small configuration is required.
You can place an nginx proxy behind your code and the target web service and enable limits on it. Then in your code you will handle 429 as usual, but the requests will be throttled by your local nginx proxy, not by the external web service. So the final destination will get only limited amount of requests.
I have found a trick using Guzzle bundle for symfony.
I had to improve a sequential program sending GET requests to a Google API. In code example, it a pagespeed URL.
To have a rate limit, there an option to delay the requests before they are sent asynchronously.
Pagespeed rate limit is 200 requests per minute.
A quick calculation gives 200/60 = 0.3s per request.
Here is the code I tested on 300 urls, getting a fantastic result of no error, except if the url passed as a parameter in the GET request gives a 400 HTTP Error (Bad request).
I put a delay of 0.4s and the average result time is less then 0.2s, whereas it took more than a minute with a sequential program.
use GuzzleHttp;
use GuzzleHttp\Client;
use GuzzleHttp\Promise\EachPromise;
use GuzzleHttp\Exception\ClientException;
// ... Now inside class code ... //
$client = new GuzzleHttp\Client();
$promises = [];
foreach ($requetes as $i=>$google_request) {
$promises[] = $client->requestAsync('GET', $google_request ,['delay'=>0.4*$i*1000]); // delay is the trick not to exceed rate limit (in ms)
GuzzleHttp\Promise\each_limit($promises, function(){ // function returning the number of concurrent requests
return 100; // 1 or 100 concurrent request(s) don't really change execution time
}, // Fulfilled function
function ($response,$index)use($urls,$fp) { // $urls is used to get the url passed as a parameter in GET request and $fp a csv file pointer
$feed = json_decode($response->getBody(), true); // Get array of results
$this->write_to_csv($feed,$fp,$urls[$index]); // Write to csv
}, // Rejected function
function ($reason,$index) {
if ($reason instanceof GuzzleHttp\Exception\ClientException) {
$message = $reason->getMessage();
var_dump(array("error"=>"error","id"=>$index,"message"=>$message)); // You could write the errors to a file or database too
Fairly new to ZeroMQ. I have a simple REQ/REP queue like below. I am using PHP but that doesn't matter as any language binding would be fine for me.
This is client to request a task
$ctx = new ZMQContext();
$req = new ZMQSocket($ctx, ZMQ::SOCKET_REQ);
$req->send("Export Data as Zip");
echo $i . ":" . $req->recv().PHP_EOL;
And this is a worker to actually perform the task.
$ctx = new ZMQContext();
$srvr = new ZMQSocket($ctx, ZMQ::SOCKET_REP);
echo "Server is started at port $port" . PHP_EOL;
$msg = $srvr->recv();
echo "Message = " . $msg . PHP_EOL;
// Do the work here, takes 10 min, knows the count of lines added and remaining
$srvr->send($msg . " is exported as zip file" . date('H:i:s'));
As the task of exporting data takes about 10 min, I want to connect to the server from a different client and get the progress/ percentage of the task done.
I am wondering if that's even a valid approach.
I tried this approach where REQ/REP part works but I get nothing in PUB/SUB part
Server part
$ctx = new ZMQContext();
$srvr = new ZMQSocket($ctx, ZMQ::SOCKET_REP);
// add PUB socket to publish progress
$c = new ZMQContext();
$p = new ZMQSocket($c, ZMQ::SOCKET_PUB);
echo "Server is started at port 5454" . PHP_EOL;
$prog = 0;
$p->send($prog++ . '%'); // this part doesn't get to the progress client
$msg = $srvr->recv();
echo "Message = " . $msg . PHP_EOL;
sleep(2);// some long task
$srvr->send($msg . " Done zipping " . date('H:i:s'));
Progress client
$ctx = new ZMQContext();
$stat = new ZMQSocket($ctx, ZMQ::SOCKET_SUB);
while (true){
echo $stat->recv() . PHP_EOL; //nothing shows here
Request client
$ctx = new ZMQContext();
$req = new ZMQSocket($ctx, ZMQ::SOCKET_REQ);
$req->send("$i : Zip the file please");
echo $i . ":" . $req->recv().PHP_EOL; //works and get the output
The concept is feasible, some tuning needed:
All PUB counterparties have to setup any non-default subscription, via, at least an empty subscription .setsockopt( ZMQ_SUBSCRIBE, "" ) meaning receive all TOPICs ( none "filter"-ed out ).
Next, both PUB-side and SUB sides ought get .setsockopt( ZMQ_CONFLATE, 1 ) configured, as there is of no value to populate and feed all intermediate values into the en-queue/de-queue pipeline, once the only value is in the "last", most recent message.
Always, the non-blocking mode of the ZeroMQ calls ought be preferred ( .recv( ..., flags = ZMQ_NOBLOCK ) et al ) or the Poller.poll() pre-tests ought be used to sniff first for a (non)-presence of a message, before spending more efforts on reading its context "from" ZeroMQ context-manager. Simply put, there are not many cases, where blocking-mode service calls may serve well in a production-grade system.
Also some further tweaking may help the PUB side, in case a more massive "attack" comes from the un-restricted pool of SUB-side entities and PUB has to create / manage / maintain resources for each of these ( unrestricted ) counterparties.
You need only use PUB/SUB if there is more than one client wanting to receive the same progress updates. Just use PUSH/PULL for a simple, point to point transfer that works over tcp.
Philosophical Discussion
With problems such as this to solve there's two approaches.
Use additional sockets to convey additional message types,
Use just two sockets, but convey more than one message type through them
You're talking about doing 1). It might be worth contemplating 2), though I must emphasise that I know next to nothing of PHP and so don't know if there are language features that encourage one to have separate request and progress clients.
If you do, your original client needs a loop (after it has sent the request) to receive multiple messages, either progress update messages or the final result. Your server, whilst it is doing its 10 minute lookup, will send regular progress update messages, and the final result message at the end. You would probably use PUSH/PULL client to server, and the same again for the progress / result from the server back to the client.
It is architecturally more flexible to follow 2). Once you have a means of sending two or more message types through a single socket and of decoding them at the receiving end, you can send more. For example, you could decide to add a 'cancel' message from the client to the server, or a partial results message from the server back to the client. This is much easier to extend than to keep adding more sockets to your architecture simply because you want to add another message flow between the client and server. Again, I don't know enough about PHP to say that this would definitely be the right way of doing it in that language. It certainly makes a lot of sense in C, C++.
I find things like Google Protocol Buffers (I prefer ASN.1) very useful for this kind of thing. These allow you to define the types of messages you want to send, and (at least with GPB), combine them together inside a single 'oneof' (in ASN.1 one uses tagging to tell different messages apart). GPB and ASN.1 are handy because then you can use different languages, OSes and platforms in your system without really having to worry about what it is being sent. And being binary (not text) they're more efficient across network connections.
I'm working on trace logger of sorts that pushes log message requests onto a Queue on a Service Bus, to later be picked off by a worker role which would insert them into the table store. While running on my machine, this works just fine (since I'm the only one using it), but once I put it up on a server to test, it produced the following error:
HTTP_Request2_MessageException: Malformed response: in D:\home\site\wwwroot\vendor\pear-pear.php.net\HTTP_Request2\HTTP\Request2\Adapter\Socket.php on line 1013
0 HTTP_Request2_Response->__construct('', true, Object(Net_URL2)) D:\home\site\wwwroot\vendor\pear-pear.php.net\HTTP_Request2\HTTP\Request2\Adapter\Socket.php:1013
1 HTTP_Request2_Adapter_Socket->readResponse() D:\home\site\wwwroot\vendor\pear-pear.php.net\HTTP_Request2\HTTP\Request2\Adapter\Socket.php:139
2 HTTP_Request2_Adapter_Socket->sendRequest(Object(HTTP_Request2)) D:\home\site\wwwroot\vendor\pear-pear.php.net\HTTP_Request2\HTTP\Request2.php:939
3 HTTP_Request2->send() D:\home\site\wwwroot\vendor\microsoft\windowsazure\WindowsAzure\Common\Internal\Http\HttpClient.php:262
4 WindowsAzure\Common\Internal\Http\HttpClient->send(Array, Object(WindowsAzure\Common\Internal\Http\Url)) D:\home\site\wwwroot\vendor\microsoft\windowsazure\WindowsAzure\Common\Internal\RestProxy.php:141
5 WindowsAzure\Common\Internal\RestProxy->sendContext(Object(WindowsAzure\Common\Internal\Http\HttpCallContext)) D:\home\site\wwwroot\vendor\microsoft\windowsazure\WindowsAzure\Common\Internal\ServiceRestProxy.php:86
6 WindowsAzure\Common\Internal\ServiceRestProxy->sendContext(Object(WindowsAzure\Common\Internal\Http\HttpCallContext)) D:\home\site\wwwroot\vendor\microsoft\windowsazure\WindowsAzure\ServiceBus\ServiceBusRestProxy.php:139
7 WindowsAzure\ServiceBus\ServiceBusRestProxy->sendMessage('<queuename>/mes…', Object(WindowsAzure\ServiceBus\Models\BrokeredMessage)) D:\home\site\wwwroot\vendor\microsoft\windowsazure\WindowsAzure\ServiceBus\ServiceBusRestProxy.php:155
I've seen previous posts that describe similar issues; Namely:
Windows Azure PHP Queue REST Proxy Limit (Stack Overflow)
Operations on HTTPS do not work correctly (GitHub)
That imply that this is a known issue regarding the PHP Azure Storage libraries, where there are a limited amount of HTTPS connections allowed. Before requirements were changed, I was accessing the table store directly, and ran into this same issue, and fixed it in the way the first link describes.
The problem is that the Service Bus endpoint in the connection string, unlike Table Store (etc.) connection string endpoints, MUST be 'HTTPS'. Trying to use it with 'HTTP' will return a 400 - Bad Request error.
I was wondering if anyone had any ideas on a potential workaround. Any advice would be greatly appreciated.
EDIT (After Gary Liu's Comment):
Here's the code I use to add items to the queue:
private function logToAzureSB($source, $msg, $severity, $machine)
// Gather all relevant information
$msgInfo = array(
"Severity" => $severity,
"Message" => $msg,
"Machine" => $machine,
"Source" => $source
// Encode it to a JSON string, and add it to a Brokered message.
$encoded = json_encode($msgInfo);
$message = new BrokeredMessage($encoded);
// Attempt to push the message onto the Queue
$this->sbRestProxy->sendQueueMessage($this->azureQueueName, $message);
catch(ServiceException $e)
throw new \DatabaseException($e->getMessage, $e->getCode, $e->getPrevious);
Here, $this->sbRestProxy is a Service Bus REST Proxy, set up when the logging class initializes.
On the recieving end of things, here's the code on the Worker role side of this:
public override void Run()
// Initiates the message pump and callback is invoked for each message that is received, calling close on the client will stop the pump.
Client.OnMessage((receivedMessage) =>
// Pull the Message from the recieved object.
Stream stream = receivedMessage.GetBody<Stream>();
StreamReader reader = new StreamReader(stream);
string message = reader.ReadToEnd();
LoggingMessage mMsg = JsonConvert.DeserializeObject<LoggingMessage>(message);
// Create an entry with the information given.
LogEntry entry = new LogEntry(mMsg);
// Set the Logger to the appropriate table store, and insert the entry into the table.
Logger.InsertIntoLog(entry, mMsg.Service);
// Handle any message processing specific exceptions here
Where Logging Message is a simple object that basically contains the same fields as the Message Logged in PHP (Used for JSON Deserialization), LogEntry is a TableEntity which contains these fields as well, and Logger is an instance of a Table Store Logger, set up during the worker role's OnStart method.
This was a known issue with the Windows Azure PHP, which hasn't been looked at in a long time, nor has it been fixed. In the time between when I posted this and now, We ended up writing a separate API web service for logging, and had our PHP Code send JSON strings to it over cURL, which works well enough as a temporary work around. We're moving off of PHP now, so this wont be an issue for much longer anyways.
Using PHP Stomp library to send and receive message from ActiveMQ (v5.4.3).
Client sends a message with reply-to & correlation-id headers to request queue (say /queue/request)
Subscribe to the response queue (say /queue/response)
Read frame
The above steps works fine when there is no pending message or pending message < n. In my case, n =200. When the number of pending message is > 200, the message is not delivered. The process waits till timeout and finally timeout without response. I can see the message (using admin UI) after timeout. Here is the code that I'm using for this case:
// make a connection
$con = new Stomp("tcp://localhost:61616");
// Set read timeout.
// Prepare request variables.
$correlation_id = rand();
$request_queue = '/queue/com.domain.service.request';
$response_queue = '/queue/com.domain.service.response';
$selector = "JMSCorrelationID='$correlation_id'";
$headers = array('correlation-id' => $correlation_id, 'reply-to' => $response_queue);
$message = '<RequestBody></RequestBody>';
// send a message to the queue.
$con->send($request_queue, $message, $headers);
// subscribe to the queue
$con->subscribe($response_queue, array('selector' => $selector, 'ack' => 'auto'));
// receive a message from the queue
$msg = $con->readFrame();
// do what you want with the message
if ( $msg != null) {
echo "Received message with body\n";
// mark the message as received in the queue
} else {
echo "Failed to receive a message\n";
Other findings:
Sending messages from one file (say from sender.php) and receive using another script (say receiver.php) working fine.
Allows to send more than 1000 message in the same request queue(and eventually processed and placed in response queue). So it doesn't look like memory issue.
Funny enough, while waiting for timeout, if I browse the queue on admin UI, I get the response.
By default, the stomp broker that I use set the prefetch size to 1.
Without knowing more my guess is that you have multiple consumers and one is hogging the messages in it's prefetch buffer, the default size is 1000 I think. To debug situations like this it's usually a good idea to look at the web console or connect with jconsole and inspect the Queue MBean to see the stats for in-flight messages, number of consumers etc.
Answering my own question.
The problem I'm facing is exactly what is described in http://trenaman.blogspot.co.uk/2009/01/message-selectors-and-activemq.html and the solution is to increase the maxPageSize as specified in ActiveMQ and maxPageSize
As you can match that the 200 is not a varying number, but the default value of maxPageSize.
More references: