MYSQL SELECT WITH PAUSE

MYSQL SELECT WITH PAUSE - php

We are making an php emailer which works perfect.
Selecting all the users from a database and send them emails are good to go.
But, since were have a huge amount of emails that has to be send, we would like start and pause the transactions of emails with [ 1000 ] to not overload the server.
Example:
SELECT: 1000;
PAUSE MYSQL
SELECT ANOTHER 1000;
PAUSE MYSQL
ETC.
I read about the START TRANSACTION, COMMIT & ROLLBACK functions, and I think I implemented this right..
Can someone help me to include a pause of 100 seconds before ROLLBACK the transaction?
I don't know what to do..
What i got until now [prefixed code]..
$max=1000;
$send=0;
$rollback=false;
mysql_query('START TRANSACTION;');
$query = mysql_query("SELECT DISTINCT mail_id, customers_email_address newsletters WHERE ORDER BY mail_id ASC");
while($result=mysql_fetch_array($query){
if( $rollback == true ){
$rollback = false;
mysql_query("ROLLBACK;");
}
[------script to send the emails-----]
$send++;
if( $max == $send ){
mysql_query("COMMIT;");
$rollback = true;
}
}
Cheers Jay

There is no need for transactions here at all - you're not updating anything. In fact, the overhead of transactions is entirely pointless here, so I'd advise you take that out.
You could simply (in theory, you can write the code for this)
Select the first 1000 rows from the database: SELECT ... LIMIT 0, 1000
Increment your offset by 1000
Select the next 1000 rows: SELECT ... LIMIT 1000, 1000
Rinse and repeat, until you get less than 1000 rows back from your query.
Please note that in order for that method to work, you'll want to ORDER BY the primary key in ASC order or something, to be sure you don't get the same row twice.

all you need is to schedule your sender script with cron for example and sending some amount of emails. (in sql use LIMIT).
it will send than N emails every M minutes and server will be happy ;)

Few optios like below:
1) You can implement Cronjob.
2) There is a opensource small application of php as PHPList which can be integrated in few seconds. (i already use this one, so)
3) 3rd option, you can use sleep function of php. (i am not sure about this)

Related

What is the best way to send prioritized mails in a email queue?

Millions of mails are inserted into a MySQL table "EmailQueue" with following fields,
id - BigInt(20)
email_address - Varchar(300)
message - text
status - enum('Pending','Prioritized','Processing','Processed')
created_datetime - datetime
sent_datetime - datetime
Generally the rows will be inserted with 'Pending' status but some high priority mails such as Forget/Reset password will be inserted with 'Prioritized' in status column.
The cron job will run every hour and send the mails as a loop with the batch of 20000 mails in every loop until it finish sending all the emails. Now I want to send the prioritized mails first which can be add to the email queue even when the cron job is running.
What is the best approach to achieve this? I'm not sure if stackoverflow is the place to ask this question, but not sure about a better place. Thanks for any help in advance.

If we ignore the "add while the cron is running" for a sec, this will select 'Prioritized' first:
ORDER BY FIELD(status, "Prioritized"), id ASC
This'll sort all rows where status=Prioritized first then it will order by id. More info/examples here.
Adding them while the cron is running is more difficult, this becomes a logic challenge. If you do SELECT * FROM emails ORDER BY FIELD(status, "Prioritized"), id ASC you select the data in the data at the time of the selecting. If items are added after you've run the query, it wont be in the returned set of data.
To get what you want, you'll need to break your code into smaller selections:
$continueProcesss = false;
$current = 0;
$itemsPerBatch = 25;
while( $continueProcesss ){
$query = "SELECT * FROM emails ORDER BY FIELD(status, 'Prioritized'), id ASC`
LIMIT $current,$itemsPerBatch";
$result = yourQueryMethod($query);
if( $result->num_rows===0 ){
$continueProcesss = false;
break;
} else{
$current += $itemsPerBatch; // next round, we skip another $itemsPerBatch rows
}
}

It's a matter of using the right query, for example, if you know the job will run every email until the queue is exhausted:
SELECT *
FROM EmailQueue
WHERE status IN ('Pending','Prioritized')
ORDER BY status DESC, created_datetime ASC
LIMIT 0,100;
The result is that every time you run the query, you get the prioritized emails first, then the pending emails, both ordered by the oldest first.
You can run this batch any number of times you need until the queue is depleted or a max of 200 times every hour to match the 20,000 hourly limit.
I'm assuming that when you start processing you change the status to Processing and when finished you change the status to Processed. That's why you always start with 0 and not an incremental number.

SELECT+UPDATE to avoid returning the same result

I have a cron task running every x seconds on n servers. It will "SELECT FROM table WHERE time_scheduled<CURRENT_TIME" and then perform a lengthy task on this result set.
My problem is now: How do I avoid having two seperate servers perform the same task at the same time?
The idea is to update *time_scheduled* with a set interval after selecting it. But if two servers happen to run the query at the same time, that will be too late, no?
All ideas are welcome. It doesnt have to be a strict MySQL solution.
Thanks!

I am guessing you have a single MySQL instance, and connections from your n servers to run this processing job. You're implementing a job queue here.
The table you mention needs to use the InnoDB access method (or one of the other transaction-friendly access methods offered by Percona or MariaDB).
Do these items in your table need to be processed in batches? That is, are they somehow inter-related? Or is it possible for your server processes to handle them one-by-one? This is an important question, because you'll get better load balancing between your server processes if you can handle them individually or in small batches. Let's assume the small batches.
The idea is to prevent any server process from grabbing onto a row in your table if some other server process has that row. I've had to do this kind of thing a lot, and here is my suggestion; I know this works.
First, add an integer column to your table. Call it "working" or some such thing. Give it a default value of zero.
Second, assign a permanent id number to each server. The last part of the server's IP address (for example, if the server's IP address is 10.1.0.123, the id number is 123) is a good choice, because it's probably unique in your environment.
Then, when a server's grabbing work to do, use these two SQL queries.
UPDATE table
SET working = :this_server_id
WHERE working = 0
AND time_scheduled < CURRENT_TIME
ORDER BY time_scheduled
LIMIT 1
SELECT table_id, whatever, whatever
FROM table
WHERE working = :this_server_id
The first query will consistently grab a batch of rows to work on. If another server process comes in at the same time, it won't ever grab the same rows, because no process can grab rows unless working = 0. Notice that the LIMIT 1 will limit your batch size. You don't have to do this, but you can. I also threw in ORDER BY to process the rows first that have been waiting the longest. That's probably a useful way to do things.
The second query retrieves the information you need to do the work. Don't forget to retrieve the primary key values (I called them table_id) for the rows you're working on.
Then, your server process does whatever it needs to do.
When it's done, it needs to throw the row back into the queue for a later time. To do that, the server process needs to set the time_scheduled to whatever it needs to be, then to set working = 0. So, for example, you could run this query for each row you're processing.
UPDATE table
SET time_scheduled = CURRENT_TIME + INTERVAL 5 MINUTE,
working = 0
WHERE table_id = ?table_id_from_previous_query
That's it.
Except for one thing. In the real world these queuing systems get fouled up sometimes. Server processes crash. Etc. Etc. See Murphy's Law. You need a monitoring query. That's easy in this system.
This query will give a list of all jobs that are more than five minutes overdue, along with the server that's supposed to be working on them.
SELECT working, COUNT(*) stale_jobs
FROM table
WHERE time_scheduled < CURRENT_TIME - INTERVAL 5 MINUTE
GROUP BY WORKING
If this query comes up empty, all is well. If it comes up with lots of jobs with working set to zero, your servers aren't keeping up. If it comes up with jobs with working set to some server's id number, that server is taking a lunch break.
You can reset all the jobs assigned to the server that's gone to lunch with this query, if need be.
UPDATE table
SET working=0
WHERE working=?server_id_at_lunch
By the way, a compound index on (working, time_scheduled) will probably help this perform well.

parallel cron jobs picking up the same SQL row

I've basically got a cron file which sends a multi_curl at the same time to 1 file thus being parallel, Inside the cron file.
My cron file looks like this (sends a parallel request)
<?php
require "files/bootstrap.php";
$amount = array(
"10","11","12","13","14"
);
$urls = array();
foreach($amount as $cron_id) {
$urls[] = Config::$site_url."single_cron.php?cron_id=".$cron_id;
}
$pg = new ParallelGet($urls);
?>
Then inside my single_cron.php I've got the following query
SELECT *
FROM accounts C JOIN proxies P
ON C.proxy_id = P.proxy_id
WHERE C.last_used < DATE_SUB(NOW(), INTERVAL 1 MINUTE)
AND C.status = 1
AND C.running = 0
AND P.proxy_status = 1
AND C.test_account = 0
ORDER BY uuid()
LIMIT 1
Even though I've got the uuid inside the query they still appear to be picking up the same row somehow, what's the best way to prevent this? I've heard something about transactions
The current framework I'm using is PHP, so if any solution in that would work, then I'm free to solutions.

Check the select for update command. This prevents other parallel queries from selecting the same row by blocking them until you do a commit. So your select should include some condition like last_process_time > 60 , and you should update the row after selecting it, setting last_processed_time to the current time. Maybe you have a different mechanism to detect whether a row has been recently selected/processed, you can use that as well. The important thing about it is that select for update will place a lock on the row, so even if you run your queries in parallel, they will be serialized by the mysql server.
This is the only way to be sure you don't have 2 queries selecting the same row - even if your order by uuid() worked correctly, you'd select the same row in 2 parallel queries every now and then anyways.
The correct way to do this with transactions is:
START TRANSACTION;
SELECT *
FROM accounts C JOIN proxies P
ON C.proxy_id = P.proxy_id
WHERE C.last_used < DATE_SUB(NOW(), INTERVAL 1 MINUTE)
AND C.status = 1
AND C.running = 0
AND P.proxy_status = 1
AND C.test_account = 0
LIMIT 1;
(assume you have a column 'ID' in your accounts table that identifies rows uniquely)
UPDATE accounts
set last_used=now(), .... whatever else ....
where id=<insert the id you selected here>;
COMMIT;
The query that reaches the server first will be executed, and the returned row locked. All the other queries will be blocked at that point. Now you update whatever you want to. After the commit, the other queries from other processes will be executed. They won't find the row you just changed, because the last_used < ... condition isn't true anymore. One of these queries will find a row, lock it, and the others will get blocked again, until the second process does the commit. This continues until everything is finished.
Instead of START TRANSACTION, you can set autocommit to 0 in your session as well. And don't forget this only works if you use InnoDB tables. Check the link i gave you if you need more details.

How to block a piece of code for multiple processes

As I have installed cron run the same script every minute. At the same time a script is executed several times.
There is such a part:
$query = mysql_query("select distinct `task_id` from tasks_pending where `checked`='0' and `taken`='0' limit 50");
Then, the obtained values set "taken = 1"
Since several processes are executed at the same time the request Return the same data for different processes. Is it possible to somehow disable this part of her time to can perform only one process?
Sorry for bad English.

Use SELECT FOR UPDATE and it will block another process to select the same rows, before they are updated

You might want to lock the table before doing anything with it, and unlock it afterwards using the [UN]LOCK TABLES statements.
Otherwise, you could use a SELECT FOR UPDATE query within a transaction scope.

php banner queue fifo

I'm just starting to put ad on my website and I would like to be able to give 1000 view to ad_a , 2000 to ad_b and let say 10000 to ad_c.
If only one page was view at the time it would be easy to update a DB and work out how many are left to view for each ad, but serval pages can be access at the same time and this make things more complicated.
I was thinking of writting a queue to manage it and request will be done one by one on the database. I'm not sure if this is the best idea or not, never done this kind of coding and I'm looking for a line of conduct, logical steps, what kind of table to create in the db if there is specification.
Many thanks for your help!

You could use Memcached to store the current count of views. Memcached is fast and light, you will not have performance problems. Also, something more complicated would be to have a "queue", as you say, and some parallel process updates it putting in there what banner to show, so you could mix them up.

The fact that one page may be viewed many times at once doesn't have to be a problem, that's why LOCK TABLES is for.
LOCK TABLES ads WRITE;
SELECT ad_id FROM ads WHERE ad_views_remaining > 0 LIMIT 1;
UPDATE ads SET ad_views_remaining = ad_views_remaining -1 WHERE ad_name = THAT_AD_ID_YOU_SELECTED_BEFORE;
UNLOCK TABLES;
This way no one can read the table until it's updated.
(this example is for MySQL, I'm sure that most other RDBMSs support locks as well)

What about rand?
$r = rand(1, 13);
if ( $r == 1 )
echo 'ad_a';
if ( $r > 1 and $r < 4 )
echo 'ad_b';
if ( $r > 3 )
echo 'ad_c';

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.