Obtain an unique sequence order number concurrently from PostgreSQL - php

We are designing an order management system, the order id is designed as a bigint with Postgresql, and the place structure is implemented as follows:
Take 2015072201000010001 as an order id example, the first eight places are considered as the date which is 20150722 here, the next seven places are considered as the region code which is 0100001 here, and the last four places are for the sequence number under the aforementioned region and date.
So every time a new order is created, the php logic application layer will query PostgreSQL with the following like sql statement:
select id from orders where id between 2015072201000010000 and 2015072201000019999 order by id desc limit 1 offset 0
then increase the id for the new order, after this insert the order to PostgreSQL database.
This is ok if there is only one order generation process at one time. But with hundreds of concurrent order generation request, there are such a lot of chances that the order ids will collide since the database read/write lock mechanism of PostgreSQL.
Let's say there are two order requests A and B. A tries to read the the latest order id from the database, then B reads the latest order id too, then A writes to the database, finally B writes to the db will failed since the order id primary key collides.
Any thoughts on how to make this order generation action concurrently feasible?

In the case of many concurrent operations your only option is to work with sequences. In this scenario you would need to create a sequence for every date and region. That sounds like a lot of work, but most of it can be automated.
Creating the sequences
You can name your sequences after the date and the region. So do something like:
CREATE SEQUENCE seq_201507220100001;
You should create a sequence for every combination of day and region. Do this in a function to avoid repetition. Run this function once for every day. You can do this ahead of time or - even better - do this in a scheduled job on a daily basis to create tomorrow's sequences. Assuming you do not need to back-date orders to previous days, you can drop yesterday's sequences in the same function.
CREATE FUNCTION make_and_drop_sequences() RETURNS void AS $$
DECLARE
region text;
tomorrow text;
yesterday text;
BEGIN
tomorrow := to_char((CURRENT_DATE + 1)::date, 'YYYYMMDD');
yesterday := to_char((CURRENT_DATE - 1)::date, 'YYYYMMDD');
FOREACH region IN
SELECT DISTINCT region FROM table_with_regions
LOOP
EXECUTE format('CREATE SEQUENCE %I', 'seq_' || tomorrow || region);
EXECUTE format('DROP SEQUENCE %I', 'seq_' || yesterday|| region);
END LOOP;
RETURN;
END;
$$ LANGUAGE plpgsql;
Using the sequences
In your PHP code you obviously know the date and the region you need to enter a new order id for. Make another function that generates a new value from the right sequence on the basis of the date and the region:
CREATE FUNCTION new_date_region_id (region text) RETURN bigint AS $$
DECLARE
dt_reg text;
new_id bigint;
BEGIN
dt_reg := tochar(CURRENT_DATE, 'YYYYMMDD') || region;
SELECT dt_reg::bigint * 10000 + nextval(quote_literal(dt_reg)) INTO new_id;
RETURN new_id;
END;
$$ LANGUAGE plpgsql STRICT;
In PHP you then call:
SELECT new_date_region_id('0100001');
which will give the next available id for the specified region for today.

The usual way to avoid locking ids in Postgres is through the sequences.
You could use Postgresql sequences for each region. Something like
create sequence seq_0100001;
then you can get a number from that using:
select nextval('seq_'||regioncode) % 10000 as order_seq
That does mean the order numbers will not reset to 0001 each day, but you do have the same 0000 -> 9999 range for order numbers. It will wrap around.
So you may end up with:
2015072201000010001 -> 2015072201000017500
2015072301000017501 -> 2015072301000019983
2015072401000019984 -> 2015072401000010293
Alternatively you could just generate a sequence for each day/region combination, but you'd need to be on top of dropping the previous days sequences at the start of next day.

Try to use UUIDv1 type which is a combination of timestamp and MAC adress. You can have it auto-generated on server side if the order of inserts is important for you. Otherwise, the IDs can be generated from any of your clients before inserting (you might need their clock synchronized). Just be aware that with UUIDv1 is you can disclose the MAC address of the host where the UUID was generated. In this case, you may want to spoof the MAC address.
For your case, you can do something like
CREATE TABLE orders (
id uuid PRIMARY KEY DEFAULT uuid_generate_v1(),
created_at timestamp NOT NULL DEFAULT now(),
region_code text NOT NULL REFERENCES...
...
);
Read more at http://www.postgresql.org/docs/9.4/static/uuid-ossp.html

Related

Select query takes too long

These 2 querys take too long to produce a result (sometimes 1 min or even sometime end up on some error) and put really heavy load on the server:
("SELECT SUM(`rate`) AS `today_earned` FROM `".PREFIX."traffic_stats` WHERE `userid` = ?i AND from_unixtime(created) > CURRENT_DATE ORDER BY created DESC", $user->data->userid)
("SELECT COUNT(`userid`) AS `total_clicks` FROM `".PREFIX."traffic_stats` WHERE `userid` = ?i", $user->data->userid)
The table has about 4 million rows.
This is the table structure:
I have one index on traffic_id:
If you select anything from traffic_stats table it will take forever, however inserting to this table is normal.
Is it possible to reduce the time spent on executing this query? I use PDO and I am new to all this.
ORDER BY will take a lot of time and since you only need aggregate data (adding numbers or counting numbers is commutative), the ORDER BY will do a lot of useless sorting, costing you time and server power.
You will need to make sure that your indexing is right, you will probably need an index for user_id and for (user_id, created).
Is user_id numeric? If not, then you might consider converting it into numeric type, int for example.
These are improving your query and structure. But let's improve the concept as well. Are insertions and modifications very frequent? Do you absolutely need real-time data, or you can do with quasi-realtime data as well?
If insertions/modifications are not very frequent, or you can do with older data, or the problem is causing huge trouble, then you could do this by running periodically a cron job which would calculate these values and cache them. The application would read them from the cache.
I'm not sure why you accepted an answer, when you really didn't get to the heart of your problem.
I also want to clarify that this is a mysql question, and the fact that you are using PDO or PHP for that matter is not important.
People advised you to utilize EXPLAIN. I would go one further and tell you that you need to use EXPLAIN EXTENDED possibly with the format=json option to get a full picture of what is going on. Looking at your screen shot of the explain, what should jump out at you is that the query looked at over 1m rows to get an answer. This is why your queries are taking so long!
At the end of the day, if you have properly indexed your tables, your goal should be in a large table like this, to have number of rows examined be fairly close to the final result set.
So let's look at the 2nd query, which is quite simple:
("SELECT COUNT(`userid`) AS `total_clicks` FROM `".PREFIX."traffic_stats` WHERE `userid` = ?i", $user->data->userid)
In this case the only thing that is really important is that you have an index on traffic_stats.userid.
I would recommend, that, if you are uncertain at this point, drop all indexes other than the original primary key (traffic_id) index, and start with only an index on the userid column. Run your query. What is the result, and how long does it take? Look at the EXPLAIN EXTENDED. Given the simplicity of the query, you should see that only the index is being used and the rows should match the result.
Now to your first query:
("SELECT SUM(`rate`) AS `today_earned` FROM `".PREFIX."traffic_stats` WHERE `userid` = ?i AND from_unixtime(created) > CURRENT_DATE ORDER BY created DESC", $user->data->userid)
Looking at the WHERE clause there are these criteria:
userid =
from_unixtime(created) > CURRENT_DATE
You already have an index on userid. Despite the advice given previously, it is not necessarily correct to have an index on userid, created, and in your case it is of no value whatsoever.
The reason for this is that you are utilizing a mysql function from_unixtime(created) to transform the raw value of the created column.
Whenever you do this, an index can't be used. You would not have any concerns in doing a comparison with the CURRENT_DATE if you were using the native TIMESTAMP type but in this case, to handle the mismatch, you simply need to convert CURRENT_DATE rather than the created column.
You can do this by passing CURRENT_DATE as a parameter to UNIX_TIMESTAMP.
mysql> select UNIX_TIMESTAMP(), UNIX_TIMESTAMP(CURRENT_DATE);
+------------------+------------------------------+
| UNIX_TIMESTAMP() | UNIX_TIMESTAMP(CURRENT_DATE) |
+------------------+------------------------------+
| 1490059767 | 1490054400 |
+------------------+------------------------------+
1 row in set (0.00 sec)
As you can see from this quick example, UNIX_TIMESTAMP by itself is going to be the current time, but CURRENT_DATE is essentially the start of day, which is apparently what you are looking for.
I'm willing to bet that the number of rows for the current date are going to be fewer in number than the total rows for a user over the history of the system, so this is why you would not want an index on user, created as previously advised in the accepted answer. You might benefit from an index on created, userid.
My advice would be to start with an individual index on each of the columns separately.
("SELECT SUM(`rate`) AS `today_earned` FROM `".PREFIX."traffic_stats` WHERE `userid` = ?i AND created > UNIX_TIMESTAMP(CURRENT_DATE)", $user->data->userid)
And with your re-written query, again assuming that the result set is relatively small, you should see a clean EXPLAIN with rows matching your final result set.
As for whether or not you should apply an ORDER BY, this shouldn't be something you eliminate for performance reasons, but rather because it isn't relevant to your desired result. If you need or want the results ordered by user, then leave it. Unless you are producing a large result set, it shouldn't be a major problem.
In the case of that particular query, since you are doing a SUM(), there is no value of ORDERING the data, because you are only going to get one row back, so in that case I agree with Lajos, but there are many times when you might be utilizing a GROUP BY, and in that case, you might want the final results ordered.

How to echo random rows from database?

I have a database table with about 160 million rows in it.
The table has two columns: id and listing.
I simply need to used PHP to display 1000 random rows from the listing column and put them into <span> tags. Like this:
<span>Row 1</span>
<span>Row 2</span>
<span>Row 3</span>
I've been trying to do it with ORDER BY RAND() but that takes so long to load on such a large database and I haven't been able to find any other solutions.
I'm hoping that there is a fast/easy way to do this. I can't imagine that it'd be impossible to simply echo 1000 random rows... Thanks!
Two solutions presented here. Both of these proposed solutions are mysql-only and can be used by any programming language as the consumer. PHP would be wildly too slow for this, but it could be the consumer of it.
Faster Solution: I can bring 1000 random rows from a table of 19 million rows in about 2 tenths of a second with more advanced programming techniques.
Slower Solution: It takes about 15 seconds with non-power programming techniques.
By the way both use the data generation seen HERE that I wrote. So that is my little schema. I use that, continue with TWO more self-inserts seen over there, until I have 19M rows. So I am not going to show that again. But to get those 19M rows, go see that, and do 2 more of those inserts, and you have 19M rows.
Slower version first
First, the slower method.
select id,thing from ratings order by rand() limit 1000;
That returns 1000 rows in 15 seconds.
For anyone new to mysql, don't even read the following.
Faster solution
This is a little more complicated to describe. The gist of it is that you pre-compute your random numbers and generate an in clause ending of random numbers, separated by commas, and wrapped with a pair of parentheses.
It will look like (1,2,3,4) but it will have 1000 numbers in it.
And you store them, and use them once. Like a one time pad for cryptography. Ok, not a great analogy, but you get the point I hope.
Think of it as an ending for an in clause, and stored in a TEXT column (like a blob).
Why in the world would one want to do this? Because RNG (random number generators) are prohibitively slow. But to generate them with a few machines may be able to crank out thousands relatively quickly. By the way (and you will see this in the structure of my so called appendices, I capture how long it takes to generate one row. About 1 second with mysql. But C#, PHP, Java, anything can put that together. The point is not how you put it together, rather, that you have it when you want it.
This strategy, the long and short of it is, when this is combined with fetching a row that has not been used as a random list, marking it as used, and issuing a call such as
select id,thing from ratings where id in (a,b,c,d,e, ... )
and the in clause has 1000 numbers in it, the results are available in less than half a second. Effective employing the mysql CBO (cost based optimizer) than treats it like a join on a PK index.
I leave this in summary form, because it is a bit complicated in practice, but includes the following particles potentially
a table holding the precomputed random numbers (Appendix A)
a mysql create event strategy (Appendix B)
a stored procedure that employees a Prepared Statement (Appendix C)
a mysql-only stored proc to demonstrate RNG in clause for kicks (Appendix D)
Appendix A
A table holding the precomputed random numbers
create table randomsToUse
( -- create a table of 1000 random numbers to use
-- format will be like a long "(a,b,c,d,e, ...)" string
-- pre-computed random numbers, fetched upon needed for use
id int auto_increment primary key,
used int not null, -- 0 = not used yet, 1= used
dtStartCreate datetime not null, -- next two lines to eyeball time spent generating this row
dtEndCreate datetime not null,
dtUsed datetime null, -- when was it used
txtInString text not null -- here is your in clause ending like (a,b,c,d,e, ... )
-- this may only have about 5000 rows and garbage cleaned
-- so maybe choose one or two more indexes, such as composites
);
Appendix B
In the interest of not turning this into a book, see my answer HERE for a mechanism for running a recurring mysql Event. It will drive the maintenance of the table seen in Appendix A using techniques seen in Appendix D and other thoughts you want to dream up. Such as re-use of rows, archiving, deleting, whatever.
Appendix C
stored procedure to simply get me 1000 random rows.
DROP PROCEDURE if exists showARandomChunk;
DELIMITER $$
CREATE PROCEDURE showARandomChunk
(
)
BEGIN
DECLARE i int;
DECLARE txtInClause text;
-- select now() into dtBegin;
select id,txtInString into i,txtInClause from randomsToUse where used=0 order by id limit 1;
-- select txtInClause as sOut; -- used for debugging
-- if I run this following statement, it is 19.9 seconds on my Dell laptop
-- with 19M rows
-- select * from ratings order by rand() limit 1000; -- 19 seconds
-- however, if I run the following "Prepared Statement", if takes 2 tenths of a second
-- for 1000 rows
set #s1=concat("select * from ratings where id in ",txtInClause);
PREPARE stmt1 FROM #s1;
EXECUTE stmt1; -- execute the puppy and give me 1000 rows
DEALLOCATE PREPARE stmt1;
END
$$
DELIMITER ;
Appendix D
Can be intertwined with Appendix B concept. However you want to do it. But it leaves you with something to see how mysql could do it all by itself on the RNG side of things. By the way, for parameters 1 and 2 being 1000 and 19M respectively, it takes 800 ms on my machine.
This routine could be written in any language as mentioned in the beginning.
drop procedure if exists createARandomInString;
DELIMITER $$
create procedure createARandomInString
( nHowMany int, -- how many numbers to you want
nMaxNum int -- max of any one number
)
BEGIN
DECLARE dtBegin datetime;
DECLARE dtEnd datetime;
DECLARE i int;
DECLARE txtInClause text;
select now() into dtBegin;
set i=1;
set txtInClause="(";
WHILE i<nHowMany DO
set txtInClause=concat(txtInClause,floor(rand()*nMaxNum)+1,", "); -- extra space good due to viewing in text editor
set i=i+1;
END WHILE;
set txtInClause=concat(txtInClause,floor(rand()*nMaxNum)+1,")");
-- select txtInClause as myOutput; -- used for debugging
select now() into dtEnd;
-- insert a row, that has not been used yet
insert randomsToUse(used,dtStartCreate,dtEndCreate,dtUsed,txtInString) values
(0,dtBegin,dtEnd,null,txtInClause);
END
$$
DELIMITER ;
How to call the above stored proc:
call createARandomInString(1000,18000000);
That generates and saves 1 row, of 1000 numbers wrapped as described above. Big numbers, 1 to 18M
As a quick illustration, if one were to modify the stored proc, un-rem the line near the bottom that says "used for debugging", and have that as the last line, in the stored proc that runs, and run this:
call createARandomInString(4,18000000);
... to generate 4 random numbers up to 18M, the results might look like
+-------------------------------------+
| myOutput |
+-------------------------------------+
| (2857561,5076608,16810360,14821977) |
+-------------------------------------+
Appendix E
Reality check. These are somewhat advanced techniques and I can't tutor anyone on them. But I wanted to share them anyway. But I can't teach it. Over and out.
ORDER BY RAND() is a mysql function working fine with small databases, but if you run anything larger then 10k rows, you should build functions inside your program instead of using mysql premade functions or organise your data in special manners.
My suggestion: keep your mysql data indexed by auto increment id, or add other incremental and unique row.
Then build a select function:
<?php
//get total number of rows
$result = mysql_query('SELECT `id` FROM `table_name`', $link);
$num_rows = mysql_num_rows($result);
$randomlySelected = [];
for( $a = 0; $a < 1000; $a ++ ){
$randomlySelected[$a] = rand(1,$num_rows);
}
//then select data by random ids
$where = "";
$control = 0;
foreach($randomlySelected as $key => $selectedID){
if($control == 0){
$where .= "`id` = '". $selectedID ."' ";
} else {
$where .= "OR `id` = '". $selectedID ."'";
}
$control ++;
}
$final_query = "SELECT * FROM `table_name` WHERE ". $where .";";
$final_results = mysql_query($final_query);
?>
If some of your incremental IDs out of that 160 million database are missing, then you can easily add a function to add another random IDs (a while loop probably) if an array of randomly selected ids consists of less then required.
Let me know if you need some further help.
If your RAND() function is too slow, and you only need quasi-random records (for a test sample) and not truly random ones, you can always make a fast, effectively-random group by sorting by middle characters (using SUBSTRING) in indexed fields. For example, sorting by the 7th digit of a phone number...in descending order...and then by the 6th digit...in ascending order...that's already quasi-random. You could do the same with character columns: the 6th character in a person's name is going to be meaningless/random, etc.
You want to use the rand function in php. The signature is
rand(min, max);
so, get the number of rows in your table to a $var and set that as your max.
A way to do this with SQL is
SELECT COUNT(*) FROM table_name;
then simply run a loop to generate 1000 rands with the above function and use them to get specific rows.
If the IDs are not sequential but if they are close, you can simply test each rand ID to see if there is a hit. If they are far apart, you could pull the entire ID space into php and then randomly sample from that distribution via something like
$random = rand(0, count($rows)-1);
for an array of IDs in $rows.
Please use mysql rand in your query during select statement. Your query will be look like
SELECT * FROM `table` ORDER BY RAND() LIMIT 0,1;

unused number mysql

How can i get all of the records in a table that are out of
sequence so I know which account numbers I can reuse. I have a range
of account numbers from 50100 to 70100. I need to know which account
numbers are not stored in the table (not currently used) so I can use.
For instance say I have the following data in table:
Account Name
------ --------
50100 Test1
50105 Test2
50106 Test4
..
..
..
I should see the results:
50101
50102
50103
50104
because 50101-50104 are available account numbers since not currently in
table.
copied from http://bytes.com/topic/sql-server/answers/78426-get-all-unused-numbers-range
With respect to MYSQL and PHP.
EDITED
My range is 10000000-99999999.
My present way is using MySql query:
'SELECT FLOOR(10000000 + RAND() * 89999999) AS random_number FROM contacts WHERE "random_number" NOT IN (SELECT uid FROM contacts) LIMIT 1';
Thanks.
solution 1:
Generate a table with all possible accountnumbers in it. Then run a query similar to this:
SELECT id FROM allIDs WHERE id NOT IN (SELECT id FROM accounts)
Solution 2:
Get the whole id colummn into an array in php or java orso. Then run a for-loop to check if the number is in the array.
$ids = (array with all ids form the table)
for($i=50100;$i<=70100;$i++){
if(array_search($i, $ids) != -1){
$availableids[] = $i;
}
}
one way would be to create another table - fill it will all allowable numbers, then write a simple query to find the ones in the new table that are not in the original table.
Sort the accounts in the server, and find jumps in PHP while reading in the results. Any jump in the sorted sequence is "free for use", because they are ordered. You can sort with something like SELECT AccountNumber FROM Accounts SORT ASCENDING;.
To improve efficiency, store the free account numbers in another table, and use numbers from this second table until no more remain. This avoids making too many full reads (as in the first paragraph), which may be expensive. While you are at it, you may want to add a hook in the part of the code which deletes accounts, so they are immediately included in this second table, making the first step unnecessary.

Trending SQL Query

So what I am trying to do is make a trending algorithm, i need help with the SQL code as i cant get it to go.
There are three aspects to the algorithm: (I am completely open to ideas on a better trend algorithm)
1.Plays during 24h / Total plays of the song
2.Plays during 7d / Total plays of the song
3.Plays during 24h / The value of plays of the most played item over 24h (whatever item leads the play count over 24h)
Each aspect is to be worth 0.33, for a maximum value of 1.0 being possible.
The third aspect is necessary as newly uploaded items would automatically be at top place unless their was a way to drop them down.
The table is called aud_plays and the columns are:
PlayID: Just an auto-incrementing ID for the table
AID: The id of the song
IP: ip address of the user listening
time: UNIX time code
I have tried a few sql codes but im pretty stuck being unable to get this to work.
In your ?aud_songs? (the one the AID points to) table add the following columns
Last24hrPlays INT -- use BIGINT if you plan on getting billion+
Last7dPlays INT
TotalPlays INT
In your aud_plays table create an AFTER INSERT trigger that will increment aud_song.TotalPlays.
UPDATE aud_song SET TotalPlays = TotalPlays + 1 WHERE id = INSERTED.aid
Calculating your trending in real time for every request would be taxing on your server, so it's best to just run a job to update the data every ~5 minutes. So create a SQL Agent Job to run every X minutes that updates Last7dPlays and Last24hrPlays.
UPDATE aud_songs SET Last7dPlays = (SELECT COUNT(*) FROM aud_plays WHERE aud_plays.aid = aud_songs.id AND aud_plays.time BETWEEN GetDate()-7 AND GetDate()),
Last24hrPlays = (SELECT COUNT(*) FROM aud_plays WHERE aud_plays.aid = aud_songs.id AND aud_plays.time BETWEEN GetDate()-1 AND GetDate())
I would also recommend removing old records from aud_plays (possibly older than 7days since you will have the TotalPlays trigger.
It should be easy to figure out how to calculate your 1 and 2 (from the question). Here's the SQL for 3.
SELECT cast(Last24hrPlays as float) / (SELECT MAX(Last24hrPlays) FROM aud_songs) FROM aud_songs WHERE aud_songs.id = #ID
NOTE I made the T-SQL pretty generic and unoptimized to illustrate how the process works.

Is a cursor the only way to do this?

I'm in the process of churning through some raw data that I have. The data are in a MySQL database. The data lists, in a millisecond-by-millisecond format, which of a number of possible 'events' are currently happening. It has only a few columns:
id - unique identifier for the row
event - indicates which event is currently occurring
What I would like to do is get some basic information regarding these data. Specifically, I'd like to create a table that has:
The id that an event starts
The id that an event ends
A new id indexing the events and their occurrence, as well as a column detailing which event is currently happening.
I know that this would be easy to deal with using PHP, just using a simple loop through all the records, but I'm trying to push the boundaries of my MySQL knowledge for a bit here (it may be dangerous, I know!!).
So, my question is this: would a cursor be the best thing to use for this? I ask because events can occur multiple times, so doing something like grouping by the event type won't work - or will it? I'm just wondering if there is a clever way of dealing with this I have missed, without needing to go through each row sequentially.
Thanks!
To demonstrate what I commented earlier about, say you have the following table:
event_log
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
start DATETIME
event VARCHAR(255) # or whatever you want for datatype
Gathering this information is as simple as:
SELECT el.*,
(SELECT el_j.start # -
FROM event_log el_j # |
WHERE el_j.id > el.id # |- Grab next row based on the next ID
LIMIT 1) as end # -
FROM event_log
ORDER BY start;

Categories