PHP & Mysql select from big data

PHP & Mysql select from big data - php

My database have 10.000.000 record
I want select from database but it is heavy
Query i have tried:
SELECT * FROM `table` USE INDEX (id) JOIN `new` AS p1
USE INDEX (pid) ON table.id = p1.pid
WHERE `p1.date` > '2015-02-01' AND `p1.date` < '2016-02-01'

You need an index on columns new.date and table.id.
You probably don't need the USE INDEX hints.
I am assuming that there not too many rows in the date range. If a large proportion of your rows are in that range, obviously, it will take a long time.

use
"LIKE" instead of "="

Related

count() takes lots of time when use WHERE clause in mysql

Table has approximately 100 000 records(tuples). Without where clause it takes only few miliseconds whereas takes 4-5 secs when use where clause.
SELECT COUNT(DISTINCT id) FROM tablename WHERE shippable = '1'
I also tried this one but it takes more time as compared to previous one.
SELECT count(rowsss) FROM (SELECT count(*) as rowsss FROM tablename WHERE shippable = '1' GROUP BY id) as T
This is the output when I use EXPLAIN keyword before starting of mysql query

If you a need a filter you could use an index on shippable eg:
create index shippable_ixd on tablename (shippable);
in this way the scan for the table is limited to values that match
and avoid the scan for entire table
and based on the fact you also need the column id you could also trying alternatively a composite index
create index shippable_ixd on tablename (shippable, id);
the sqloptimizer should retrive directly form the index the info needed.
In this case The use of composite index ( with a redundant id not need by where clause) is useful because the SQL engine retrive all the data needed to the query just scanning the index, avoiding the access to the data in the table. This tecnique is use frequently for db queries tuning.

When you checking any condition that time both value should be in same type then execution of query will be fast.
SELECT count(rowsss) FROM (SELECT count(*) as rowsss FROM tablename WHERE CAST(shippable AS CHAR) = '1' GROUP BY id) as T

Mysql fetch from last to first - [many records]

i want to fetch records from mysql starting from last to first LIMIT 20. my database have over 1M records. I am aware of order by. but from my understanding when using order by its taking forever to load 20 records i have no freaking idea. but i think mysql fetch all the records before ordering.
SELECT bookings.created_at, bookings.total_amount,
passengers.name, passengers.id_number, payments.amount,
passengers.ticket_no,bookings.phone,bookings.source,
bookings.destination,bookings.date_of_travel FROM bookings
INNER JOIN passengers ON bookings.booking_id = passengers.booking_id
INNER JOIN payments on payments.booking_id = bookings.booking_id
ORDER BY bookings.booking_id DESC LIMIT 10

I suppose if you execute the query without the order by the time would be satisfactory?
You might try to create an index in the column your are ordering:
create index idx_bookings_booking_id on bookings(booking_id)

You can try to find out complexity of the Query using
EXPLAIN SELECT bookings.created_at, bookings.total_amount,
passengers.name, passengers.id_number, payments.amount,
passengers.ticket_no,bookings.phone,bookings.source,
bookings.destination,bookings.date_of_travel FROM bookings
INNER JOIN passengers ON bookings.booking_id = passengers.booking_id
INNER JOIN payments on payments.booking_id = bookings.booking_id
ORDER BY bookings.booking_id DESC LIMIT 10
then check the proper index has been created on the table
SHOW INDEX FROM `db_name`.`table_name`;
if the index us not there create proper index on all the table
please add if anything is missing

The index lookup table needs to be able to reside in memory, if I'm not mistaken (filesort is much slower than in-mem lookup).
Use small index / column size
For a double in capacity use UNSIGNED columns if you need no negative values..
Tune sort_buffer_size and read_rnd_buffer_size (maybe better on connection level, not global)
See https://dev.mysql.com/doc/refman/5.7/en/order-by-optimization.html , particularly regarding using EXPLAIN and the maybe trying another execution plan strategy.

You seem to need another workaround like materialized views.
Tell me if this sounds like it:
Create another table like the booking table e.g. CREATE TABLE booking_short LIKE booking. Though you only need the booking_id column
And check your code for where exactly you create booking orders, e.g. where you first insert into booking. SELECT COUNT(*) FROM booking_short. If it is >20, delete the first record. Insert the new booking_id.
You can select the ID and join from there before joining for more details with the rest of the tables.
You won't need limit or sorting.
Of course, this needs heavy documentation to avoid maintenance problems.
Either that or https://stackoverflow.com/a/5912827/6288442

How to reduce subquery execution time...?

I want per day sales item count so for that one i already created query but it takes to much around 55.585s and query is
Query :
SELECT
td.db_date,
(
select count(*) from order as order where DATE(order.created_on) = td.db_date
)as day_contribute
FROM time_dimension as td
So can any one please let me know how may i optimized this query and reduce execution time.?

You can modify your query to join like:
SELECT
td.db_date, count(order.id) as day_contribute
FROM time_dimension as td
LEFT JOIN order ON DATE(order.created_on) = td.db_date
GROUP BY td.db_date;
I do not know your primary id key for table order - so used just "order.id". Replace it with your.
Also it is very important - test if you have index on td.db_date field.
And one more important thing - better to avoid using DATE(order.created_on). Because it is mean that DATE() method will be called each time when DB will compare dates. If it is possible - convert order.created_on to same format as td.db_date. Or join by other fields. That will add speed too.

First you should make sure you have index on created_on column in order table.
However if you have many records in time_dimension and many records in order table it might be hard to optimize the query, because for each record from time_dimension you need to search in order table.
You can also change count(*) into count(order_id) (assuming primary key in order table is order_id) or add extra column with date only in order table (created_on_date with date only and index on this column) so your query could look like this:
SELECT
td.db_date,
(
select count(order_id) from order where order.created_on_date = td.db_date
)as day_contribute
FROM time_dimension as td
However it's possible the execution time might be too high if you have many records in both tables, so it might be necessary to create one extra table where you hold number of orders for each day and update it in cron or when adding/updating/deleting records in order table

Insert Records All At Once

I have a table that has been functional and i added a column to the table. After adding the column i want to add the result of a query (query is same for all but different results) into that column all at once instead of one at a time which will be time consuming. How can i achieve that? Cos after updating, i have just one result in all the column, i cannot use a where clause cos it will require me doing it one after the other
$stmt = $pdo->prepare("UPDATE table SET my_value = '$myValue' ");
$stmt->execute();

UPDATE table
SET my_value = (select col from some_table where ...)

If the value is the same for all rows, I would advise using cross join:
update table t cross join
(select newval . . .) x
set t.col = x.newval;
Note: this is better than a subquery, because the subquery is guaranteed to be evaluated only once.
If you are trying to say that the value is the same for groups of columns, then extend this to a join:
update table t join
(select grp, newval . . .) x
on t.grp = x.grp
set t.col = x.newval;

After adding the column I want to add the result of a query (query
result is same for all) into that column all at once instead of one at
a time which will be time consuming.
The solution depends on what you mean by "Is the same for all the rows."
If you have one value that is exactly the same for all columns, you can just ask for it and then update. This is usually faster (and allows you to debug more easily) than using pure SQL to achieve everything.
If, on the other hand, you mean the values of that column are retrieved by the same query, but will be different for different rows, then a subquery or a cross join as Gordon suggested will do the trick.

Returning random rows from mysql database without using rand()

I would like to be able to pull back 15 or so records from a database. I've seen that using WHERE id = rand() can cause performance issues as my database gets larger. All solutions I've seen are geared towards selecting a single random record. I would like to get multiples.
Does anyone know of an efficient way to do this for large databases?
edit:
Further Edit and Testing:
I made a fairly simple table, on a new database using MyISAM. I gave this 3 fields: autokey (unsigned auto number key) bigdata (a large blob) and somemore (a medium int). I then applied random data to the table and ran a series of queries using Navicat. Here are the results:
Query 1: select * from test order by rand() limit 15
Query 2: select *
from
test
join
(select round(rand()*(select max(autokey) from test)) as val from test limit 15) as rnd
on
rnd.val=test.autokey;`
(I tried both select and select distinct and it made no discernible difference)
and:
Query 3 (I only ran this on the second test):
SELECT *
FROM (
SELECT #cnt := COUNT(*) + 1,
#lim := 10
FROM test
) vars
STRAIGHT_JOIN
(
SELECT r.*,
#lim := #lim - 1
FROM test r
WHERE (#cnt := #cnt - 1)
AND RAND(20090301) < #lim / #cnt
) i
ROWS: QUERY 1: QUERY 2: QUERY 3:
2,060,922 2.977s 0.002s N/A
3,043,406 5.334s 0.001s 1.260
I would like to do more rows so I can see how query 3 scales, but at the moment, it seems as though the clear winner is query 2.
Before I wrap up this testing and declare an answer, and while I have all this data and the test environment set up, can anyone recommend any further testing?

Try:
select * from table order by rand() limit 15
Another (and possibly more efficient way) would be to join against a set of random values. This should work, if there's some contiguous integer key in the table. Here is how I would do it in postgres (My MySQL is a bit rusty)
select * from table join
(select (random()*maxid)::integer as val from generate_series(1,15)) as rnd
on rand.val=table.id;
where maxid is the highest id in table. If id has an index, then this would mean only 15 index lookup, so its very fast.
UPDATE:
Looks like there no such thing as generate_series in MySQL. My fault. We don't need it actually:
select *
from
table
join
-- this just returns 15 random numbers.
-- I need `table` here only to produce rows for rand()
(select round(rand()*(select max(id) from table)) as val from table limit 15) as rnd
on
rnd.val=table.id;
P.S. If I don't want duplicates returned, I can use (select distinct [...]) in the random generator expression.

Update: Check out the accepted answer in this question. It's pure mySQL and even deals with even distribution.
The problem with id = rand() or anything comparable in PHP is that you can't be sure whether that particular ID still exists. Therefore, you need to work with LIMIT, and that can become slow for large amounts of data.
As an alternative to that, you could try using a loop in PHP.
What the loop does is
Create a random integer number using rand(), with a scope between 0 and the number of records in the database
Query the database whether a record with that ID exists
If it exists, add the number to an array
If it doesn't, go back to step 1
End the loop when the array of random numbers contains the desired number of elements
this method could cause a lot of queries in a fragmented table, but they should be pretty fast to execute. It may be faster than LIMIT rand() in certain situations.
The LIMIT method, as outlined by #Luther, is certainly the simplest code-wise.

You could do a query with all the results or however many limited, then use mysqli_fetch_all followed by:
shuffle($a);
$a = array_slice($a, 0, 15);

For a large dataset doing
select * from table order by rand() limit 15
can be quite time and memory consuming.
If your data records happen to be numbered you can put and index on the numbering colum and do a
select * from table where no >= rand() limit 15
Or even better do the random number generation in your application and do
select * from table where no >= $rand and no <= $rand+15
If your data doesn't change too often, it might be worth to add such a numbering a column to make the selection efficient.

Assuming MySQL supports nested queries and that operations on the primary key are fast, I'd try something like
select * from table where id in (select id from table order by rand() limit 15)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP & Mysql select from big data - php

My database have 10.000.000 record I want select from database but it is heavy Query i have tried: SELECT * FROM `table` USE INDEX (id) JOIN `new` AS p1 USE INDEX (pid) ON table.id = p1.pid WHERE `p1.date` > '2015-02-01' AND `p1.date` < '2016-02-01'

You need an index on columns new.date and table.id. You probably don't need the USE INDEX hints. I am assuming that there not too many rows in the date range. If a large proportion of your rows are in that range, obviously, it will take a long time.

use "LIKE" instead of "="

Related

count() takes lots of time when use WHERE clause in mysql

Mysql fetch from last to first - [many records]

How to reduce subquery execution time...?

Insert Records All At Once

Returning random rows from mysql database without using rand()

Categories

Resources