Insert/update data in chunks in MySQL - php

I have a query which finds the aggregate sum and counts from a table and I need to update these details in another table. If I use the query without any conditions that limit the number of rows fetched at a time, it is failing (timed out) and so I am planning to fetch 10000 records at a time and insert it into the database. Now I have some confusions regarding this.
If I use database transactions and include all the operations from fetching the data to insert/update it in db, it is a good idea ? I would like to know whether this will lock the table which I am fetching until the transaction is completed. If so I can't use that as there are other API queries which is acting upon this data in real time.
The better way for this operation is through loops ? I mean fetching 1000 records and update 1000 records in DB or I need to use one more layer in between like Redis file storage, which will store the entire data and update operation will be performed at once.
My operation flow in the application level is as follows:
while ($i <= $maxCounr) {
$limit = 1000;
$results = fetchResults($i, $limit);
$formattedResilts = formatResults($results);
updateRecords($formattedResults);
$i = $i + 1000;
}
Is there any downside in doing something like this ? One thing I noted is that it needs to hit database multiple times. If I have 100000 entries to be processed, it needs to hit database 100000/1000 = 100 times. Is there any other way I can do this ? I am using this in a background worker which is running daily basis.

Related

Get online users from big record table

i have a big data record from onlines table. (more than 40 Million record)
now i want to show online user in any time from the table but this execute from server has been failed ...
for example when i send request get online in last week , it's dose not work (because the table have very large record).
this is my example php code:
$d = $_GET['date'];
$time = time() - 60*60 * 24 * $d;
$phql = "SELECT DISTINCT aid FROM onlines WHERE time > '$time'";
so, do you have any better tips ?
Tnx.
Use EXPLAIN SELECT ... to see which indices are defined on your table. Ensure that particularly for big tables, that your columns that you query are indexed. In this case time.
You can create an index by:
CREATE INDEX time_index ON onlines (time);
This should speed up the query. If you do not care about potential data loss or persistance you might look into using an in-memory table to avoid IO. That speed up queries significantly but will empty the table if the server restarts or MySQL is shut down.

Correct way to pass between 5,000 to 100,000 values in mysql WHERE clause

I am getting 45000 values from one query result, and need to use this values to make second query but because of large size of array, its taking long time more than 30 seconds to execute so getting Error:
Error: Maximum execution time of 30 seconds exceeded
Is there any other way to do this database data calculation quickly, or i should calculate the data and save it in another table to show at anytime
Queries :
$query = $em->createQuery('SELECT DISTINCT(u.username) as mobile FROM sessionTable u WHERE u.accesspoint IN (?1) ');
$query->setParameter(1, $accesspoint);
$result = $query->getResult();
$count = count($result);
$i = 0;
$numbers = array();
foreach ($result as $value) {
$numbers[$i] = $value['mobile'];
$i++;
}
dump(count($numbers)); //----> Output is 48567 --successful
$Users = $this->getDoctrine()
->getRepository('AcmeDataBundle:User')
->findByNumber($numbers);
----Error Occurs Here------
dump(count($Users));
die();
I am using symfony 2.0 framework , doctrine 2.0
UPDATE :
consider I have 5 tables in same database,
viz. 1- user 2- googleData 3- facebook data 4-yahooData 5- sessions
when users login to my application, I collect there gender info from either one or multiple social profile data, and save in that particular table
Now I want to calculate all users male:female ratio who have used multiple sessions,
For this scenario its going too tough to calculate male :female ratio from multiple tables.
I feel there is one solution it will be easy if I adds Gender column directly in session table but is there any other better way by using FK or anything else ?
If you are having to pass 1000 values never mind 100,000 values, that probably means you have a problem in the design of your queries. The data has to come from somewhere, if it's from the database, it would be a simple matter to use a join or a sub query. If it's from an external source, it can go into a temporary table.
so in short: Yes, a temporary or permanent table is better

Concurrent PHP and MySQL request

I'm developing a web application with PHP and MySQL in that I have a situation where I have to limit the number of records to be inserted in a table.
...
const MAX= 10
if(/*record count query*/ < $this::MAX) {
/*insert query*/
}
...
For test purpose I'm triggering this code just by using GET request from the browser.
When I click F5 Key(refresh) continuously for about 5 seconds the count exceeds the MAX.
But when I go one by one the count is with in the limit.
This shows that when I click F5 continuously count query got executed while the insert query is executing simultaneously. I have no idea on how to solve this problem some guidance would be helpful to me.
You have to LOCK the table so no other process writes to the database when you are trying to get the current count. Otherwise you are always under the risk that another process currently inserts the data.
For performance reasons, you may use another table just as a counter, which you will lock during those operations.

Prevent same row being selected in mySQL

I have been given the task of creating a "Mass Crawler" which completely relies on proxies inside a database. Here's a simple overview in what I'm attempting to achieve :
1 x CronJob Bootstrap file - This is the file which sends 50 parallel curl requests to the individual crawler file
1 x Individual Crawler file - This is supposed to grab a UNIQUE row (proxy) from the database which another process hasn't selected.
I've had a look into the TRANSACTIONS with mySQL, but I still believe doing this wouldn't help as the query would be getting executed at the exact same time for each individual crawler process.
Here's kind of the idea I had in my head for the individual crawler file :
$db = new MysqliDb("localhost", "username", "password", "database");
$db->connect();
$db->startTransaction();
$db->where("last_used", array("<" => "DATE_SUB(NOW(),INTERVAL 30 SECOND)"));
$proxies = $db->get("proxies", 1);
if(count($proxies) == 1) {
//complete any scraping that needs to be done
//update the database to say the proxy has just been used
$db->where("id", $accounts[0]['id']);
$db->update("proxies", array("last_used", date("Y-m-d H:i:s")));
//commit the complete transaction
$db->commit();
}
$db->disconnect();
Would that above example be the correct way to use the mySQL TRANSACTION feature and assure ALL parallel queries selected different rows?
You need a column in the table that indicates that the row is in use by one of the crawler processes. Your first SELECT should look for WHERE in_use = 0; it needs to use the FOR UPDATE clause to lock the rows that are processed, though.
SELECT *
FROM proxies
WHERE in_use = 0
LIMIT 1
FOR UPDATE;
I don't know how to write that query with the DB API you're using; you may need to use its function for performing raw queries.
Then updates that row to SET in_use = 1. By doing both operations in a transaction, you ensure that no other process will get that row.
When it's done processing the row, it can SET in_use = 0.

Insert automatically on new table?

I will create 5 tables, namely data1, data2, data3, data4 and data5 tables. Each table can only store 1000 data records.
When a new entry or when I want to insert a new data, I must do a check,
$data1 = mysql_query(SELECT * FROM data1);
<?php
if(mysql_num_rows($data1) > 1000){
$data2 = mysql_query(SELECT * FROM data2);
if(mysql_num_rows($data2 > 1000){
and so on...
}
}
I think this is not the way right? I mean, if I am user 4500, it would take some time to do all the check. Is there any better way to solve this problem?
I haven decided the numbers, it can be 5000 or 10000 data. The reason is flexibility and portability? Well, one of my sql guru suggest me to do this way
Unless your guru was talking about something like Partitioning, I'd seriously doubt his advise. If your database can't handle more than 1000, 5000 or 10000 rows, look for another database. Unless you have a really specific example how a record limit will help you, it probably won't. With the amount of overhead it adds it probably only complicates things for no gain.
A properly set up database table can easily handle millions of records. Splitting it into separate tables will most likely increase neither flexibility nor portability. If you accumulate enough records to run into performance problems, congratulate yourself on a job well done and worry about it then.
Read up on how to count rows in mysql.
Depending on what database engine you are using, doing count(*) operations on InnoDB tables is quite expensive, and those counts should be performed by triggers and tracked in a adjacent information table.
The structure you describe is often designed around a mapping table first. One queries the mapping table to find the destination table associated with a primary key.
You can keep a "tracking" table to keep track of the current table between requests.
Also be on alert for race conditions (use transactions, or insure only one process is running at a time.)
Also don't $data1 = mysql_query(SELECT * FROM data1); with nested if's, do something like:
$i = 1;
do {
$rowCount = mysql_fetch_field(mysql_query("SELECT count(*) FROM data$i"));
$i++;
} while ($rowCount >= 1000);
I'd be surprised if MySQL doesn't have some fancy-pants way to manage this automatically (or at least, better than what I'm about to propose), but here's one way to do it.
1. Insert record into 'data'
2. Check the length of 'data'
3. If >= 1000,
- CREATE TABLE 'dataX' LIKE 'data';
(X will be the number of tables you have + 1)
- INSERT INTO 'dataX' SELECT * FROM 'data';
- TRUNCATE 'data';
This means you will always be inserting into the 'data' table, and 'data1', 'data2', 'data3', etc are your archived versions of that table.
You can create a MERGE table like this:
CREATE TABLE all_data ([col_definitions]) ENGINE=MERGE UNION=(data1,data2,data3,data4,data5);
Then you would be able to count the total rows with a query like SELECT COUNT(*) FROM all_data.
If you're using MySQL 5.1 or above, you can let the database handle this (nearly) automatically using partitioning:
Read this article or the official documentation

Categories