I've got this server setting a live traffic log DB that holds a big stats table. Now I need to create a smaller table from it, let's say 30 days back.
This server also has a slave server that copies the data and is 5 sec behind the master.
I created this slave in order to reduce server process for selecting queries so it only works with insert/update for the traffic log.
Now I need to copy the last day to the smaller table, and still not to use the "real" DB,
so I need to select from the slave and insert to the real smaller table. (The slave only allows read operations).
I am working with PHP and I can't solve this with one query using two different databases at one query... If it's possible, please let me know how?
When using two queries I need to hold the last day as a PHP MySQL object. For 300K-650K of rows, it's starting to be a cache memory problem. I would use a partial select by ID(by setting the ids at the where term) chunks but I don't have an auto increment id field and there's no id for the rows (when storing traffic data id would take a lot of space).
So I am trying this idea and I would like to get a second opinion.
If I will take the last day at once (300K rows) it will overload the PHP memory.
I can use limit chunks, or a new idea: selecting one column at a time and copying this one to the new real table. But I don't know if the second method is possible. Does insert looks at the first open space at a column level or row level?
the main idea is reducing the size of the select.. so is it possible to build a select by columns and then insert them as columns at mysql?
If this is simply a memory problem in PHP you could try using PDO and fetching 1 result row at a time instead of a all at the same time.
From PHP.net for PDO:
<?php
function getFruit($conn) {
$sql = 'SELECT name, color, calories FROM fruit ORDER BY name';
foreach ($conn->query($sql) as $row) {
print $row['name'] . "\t";
print $row['color'] . "\t";
print $row['calories'] . "\n";
}
}
?>
well here is where php start to be weird.. i took your advice and started to use chunks for the data. i used a loop for advancing a limit in 2000 rows jumps. but what was interesting is when i started to use php memory usage and memory peak functions i found out that the reason the chunks method doesn't work in large scales and looping is because setting a new value to a var doesn't release the memory of what was before the new setting.. so you must use unset or null in order to keep your memory at php, –
Related
It's kinda hard to undertsand my need in the title.
CodeIgniter is performing a SELECT query in a table of 800,000+ rows in one shot.
It takes a lot of memory, but in one specific server, I get a "Out of memory" fatal error.
For performance purposes, I would like to seperate the select into 2 selects, and more specifically, the 50% first rows, and then the 50% left.
I reuse this set of data to perform an INSERT afterwise.
How to do that without losing/forgetting any single row ?
Beside the fact that operations like that are highly connected to performance issues, you can use unbuffered_row.
Basically, if you have a job with that large data - you should use
unbuffered_row provided and integrated in the built
in query builder.
its very well documented here in the result rows section.
for example:
$query = $this->db->select('*')->from('your_table')->get();
while($row = $query->unbuffered_row())
{
//do your job
}
This will avoid your memory problem.
is There a way that I can update 100k records in a query and mysql database will work smoothly?
Suppose there is a table users containg hundred thousand of records and I have to update approx fifty thousand of records and for update I have IDs of those records means to around fifty thousand of records somewhere stored in csv file,
1 - Will query be ok as size of query would be too large ? or if there is any way to put in smaller chuncks let me know ?
2- Considering laravel framework, if there any option to read a part of file not the whole file, to avoid memory leakage, As I donot want to read all file at the same time, please suggest.
Any suggestion are welcome !
If you're thinking of building a query like UPDATE users SET column = 'value' WHERE id = 1 OR id = 2 OR id = 3 ... OR id = 50000 or WHERE id IN (1, 2, 3, ..., 50000) then that will probably be too big. If you can make some logic to summarize that, it would shorten the query and speed things up on MySQL's end significantly. Maybe you could make it WHERE id >= 1 AND id <= 50000.
If that's not an option, you could do it in bursts. You're probably going to loop through the rows of the CSV file, build the query as a big WHERE id = 1 OR id = 2... query and every 100 rows or so (or 50 if that's still too big), run the query and start a new one for the next 50 IDs.
Or you could just run 50.000 single UPDATE queries on your database. Honestly, if the table makes proper use of indexes, running 50.000 queries should only take a few seconds on most modern webservers. Even the busiest servers should be able to handle that in under a minute.
As for reading a file in chunks, you can use PHP's basic file access functions for that:
$file = fopen('/path/to/file.csv', 'r');
// read one line at a time from the file (fgets reads up to the
// next newline character if you don't provide a number of bytes)
while (!feof($file)) {
$line = fgets($file);
// or, since it's a CSV file:
$row = fgetcsv($file);
// $row is not an array with all the CSV columns
// do stuff with the line/row
}
// set the file pointer to 60 kb into the file
fseek($file, 60*1024);
// close the file
fclose($file);
This will not read the full file into memory. Not sure if Laravel has its own way of dealing with files, but this is how to do that in basic PHP.
Depending on data you have to update, i would suggest few ways:
If all users would be updated by same value - as #rickdenhaan said,
you can build multiple batches every X rows from csv.
If every individual user have to be updated with unique values - you have to run single queries.
If any updated columns have indices - you should disable autocommit and do transaction manually to avoid reindex after each single update.
To avoid memory leakage, my opinion is the same as #rickdenhaan's. You should read csv line by line using the fgetcsv
To avoid possible timeouts, for example you can put script processing into laravel queues
I have a php application showing 3 tables of data, each from the same MySQL table. Each record has an integer field named status which can have values 1, 2 or 3. Table 1 shows all records with status = 1, Table 2 showing status = 2 and table 3 showing status = 3.
To achieve this three MySQL queries could be run using WHERE to filter by status, iterating through each set of results once to populate the three tables.
Another approach would be to select all from the table and then iterate through the same set of results once for each table, using php to test the value of status each time.
Would one of these approaches be significantly more efficient than the other? Or would one of them be considered better practice than the other?
Generally, it's better to filter on the RDBMS side so you can reduce the amount of data you need to transfer.
Transferring data from the RDBMS server over the network to the PHP client is not free. Networks have a capacity, and you can generate so much traffic that it becomes a constraint on your application performance.
For example, recently I helped a user who was running queries many times per second, each generating 13MB of result set data. The queries execute quickly on the server, but they couldn't get the data to his app because he was simply exhausting his network bandwidth. This was a performance problem that didn't happen during his testing, because when he ran one query at a time, it was within the network capacity.
If you use the second method you connect with database only once, thus it's more efficient.
And even if it wasn't, it's more elegant that way IMO.
Of course there are some situations that it would be better to connect three times (eg. getting info from this query would be complicated), but for most of the cases I would do it the second way.
I would create a store procedure that return all the fields you need pre-formatted, no more, no less.
And then just loop on php without calling any other table.
This way you run only 1 query, and you only get the bytes you need. So same bandwidth, less http request = more performance.
I have a large table of about 14 million rows. Each row has contains a block of text. I also have another table with about 6000 rows and each row has a word and six numerical values for each word. I need to take each block of text from the first table and find the amount of times each word in the second table appears then calculate the mean of the six values for each block of text and store it.
I have a debian machine with an i7 and 8gb of memory which should be able to handle it. At the moment I am using the php substr_count() function. However PHP just doesn't feel like its the right solution for this problem. Other than working around time-out and memory limit problems does anyone have a better way of doing this? Is it possible to use just SQL? If not what would be the best way to execute my PHP without overloading the server?
Do each record from the 'big' table one-at-a-time. Load that single 'block' of text into your program (php or what ever), and do the searching and calculation, then save the appropriate values where ever you need them.
Do each record as its own transaction, in isolation from the rest. If you are interrupted, use the saved values to determine where to start again.
Once you are done the existing records, you only need to do this in the future when you enter or update a record, so it's much easier. You just need to take your big bite right now to get the data updated.
What are you trying to do exactly? If you are trying to create something like a search engine with a weighting function, you maybe should drop that and instead use the MySQL fulltext search functions and indices that are there. If you still need to have this specific solution, you can of course do this completely in SQL. You can do this in one query or with a trigger that is run each time after a row is inserted or updated. You wont be able to get this done properly with PHP without jumping through a lot of hoops.
To give you a specific answer, we indeed would need more information about the queries, data structures and what you are trying to do.
Redesign IT()
If for size on disc is not !important just joints table into one
Table with 6000 put into memory [ memory table ] and make backup every one hour
INSERT IGNORE into back.table SELECT * FROM my.table;
Create "own" index in big table eq
Add column "name index" into big table with id of row
--
Need more info about query to find solution
I have written a tool for database replication in PHP. It's working fine but there's one issue:
I'm using PDO to connect to the different databases to keep it independent of any specific RDBMS, which is crucial to this application.
The tool does some analysis on the tables to decide how to convert certain types and some other stuff. Then it pretty much does a "SELECT * FROM <tablename>" to get the rows that need to be copied. The result sets are fairly large (about 50k rows in some tables).
After that it iterates over the result set in a while loop with PDOStatement::fetch();, does some type conversion and escaping, builds an INSERT statement and feeds that to the target database.
All this is working nicely with one exception. While fetching the rows, one ata time, from the result set, the PHP process keeps eating up more and more memory. My assuption is, that PDO keeps the already processed rows in memory until the whole result set is processed.
I also abserved that, when my tool is finished with one table and proceeds to the next, memory consumption drops instantly, which supports my theory.
I'm NOT keeping the data in PHP variables! I hold just one single row at any given moment for processing, so that's not the problem.
Now to the question: Is there a way to force PDO not to keep all the data in memory? I only process one row at a time, so there's absolutely no need to keep all that garbage. I'd really like to use less memory on this thing.
I believe the problem comes from php's garbage collector, as it does not garbage collect soon enough.
I would try to fetch my results in chunks of row_count size, like "SELCT ... LIMIT offset, row_count" in MySQL, or "SELECT * FROM (SELECT ...) WHERE ROW_NUM BETWEEN offset AND (offset + row_count)" in ORACLE.
Using Zend_Db_Select one can generate DB-independent queries:
$select = $db->select()
->from(array('t' => 'table_name'),
array('column_1', 'column_2'))
->limit($row_count, $offset);
$select->__toString();
# on MySQL renders: SELECT column_1, column_2 FROM table_name AS t LIMIT 10, 20