MySQL - Optimising an update function for performance reasons - php

I want to go through the database one row at a time. The aim is to go through all the entries in a database table to update a column.
simplified DB table
id
data
ratings
1
jsonstring*
ratingstring*
Note: There are over 200,000 entries.
(*)jsonstring
{"ID":78,"post_author":1,"active":1,"a_rating_users":71,"a_rating_score":278,"a_rating_avarage":"3.92","t_rating_users":6,"t_rating_score":19,"t_rating_avarage":"3.17","r_rating_users":5,"r_rating_score":22,"r_rating_avarage":"4.40"}
(*)ratingstring
{"d":{"r":"1","c":1,"a":1},"t":{"r":5,"c":1,"a":5},"p":{"r":5,"c":1,"a":5}}
data are CHARACTER SET: utf8 and TYPE: TEXT
ratings are CHARACTER SET: utf8 and TYPE: Varchar
My current approach:
I get all the data and iterate it with a foreach loop. Since it still has to
decode the string into a JSON for each entry, it takes a long time.
How can I optimise it? One consideration was to read each row individually from
the database, modify it and save it. But I don't know if this would work and how
to move the database pointer to the next entry.

Related

Query is taking 8 hours for checking and inserting against 3 millions of data

I get new 10000s of xml files data everyday.
and I always have run a query to see if there is any new data into those XML files and if that doesn't exists into our database then insert that data into our table.
Here is the code
if(!Dictionary::where('word_code' , '=' , $dic->word_code)->exists()) {
// then insert into the database.
}
where $dic->word_code is coming from thousands of XML files. every time it opens up the new XML file one by one then check this record exists then open a new XML file and check if it doesn't exists then insert the record then move to another file and do the same procedure with 10000 XML of files.
each XML file is about 40 to 80mb which has lots of data.
I already have 2981293 rows so far and checking against 2981293 rows with my XML files then inserting the row seems to be really time-consuming and resource greedy task.
word_code is already index.
The current method takes about 8 hours to finish up the procedure.
By the way I must mention this that after running this huge procedure of 8 hours it downloads about 1500 to 2000 rows of data per day.
Comparing the file to the database line by line is the core issue. Both the filesystem and databases support comparing millions of rows very quickly.
You have two options.
Option 1:
Keep a file backup of the previous run to run filesystem compare to find differences in the file.
Option 2:
Load the XML file into a MySQL table using LOAD DATA INFILE. Then run a query on all rows to find both new and changed rows. Be sure to index the table with a well defined unique key to keep this query efficient.
I would split this job into two tasks:
Use your PHP script to load the XML data unconditionally in a temporary table that has no constraints, no primary key, no indexes. Make sure to truncate that table before loading the data.
Perform one single INSERT statement, to merge records from that temporary table into your main table, possibly with an ON DUPLICATE KEY UPDATE or IGNORE option, or otherwise a negative join clause. See INSERT IGNORE vs INSERT … ON DUPLICATE KEY UPDATE
For the second point, you could for instance do this:
INSERT IGNORE
INTO main
SELECT *
FROM temp;
If the field to compare is not a primary key in the main table, or is not uniquely indexed, then you might need a statement like this:
INSERT INTO main
SELECT temp.*
FROM temp
LEFT JOIN main m2
ON m2.word_code = temp.word_code
WHERE m2.word_code is NULL;
But this will be a bit slower than a primary-key based solution.

MariaDB Update Table Contents From Changing File

I have the following problem:
I have got a dataset inside a text file (not xml or csv encoded or something, just field values separated by \t and \n) which is updated every 2 minutes. I need to put the data from the file into a MariaDB Database, which itself is not very difficult to do.
What I am unsure about however, is how I would go about updating the table if the file's contents change. I thought about truncating the table and then filling it again, but doing that every 2 minutes with about 1000 datasets would mean some nasty problems with the database being incomplete during those updates, which makes it not a usable solution (which it wouldn't have been with fewer datasets either :D)
Another solution I thought about was to append the new data to the existing table, and use a delimter on the unique column (e.g. use cols 1-1000 before update, append data, then use values 1001-2000 after the update and remove 1-1000, after 2 or so updates start at id 1 again).
Updating the changing fields is not an option, because the raw data format would make that really difficult to keep track of the column that has changed (or hasn't)
I am, however unsure about best practices, as I am relatively new to SQL and stuff, and would like to hear your opinion, maybe I am just overlooking something obvious...
Even better...
CREATE TABLE new LIKE real; -- permanent, not TEMPORARY
load `new` from the incoming data
RENAME TABLE real TO old, new TO real;
DROP TABLE old.
Advantages:
The table real is never invisible, nor empty, to the application.
The RENAME is "instantaneous" and "atomic".
As suggested by Alex, I will create a temporary table, insert my data into the temporary table, truncate the production table and then insert from the temporary table. Works like a charm!

Combine Multiple Rows in MySQL into JSON or Serialize

I currently have a database structure for dynamic forms as such:
grants_app_id user_id field_name field_value
5--------------42434----full_name---John Doe
5--------------42434----title-------Programmer
5--------------42434----email-------example#example.com
I found this to be very difficult to manage, and it filled up the number rows in the database very quickly. I have different field_names that can vary up to 78 rows, so it proved to be very costly when making updates to the field_values or simply searching them. I would like to combine the rows and use either json or php serialize to greatly reduce the impact on the database. Does anyone have any advice on how I should approach this? Thank you!
This would be the expected output:
grants_app_id user_id data
5--------------42434----{"full_name":"John Doe", "title":"Programmer", "email":"example#example.com"}
It seems you don't have a simple primary key in those rows.
Speeding up the current solution:
create an index for (grants_app_id, user_id)
add an auto-incrementing primary key
switch from field_name to field_id
The index will make retrieving full-forms a lot more fun (while taking a bit extra time on insert).
The primary key allow you to update a row by specifying a single value backed by a unique index, which should generally be really fast.
You probably already have some definition of fields. Add integer-IDs and use them to speed up the process as less data is stored, compared, indexed, ...
Switching to a JSON-Encoded variant
Converting arrays to JSON and back can be done by using json_encode and json_decode since PHP 5.2.
How can you switch to JSON?
Possibly the current best way would be to use a PHP-Script (or similar) to retrieve all data from the old table, group it correctly and insert it into a fresh table. Afterwards you may switch names, ... This is an offline approach.
An alternative would be to add a new column and indicate by field_name=NULL that the new column contains the data. Afterwards you are free to convert data at any time or store only new data as JSON.
Use JSON?
While certainly it is tempting to have all data in one row there are somethings to remember:
with all fields preserved in a single text-field searching for a value inside a field may become a two-phase approach, as a % inside any LIKE can skip into other field's values. Also LIKE '%field:value%' is not easily optimized by indexing the column.
changing a single field means updating all stored fields. As long as you are sure only one process changes the data at any given time this is ok, otherwise there tend to be more problems.
JSON-column needs to be big enough to hold field-names + values + separators. This can be a lot. Also if you miss-calculate a long value in any field means a truncation with the risk of loosing all information on all fields after the long value
So in your case even with 78 different fields it may still be better two have a row per formular user and field. (It may even turn out that JSON is more practicable for formulars with few fields).
As explained in this question you have to remember that JSON is only some other text to MySQL.

Check for existing entries in Database or recreate table?

I've got a PHP script pulling a file from a server and plugging the values in it into a Database every 4 hours.
This file can and most likely change within the 4 hours (or whatever timeframe I finally choose). It's a list of properties and their owners.
Would it be better to check the file and compare it to each DB entry and update any if they need it, or create a temp table and then compare the two using an SQL query?
None.
What I'd personally do is run the INSERT command using ON DUPLICATE KEY UPDATE (assuming your table is properly designed and that you are using at least one piece of information from your file as UNIQUE key which you should based on your comment).
Reasons
Creating temp table is a hassle.
Comparing is a hassle too. You need to select a record, compare a record, if not equal update the record and so on - it's just a giant waste of time to compare a piece of info and there's a better way to do it.
It would be so much easier if you just insert everything you find and if a clash occurs - that means the record exists and most likely needs updating.
That way you took care of everything with 1 query and your data integrity is preserved also so you can just keep filling your table or updating with new records.
I think it would be best to download the file and update the existing table, maybe using REPLACE or REPLACE INTO. "REPLACE works exactly like INSERT, except that if an old row in the table has the same value as a new row for a PRIMARY KEY or a UNIQUE index, the old row is deleted before the new row is inserted." http://dev.mysql.com/doc/refman/5.0/en/replace.html
Presumably you have a list of columns that will have to match in order for you to decide that the two things match.
If you create a UNIQUE index over those columns then you can use either INSERT ... ON DUPLICATE KEY UPDATE(manual) or REPLACE INTO ...(manual)

Possible to have a mixed overwrite / append batch write into MySQL?

I am setting up an uploader (using php) for my client where they can select a CSV (in a pre-determined format) on their machine to upload. The CSV will likely have 4000-5000 rows. Php will process the file by reading each line of the CSV and inserting it directly into the DB table. That part is easy.
However, ideally before appending this data to the database table, I'd like to review 3 of the columns (A, B, and C) and check to see if I already have a matching combo of those 3 fields in the table AND IF SO I would rather UPDATE that row rather than appending. If I DO NOT have a matching combo of those 3 columns I want to go ahead and INSERT the row, appending the data to the table.
My first thought is that I could make columns A, B, and C a unique index in my table and then just INSERT every row, detect a 'failed' INSERT (due to the restriction of my unique index) somehow and then make the update. Seems that this method could be more efficient than having to make a separate SELECT query for each row just to see if I have a matching combo already in my table.
A third approach may be to simply append EVERYTHING, using no MySQL unique index and then only grab the latest unique combo when the client later queries that table. However I am trying to avoid having a ton of useless data in that table.
Thoughts on best practices or clever approaches?
If you make the 3 columns the unique id, you can do an INSERT with ON DUPLICATE KEY.
INSERT INTO table (a,b,c,d,e,f) VALUES (1,2,3,5,6,7)
ON DUPLICATE KEY UPDATE d=5,e=6,f=7;
You can read more about this handy technique here in the MySQL manual.
If you add a unique index on the ( A, B, C ) columns, then you can use REPLACE to do this in one statement:
REPLACE works exactly like INSERT,
except that if an old row in the table
has the same value as a new row for a
PRIMARY KEY or a UNIQUE index, the old
row is deleted before the new row is
inserted...

Categories