I have some issues with sugarcrm.
As you know that sugarcrm table, they do have ID (which is a unique string), they not run by sequential. e.g
4bab37e4-798a-e01c-75de-4e4397f358b7
For example, I would like to copy the table sugarcrm.accounts to something.accounts, in something.accounts I added some custom file on it for another PHP process usage.
Now the problem is, my sugarcrm table got huge records there, I plan to run it batch by batch, each time I would like to copy 10,000 records to my somthing.accounts.
However, sugarcrm.accounts, their ID, not in sequential, how do I know offset parameter?
I do not want to amend sugarcrm table / or add a temporary table in sugarcrm. (e.g sugarcrm.account_index), it might caused me having problem to do the upgrade.
So anyone have any idea, how do I get the index number? Is MySQL got hidden index?
Or anyone have better idea to do the database table copy another database table?
One way is the following:
- Select all rows from sugarcrm.accounts and order by date_created ascending.
- Use limit to only select a subset of the rows (store the offset from batch to batch)
- Copy the subset of rows to something.accounts
If new records are added later they will still be copied, since they will be last in the set. However, if you delete records in sugarcrm.accounts while running the batch jobs, then you need to change the offset as well, since you might omit some rows.
Another way, if the two databases/tables are in the same MySQL instance, is to join the two tables, and select the next 10.000 which doesn't exist in something.accounts.
Related
I have a MySQL database that is becoming really large. I can feel the site becoming slower because of this.
Now, on a lot of pages I only need a certain part of the data. For example, I store information about users every 5 minutes for history purposes. But on one page I only need the information that is the newest (not the whole history of data). I achieve this by a simple MAX(date) in my query.
Now I'm wondering if it wouldn't be better to make a separate table that just stores the latest data so that the query doesn't have to search for the latest data from a specific user between millions of rows but instead just has a table with only the latest data from every user.
The con here would be that I have to run 2 queries to insert the latest history in my database every 5 minutes, i.e. insert the new data in the history table and update the data in the latest history table.
The pro would be that MySQL has a lot less data to go through.
What are common ways to handle this kind of issue?
There are a number of ways to handle slow queries in large tables. The three most basic ways are:
1: Use indexes, and use them correctly. It is important to avoid table scans on large tables; this is almost always your most significant performance hit with single queries.
For example, if you're querying something like: select max(active_date) from activity where user_id=?, then create an index on the activity table for the user_id column. You can have multiple columns in an index, and multiple indexes on a table.
CREATE INDEX idx_user ON activity (user_id)
2: Use summary/"cache" tables. This is what you have suggested. In your case, you could apply an insert trigger to your activity table, which will update the your summary table whenever a new row gets inserted. This will mean that you won't need your code to execute two queries. For example:
CREATE TRIGGER update_summary
AFTER INSERT ON activity
FOR EACH ROW
UPDATE activity_summary SET last_active_date=new.active_date WHERE user_id=new.user_id
You can change that to check if a row exists for the user already and do an insert if it is their first activity. Or you can insert a row into the summary table when a user registers...Or whatever.
3: Review the query! Use MySQL's EXPLAIN command to grab a query plan to see what the optimizer does with your query. Use it to ensure that the optimizer is avoiding table scans on large tables (and either create or force an index if necesary).
I am going to set up a cron job to update some data via an API. I want it to update the database with the new feeds.
i.e. I would have an existing feed of entries, a script would go through the new feed. If the entry is already there, then dont update, if it is not in the db, then add it, and all other entries need to be deleted.
I was wondering what a good way to do this was have a column called "updated". This would be 0 by default. When a new entry is added, or an existing one is checked, the columns value becomes 1. Once the cron job has completed its updating, it would then remove all values that are still 0, and reset the remainder to 0.
Is this the right way to do such a job, if it helps there are over 10 million rows.
First of all there is no right or wrong answer and it always depends.
Now that being said with your approach you'll be updating all 10m+ rows in your main (target) table twice each time you do the sync up, which depending on how busy this table is may or may not be acceptable.
You may consider a different approach that is widely used in ETL:
load your feed data into a staging table first; do batch inserts or if possible use LOAD DATA INFILE - the fastest way of ingesting data in MySQL
optionally build indexes to help with lookups
"massage" your data if necessary (clean up, transform, augment etc)
insert into main table all new rows that present in staging and not in the main table
delete all rows from the main table that don't present in the staging table
truncate staging table
Hello I have a mysql database and all I want is basically to get a value on the second table from a first table query
I have figured something like this but is not working.
select src, dst_number, state, duration
from cdrs, area_code_infos
where SUBSTRING(cdrs.src,2,3) = area_code_infos.`npa`;
Please help me figure out this. I have tried in PHP to have multiple queries running one after the other but when I loaded the page after 45 minutes of wait time I gave up.
Thanks,
I assume the tables are farily big, and you are also doing an unindexed query.. basically substring has to be calculated for every row.
Whenever you do a join, you want to make sure both of the joined fields are indexed.
An option would be to create another column containing the substring calculation and then creating an index on that.
However, a better option would be to have an areaCodeInfosID column and set it as a foreign key to the area_code_infos table
I've got a PHP script pulling a file from a server and plugging the values in it into a Database every 4 hours.
This file can and most likely change within the 4 hours (or whatever timeframe I finally choose). It's a list of properties and their owners.
Would it be better to check the file and compare it to each DB entry and update any if they need it, or create a temp table and then compare the two using an SQL query?
None.
What I'd personally do is run the INSERT command using ON DUPLICATE KEY UPDATE (assuming your table is properly designed and that you are using at least one piece of information from your file as UNIQUE key which you should based on your comment).
Reasons
Creating temp table is a hassle.
Comparing is a hassle too. You need to select a record, compare a record, if not equal update the record and so on - it's just a giant waste of time to compare a piece of info and there's a better way to do it.
It would be so much easier if you just insert everything you find and if a clash occurs - that means the record exists and most likely needs updating.
That way you took care of everything with 1 query and your data integrity is preserved also so you can just keep filling your table or updating with new records.
I think it would be best to download the file and update the existing table, maybe using REPLACE or REPLACE INTO. "REPLACE works exactly like INSERT, except that if an old row in the table has the same value as a new row for a PRIMARY KEY or a UNIQUE index, the old row is deleted before the new row is inserted." http://dev.mysql.com/doc/refman/5.0/en/replace.html
Presumably you have a list of columns that will have to match in order for you to decide that the two things match.
If you create a UNIQUE index over those columns then you can use either INSERT ... ON DUPLICATE KEY UPDATE(manual) or REPLACE INTO ...(manual)
my site has lots of incoming searches which is stored in a database to show recent queries into my website. due to high search queries my database is getting bigger in size. so what I want is I need to keep only recent queries in database say 10 records. this keeps my database small and queries will be faster.
I am able to store incoming queries to database but don't know how to restrict or delete excess/old data from table.
any help??
well I am using PHP and MySQL
Hopefully you have a timestamp column in your table (or have the freedom to add one). AFAIK, you have to add the timestamp explicitly when you add data to the table. Then you can do something along the lines of:
DELETE FROM tablename WHERE timestamp < '<a date two days in the past, or whatever'>;
You'd probably want to just do this periodically, rather than every time you add to the table.
I suppose you could also just limit the size to the most recent ten records by checking the size of the table every time you are about to add a line, and deleting the oldest record (again, using the timestamp column you added) if adding the new record will make it too large.
Falkon's answer is good - though you might not want to have your archive in a table, depending on your needs for that forensic data. You could also set up a cron job that just uses mysqldump to make a backup of the database (with the date in the filename), and then delete the excess records. This way you can easily make backups of your old data, or search it with whatever tool, and your database stays small.
You should write a PHP script, which will be started by CRON (ie. once a day) and move some data from main table TableName to archive table TableNameArchive with exactly the same structure.
That SQL inside the script should looks like:
INSERT INTO TableNameArchive
SELECT * FROM TableName WHERE data < '2010-06-01' //of course you should provide here your conditions
next you should DELETE old records from TableName.