MYSQL Table creation & index for huge table

MYSQL Table creation & index for huge table - php

I have a peculiar situation that brings me to you for advice, as this is the first time in quite a while where I don't have a clue where to start.
I have a large database of records with a single table that contains about 700k rows. The table is mostly text strings with a lot of indexes, as there's many ways to interface with the data on our website.
I am attempting to build a single tool that sound very simple:
In this table there is a column called business_codes.
This table includes a series of two-digit strings separated by a tilde "~".
Now this column cannot be NULL, thus it always has at least a single two-digit string for each record.
We have a job that updates these records directly from our vendor each hour via cron & some PHP.
The tool I am trying to create will pull the strings for each record, then cross-reference each one with all the other records in the database & return the amount of times it was used.
For instance. You login to the site, and you check this tool.
Your own business_codes in the database are AB~CD~EF
I want the tool to sort through each of your business_codes and output suggested business_codes to add to your profile based on the use of other records with the same business code.
The tool would look up each of your own codes and result:
84% of other business_codes with with AB in their profile also use HI
77% of other business_codes with with CD in their profile also use LM
This is absolute madness. My first stab at it was to create a metadata table and import the codes into the new table individually without the ~.
I can write the PHP to display all of this, my question is on the structure of the table (if a table even needs to be created). I am at a loss on how to even approach this issue.
Any help would be appreciated.

Related

MySQL table from phpMyAdmin's tracking reports

Using phpMyAdmin I can track a certain table's transactions (new inserts, deletes, etc), but is it possible to change or export it to a SQL table to be imported into my site using PHP.

While it is not exactly what I was looking for, I found a way to do it for the time being, one of the tables in the phpmyadmin (it's own) database is called pma__tracking, which contains a record of all tables being tracked, one of its columns is the data_sql longtext column which writes each report (ascendingly which is a bit annoying) in the following format
# log date username
data definition statement
Just added it for future references.

How can I compare a mysql table between two databases and update the differences efficiently?

Here is the setup, I have multiple online stores that I would like to use the same product database. Currently they are all separate, so updating anything requires going through and copying products over, it is a giant pain. What I would like to do is create a master product database that every night, each site will compare its database with, and make updates accordingly.
The idea is one master database of products that will be updated a few times a day, and then say at 2:00 AM, a cron job will run pulling the updates to the individual websites.
Just a few more details on the database, there is one table 'products' that needs to be compared, but it also needs to look at table 'prodcuts_site_status' to determine the value for the products status for each given site, so I can't simply dump the master table and re-important it into the site databases.
Creating a php script to go row by row and compare and update would be easy enough, but I was hoping there existed a more elegant/efficient solution in mysql. Any suggestions?
Thanks!

To sum up you could try 3 different methods:
use SELECT ... INTO OUTFILE and then LOAD DATA INFILE from MySQL Cross Server Select Query
use the replication approach described here Perl: How to copy/mirror remote MYSQL table(s) to another database? Possibly different structure too?
use a FEDERATED storage engine to join tables from different servers http://dev.mysql.com/doc/refman/5.0/en/federated-storage-engine.html

Which database table schema for storing survey data?

I'm developing software for conducting online surveys. When a lot of users are filling in a survey simultaneously, I'm experiencing trouble handling the high database write load. My current table (MySQL, InnoDB) for storing survey data has the following columns: dataID, userID, item_1 .. item_n. The item_* columns have different data types corresponding to the type of data acquired with the specific items. Most item columns are TINYINT(1), but there are also some TEXT item columns. Large surveys can have more than a hundred items, leading to a table with more than a hundred columns. The users answers around 20 items in one http post and the corresponding row has to be updated accordingly. The user may skip a lot of items, leading to a lot of NULL values in the row.
I'm considering the following solution to my write load problem. Instead of having a single table with many columns, I set up several tables corresponding to the used data types, e.g.: data_tinyint_1, data_smallint_6, data_text. Each of these tables would have only the following columns: userID, itemID, value (the value column has the data type corresponding to its table). For one http post with e.g. 20 items, I then might have to create 19 rows in data_tinyint_1 and one row in data_text (instead of updating one large row with many columns). However, for every item, I need to determine its data type (via two table joins) so I know in which table to create the new row. My zend framework based application code will get more complicated with this approach.
My questions:
Will my solution be better for heavy write load?
Do you have a better solution?

Since you're getting to a point of abstracting this schema to mimic actual datatypes, it might stand to reason that you should simply create new table sets per-survey instead. Benefit will be that the locking will lessen and you could isolate heavy loads to outside machines, if the load becomes unbearable.
The single-survey database structure then can more accurately reflect your real world conditions and data input handlers. It ought to make your abstraction headaches go away.
There's nothing wrong with creating tables on the fly. In some configurations, soft sharding is preferable.

This looks like obvious solution would be to use document database for fast writes and then bulk-insert answers to MySQL asynchronously using cron or something like that. You can create view in the document database for quick statistics, but allow filtering and other complicated stuff only in MySQ if you're not a fan of document DBMSs.

Archive MySQL data using PHP every week

I have a MySQL DB that receives a lot of data from a source once every week on a certain day of the week at a given time (about 1.2million rows) and stores it in, lets call it, the "live" table.
I want to copy all the data from "live" table into an archive and truncate the live table to make space for the next "current data" that will come in the following week.
Can anyone suggest an efficient way of doing this. I am really trying to avoid -- insert into archive_table select * from live --. I would like the ability to run this archiver using PHP so I cant use Maatkit. Any suggestions?
EDIT: Also, the archived data needs to be readily accessible. Since every insert is timestamped, if I want to look for the data from last month, I can just search for it in the archives

The sneaky way:
Don't copy records over. That takes too long.
Instead, just rename the live table out of the way, and recreate:
RENAME TABLE live_table TO archive_table;
CREATE TABLE live_table (...);
It should be quite fast and painless.
EDIT: The method I described works best if you want an archive table per-rotation period. If you want to maintain a single archive table, might need to get trickier. However, if you're just wanting to do ad-hoc queries on historical data, you can probably just use UNION.
If you only wanted to save a few periods worth of data, you could do the rename thing a few times, in a manner similar to log rotation. You could then define a view that UNIONs the archive tables into one big honkin' table.
EDIT2: If you want to maintain auto-increment stuff, you might hope to try:
RENAME TABLE live TO archive1;
CREATE TABLE live (...);
ALTER TABLE LIVE AUTO_INCREMENT = (SELECT MAX(id) FROM archive1);
but sadly, that won't work. However, if you're driving the process with PHP, that's pretty easy to work around.

Write a script to run as a cron job to:
Dump the archive data from the "live" table (this is probably more efficient using mysqldump from a shell script)
Truncate the live table
Modify the INSERT statements in the dump file so that the table name references the archive table instead of the live table
Append the archive data to the archive table (again, could just import from dump file via shell script, e.g. mysql dbname < dumpfile.sql)

This would depend on what you're doing with the data once you've archived it, but have you considered using MySQL replication?
You could set up another server as a replication slave, and once all the data gets replicated, do your delete or truncate with a SET BIN-LOG 0 before it to avoid that statement also being replicated.

How can I search all of the databases on my mysql server for a single string of information

I have around 150 different databases, with dozens of tables each on one of my servers. I am looking to see which database contains a specific person's name. Right now, i'm using phpmyadmin to search each database indvidually, but I would really like to be able to search all databases and all tables at once. Is this possible? How would I go about doing this?

A solution would be to use the information_schema database, to list all database, all tables, all fields, and loop over all that...
There is this script that could help for at least some part of the work : anywhereindb (quoting) :
This code is search all the tables and
all the rows and columns in a MYSQL
Database. The code is written in PHP.
For faster result, we are only
searching in the varchar field.
But, as Harmen noted, this only works with one database -- which means you'd have to wrap something arround it, to loop over each database on your server.
For more informations about that, take a look at Chapter 19. INFORMATION_SCHEMA Tables ; especially, the SCHEMATA table, which contains the name of all databases on the server.
Here's another solution, based on a stored procedure -- which means less client/server calls, which might make it faster : http://kedar.nitty-witty.com/miscpages/mysql-search-through-all-database-tables-columns-stored-procedure.php

The right way to go about it would be to NORMALIZE your data in the first place!!!
You say name - but most people have at least 2 names (a surname and a forename) are these split up or in the same field? If they are in the same field, then what order do they appear in? how are they capitalized?
The most efficient way to try to identify where the data might be would be to write a program in C which sifts the raw data files (while the DBMS is shut down) looking for the data - but that will only tell you what table they apppear in.
Failing that you need to write some PHP which iterates through each database ('SHOW databases' works much like a select statement), then iterates through each table in the database, then generates a SELECT statement filtering on each CHAR or VARCHAR column large enough to hold the name you are looking for (try running 'DESC $table').
Good luck.
C.

The best answer probably depends on how often you want to do this. If it is ad-hoc once a week type stuff then the above answers are good.
If you want to do this kind of search once a second, maybe create a "data warehouse" database that contains just the table:columns you want to search (heavily indexed, with a reference back to the source database if that is needed) populated by cron job or by stored procedures driven by changes in the 150 databases...

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.