Convert strings to ids as MySQL composite key - php

I send a lot of request during a second to the database in a following pattern:
name1 | name2 | time
aaa bbb 5
aaa bbb 2
ccc ddd 3
name1-name2 is a composite key, there are maybe 10 combinations of those, and more than a million records in a table.
Now, because these are strings, it's really slow to select something from that. I've been thinking about converting this composite key to some integer to speed it up, and make it one column instead of two.
Will this really speed up my database?
How can I convert this composite key to achieve this?

Related

Convert string of words to unique number

I'm building my own custom speller that should correct word or number of words to custom correction.
For that I created a SQL table that have the next structure:
|---------------------|-------------------------|----------------------------|
| id (int11) | keyword (varchar 255) | correction (varchar 255) |
|---------------------|-------------------------|----------------------------|
| 1 | Facebooc | Facebook |
|---------------------|-------------------------|----------------------------|
| 2 | I lovi you | I love you |
|---------------------|-------------------------|----------------------------|
| 3 | This is a tsst | This is a test |
|---------------------|-------------------------|----------------------------|
Keyword column is mark as unique and have index on it (asc)
keyword can be more than one word (batch of words)
When I get request with new keyword, my code is making a select query to check if this specific keyword have a correction (if keyword not exists its inserting the new keyword to the table without correction.
Now I expect this table to be very large (about 10 million rows and even more), so I thought maybe placing a unique flag and index on the keyword column is not so good idea.
Does the correct structure is good for my needs?
I thought maybe to add another int column to the table and check if there a way to convert each keyword to a unique number so maybe it will be easy to search and select data? think its good idea?
You can add a column with a short checksum as provided by the crc32() function.
However, crc32() does not generate a unique index. There is a probability greater than 0 that 2 strings generate the same checksum.
If the same checksum is not found for a new keyword, the keyword is certainly not yet in the database.
If the same checksums are found, then the keywords themselves have to be checked.
Whether this method brings advantages in speed also depends heavily on the performance of the database system.

What´s the most efficient way to store date based availibilites

i´m new here and have a project where i have a performance problem that seems hard to fix. I have created a search for objects that have availibilities means a very simple structure:
ObjectID | Date | Number of available objects
---------------------------------------------
Object1 | 01.01.2019 | 1
Object1 | 02.01.2019 | 1
Object1 | 03.01.2019 | 0
Object1 | 04.01.2019 | 1
Object1 | 05.01.2019 | 1
Object2 | 01.01.2019 | 1
Object2 | 02.01.2019 | 1
Object2 | 03.01.2019 | 0
Object2 | 04.01.2019 | 1
Object2 | 05.01.2019 | 1
I´m working with mysql and php
A typical query would be:
Which objects are available between 01.01.2019 - 28.02.2019 10 days available in a row.
It´s not really hard to make it working with mysql but once you have more then 10 users using the searchfunction the server load becomes extremly high eventough the table is optimised (indexes etc.) The server has 2 cores with 4 GB of RAM.
I also tried to store the dates comma separated per object in a table and let the application search but that creates extrem high traffic between application and database which is also not a real solution.
In total we have around 20.000 Objects and availabilities stored for max. 500 days so we have around 10.000.000 datasets in my first solution.
Does anybody have and idea what´s the most efficient way toDo this ?
(How to store it to make search fast ?)
For this project i sadly can not cache the searches.
Thanks for you help and Kind Regards, Christoph
Don't store dates in 28.02.2019 format. Flip it over, then use a DATE datatype in the table. Please provide SHOW CREATE TABLE.
What is your algorithm for searching?
The header says "number of objects", yet the values seem to be only 0 or 1, as if it is a boolean flag??
What is the maximum timespan? (If under 64, there are bit-oriented tricks we could play.)
By looking at adjacent rows (cf LAG(), if using MySQL 8.0), decide when an object changes state. Save those dates.
From that, it is one more hop to get "how many consecutive days" starting at one of those dates. This will be a simple query, and very fast if you have a suitable composite index.

how to use unique value for primary key of two table in phpMyAdmin -sql ? (use in php)

I have 2 table like this:
1-private_messages table:
messageId | message
--------------------
1 | text1
4 | text4
2-public_messages table:
messageId | message
----------------------
2 | text2
3 | text3
5 | text5
in two table , messageId column is primaryKey
now I want that these two column be auto increment and has a unique Id in two table like shown above.
now when I want to insert a row in one of tables , I had to find max Id of each table and compare them to find max of them. then increase that and insert new row.
I want know, is there any better or automatic way that when I insert new row, database do that automatically?
thanks
You can obtain unique numbers in MySQL with a programming pattern like the following.
First create a table for the sequence. It has an auto-increment field and nothing else.
CREATE TABLE sequence (
sequence_id INT NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`sequence_id`)
)
Then when you need to insert a unique number into one of your tables, use something like these queries:
INSERT INTO sequence () VALUES ();
DELETE FROM sequence WHERE sequence_id < LAST_INSERT_ID();
INSERT INTO private_messages (messageID, message)
VALUES (LAST_INSERT_ID(), 'the message');
The second INSERT is guaranteed to use a unique sequence number. This guarantee holds even if you have dozens of different client programs connected to your database. That's the beauty of AUTO_INCREMENT.
The second query (DELETE) keeps the table from getting big and wasting space. We don't care about any rows in the table except for the most recent one.
Edit. If you're using php, simply issue the three queries one after the other using three calls to mysqli_query() or the equivalent method in the MySQL interface you have chosen for your program.
All that being said, beware of false economy. Don't forget that storage on Amazon S3 costs USD 0.36 per year per gigabyte. And that's the most expensive storage. The "wasted" storage cost for putting your two kinds of tables into a single table will likely amount to a few dollars. Troubleshooting a broken database app in production will cost thousands of dollars. Keep it simple!
Use flag like 1 for private messages and 0 for public in a single table , so it is easy to insert and easy to fetch and compare....
messageId | message | flag
---------------------------
1 | text1 | 1
2 | text2 | 0
3 | text3 | 0
4 | text4 | 1
5 | text5 | 0
There is no way to do this automatically that I'm aware of.
You might be able to write a function in the DB to make it happen, I don't recommend it.
Mark Baker's suggestion, to have a single messages table and a public/private flag sounds like the best way to go if you absolutely need IDs to be unique across both types of messages.

Followers/following database structure

My website has a followers/following system (like Twitter's). My dilemma is creating the database structure to handle who's following who.
What I came up with was creating a table like this:
id | user_id | followers | following
1 | 20 | 23,58,84 | 11,156,27
2 | 21 | 72,35,14 | 6,98,44,12
... | ... | ... | ...
Basically, I was thinking that each user would have a row with columns for their followers and the users they're following. The followers and people they're following would have their user id's separated by commas.
Is this an effective way of handling it? If not, what's the best alternative?
That's the worst way to do it. It's against normalization. Have 2 seperate tables. Users and User_Followers. Users will store user information. User_Followers will be like this:
id | user_id | follower_id
1 | 20 | 45
2 | 20 | 53
3 | 32 | 20
User_Id and Follower_Id's will be foreign keys referring the Id column in the Users table.
There is a better physical structure than proposed by other answers so far:
CREATE TABLE follower (
user_id INT, -- References user.
follower_id INT, -- References user.
PRIMARY KEY (user_id, follower_id),
UNIQUE INDEX (follower_id, user_id)
);
InnoDB tables are clustered, so the secondary indexes behave differently than in heap-based tables and can have unexpected overheads if you are not cognizant of that. Having a surrogate primary key id just adds another index for no good reason1 and makes indexes on {user_id, follower_id} and {follower_id, user_id} fatter than they need to be (because secondary indexes in a clustered table implicitly include a copy of the PK).
The table above has no surrogate key id and (assuming InnoDB) is physically represented by two B-Trees (one for the primary/clustering key and one for the secondary index), which is about as efficient as it gets for searching in both directions2. If you only need one direction, you can abandon the secondary index and go down to just one B-Tree.
BTW what you did was a violation of the principle of atomicity, and therefore of 1NF.
1 And every additional index takes space, lowers the cache effectiveness and impacts the INSERT/UPDATE/DELETE performance.
2 From followee to follower and vice versa.
One weakness of that representation is that each relationship is encoded twice: once in the row for the follower and once in the row for the following user, making it harder to maintain data integrity and updates tedious.
I would make one table for users and one table for relationships. The relationship table would look like:
id | follower | following
1 | 23 | 20
2 | 58 | 20
3 | 84 | 20
4 | 20 | 11
...
This way adding new relationships is simply an insert, and removing relationships is a delete. It's also much easier to roll up the counts to determine how many followers a given user has.
No, the approach you describe has a few problems.
First, storing multiple data points as comma-separated strings has a number of issues. It's difficult to join on (and while you can join using like it will slow down performance) and difficult and slow to search on, and can't be indexed the way you would want.
Second, if you store both a list of followers and a list of people following, you have redundant data (the fact that A is following B will show up in two places), which is both a waste of space, and also creates the potential of data getting out-of-sync (if the database shows A on B's list of followers, but doesn't show B on A's list of following, then the data is inconsistent in a way that's very hard to recover from).
Instead, use a join table. That's a separate table where each row has a user id and a follower id. This allows things to be stored in one place, allows indexing and joining, and also allows you to add additional columns to that row, for example to show when the following relationship started.

Storing huge arrays in the database

I am working on an image processing project. I am using php for the GUI and matlab for the algorithm and Mysql as my database. I have 30,000 of images stored in the database for now. My matlab program will generate 3 arrays for each image containing 300 elements. So, my question is
Whether to save that arrays generated from matlab for all images in a single txt file or create a txt file for each image. Which method will be easier to retrieve datas and store into the database?
How hard it is to copy array form a txt file and saving it into the database? Is there any standard process for this?
The elements of the array must be retrieved for further computation. Can we use serialize and deserialize for this purpose?
I have to compare 2 array elements at a time and obtain a third array with the minimum values from both array. Eg A=[1 2 3 4] and B=[6 1 4 2] I have to compare each elements of this 2 array and generate a third array c=[1 1 3 2] that is comparing each elements of the array with its corresponding elements of the other array and storing the minimum element in the third array. This process is repeated with thousands of arrays comparing with 1 fixed array. Is there any php function to do this?
Any suggestions and help will be highly appreciable. Thank you.
The best solution in this case is to create a separate meta table that stores the data(arrays) that relates to your images.
Here's a simple example EER:
The combination of image_id (referencing foreign key), array, and index make up the primary key for the meta table.
I'm assuming you can safely represent the keys of your arrays as just 0-indexed, and all values of your arrays are also numbers (which is why all fields are of type INT, but if not, adjust the datatypes accordingly).
image_id represents the linked image_id.
array represents the specific array (you said up to 3 arrays, so values in this column would be anywhere from 1-3).
index is the numerical index in the array.
value is the value in the array paired with the index.
Example data in your meta table might look like:
image_id | array | index | value
------------------------------------------------------
256 | 1 | 0 | 5
256 | 1 | 1 | 9
256 | 1 | 2 | 4
256 | 1 | 3 | 23
256 | 1 | 4 | 1
256 | 2 | 0 | 9
256 | 2 | 1 | 15
256 | 2 | 2 | 8
256 | 2 | 3 | 19
256 | 2 | 4 | 11
In the above example data, we have two arrays (each represented by 1 and 2 in the array column) with 5 elements each (key represented in the index column, and value represented in the value column.
You can store however many arrays you want, with however many indexes you want.
You can also perform your needed array comparison calculation right in MySQL using GROUP BY. Here is how you can find the minimum value for each key across all arrays for image_id 256 (let's say there are 3 arrays):
SELECT index, MIN(value) AS minvalue
FROM image_meta
WHERE image_id = 256
GROUP BY index
Since a composite index is set up on (image_id, array, index), this should be extremely quick.
This design also allows you to have a variable number of indexes and arrays per image.
Reading through your problems, regardless of my comment, a relational database would be better for you, and if performance is not a big issue, then SQL would be the language to solve your issues best, not on the PHP level.
And yes, PHP has easy ways also to solve those problems, albeit not as easy as in SQL.

Categories