I have a search page written in PHP, and it needs to search in the MySQL database, and the result need to be sortable. This search page will be accessed by many users (>1000 at any time).
However, it is not feasible to sort the search result in MySQL, as it would be very slow.
I'm thinking of storing each search result into a temporary table (not MySQL temporary table), and the table name is stored inside another table for reference like this:
| id | table_name | timeout |
-----------------------------
| 1 | result_1 | 10000 |
| 2 | result_2 | 10000 |
Then I can use the temporary tables to sort any search results whenever needed without the need to reconstruct (with some modification) the query.
Each table will be dropped, according to the specified timeout.
Assuming I cannot modify the structure of existing tables that are used in the query, would this be a good solution or are there better ways? Please advice.
Thanks
There's no need to go to the trouble of storing the results in a persistent database when you just want to cache search results in memory. Do you need indexed access to relational data? If the answer is no, don't store it in a MySQL database.
I know that phpbb (an open source web forum which supports MySQL backends) uses a key-value store to back its search results. If the forum is configured to give you a link to the specific results page (with the search id hash in the URL's query string) then that link will be valid for awhile but eventually be flushed out of the cache, just like you want. It may be overkill to implement a full database abstraction layer if you're set on MySQL though. Anyway:
http://wiki.phpbb.com/Cache
You should just use memcached or something to store the results data, and you can easily retrieve the data and sort it in PHP. Also there are some PHP-specific cache frameworks that minimize the cost of loading and offloading data from the interpreter:
https://en.wikipedia.org/wiki/List_of_PHP_accelerators
Related
I am planning of using MongoDB for my comments function. But all the user's data is in MySQL. That means, I am trying to store comments in MongoDB and other fix information in MySQL. But when I started to think about retrieving comments from MongoDB, I came across the question that, how can I relate MongoDB data with MySQL.
For example,
user name, profile_url is stored in MySQL
comments are stored in MongoDB with user_id
So how can I retrieve data like
| name | profile_url | comments |
|-------|--------------|-----------------------|
| xyz | image.jpg | That was nice comment |
| abc | image.jpg | I agree |
Is it possible to do so? Or is there any other way?
and I am using Laravel 5 with jenssegers/laravel-mongodb package.
MongoDB and MySQL are completely separate applications. They have no way to communicate with each other except through your application. That means if a request needs data from both sources, it needs to query both separately.
But what you could do is keep redundant data in both databases. When you store a comment in MongoDB, also put the relevant user information into the Comment document. Such a duplication of information is a deadly sin in relational databases but is common practice in MongoDB. Until recently (3.2) MongoDB had no support for JOINs whatsoever, and even now it's still quite rudimentary. That means you should usually avoid storing the data you need to fulfill a request in more than one collection, even if that means that you have redundancies.
You can't retrieve the user and comments data in the same time. You have to:
get the user ID
query mongodb using the user ID as parameter and get the comments you need
If you need aggregate data for more users, you could consider to aggregate the queries:
get all the users' data from mysql
query mongodb asking all the comments of the specified users
merge users data with comments data in PHP
Finally: are you sure storing the comments in mongodb is the right solution in your case? Are you planning to have a so huge ammount of comments at the point that it will require to store them in an external DB ?
If you choose this way, you can consider to store the user data in mongodb too. But before doing this, plan carefully if it's the right choice for you (i.e. consider the queries you'll need to do, and check if the data, stored in this way, would be fit for you queries )
I have one varchar and two BLOB types of data for recipes. I don't need relations between data. For example I don't need to know which meals need potato etc.
I'll get meal's materails from database, edit them and save them again as BLOB. Then I will create a binary text file (~100KB) on the fly and save it in another column named binary data.
So my question is, does splitting table into two makes sense? Putting one BLOB in one table and another BLOB in another table changes performance (in theoretically). Or doesn't it change anything except backup issues ?
+-id--+-meal name (varchar)----+-materials (BLOB)------------+-binary data (BLOB)---+
| 1 | meatball | (meat, potato, bread etc.) | (some binary files) |
| 2 | omelette | (potato, egg, etc.) | (other binary files) |
+-----+------------------------+-----------------------------+----------------------+
If you will be using a ORM, better use the split table approach.
Otherwise, when you ask for the materials, the ORM will usually fetch all available fields... So reading big and unnecessary "binary" objects.
On other side of things... If you'll serve the binary results, a better approach would be to save the files and serve them directly.
It's more a design choice than a specific performance improvement. This assumes your query is not doing a catch-all "SELECT *". Your queries should always target the specific columns you are interested in for a given purpose.
If you do not anticipate the BLOB types for a specific meal growing past your current expectation, then keeping it in one table is an appropriate choice. This is assuming there is a one-to-one relationship between them.
However, if there is any chance there might be any need for more BLOB objects for a meal, then yes I would consider splitting it out to a new table and cross-references. Somtimes, it is better to be safe than sorry though.
I have a scenario in which I am not sure about what to do.
I have a website where a user can update their status. I am allowing the use of hash tags so a possible user post might look like:
Went for a great hike today!! #hiking
Now, I intend to store the post in a table appropriately named "POSTS" which is structured like this:
post_id | user_id | text | date
Now, when a user submits the form which holds the post text I run a script to create an array to get all of the hash tag terms the user used and then store them in an array.
So then I can loop through that array and insert the tags into the aptly named "TAGS" table. Now the structure of this table is this:
tag_id | post_id | user_id | tag
The only problem with this is that I do not know the post_id of the post until after I insert the data into the "POSTS" table (post_id is the primary key and is auto increment).
Now, I was thinking I could just SELECT the last row of data from the "POSTS" table for that user (after I insert the post), and then in turn use the returned post_id for my query that inserts the tag data into the "TAGS" table. This seems like not the best way? My question is:
Is this the best solution or is there a better way to go about this scenario?
I am brand new to Stack Overflow, so don't please down vote me. Comment and tell me what I am doing wrong and I will learn and ask better questions.
Thanks
You can get last insterted ID very simply:
mysql_insert_id() if you don't use PDO or using function lastInsertId() if you do.
Have a new column in both tables - unique_id - which holds a string you generate in code before querying the database. That way you have an id to tie posts and tags together before submission. I use this method all the time for similar applications.
Only issue is uniqueness, but there a variety of ways to generate unique ids (I normally use a mixture of timestamps and hashing).
This sort of depends on which version of mysql you're using and how you want to organize your code.
Option 1. Do exactly what you've said. PHP would contain the code to manage the database and how data is stored into the database. The only drawback that I see in what you've outlined is if there's an issue with dealing with the hashtags, then possibly you would have a post that is inserted to the database, but the hash part did not successfully complete. For certain applications (like a bank account), this may not be acceptable and this is what database transactions are for.
Option 2. Another way to handle this would be to write a mysql stored procedure that does both the insert and handling the hash tags. The stored procedure could also wrap the whole thing in a transaction so that your database is consistent. Note that this requires a version of mysql that supports stored procedures. The bad side of doing this is that you would have to write in mysql, which is different from PHP.
Both mysql and PHP can handle this application logic/datastore logic. It is a matter of how you want to organize the code. I would prefer keeping the layers distinct. Even if you are to do this in PHP, at least have a separate class that deals with the database and not do anything else. When your code gets bigger, having a separate class or module or namespace that manages these types of code really makes them easier to change and to test.
I need to insert data into DB in two language, and I am having a bit of a dilemma (data needs to exist in both languages). Is it better to make user insert data in both language at once, or is it better for the user to first insert in one language and then to insert in the second one? And if the latter is better how is the most efficient way to do this? How can I present all articles that are not inserted in both language?
DB structure for the articles:
Common table for all article (same data):
**article -> id_article | image | date_created | category_id | subcategory_id**
Table where data is different:
article_info -> article_id | name | text | lang_id
If the data must exist in both languages - i.e., the application assumes that if an item exists in one language, than it must exist in the other - then you should design your application so that the user must add them both at once.
When you perform the database writes, you should also be using transactions. This will ensure that either all of your writes succeed, or none of them do. It prevents the database from being left in an indeterminate state with a record for one language but not the other.
Have a look at this CodeIgniter manual page on transactions to get an idea on how they work.
You can also use the insert_batch method in the database class to insert both records at once. I don't know how it works with all database drivers, but the mysqli driver will generate a single query when you use insert_batch, so the entire insert will succeed or the entire insert will fail, similar to what happens with transactions. That said, I would still wrap the call to insert_batch in a transaction block just to be a bit paranoid and future-proof.
I'm in the process of rebuilding an application (lone developer here) using PHP and PostgreSQL. For most of the data, I'm storing it using a table with multiple columns for each attribute. However, I'm now starting to build some of the tables for the content storage. The content in this case, is multiple sections that each contain different data sets; some of the data is common and shared (and foreign key'd) and other data is very unique. In the current iteration of the application we have a table structure like this:
id | project_name | project_owner | site | customer_name | last_updated
-----------------------------------------------------------------------
1 | test1 | some guy | 12 | some company | 1/2/2012
2 | test2 | another guy | 04 | another co | 2/22/2012
Now, this works - but it gets hard to maintain for a few reasons. Adding new columns (happens rarely) requires modifying the database table. Audit/history tracking requires a separate table that mirrors the main table with additional information - which also requires modification if the main table is changed. Finally, there are a lot of columns - over 100 in some tables.
I've been brainstorming alternative approaches, including breaking out one large table into a number of smaller tables. That introduces other issues that I feel also cause problems.
The approach I am currently considering seems to be called the EAV model. I have a table that looks like this:
id | project_name | col_name | data_varchar | data_int | data_timestamp | update_time
--------------------------------------------------------------------------------------------------
1 | test1 | site | | 12 | | 1/2/2012
2 | test1 | customer_name | some company | | | 1/2/2012
3 | test1 | project_owner | some guy | | | 1/2/2012
...and so on. This has the advantage that I'm never updating, always inserting. Data is never over-written, only added. Of course, the table will eventually grow to be rather large. I have an 'index' table that lists the projects and is used to reference the 'data' table. However I feel I am missing something large with this approach. Will it scale? I originally wanted to do a simple key -> value type table, but realized I need to be able to have different data types within the table. This seems managable because the database abstraction layer I'm using will include a type that selects data from the proper column.
Am I making too much work for myself? Should I stick with a simple table with a ton of columns?
My advice is that if you can avoid using an EAV table, do so. They tend to be performance killers. They are also difficult to properly query especially for reporting (Yes let me join to this table an unknown number times to get all of the data out of it I need and, oh by the way, I don't know what columns I have available so I have no idea what columns the report will need to contain) and it is hard to get the kind of database constraints that you need to ensure data integrity (how to ensure that the required fields are filled in for instance) and it can cause you to use bad datatypes. It is far better in the long run to define tables that store the data you need.
If you are really need the functionality, then at least look into NoSQL databases which are more optimized for this sort of undefined data.
Moving your entire structure to EAV can lead to a lot of problems down the line, but it might be acceptable for the audit-trail portion of your problem since often foreign key relationships and strict datatyping may disappear over time anyway. You can probably even generate your audit tables automatically with triggers and stored procedures.
Note, however, that reconstructing old versions of records is non-trivial with an EAV audit trail and will require a fair amount of application code. The database will not be able to do it by itself.
An alternative you could consider is to store all your data (new and old records) in the same table. You can either include audit fields in the same table and leave NULL when unnecessary, or store some rows in the table being "current" and with audit-related fields stored in another table. To simplify your application, you can create a view which only shows current rows and issue queries against the view.
You can accomplish this with a joined table inheritance pattern. With joined table inheritance, you put common attributes into a base table along with a "type" column, and you can join to additional tables (which have the same primary key which is also a foreign key) based on type. Many Data-Mapper-Pattern ORMs have native support for this pattern, often called "polymorphism".
You could also use PostgreSQL's native table inheritance mechanism, but note the caveats carefully!