Compressing text before storing it in the database - php

I need to store a very big amount of text in mysql database. It will be millions of records with field type LONGTEXT and database size will be huge.
So, I want ask, if there is a safe way to compress text before storing it into TEXT field to save space, with ability to extract it back if needed?
Something like:
$archived_text = compress_text($huge_text);
// saving $archived_text to database here
// ...
// ...
// getting compressed text from database
$archived_text = get_text_from_db();
$huge_text = uncompress_text($archived_text);
Is there a way to do this with php or mysql? All the texts are utf-8 encoded.
UPDATE
My application is a large literature website where users can add their texts. Here is the table I have:
CREATE TABLE `book_parts` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`book_id` int(11) NOT NULL,
`title` varchar(200) DEFAULT NULL,
`content` longtext,
`order_num` int(11) DEFAULT NULL,
`views` int(10) unsigned DEFAULT '0',
`add_date` datetime DEFAULT NULL,
`is_public` tinyint(3) unsigned NOT NULL DEFAULT '1',
`published_as_draft` tinyint(3) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `key_order_num` (`order_num`),
KEY `add_date` (`add_date`),
KEY `key_book_id` (`book_id`,`is_public`,`order_num`),
CONSTRAINT FOREIGN KEY (`book_id`) REFERENCES `books` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Currently it has about 800k records and weights 4 GB, 99% of queries are SELECT. I have all reasons to think that numbers increase diagrammatically. I wouldn't like to store texts in the files because there is quite heavy logic around and my website has quite a few hits.

Are you going to index these texts. How big is read load on this texts? Insert load?
You can use InnoDB data compression - transparent and modern way. See docs for more info.
If you have realy huge texts (say, each text is above 10MB), than good idea is not to store them in Mysql. Store compressed by gzip texts in file system and only pointers and meta in mysql. You can easily expand your storage in future and move it to e.g. DFS.
Update: another plus of storing texts outside Mysql: DB stays small and fast. Minus: high probability of data inconsistence.
Update 2: if you have much programming resourses, please, take a look on projects like this one: http://code.google.com/p/mysql-filesystem-engine/.
Final Update: according to your info, you can just use InnoDB compression - it is the same as ZIP. You can start with these params:
CREATE TABLE book_parts
(...)
ENGINE=InnoDB
ROW_FORMAT=COMPRESSED
KEY_BLOCK_SIZE=8;
Later you will need to play with KEY_BLOCK_SIZE. See SHOW STATUS LIKE 'COMPRESS_OPS_OK' and SHOW STATUS LIKE 'COMPRESS_OPS'. Ratio of these two params must be close to 1.0: Docs.

If you're compressing (eg. gzip), then don't use TEXT fields of any sort. They're not binary-safe. Data going into/coming out of text fields is subject to character set translation, which probably (though not necessarily) mangle the compressed data and give you a corrupted result when you retrieve/uncompress the text.
Use BLOB fields instead, which are binary-transparent and do not to any translation of the data.

It might be better to define the text field as blob, and compress the data in PHP to save costs in communication.
CREATE TABLE book_parts (
......
content blob default NULL,
......
)
In PHP, use gzcompress and gzuncompress.
$content = '......';
$query = sprintf("replace into book_parts(content) values('%s') ",
mysql_escape_string(gzcompress($content)) );
mysql_query($query);
$query = "select * from book_parts where id = 111 ";
$result = mysql_query($query);
if ($result && $row = mysql_fetch_assoc($result))
$content = gzuncompress($row['content']);

You may also want to use a COMPRESS option to enable compression of packets.
Read some information about this option:
Use Compression in MySQL Connector/Net
Compress Property in dotConnect for MySQL
For PHP I have found this - MYSQLI_CLIENT_COMPRESS for mysqli_real_connect function.

You could use php functions gzdeflate and gzinflate for text.

There are no benefits in compressing large
texts into a database.
Here are the problems you might face in the long run:
If the server crashes the data may be hard to recover.
Not ideal for search.
It takes additional time to transfer the data between the mysql server and the browser.
Time consuming for backup (not using replication).
I think storing these large texts into a disk file will be easier for:
Distributed backup (rsync).
PHP to handle file upload.

Related

How to store CodeIgniter session data if data length likely to exceed BLOB size?

I have a fairly elaborate multi-page query form. Actually, my site has several for querying different data sets. As these query parameters span multiple page requests, I rely on sessions to store the accumulated query parameters. I'm concerned that the data stored in session, when serialized, might exceed the storage capacity of the MySQL BLOB storage capacity (65,535 bytes) of the data column specified by the CodeIgniter session documentation:
CREATE TABLE IF NOT EXISTS `ci_sessions` (
`id` varchar(128) NOT NULL,
`ip_address` varchar(45) NOT NULL,
`timestamp` int(10) unsigned DEFAULT 0 NOT NULL,
`data` blob NOT NULL,
KEY `ci_sessions_timestamp` (`timestamp`)
);
How can I store my user-entered query parameters and be sure that they will be preserved for a given user?
I considered using file-based-caching to cache this data with a key generated from the session ID:
// controller method
public function my_page() {
// blah blah check POST for incoming query params and validate them
$validated_query_params = $this->input->post();
// session library is auto-loaded
// but apparently new session id generated every five mins by default?
$cache_key = "query_params_for_sess_id" . $this->session->session_id;
$this->load->driver('cache');
// cache for an hour
$this->cache->file->save($cache_key, $validated_query_params, 3600);
}
However, I worry that the session ID might change when a new session ID gets generated for a given user. Apparently this happens by default every five minutes as CodeIgniter generates new session IDs to enhance security.
Can anyone suggested a tried-and-true (and efficient!) means of storing session data that exceeds the 64K blob size?
You could use MEDIUMBLOB, which supports up to 16MB, or LONGBLOB which supports up to 4GB.
See https://dev.mysql.com/doc/refman/8.0/en/string-type-overview.html
Also, if you declare your blob with a length like BLOB(2000000) (whatever is the length you need), it will automatically promote it to a data type that can hold that length of data. For example, BLOB(2000000) will implicitly become MEDIUMBLOB.
mysql> create table t ( b blob(2000000) );
mysql> show create table t\G
*************************** 1. row ***************************
Table: tt
Create Table: CREATE TABLE `tt` (
`b` mediumblob
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4

Images store in mysql database

I want to store images in mysql database. images sends through the android app.The problem is I really don't know which method to used to store images in msql database.
1. Access image from file
2.Store image in filesystem and store the url in the database
3.Store the image in the database
After storing these images.I want to get these images into PHP file.
I want to know which method is best and how to do that?
Yes, you can store images in the database, but it's not advisable and bad practice.
Do not store images in the database. Store images in directories and store references to the images in the database. Like store the path to the image in the database or the image name in the database.
Images can get quite large 1MB >. Even if it's a small image, it's still bad practice. You're putting extra hits on your database transactions that you can completely avoid. No big websites stores images in their database. For example: Facebook doesn't do it. It's not a good idea.
Avoid it. Use directories.
On a mobile platform database calls are expensive. Means it takes much more cycles to fetch from database and may degrade the performance of your application. Obviously you don't want that. Still if you want to do that BLOB is your answer.
The efficient way
add the absolute path field in your database and fetch the image from the local storage.
This is the best practice.
first create a photo table to store the reference of the image
CREATE TABLE IF NOT EXISTS `photos` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`img` varchar(255) NOT NULL,
`sound` varchar(255) NOT NULL,
`about` text NOT NULL,
`date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`user_id` varchar(255) NOT NULL,
`likes` int(255) NOT NULL,
`down` int(255) NOT NULL,
`seen` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 ;
then in the upload process after saving the image to the folder get the image filename and store it to the table like below.
$save_image = mysql_query("INSERT INTO photos VALUES('','$originalFile','','$about','$data','$uid','0','0','0')") or die(mysql_error());
and then simply to display the image
$p = mysql_query("select img from photos");
$p_f = mysql_fetch_assoc($p);
$img = $p_f['img'];
and to display the image simply
<img src="myFolder/<?php echo $img?>">
and you done.

Change MySQL-spatial into geoJson by using PHP

I have a shape file, and I want to show it on the web by using leaflet (http://leaflet.cloudmade.com/). Since leaflet only support geoJSON, I should change the shp file into geoJSON. It is easy since I can use "save as" capability in Quantum-GIS.
Although I can use geojson as database (by reading, edit and writing the file programmatically), I think it is better to use the "real" database. My-SQL is the most popular one, and it support spatial data, so I decide to use MySQL.
The scenario is:
Change shp into MySQL (I use ogr2ogr and just simply run this command: ogr2ogr -f "MySQL" MySQL:"geo,user=root,host=localhost,password=toor" -lco engine=MYISAM airports.shp)
Fetch MySQL database into geojson <-- here is the problem
Using ajax to get the geojson and change the layout <-- this should be easy, I'm good with JQuery
There is a column in My MySQL table which its type is "GEOMETRY", Look the table definition below:
CREATE TABLE IF NOT EXISTS `airports` (
`OGR_FID` int(11) NOT NULL AUTO_INCREMENT,
`SHAPE` geometry NOT NULL,
`cat` decimal(10,0) DEFAULT NULL,
`na3` varchar(80) DEFAULT NULL,
`elev` double(32,3) DEFAULT NULL,
`f_code` varchar(80) DEFAULT NULL,
`iko` varchar(80) DEFAULT NULL,
`name` varchar(80) DEFAULT NULL,
`use` varchar(80) DEFAULT NULL,
UNIQUE KEY `OGR_FID` (`OGR_FID`),
SPATIAL KEY `SHAPE` (`SHAPE`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=77 ;
Is there any way to change such a table into geojson format?
(I prefer the easy way, but if there is not, just change the column into array like is acceptable)
EDIT:
I use geophp written by phayes.
https://github.com/phayes/geoPHP/wiki/Example-format-converter.
This solves the main problem. Only need to a bit mess up with adding feature etc.
Any easier solution?
While there may not be a direct method to convert from a mysql spatial entity to geojson, you can try the following:
get the WKT (Well Known Text) of the entity. (MySQL Reference)
convert from WKT to geojson (Done in perl, although you should be able to find it in other languages or write your own in javascript);
Note that just calling jsonEncode() on the entity, as others have suggested, will not yeild geoJson.
My personal suggestion, which does not directly answer your question, would be to store the data in the format you need it retrieved in. It will reduce the overhead required to process the data every time you need it.
The easiest way to do this is to store the geojson in plain text as you suggested. If, for whatever reason, you also need the geometry stored in native format, you can store it in another column. The only downside is keeping the two columns in sync.

MySQL, which is more efficient longtext, text, or blob? Improving insert efficiency

I am in the process of migrating a large amount of data from several databases into one. As an intermediary step I am copying the data to a file for each data type and source db and then copying it into a large table in my new database.
The structure is simple in the new table, called migrate_data. It consists of an id (primary key), a type_id (incremented within the data type set), data (a field containing a serialized PHP object holding the data I am migrating), source_db (refers to the source database, obviously), data_type (identifies what type of data we are looking at).
I have created keys and key combinations for everything but the data field. Currently I have the data field set as a longtext column. User inserts are taking about 4.8 seconds each on average. I was able to trim that down to 4.3 seconds using DELAY_KEY_WRITE=1 on the table.
What I want to know about is whether or not there is a way to improve the performance even more. Possibly by changing to a different data column type. That is why I ask about the longtext vs text vs blob. Are any of those more efficient for this sort of insert?
Before you answer, let me give you a little more information. I send all of the data to an insert function that takes the object, runs it through serialize, then runs the data insert. It is also being done using Drupal 6 (and its db_query function).
Any efficiency improvements would be awesome.
Current table structure:
CREATE TABLE IF NOT EXISTS `migrate_data` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`type_id` int(10) unsigned NOT NULL DEFAULT '0',
`data` longtext NOT NULL,
`source_db` varchar(128) NOT NULL DEFAULT '',
`data_type` varchar(128) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `migrated_data_source` (`source_db`),
KEY `migrated_data_type_id` (`type_id`),
KEY `migrated_data_data_type` (`data_type`),
KEY `migrated_data_id__source` (`id`,`source_db`),
KEY `migrated_data_type_id__source` (`type_id`,`source_db`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 DELAY_KEY_WRITE=1;
The various text/blob types are all identical in storage requirements in PHP, and perform exactly the same way, except text fields are subject to character set conversion. blob fields are not. In other words, blobs are for when you're storing binary that MUST come out exactly the same as it went in. Text fields are for storing text data that may/can/will be converted from one charset to another.

How to unset BLOB with php/mysql?

Cuz, I did it unintentionally. After reading wikipedia I understand the "binary large object" is for large media files, and I'm not saving a media file.
So how does data get stored this way? What's wrong with this setup to display text as BLOB in phpmyadmin?
the MySql field from phpmyadmin,
Field = 'first_name'
Type = text
Collation = latin1_bin
Null = No
Default = None
The php code,
$insertName = "INSERT INTO name(first_name,last_name)VALUES('$firstName','$lastName')";
$dbSuccess_1 = mysql_query($insertName,$connectID) or die ("ERROR_1 - Unable to save
to MySQL".error_get_last().mysql_error($connectID));
If you're asking how to change a BLOB column to TEXT you would use a query similar to this:
ALTER TABLE `name`
CHANGE COLUMN `first_name` `first_name` TEXT NULL FIRST
,CHANGE COLUMN `last_name` `last_name` TEXT NULL AFTER `first_name`;
You can use PHPMyAdmin to make the change even easier.
TEXT and BLOB are essentially identical, except that TEXT fields are subject to character set limitations (and character set is taken into account while sorting/grouping the fields), while BLOBs are stored verbatim as a sequence of bytes and will not be transformed.
Relevant docs: http://dev.mysql.com/doc/refman/5.0/en/blob.html

Categories