I cleaned the question a little bit because it was getting very big and unreadable.
Running on my localhost.
As you can see in the image below, the query takes 755.15 ms when selecting from the table Job that contains 15000 rows (with the where conditions returning 6650)
The table Company contains 1000 rows.
The table geo__name contains 84300 rows approx and is not giving me any problem, so I believe the problem is the database structure or something.
The structure of these 2 tables is the following:
Table Job is:
CREATE TABLE `job` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`created_at` datetime NOT NULL,
`updated_at` datetime NOT NULL,
`company_id` int(11) NOT NULL,
`activity_sector_id` int(11) DEFAULT NULL,
`status` int(11) NOT NULL,
`active` datetime NOT NULL,
`contract_type_id` int(11) NOT NULL,
`salary_type_id` int(11) NOT NULL,
`workday_id` int(11) NOT NULL,
`geoname_id` int(11) NOT NULL,
`title` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`minimum_experience` int(11) DEFAULT NULL,
`min_salary` decimal(7,2) DEFAULT NULL,
`max_salary` decimal(7,2) DEFAULT NULL,
`zip_code` int(11) DEFAULT NULL,
`vacancies` int(11) DEFAULT NULL,
`show_salary` tinyint(1) NOT NULL,
PRIMARY KEY (`id`),
KEY `created_at` (`created_at`,`active`,`status`) USING BTREE,
CONSTRAINT `FK_FBD8E0F823F5422B` FOREIGN KEY (`geoname_id`) REFERENCES `geo__name` (`id`),
CONSTRAINT `FK_FBD8E0F8398DEFD0` FOREIGN KEY (`activity_sector_id`) REFERENCES `activity_sector` (`id`),
CONSTRAINT `FK_FBD8E0F85248165F` FOREIGN KEY (`salary_type_id`) REFERENCES `job_salary_type` (`id`),
CONSTRAINT `FK_FBD8E0F8979B1AD6` FOREIGN KEY (`company_id`) REFERENCES `company` (`id`),
CONSTRAINT `FK_FBD8E0F8AB01D695` FOREIGN KEY (`workday_id`) REFERENCES `workday` (`id`),
CONSTRAINT `FK_FBD8E0F8CD1DF15B` FOREIGN KEY (`contract_type_id`) REFERENCES `job_contract_type` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=15001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
The table company is:
CREATE TABLE `company` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`logo` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`created_at` datetime NOT NULL,
`updated_at` datetime NOT NULL,
`website` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`user_id` int(11) NOT NULL,
`phone` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`cifnif` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`type` int(11) NOT NULL,
`subscription_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `UNIQ_4FBF094FA76ED395` (`user_id`),
KEY `IDX_4FBF094F9A1887DC` (`subscription_id`),
KEY `name` (`name`(191)),
CONSTRAINT `FK_4FBF094F9A1887DC` FOREIGN KEY (`subscription_id`) REFERENCES `subscription` (`id`),
CONSTRAINT `FK_4FBF094FA76ED395` FOREIGN KEY (`user_id`) REFERENCES `user` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
The query is the following:
SELECT
j0_.id AS id_0,
j0_.status AS status_1,
j0_.title AS title_2,
j0_.min_salary AS min_salary_3,
j0_.max_salary AS max_salary_4,
c1_.id AS id_5,
c1_.name AS name_6,
c1_.logo AS logo_7,
a2_.id AS id_8,
a2_.name AS name_9,
g3_.id AS id_10,
g3_.name AS name_11,
j4_.id AS id_12,
j4_.name AS name_13,
j5_.id AS id_14,
j5_.name AS name_15,
w6_.id AS id_16,
w6_.name AS name_17
FROM
job j0_
INNER JOIN company c1_ ON j0_.company_id = c1_.id
INNER JOIN activity_sector a2_ ON j0_.activity_sector_id = a2_.id
INNER JOIN geo__name g3_ ON j0_.geoname_id = g3_.id
INNER JOIN job_salary_type j4_ ON j0_.salary_type_id = j4_.id
INNER JOIN job_contract_type j5_ ON j0_.contract_type_id = j5_.id
INNER JOIN workday w6_ ON j0_.workday_id = w6_.id
WHERE
j0_.active >= CURRENT_TIMESTAMP
AND j0_.status = 1
ORDER BY
j0_.created_at DESC
When executing the above query I have these results:
In MYSQL Workbench: 0.578 sec / 0.016 sec
In Symfony profiler: 755.15 ms
The question is: Is the duration of this query correct? if not, how can I improve the speed of the query? it seems too much.
The Symfony debug toolbar if it helps:
As you can see in the below image, I'm only getting the data I really need:
The explain query:
The timeline:
The MySQL server can't handle the load being placed on it. This could be due to resource contention, or because it has not been appropriately tuned and it could also be a problem with your hard drive.
First, I would start your performance by adding MySQL keyword "STRAIGHT_JOIN" which tells MySQL to query the data in the order I have provided, dont try to think the relationships for me. However, on your dataset being so small, and already 1/2 second, don't know if that will help as much, but on larger datasets I have known it to SIGNIFICANTLY improve performance.
Next, you appear to be getting lookup descriptions based on the PK/FK relationship results. Not seeing the indexes on those tables, I would suggest doing covering indexes which contain both the key and description so the join can get the data from the index pages it uses for the JOIN instead of use index page, find the actual data pages to get the description and continue.
Last, your job table with the index on (created_at,active,status), might perform better if the index had the index as ( status, active, created_at ).
With your existing index, think of it this way, each day of data is put into a single box. Within each day box that is sorted by an active timestamp (even if simplified by active date), THEN the status.
So, for each day CREATED, you open a box. Look at secondary boxes, one for each "Active" timestamp (ex: by day). Within each Active timestamp (day), only now can you see if the "Status = 1" records. So open each active timestamp day, assess Status = 1, then close each created day box and go to the next created day box and repeat. So look at the labor intensive of open each box per day, each active box within that day.
Now, under the suggested index starting with status. You now have a very finite number of boxes, one for each status. Open only the 1 box for status = 1 These are the only ones you want to consider... All the others you don't care. Inside that, you have the actual records based on ACTIVE Timestamp and that is sub-sorted. From that, you can jump directly to those at the current timestamp. From the first record and the rest within the box, you now have all the records that qualify. Done. Since these records (index) ALSO has the Created_at as part of the index, it can optimize that with the descending sort order.
For ensuring "covering indexes" for the other lookup tables if they do not yet exist, I suggest the following.
table index
company ( id, name, logo )
activity_sector (id, name )
geo__name ( id, name )
job_salary_type ( id, name )
job_contract_type ( id, name )
workday ( id, name )
And the MySQL Keyword...
SELECT STRAIGHT_JOIN (rest of query...)
There are several reasons as to why Symfony is slow.
1. Server fault
First, it could be the server fault. Server performances may hinder your query time.
2. Data size and defered rendering
Then comes the data size. As you can see on the image below, the query on one of my project have a 50Mb data size (currently about 20k rows).
Parsing 50Mb in HTML can take some time, mostly because of loops.
Still, there are solutions about this, like defered rendering.
Defered rendering is quite simple, instead of parsing data in your twig you,
send all data to a javascript varaible, and use javascript to parse/render data once the DOM is loaded.
3. Query optimisation
As I wrote in comment, you can check the following question, on which I explained why custom queries are important.
Are Doctrine relations affecting application performance?
In this question, you will read that order matter... It's in fact the most important thing.
While static data in your databases are often inserted in the right order,
it's rarely the case for dynamic data (data provided by user during the website life)
Which is why, using ORDER BY in your query will often speed up the page rendering,
as doctrine won't be doing extra queries on it's own.
As exemple, One of my site have about 700 entries diplayed on the index.
First, here is the query count while using findAll() :
It show 254 query (253 duplicates) in 144ms, plus 39 render time.
Next, using the second parameter of findBy(), ORDER BY, I get this result :
You can see the full query here (sreenshot is big)
Much better, 1 query only in 8ms, and about the same render time.
But, here, I don't use any fields from associations.
From the moment I will do it, doctrine qui do some extra query, and query count and time will skyrocket.
In the end, it will turn back to something like findAll()
And last, this is the custom query :
In this custom query, the query time went from 8ms to 38ms.
But, unlike the previous query, I got way more data in my result,
which will prevent doctrine from doing extra query.
Again, ORDER BY() matter in this query. Without it, I skyrocket back to 84 queries.
4. Partials
When you do custom query, you can load partials objects instead of full data.
As you said in your question, description field seems to slow down your loading speed,
with partials, you can avoid to load some fields from the table, which will speed up query speed.
First, instead of your regular syntax, this is how you will create the query builder :
$em=$this->getEntityManager();
$qb=$em->createQueryBuilder();
Just in case, I prefer to keep $em as a separate variable (if I want to fetch some class repository for example).
Then you can start your partial select. Careful, first select can't include any association fields :
$qb->select("partial job.{id, status, title, minimum_experience, min_salary, max_salary, zip_code, vacancies")
->from(Job::class, "job");
Then you can add your associations :
$qb->addSelect("company")
->join("job.company", "company");
Or even add partial association in case you don't need all the data of the association :
$qb->addSelect("partial activitySector.{id}")
->join("job.activitySector", "activitySector");
$qb->addSelect("partial job.{id, company_id, activity_sector_id, status, active, contract_type_id, salary_type_id, workday_id, geoname_id, title, minimum_experience, min_salary, max_salary, zip_code, vacancies, show_salary");
5. Caches
You could also use various caches, like Zend OPCache for PHP, which you will find some advices in this question: Why Symfony3 so slow?
There is also the SQL cache Varnish.
This round up about everything I can share to lower your loading time.
Hope it will prove useful and you will be able to solve your problem.
So many keys , try to minimize the number of keys.
I am using phpmyadmin to manage my database. In one of my tables, when I click to see the last page (30 records) of 60000 records, I get this alert:
"This operation could take a long time. Proceed anyway?" which in fact does not happen and it shows up records in a very short time.
By the way, my table structure is as follow:
CREATE TABLE `documents` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) DEFAULT NULL,
`type` char(50) NOT NULL,
`comment` varchar(512) DEFAULT NULL,
`date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ;
so why am I getting this alert?
phpMyAdmin, as you know is a PHP based database management panel, it has to scan through the database to the last 30 rows, which means it is processing each record between 0 and 60000 to retrieve records 59970-60000.
Depending on how fast your web & sql server is, this can take a long time. The warning message is simply there to say that it could take a long time to get these records because of the size of your database.
In my application whenever a user upload a wallpaper,i need to crop that wallpaper into
3 different sizes and store all those paths(3 paths for cropped images and 1 for original upload wallpaper) into my database.
I also need to store the tinyurl of the original wallpaper(one which is uploaded by user).
While solving the above described problem i come up with following table structure.
CREATE TABLE `wallpapermaster` (
`wallpaperid` int(10) unsigned NOT NULL AUTO_INCREMENT,
`userid` bigint(20) NOT NULL,
`wallpaperloc` varchar(100) NOT NULL,
`wallpapertitle` varchar(50) NOT NULL,
`wallpaperstatus` tinyint(4) DEFAULT '0' COMMENT '0-Waiting,1-approved,2-disapproved',
`tinyurl` varchar(40) NOT NULL
) ENGINE=MyISAM
wallpaperloc is a comma separated field consisting of original wallpaper location plus locations of all cropped instances.
I know using comma separated field considered to be a bad design in the world of relational database,So Would you like to suggest some other neat and efficient ways?
Use a 1:n relationship between the wallpapermaster and a location table.
Something like this:
CREATE TABLE wallpapermaster (
wallpaperid int unsigned NOT NULL AUTO_INCREMENT,
userid bigint NOT NULL,
wallpaperloc varchar(100) NOT NULL,
wallpapertitle varchar(50) NOT NULL,
wallpaperstatus tinyint DEFAULT '0' COMMENT '0-Waiting,1-approved,2-disapproved',
primary key (wallpaperid)
) ENGINE=InnoDB;
CREATE TABLE wallpaperlocation (
wallpaperid int unsigned NOT NULL,
location varchar(100) NOT NULL,
tinyurl varchar(40),
constraint fk_loc_wp
foreign key (wallpaperid)
references wallpapermaster (wallpaperid),
primary key (wallpaperid, location)
) ENGINE=InnoDB;
The primary key in wallpaperlocation ensures that the same location cannot be inserted twice.
Note that int(10) does not define any datatype constraints. It is merely a hint for client application to indicate how many digits the number has.
Usually you use a fixed location (maybe out of a config), fix extension (usually jpg) and a special filename formats like [name]-1024x768.jpg. This way you only the the name
In my opinion using ; or , in siple application is quite good solution even in relational databases.
You should propably think about amout of splitted images count. If there will be less than 5 wallpapers I would not take overhead complex solutions.
It's easy to maintain in database and application. You will use string splitting/joining methods
No need to adding extra additional tables which you will use join to retreive values.
Using simple varchar rather xml is better because you don't have to rely on application database access engine. When you use ORM or JDBC you have extra additional work to do to handle more complex datatypes.
In more complex systems I would make XML column.
While thumbnails are generated automatically from the single uploaded file, you don't need to store paths to cropped/resized files at all.
Instead you can just use normalized filenames for thumbnails and then find them in filesystem - something that KingCrunch suggested: photo1.jpg, photo1-medium.jpg etc.
Anyway, my 2cc: for avoiding traversing your image library (and created thumbnails) with some harvesters, it's good idea to encrypt name of each thumbnail even with just MD5 + some secret key programmatically, so only your program which knows the key can create proper path to the thumbnails basing on the original name/path. For other clients, naming sequence will be just random.
CREATE TABLE `wallpapermaster` (
`wallpaperid` int(10) unsigned NOT NULL AUTO_INCREMENT,
`userid` bigint(20) NOT NULL,
`wallpapertitle` varchar(50) NOT NULL,
`wallpaperstatus` tinyint(4) DEFAULT '0' COMMENT '0-Waiting,1-approved,2-disapproved',
`tinyurl` varchar(40) NOT NULL
) ENGINE=MyISAM
Create a new table which will create relationship with "wallpapermaster" table
create wallpapermaster_mapper(
`id` unsigned NOT NULL AUTO_INCREMENT,
`wallpapermaster_id` int(10) //this will be foreign key with id of wallpapermaster table
`wallpaper_path1` varchar(100) NOT NULL,
`wallpaper_path2` varchar(100) NOT NULL,
`wallpaper_path3` varchar(100) NOT NULL,
)
I have ~38 columns for a table.
ID, name, and the other 36 are bit-sized settings for the user.
The 36 other columns are grouped into 6 "settings", e.g. Setting1_on, Setting1_colored, etc.
Is this the best way to do this?
Thanks.
If it must be in one table and they're all toggle type settings like yes/no, true/false, etc... use TINYINT to save space.
I'd recommend creating a separate table 'settings' with 36 records one for each option. Then create a linking table to the user table with a value column to record the user settings. This creates a many-to-many link for the user settings. It also makes it easy to add a new setting--just add a new row to the 'settings' table. Here is an example schema. I use varchar for the value of the setting to allow for later setting which might not be bits, but feel free to use TINYINT if size is an issue. This solution will not use as much space as the one table with the danger of a large sparsely populated set of columns.
CREATE TABLE `user` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(64) DEFAULT NULL,
`address` varchar(64) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `setting` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(64) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `setting_user` (
`user_id` int(11) NOT NULL DEFAULT '0',
`setting_id` int(11) unsigned NOT NULL,
`value` varchar(32) DEFAULT NULL,
PRIMARY KEY (`user_id`,`setting_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
All depends on how you want to access them. If you want to (or must) just select one of them, then go with the #Ray solution. If they can be functionally grouped (really, not some pretend grouping for all those that start with F) ie. you'll always need number of them for a function and reading and writing them doesn't make sense as an individual flag, then perhaps storing them as ints and using logic operaoprs on them might be a goer.
Saying that, unless you are doing a lot of read and writes to the db during a session, bundling them up into ints gives you very little performance wise, it would save some space on the DB, if all the options had to exist. If doesn't exist = false, it could be a toss up.
So all things being unequal, I'd go with Mr Ray.
MySQL has a SET type that could be useful here. Everything would fit into a single SET, but six SETs might make more sense.
http://dev.mysql.com/doc/refman/5.5/en/set.html
I am working on a social network type site in PHP, I have done this once before and the site outgrew my coding ability to keep up, this was a couple years back and now I am wanting to tackle this project again.
Basicly on my network there is a friend_friend mysql table that keeps track of who is who's friend, for every confirmed friend, there are 2 entries into the DB
here is that table:
CREATE TABLE IF NOT EXISTS `friend_friend` (
`autoid` int(11) NOT NULL AUTO_INCREMENT,
`userid` int(10) DEFAULT NULL,
`friendid` int(10) DEFAULT NULL,
`status` enum('1','0','3') NOT NULL DEFAULT '0',
`submit_date` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`alert_message` enum('yes','no') NOT NULL DEFAULT 'yes',
PRIMARY KEY (`autoid`),
KEY `userid` (`userid`),
KEY `friendid` (`friendid`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1657259 ;
I then have a user table with all users info called friend_reg_user
Then a table for bulletins that users post, the object is to only show bulletins from users who you are friends with.
Here is bulletins table
CREATE TABLE IF NOT EXISTS `friend_bulletin` (
`auto_id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(10) NOT NULL DEFAULT '0',
`bulletin` text NOT NULL,
`subject` varchar(255) NOT NULL DEFAULT '',
`color` varchar(6) NOT NULL DEFAULT '000000',
`submit_date` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`status` enum('Active','In Active') NOT NULL DEFAULT 'Active',
`spam` enum('0','1') NOT NULL DEFAULT '1',
PRIMARY KEY (`auto_id`),
KEY `user_id` (`user_id`),
KEY `submit_date` (`submit_date`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=455144 ;
Ok so to do this I would either run a query on the friend_friend table to get all friends of a user and add them to a string like this 1,2,3,4,5,6 those would be friend ID numbers and then select from bulletin table where bulletin author ID is in my friend ID list
The second method is to use JOINS to get all this data at once.
My quest now finally, once the site gets very large, when there are millions of friends records and bulletins in the DB this all slows down, what are my options to speed things up? Is there a better way to do this? Also I am planning on changing bulletins to include more then just bulletins but do more of user actions like the big sites do now so it will show status updates and blogs and bulletins and all
What you are looking to do can likely be done in a number of ways. You can have a summary rollup table that combines all of the associated data (friends in this instance) for a given member.
That is a pretty basic approach but it can become much more sophisticated.
Summary rollups act as a persistent caching mechanism. You'll have to keep this up to date by some method - a cron job, MapReduce, etc. You dont want to compute all that data every time you need it - instead, compute it at regular intervals so that it is ready quickly.
Memcache is a great tool for caching but that caches data that has to be computed at some point anyway. Unfortunately, Memcache is not persistent. That means that if the memcached servier or service dies, so does your data.
You can explore some advanced cutting edge technologies such as MongoDB, CouchDB, Project Voldemort and neo4j for some even more efficient tools.
Id also recommend looking at the source code for the open source PHP based social network Elgg at http://www.elgg.org/
Facebook uses memcached to store SQL databases as distributed hash tables. That's probably your best bet.