Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I am using php 5.3 and postgresql 9.1
Presently, I am doing DB work "outside" DB in PHP by fetching data from DB and processing the data and finally inserting/updating/deleting in DB, but as I am getting comfortable working with postgresql functions I have started coding in plpgsql.
Now, I would like to know is there any speed difference between the two or I can use which ever I am confortable with.
Also, will the answer be same for higher versions => php 5.5 and postgresql 9.3
Depends on what do you do. PL/pgSQL is optimized for data manipulation - PHP is optimized for html pages production. Some background technology is similar - and speed of basic structures is similar - PHP is significantly faster in string manipulations, but PLpgSQL runs in same address space as PostgreSQL database engine, and use same data types as PostgreSQL database engine, so there is zero overhead from data type conversions and interprocess communications.
Stored procedures has strong opponent and strong defenders - it is any other technology, and if you can use it well, it can serve perfect for small, for large projects. It is good for decomposition - it naturally divide application to presentation (interactive) layer and to data manipulation layer. It is important for data centric applications and less important for presentation centric applications. And opponents agree so, sometimes a stored procedures are necessary from performance reasons.
I disagree with kafsoksilo - debugging, unit testing, maintaining is not any issue - when you have knowledges about this technology - you can use almost all tools, that you know. And plpgsql language is pretty powerful (for data manipulation area) language - well documented with good diagnostic, clean and readable error messages and minimum issues.
Plpgsql is faster, as you don't have to fetch the data, process them and then submit a new query. All the process is done internally and it is also precompiled which also boosts performance.
Moreover when the database is on a remote server and not locally, you will have the network roundtrip delay. Sometimes the network roundtrip delay is higher that the time that your whole script needs to run.
For example if you need to execute 10 queries on a slow network, using plpgsql and execute only one would be a great improvement.
If the processing that you are going to perform is fetching large chunks of data, and output a true or false, then plpgsql gain will be even greater.
On the other hand, using plpgsql and putting logic in the database, makes your project a lot more difficult to debug, to fix errors and to unit testing. Also it makes a lot more difficult to change the RDBMS in the future.
My suggestion would be to manipulate the data at php, and use a little plpgsql only when you want to isolate some logic for security or data integrity reasons, or you want to tune your project for the best possible performance (which should be a concern after the first release).
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Just curious how other people feel about this. Will appreciate opinions or facts, whatever you got :)
I am working on an application where a lot of info is pulled from MySQL and needed on multiple pages.
Would it make more sense to...
Pull all data ONCE and store it in SESSION variables to use on other pages
Pull the data from the database on each new page that needs it
I assume the preferred method is #1, but maybe there is some downside to using SESSION variables "too much"?
Side question that's kind of related: As far as URLs, is it preferable to have data stored in them (i.e. domain.com/somepage.php?somedata=something&otherdata=thisdata) or use SESSION variables to store that data so the URLs can stay general/clean (i.e. domain.com/somepage.php)?
Both are probably loaded questions but any possible insight would be appreciated.
Thanks!
Your question can't be answered to the point where the answer is applicable everywhere.
Here's why: many web server architectures deal with having HTTP server (Apache, Nginx), serverside language (PHP, Ruby, Python) and RDBMS (MySQL, PostgreSQL) on one and the same machine.
That's one of the most common setups you can find.
Now, this is what happens in your scenario:
You connect to MySQL - you establish a connection from PHP > MySQL and that "costs" a little
You request the data, so MySQL reads it from the hard drive (unless cached in RAM)
PHP gets the data and allocates some memory to hold the information
Now you save that to a session. But by default, sessions are disk based so you just issued a write operation and you spent at least 1 I/O operation of your hard drive
But let's look at what happened - you moved some data from disk (MySQL) to RAM (PHP variable) which then gets saved at disk again.
You really didn't help yourself or your system in that case, what happens is that you made things slower.
On the other hand, PHP (and other languages) are capable of maintaining connections to MySQL (and other databases) so they minimize the cost of opening a new connection (which is really inexpensive in the grand scheme of things).
As you can see, this is one scenario. There's a scenario where you have your HTTP server on a dedicated machine, PHP on dedicated machine and MySQL on dedicated machine. The question is, again, is it cheaper to move data from MySQL to a PHP session. Is that session disk based, redis based, memcache based, database based? What's the cost of establishing the connection to MySQL?
What you need to ask, in any scenario that you can imagine - what are you trading off and for what?
So, if you are running the most common setup (PHP and your database on the same machine) - the answer is NO, it's not better to store some MySQL data in a session.
If you use InnoDB (and you probably are) and if it's optimized properly, saving some data to a session to avoid apparent overhead of querying the db for reads won't yield benefits. It's most likely going to be quite the opposite.
Putting it into the session is almost always a terrible idea. It's not even worth considering unless you've exhausted all other options.
Here's how you tackle these problems:
Evaluate if there's anything you can do to simplify the query you're running, like trim down on the columns you fetch. Instead of SELECT * try SELECT x,y where those are the only columns you need.
Use EXPLAIN to find out why the query is taking so long. Look for any easy wins like adding indexes.
Check that your MySQL server is properly tuned. The default configuration is terrible and some simple one-line fixes can boost performance dramatically.
If, and only if, you've tried all these things and you can't squeeze out any more performance, you want to try and cache the results.
You only pull the pin on caching because caching is one of the hardest things to get right.
You can use something like Memcached or Redis act as a faster store for pre-refetched results. They're designed to automatically expire cached data that's no longer used.
The reason using $_SESSION is a bad idea is because once data is put in there very few take the time to properly expunge it later, leading to an ever growing session. If you're concerned about performance, keep your sessions as small as possible.
Just think about your users(client pc). session takes some spaces to user pc, also session can get lost, may e after closing page, or copying the link and paste it to other browser. God practice there i think just use query, but note something, try as much as possible to reduce number of queries in page, it will slow down your site.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am about to create a PHP web project that will be consisting of a large database. The database will be MYSQL and will store more than 30000 records per day. To optimize the DB I thought to use MEMCACHED library with it. Am i going the correct way or some other alternative can be used to overcome the data optimization problem. I just want to provide faster retrieval and insertion. Can somebody advise me which tool should I use and how, as the data will gradually increase at a higher rate ? Should i use object relational mapping concept too ?
You can use Master & Slave technique for this purpose. Basically it would be combination of 2 db first for read operation and other for write operation.
I'd side with #halfer and say he's right about the test data. At least you'll know that you're not trying to optimize something that doesn't need optimizing.
On top of test data you'll also need some test scenarios to mimic the traffic patterns of your production environment, that's the hard part and really depends on the exact application patterns: how many reads versus writes versus updates / per second.
Given your number (30k) you'd average out at about 3 inserts / second which I'd assume even the cheapest machines could handle with ease. As for reads, a years worth of data would be just under 11M records. You may want to partition the data (mysql level or application level) if look ups become slow but I doubt you'd need to with such relatively small volumes. The real difference maker would be if the # of reads is 1000x more than the number of inserts, then you could look into what #ram sharma suggested and set up a replicated master-slave model where the master takes all the writes and the slaves are read-only.
Memcached is a powerful beast when used correctly and can turn a slow DB disk read into a blazing fast memory read. I'd still only suggest you look into it IF the DB is too slow. Adding moving parts to any application also adds potential failure points and increases the overall complexity.
EDIT: as for the use of an ORM, that's your choice and really won't change a thing concerning the DB's speed although it may add fractions of milliseconds to the end user.. usually worth it in my experience.
Cheers --
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Recently I started eCommerce project and I need to use datamining. Simply my question is which solution may I use in development:
MySQL with PHP
SQL Server with ASP
Actually MySQL is good solution and suitable for my project for many reasons, but is it good and optimal for Datamining? I'm beginner in Datamining and I'll develop this as part of my project. Is there a good support tools for it?
SQL databases play little role in data mining. (That is, unless you consider computing various business reports involving averages as "data mining", IMHO these should at most be called "business analytics").
The reason is that the advanced statistics performed for data mining can't be accelerated by the database indexes. And usually, they also take much longer than interactive users would be willing to wait.
So in the end, most actual data mining happens "offline", outside of a database. The database may serve as initial data storage, but the actual data mining process then usually is 1. load data from database, 2. preprocess data, 3. analyze data, 4. present results.
I know that there exist some SQL extensions such as the DMX ("Data mining eXtensions"). But seriously, that isn't really data mining. That is an interface to invoke some basic prediction functionality, but nothing general. Any good data mining will require customization of the process, and you can't do this with a DMX one-liner.
Fact is, the most important tools for data mining are R and SciPy. Followed by the specialized tools such as RapidMiner, Weka and ELKI. Why? Because R and Python are best for scripting. It's ALL about customization of the process. Forget any push-button solution, they just don't work reasonably well yet.
You just can't reasonably train e.g. a support vector machine "inside" of a SQL database (and even less, inside a NoSQL database, which usually is not much more than a key-value store). Also don't underestimate the need to preprocess your data. So in fact, you will be training on a copy of the data set. You might then just get this copy into a data format most efficient for your actual data mining process later on; instead of keeping it in a random-access general-purpose database store.
I would say pick the language you and your team feels more comfortable with, there are goods and not so goods on both sides, I reckon you do a bit research before you pick a path keeping in mind your business needs.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I'm currently building a web application that needs to access a (really) big database of strings and compare them to a given 'request string' - This has to be done over and over again using different comparison methods (number of identical words, order of words...) and should be scalable and, more important, fast.
I thought about about implementing the comparison method itself in C, because it's obviously much faster than interpreted, though more 'webby' languages like PHP.
This brought me to three questions:
1) How am I supposed to 'connect' the C application to the web server (currently Apache)? I thought about the usual CGI-way, but because of its need to create one new process per request it would be less scalable and fast - at least that's what I read about it.
2) Which database technology is best to use with C for this use-case?
3) Last but not least, do you think it's worth the struggle or would it be enough to go the usual way by building a PHP-Script that connects to a MySQL database - how big is the speed difference?
Thanks in advance,
David
Bad application architecture, bad database design and bad code will always run inefficiently slow won't scalable.
If you get that out of the way most "very high demand" purposes can be served with any of the interpreted languages - remember they're optimized at what they do (wasteful with memory for example but usually pretty fast even for high demand use).
Having said that we get to the real answer:
In database design there is no perfect approach for all use-cases. You may need to structure your database in one way to achieve the best reading speed, and in another to achieve the best writing speed, and yet another to achieve the best flexibility (but sacrificing both read and write speed). A section may need high read speed another may need high write speed and yet another may need high flexibility.
Think of the way you designed the database and ask yourself "do i need to connect to the database 300 times in a session to gather all of the data or could I write one big statement that can read it all at once" if this is not easily achievable think if you can write an SQL stored procedure that can do that if you come up empty again think if you could change the structure (sacrificing flexibility for example) to allow for a one-liner read or a stored procedure read.
In any case many-many connections from PHP to MySQL sending 1 query means you're spending a hefty amount of time waiting for PHP to connect to the MySQL server over the network (even if it's local) and for MySQL to process the request and supply an answer etc. and so on and so forth. If you can either batch-generate all of the statements you intend to send into one string that's great, if not if you can group them in smaller batches that's OK as well.
If all of the above is unacceptable or if you really have a knack for writing a C script today (laudable intent by all means):
You should consider writing a C MySQL module defining some UDF - user defined functions. They can have access to row-level data at the moment MySQL is reading it, and they can process for output aggregate and all that jazz.
Writing an apache module or a PHP module will deliver data in much the same format as the MySQL module but the processing won't be running inside the MySQL server it will be requesting data from MySQL, processing it and requesting more data.
Apache-2.4.x has mod_dbd, which can talk to various database back-ends (using precompiled queries for speed) straight out of Apache config-files. This can be used in, for example, mod_rewrite's rules to alter incoming requests depending on results of the queries.
In other words, depending on the details of your application, you may be able to avoid coding altogether.
If you do have to write code, however, I'd suggest you first implement the prototype in a language you know best. There is a good chance, it will be "fast enough" -- or that the bulk of the time spent per request will be on the database side making the choice of the language irrelevant.
Connecting to a database back-end from C is easy -- each database vendor has (at least one) client-library implementation for C-programs.
If your app does not require multiple HTTP-servers to talk to the same (single) database, then your best DB would be a local one: SleepyCat's (now Oracle's) "db", gdbm, or SQLite3.
Also, if the updates (changes to the database) are infrequent, you can use flat-files and build the database in memory... This will limit the database's maximum size, but will be the fastest and allow you avoid handling (re)connections, etc.
EDIT: memcache is an external database -- whether it is running on the same system or a remote one, you have to connect to it, talk to it, and copy data from its memory into yours. If database-changes aren't frequent, you are better off keeping stuff in your own memory and using it straight from there.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I've been writing asp.net apps with SQL Server back ends for the past 10 years. During that time, I have also written some PHP apps, but not many.
I'm going to be porting some of my asp.net apps to PHP and have run into a bit of an issue. In the Asp.net world, it's generally understood that when accessing any databases, using views or stored procedures is the preferred way of doing so.
I've been reading some PHP/MySQL books and I'm beginning to get the impression that utilizing stored procedures in MySQL is not advisable. I hesitate in using that word, advisable, but that's just the feeling I get.
So, the advice I'm looking for is basically, am I right or wrong? Do PHP developers use stored procedures at all? Or, is it something that is shunned?
Whether to use Stored Procedures or not is more of a religious or political discussion at a bar than not.
What needs to be done is clearly define your application layers and not step over those boundaries. Stored procedures have several advantages and disadvantages over doing queries outside of the database.
Advantage 1: Stored procedures are modular. This is a good thing from a maintenance standpoint. When query trouble arises in your application, you would likely agree that it is much easier to troubleshoot a stored procedure than an embedded query buried within many lines of GUI code.
Advantage 2: Stored procedures are tunable. By having procedures that handle the database work for your interface, you eliminate the need to modify the GUI source code to improve a query's performance. Changes can be made to the stored procedures--in terms of join methods, differing tables, etc.--that are transparent to the front-end interface.
Advantage 3: Stored procedures abstract or separate server-side functions from the client-side. It is much easier to code a GUI application to call a procedure than to build a query through the GUI code.
Advantage 4: Stored procedures are usually written by database developers/administrators. Persons holding these roles are usually more experienced in writing efficient queries and SQL statements. This frees the GUI application developers to utilize their skills on the functional and graphical presentation pieces of the application. If you have your people performing the tasks to which they are best suited, then you will ultimately produce a better overall application.
With all that in mind there are several disadvantages.
Disadvantage 1:
Applications that involve extensive business logic and processing could place an excessive load on the server if the logic was implemented entirely in stored procedures. Examples of this type of processing include data transfers, data traversals, data transformations and intensive computational operations. You should move this type of processing to business process or data access logic components, which are a more scalable resource than your database server.
Disadvantage 2:
Do not put all of your business logic into stored procedures. Maintenance and the agility of your application becomes an issue when you must modify business logic in Sp language. For example, ISV applications that support multiple RDBMS should not need to maintain separate stored procedures for each system.
Disadvantage 3:
Writing and maintaining stored procedures is most often a specialized skill set that not all developers possess. This situation may introduce bottlenecks in the project development schedule.
I have probably missed some advantages and disadvantages, feel free to comment.
Could also be because MySQL didn't get stored procedures until version 5. If you use prepared statement you should be okay...just don't use inline SQL
A couple of years ago I ended up writing a fair amount (~3K lines) of stored procedure code for a PHP/MySQL project. In my experience:
MySQL stored procedures probably aren't going to help you performance-wise.
Executing SPs through prepared statements with MySQLi can cause headaches.
It can be hard to abstract out common patterns—I found myself repeating myself more than I liked.
Depending on the MySQL version and configuration, you might need SUPER privileges to create SPs.
If you're porting code that uses stored procedures, it might be easiest to keep them. It's certainly possible to use them with PHP and MySQL, and I wouldn't personally call it inadvisable, exactly. I just probably wouldn't choose to use them again if I were starting a new PHP project from scratch.
In general, I very much dislike stored procedures because:
It's too easy to slip in business logic that shouldn't be there.
Updates to the application that require updates to your stored procedures is a pain to synchronize, especially if you have to revert to a previous build.
For any database manipulation, I recommend going with a PHP ORM framework like http://www.doctrine-project.org or a framework that includes ORM like CakePHP. You'll have an added bonus of being able to more easily switch between SQL Server and MySQL.
Stored procedures are -- often -- a complete waste of effort.
When in doubt, actually measure the performance. You'll often find that stored procedures add complexity for no recognizable benefit. You may have no performance enhancement from your SP's.
Some folks think they're very "Important". It's essential to actually measure performance rather than quibble or debate.
Many (most?) webapps use a database abstraction layer to take care of injection vulnerabilities/etc.
If you want one for your own app, take a look at PDO. Here is a big tutorial about how to use it:
http://www.devshed.com/c/a/PHP/Using-PDO-Objects-in-PHP-5/
What about SQL injection ?
Procedures allow you to do parameter invocation on the WHERE clause, reducing injection hazards
Here is a balanced and informed article on stored procedures in MySQL: http://www.linuxjournal.com/article/9652?page=0,0
To dismiss them out of hand as a "waste of time" or "hard to maintain" or "providing no real benefit" in any database and application of significant size would be very unwise.