Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Recently I started eCommerce project and I need to use datamining. Simply my question is which solution may I use in development:
MySQL with PHP
SQL Server with ASP
Actually MySQL is good solution and suitable for my project for many reasons, but is it good and optimal for Datamining? I'm beginner in Datamining and I'll develop this as part of my project. Is there a good support tools for it?
SQL databases play little role in data mining. (That is, unless you consider computing various business reports involving averages as "data mining", IMHO these should at most be called "business analytics").
The reason is that the advanced statistics performed for data mining can't be accelerated by the database indexes. And usually, they also take much longer than interactive users would be willing to wait.
So in the end, most actual data mining happens "offline", outside of a database. The database may serve as initial data storage, but the actual data mining process then usually is 1. load data from database, 2. preprocess data, 3. analyze data, 4. present results.
I know that there exist some SQL extensions such as the DMX ("Data mining eXtensions"). But seriously, that isn't really data mining. That is an interface to invoke some basic prediction functionality, but nothing general. Any good data mining will require customization of the process, and you can't do this with a DMX one-liner.
Fact is, the most important tools for data mining are R and SciPy. Followed by the specialized tools such as RapidMiner, Weka and ELKI. Why? Because R and Python are best for scripting. It's ALL about customization of the process. Forget any push-button solution, they just don't work reasonably well yet.
You just can't reasonably train e.g. a support vector machine "inside" of a SQL database (and even less, inside a NoSQL database, which usually is not much more than a key-value store). Also don't underestimate the need to preprocess your data. So in fact, you will be training on a copy of the data set. You might then just get this copy into a data format most efficient for your actual data mining process later on; instead of keeping it in a random-access general-purpose database store.
I would say pick the language you and your team feels more comfortable with, there are goods and not so goods on both sides, I reckon you do a bit research before you pick a path keeping in mind your business needs.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed last year.
Improve this question
Currently, I am working on a website and just started studying backend. I wonder why nobody uses JSON as a database. Also, I don't quite get the utility of php and SQL. Since I could easily get data from JSON file and use it, why do I need php and SQL?
ok! let assume you put the data in a JSON variable and store it in a file for all your projects.
obviously, u need to add a subsystem for getting back up, then you will write it.
you must increase the performance for handling a very large amount of data, just like indexing, hash algorithms, and... , assume u handle it.
if you need some API for working and connecting with a variety of programming languages, u need to write them.
what about functionalities? what if you need to add some triggers, store procedures, views, full-text search and etc? ok, you will pay your time and add them.
ok, good job, but your system will grow up and you need to scale it, can you do it? u will write abilities for clustering across servers, sharding, and ...
now you need to guarantee that your system will compatible with ACID rules, to keep atomicity, Consistency, Isolation, and Durability.
can you always handle all querying techniques (Map/Reduce) and respond with a fast and standard structure?
now it's time to offer very quick write speeds, it brings serious issues for you
ok, now proper your solutions for condition racing, isolation level, locking, relations and ...
after you do all this work plus thousands of many others, probably you will have a DBMS a little bit just like MongoDB or other relational and non-relational databases!
so it's better to use them, however, obviously, you can choose to don't to use them too, I admit that sometimes saving data in a single file has better performance, but only sometimes, in some cases, with some data, for some purpose! if you know what exactly you do, then ist OK to save data in a JSON file.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I finished creating an accounting web application for an organization using codeignter and mysql db, and I have just submitted it to them, they liked the work, but they asked me how they would transfer their old manual data to the new one online, so that their members would be able to see their account balances and contributions history.
This is a major problem for me because most of my tables make use of 'referential integrity' to ensure data synchronization and would not support the style of their manual accounting.
I know a lot of people here have faced cases like this and I would love to know the best way to collect users history, and I also know this might probably be flagged as not a real question, but I really really have to ask people with experience.
I would appreciate all answers. Thanks (And vote downs too)..
No matter what the case is, data conversions are very challenging and almost always time consuming. Depending on the consistency of the data in question, it could be a case that about 80% of the data will transfer over neatly if you create a conversion program using PHP. That conversion code in and of itself may be more time consuming than it is worth. If you are talking hundreds of thousands of records and beyond, it is probably a good idea to make that conversion program work. Anyone who might suggest there is a silver bullet is certainly not correct.
Here are a couple of suggested steps:
(Optional) Export your Excel spreadsheets to Access. Access can help you to standardize data and has tools in place to help you locate records which have failed in some way. You can also create filters in Access if you need to. The benefit of taking this step, if you are familiar with Access, is that you have already begun the conversion process to a database. As a matter of fact, if you so desire, you can import your MySQL database information into Access as well. The benefit of this is pretty obvious: You can create a query and merge your two separate tables together to form one table, which could save you a great deal of coding.
Export your Access table/query into a CSV file (note, if you find it is overkill or if you don't have Access, you can skip step 1 and simply save your .xls or .xlsx file to type .csv. This may require more legwork for your PHP conversion code but that is probably a matter of preference. Some people prefer to avoid Access as much as possible, and if you don't normally use it you will be wasting time trying to learn it just to save yourself a little bit of time).
Utilize PHP's built-in str_getcsv function. This will convert a CSV file into a PHP array.
Create your automated program to parse through each record. Based on the column and its requirements, you can either accept or reject records. You can then export your data, such as was done in this SO answer, back to CSV. You can save two different CSV files, one with accepted records, and one with rejected records.
With rejected records, which are all but inevitable when transferring from a spreadsheet, you will need to have a course of action. The simplest way for your clients is probably to give them a procedure to either manually import records into the database, if you've given them an interface to do so, or - probably simpler but requiring more back-and-forth - to update the records in Excel to be compliant with the new system.
Edit
Based on the thread under your question which sheds more light on what you are trying to do (i.e., have a parent for each transaction that is an accepted loan), you should be able to contrive a parent field, even if it is not complete, by creating a parent record for each set of transactions based around an account. You can do this via Access, PHP, or, more likely, a combination.
Conclusion
Long story short, data conversions take time. If you put the time in up front, it will be far easier to maintain a standardized series of information in the long run. If you find something which takes less time in the beginning, it will mean additional work for you in the long run in order to make this "simple" fix work over time.
Similarly, the closer you can get legacy data to conform to your new data, the easier it will be for your clients to perform queries etc. While this may mean that some manual entry will be required on the part of you or your client, it is better to inform the client of the pros and cons of each method fully and let them decide. My recommendation would always be to put extra work in at the front-end because it almost always ends up cheaper than having to deal with a quick fix in the long run, but that is not always practical given real world constraints.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I am using php 5.3 and postgresql 9.1
Presently, I am doing DB work "outside" DB in PHP by fetching data from DB and processing the data and finally inserting/updating/deleting in DB, but as I am getting comfortable working with postgresql functions I have started coding in plpgsql.
Now, I would like to know is there any speed difference between the two or I can use which ever I am confortable with.
Also, will the answer be same for higher versions => php 5.5 and postgresql 9.3
Depends on what do you do. PL/pgSQL is optimized for data manipulation - PHP is optimized for html pages production. Some background technology is similar - and speed of basic structures is similar - PHP is significantly faster in string manipulations, but PLpgSQL runs in same address space as PostgreSQL database engine, and use same data types as PostgreSQL database engine, so there is zero overhead from data type conversions and interprocess communications.
Stored procedures has strong opponent and strong defenders - it is any other technology, and if you can use it well, it can serve perfect for small, for large projects. It is good for decomposition - it naturally divide application to presentation (interactive) layer and to data manipulation layer. It is important for data centric applications and less important for presentation centric applications. And opponents agree so, sometimes a stored procedures are necessary from performance reasons.
I disagree with kafsoksilo - debugging, unit testing, maintaining is not any issue - when you have knowledges about this technology - you can use almost all tools, that you know. And plpgsql language is pretty powerful (for data manipulation area) language - well documented with good diagnostic, clean and readable error messages and minimum issues.
Plpgsql is faster, as you don't have to fetch the data, process them and then submit a new query. All the process is done internally and it is also precompiled which also boosts performance.
Moreover when the database is on a remote server and not locally, you will have the network roundtrip delay. Sometimes the network roundtrip delay is higher that the time that your whole script needs to run.
For example if you need to execute 10 queries on a slow network, using plpgsql and execute only one would be a great improvement.
If the processing that you are going to perform is fetching large chunks of data, and output a true or false, then plpgsql gain will be even greater.
On the other hand, using plpgsql and putting logic in the database, makes your project a lot more difficult to debug, to fix errors and to unit testing. Also it makes a lot more difficult to change the RDBMS in the future.
My suggestion would be to manipulate the data at php, and use a little plpgsql only when you want to isolate some logic for security or data integrity reasons, or you want to tune your project for the best possible performance (which should be a concern after the first release).
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I recently got an internship involving the writing of an sql database and don't really know where to start.
The database will hold the information of a ton of electrical products: including but not limited to watt usage, brand, year, color, size, country of origin. The database's main task is to use some formulas using the above information to output several energy-related things. I will also be building a simple gui for it. The database will be accessed solely on Windows computers by no more than 100 people, possibly around the area of 20.
The organization does not normally give internships, and especially not to programmers (they're all basically electrical engineers); I got it via really insisting to a relative that works there who was looking into how to organize some of the products they overlook. In other words, I can't really ask THEM for guidance on the matter since they're not programmers, which is the reason I headed here to get a feel of where I'm starting.
I digress -- my main concerns are:
What is some recommended reading or viewing for this? Tips, tricks?
How do I set up a server for this? What hardware/bandwidth/etc. do I require?
How do I set up the Database?
For the gui client, I decided to take a look at having it be a window showing a webpage built with the sql embeded into php. Is this a good idea? Recommended reading for doing that? What are alternatives?
What security measures would you recommend? Recommended reading?
I have: several versions of Microsoft's mySQL servers, classroom experience with mySQl and PHP, several versions of Visual Studio, access to old PCs for testing (up to and including switching operating systems, hardware, etc.), access to a fairly powerful PC (non-modifiable), unlimited bandwidth.
Any help would be appreciated, thanks guys!
What is some recommended reading or viewing for this? Tips, tricks?
I'd recommend spending quite a bit of time in the design stage, before you even touch a computer. Just grab some scrap paper and a pencil and start sketching out various "screens" that your UI might expose at various stages (from menus to inputs and outputs); show them to your target users and see if your understanding of the application fits with the functionality they expect/require; consider when, where, how and why they will access and use the application; refine your design.
You will then have a (vague) functional specification, from which you will be able to answer some of the further questions posed below so that you can start researching and identifying the technical specification: at this stage you may settle upon a particular architecture (web-based?), or certain tools and technologies (PHP and MySQL?). You can then identify further resources (tutorials?) to help progress toward implementation.
How do I set up a server for this? What hardware/bandwidth/etc. do I require?
Other than the number of users, your post gives very little indication of likely server load from which this question can be answered.
How much data will the database store ("a ton of electrical products" is pretty vague)? What operations will need to be performed ("use some formulas ... to output several energy-related things"
is pretty vague)? What different classes of user will there be and what activities will they perform? How often will those activities write data to and read data from the database (e.g. write 10KiB, once a month; and read 10GiB, thousands of times per second)? Whilst you anticipate 20 users, will they all be active simultaneously, or will there typically only be one or two at any given time? How critical is the application (in terms of the reliability/performance required)?
Perhaps, for now, just install MySQL and see how you fare?
How do I set up the Database?
As in, how should you design the schema? This will depend upon the operations that you intend to perform. However, a good starting point might be a table of products:
CREATE TABLE products (
product_id SERIAL,
power INT UNSIGNED COMMENT 'watt usage',
brand VARCHAR(255),
year INT UNSIGNED,
color VARCHAR(15),
size INT UNSIGNED,
origin CHAR(2) COMMENT 'ISO 3166-1 country code'
);
Depending upon your requirements, you may then wish to create further tables and establish relationships between them.
For the gui client, I decided to take a look at having it be a window showing a webpage built with the sql embeded into php. Is this a good idea? Recommended reading for doing that? What are alternatives?
A web-based PHP application is certainly one option, for which you will find a ton of helpful free resources (tutorials, tools, libraries, frameworks, etc.) online. It also is highly portable (as virtually every device has a browser which will be able to interact with your application, albeit that ensuring smooth cross-browser compatibility and good cross-device user experience can be a bit painful).
There are countless alternatives, using virtually any/every combination of languages, runtime environments and architectures that you could care to mention: from Java servlets to native Windows applications, from iOS apps to everything in between, the choice is limitless. However, the best advice is probably to stick to that with which you are already most comfortable/familiar (provided that it can meet the functional requirements of the application).
What security measures would you recommend? Recommended reading?
This is another pretty open-ended question. If you are developing a web-app, I'd at very least make yourself aware of (how to defend against) SQL injection, XSS attacks and techniques for managing user account security. Each of these areas alone are quite large topics—and that's before one even begins to consider the security of the hosting platform or physical environment.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am about to create a PHP web project that will be consisting of a large database. The database will be MYSQL and will store more than 30000 records per day. To optimize the DB I thought to use MEMCACHED library with it. Am i going the correct way or some other alternative can be used to overcome the data optimization problem. I just want to provide faster retrieval and insertion. Can somebody advise me which tool should I use and how, as the data will gradually increase at a higher rate ? Should i use object relational mapping concept too ?
You can use Master & Slave technique for this purpose. Basically it would be combination of 2 db first for read operation and other for write operation.
I'd side with #halfer and say he's right about the test data. At least you'll know that you're not trying to optimize something that doesn't need optimizing.
On top of test data you'll also need some test scenarios to mimic the traffic patterns of your production environment, that's the hard part and really depends on the exact application patterns: how many reads versus writes versus updates / per second.
Given your number (30k) you'd average out at about 3 inserts / second which I'd assume even the cheapest machines could handle with ease. As for reads, a years worth of data would be just under 11M records. You may want to partition the data (mysql level or application level) if look ups become slow but I doubt you'd need to with such relatively small volumes. The real difference maker would be if the # of reads is 1000x more than the number of inserts, then you could look into what #ram sharma suggested and set up a replicated master-slave model where the master takes all the writes and the slaves are read-only.
Memcached is a powerful beast when used correctly and can turn a slow DB disk read into a blazing fast memory read. I'd still only suggest you look into it IF the DB is too slow. Adding moving parts to any application also adds potential failure points and increases the overall complexity.
EDIT: as for the use of an ORM, that's your choice and really won't change a thing concerning the DB's speed although it may add fractions of milliseconds to the end user.. usually worth it in my experience.
Cheers --