Writing a mySQL database from the ground up [closed]

Writing a mySQL database from the ground up [closed] - php

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I recently got an internship involving the writing of an sql database and don't really know where to start.
The database will hold the information of a ton of electrical products: including but not limited to watt usage, brand, year, color, size, country of origin. The database's main task is to use some formulas using the above information to output several energy-related things. I will also be building a simple gui for it. The database will be accessed solely on Windows computers by no more than 100 people, possibly around the area of 20.
The organization does not normally give internships, and especially not to programmers (they're all basically electrical engineers); I got it via really insisting to a relative that works there who was looking into how to organize some of the products they overlook. In other words, I can't really ask THEM for guidance on the matter since they're not programmers, which is the reason I headed here to get a feel of where I'm starting.
I digress -- my main concerns are:
What is some recommended reading or viewing for this? Tips, tricks?
How do I set up a server for this? What hardware/bandwidth/etc. do I require?
How do I set up the Database?
For the gui client, I decided to take a look at having it be a window showing a webpage built with the sql embeded into php. Is this a good idea? Recommended reading for doing that? What are alternatives?
What security measures would you recommend? Recommended reading?
I have: several versions of Microsoft's mySQL servers, classroom experience with mySQl and PHP, several versions of Visual Studio, access to old PCs for testing (up to and including switching operating systems, hardware, etc.), access to a fairly powerful PC (non-modifiable), unlimited bandwidth.
Any help would be appreciated, thanks guys!

What is some recommended reading or viewing for this? Tips, tricks?
I'd recommend spending quite a bit of time in the design stage, before you even touch a computer. Just grab some scrap paper and a pencil and start sketching out various "screens" that your UI might expose at various stages (from menus to inputs and outputs); show them to your target users and see if your understanding of the application fits with the functionality they expect/require; consider when, where, how and why they will access and use the application; refine your design.
You will then have a (vague) functional specification, from which you will be able to answer some of the further questions posed below so that you can start researching and identifying the technical specification: at this stage you may settle upon a particular architecture (web-based?), or certain tools and technologies (PHP and MySQL?). You can then identify further resources (tutorials?) to help progress toward implementation.
How do I set up a server for this? What hardware/bandwidth/etc. do I require?
Other than the number of users, your post gives very little indication of likely server load from which this question can be answered.
How much data will the database store ("a ton of electrical products" is pretty vague)? What operations will need to be performed ("use some formulas ... to output several energy-related things"
is pretty vague)? What different classes of user will there be and what activities will they perform? How often will those activities write data to and read data from the database (e.g. write 10KiB, once a month; and read 10GiB, thousands of times per second)? Whilst you anticipate 20 users, will they all be active simultaneously, or will there typically only be one or two at any given time? How critical is the application (in terms of the reliability/performance required)?
Perhaps, for now, just install MySQL and see how you fare?
How do I set up the Database?
As in, how should you design the schema? This will depend upon the operations that you intend to perform. However, a good starting point might be a table of products:
CREATE TABLE products (
product_id SERIAL,
power INT UNSIGNED COMMENT 'watt usage',
brand VARCHAR(255),
year INT UNSIGNED,
color VARCHAR(15),
size INT UNSIGNED,
origin CHAR(2) COMMENT 'ISO 3166-1 country code'
);
Depending upon your requirements, you may then wish to create further tables and establish relationships between them.
For the gui client, I decided to take a look at having it be a window showing a webpage built with the sql embeded into php. Is this a good idea? Recommended reading for doing that? What are alternatives?
A web-based PHP application is certainly one option, for which you will find a ton of helpful free resources (tutorials, tools, libraries, frameworks, etc.) online. It also is highly portable (as virtually every device has a browser which will be able to interact with your application, albeit that ensuring smooth cross-browser compatibility and good cross-device user experience can be a bit painful).
There are countless alternatives, using virtually any/every combination of languages, runtime environments and architectures that you could care to mention: from Java servlets to native Windows applications, from iOS apps to everything in between, the choice is limitless. However, the best advice is probably to stick to that with which you are already most comfortable/familiar (provided that it can meet the functional requirements of the application).
What security measures would you recommend? Recommended reading?
This is another pretty open-ended question. If you are developing a web-app, I'd at very least make yourself aware of (how to defend against) SQL injection, XSS attacks and techniques for managing user account security. Each of these areas alone are quite large topics—and that's before one even begins to consider the security of the hosting platform or physical environment.

Related

Business logic in oracle database or in PHP application [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I have a question for people who have experience working with Oracle and PHP. Please this is not to open a meaningless debate. I just want to make sure everything is considered correctly in my system.
I am working on a project in which there are thousands of users, divided into groups and sub groups. Each group has its different access rights and each subgroup has its own privileges.
I need to have your opinion about these two approaches:
Implementing access rights and privileges in PHP with one big
application user(oracle account),(I am clueless as to the advantages
and disadvantages of this approach).
Implementing access rights and privileges in Oracle database(each
user would be an Oracle account) and use the virtual private
database, caching, secure roles.... from a performance stand point
this is the best approach. Security! well I know it is good but I am
afraid I am missing good things not implementing it in PHP.
I did some research on the net but in vain(I scratched my head a lot). I am new to PHP but I have good knowledge about Oracle.
Any suggestions, Ideas?

As you say you're going to have 1000s of users, i assume your software is going to be used in a big company, which probably means there's not one IT department, but several of them - one providing managed basic hardware (OS level but no applications), another managing databases, and a third one putting it all together (hardware+os, database, application) and providing the managed application as a service for the end user. My decision might be heavily influenced by working for such a department for over 10 years now.
I've seen one application that used the "one database user per OS user" approach (VPM by Dassault Systems, an attachment to the Catia V4 CAD system - and it's a horror to maintain. You want to have your permissions in ONE place, not several. Possibly you have a nice GUI to change them, but when your user calls you saying "i can't do X", and your GUI says he should be able to do X, it's simply too tedious to search everywhere. We've had cases where views didn't have the access roles they should have, views were wrongly defined, users had permissions on some tables but not all of them, all that stuff.
Additionally, our database department has - at the moment - about 600 "databases" that are used by diffent departments. So they are running about 20 real "databases" on several clusters, and they have implemented quite a rigid scheme of database names and corresponding user names. Each database has a base name XXX, with XXX_A the user name DDL statements, and XXX_U for DML. Applications may ONLY use XXX_U, which also means applications may not do DDL. This allows the database admins, in case of load issues on the cluster, to move an entire schema, including all users, roles and tables, to a different instance on a different cluster easily, without knowing too much about the individual databases. Of course, our VPM database doesn't fit into that schema, so we had to argue with the DB people a lot - and our monthly charge by the DB department is much higher than normal, because they have much more trouble administrating it.
Bottom line: Don't expect you can do whatever you want within your database. In a large company, with a department managing the databases, you will have restrictions what your application is allowed to do and what it isn't.
Also, your management might, at one time, decide they want to move to a different database system like DB2 for political reasons. This has much less to do with technical advantages than with who's invited whom to golf. You might, at one time, be asked if your system could be moved to a different database, by people you don't want to say "no" to. I wouldn't want to be dependent on too specific oracle features in this case.
Also keep in mind that requirements change over time, and there might be new, more granular, requirements in a while. This strongly favours doing the permission stuff in software, because it's much easier to add another column to a permission table that specifies something new, instead of having to implement something new in a database permission structure that just isn't meant to handle this kind of thing.
If you were developing a native application that runs on individual users' PCs, using only one oracle account might be a big security hole. But as you're using PHP, it's only the server that's communicating with the DB, so noone can extract login information from userspace anyways.
In my opinion, create an api for permission management first. Do not use oracle users, groups and roles for that; instead, manage your permissions in some SQL tables. Create an api (a collection of check_if_user_X_may_do_Y functions), possibly in pl/sql if you feel more comfortable there, better in PHP if you want to be portable. Build your application on this API. It might be more dev work at the start, but will result (imho) in much less administration work later.

Although Guntram makes some very salient points, he has missed what I consider to be fairly essential considerations:
1) you describe a 3 tier model of authorization which the Oracle permissions model does not accomodate (although it is possible to represent this as a 2-tier model but at the cost of creating work and complexity).
2) using the user supplied credentials to authenticate against the database pretty much precludes the use of persistent database connections - which is rather important when your connection setup time is as expensive as it is with Oracle.
By all means store the representation of the groups/sub-groups/users/permissions in the database but not as Oracle users. But use a single (or small number) of oracle accounts for the access from PHP to the DBMS.
If you want to implement strong security, then you should also store mappings between sessions ids and users (not necessarily the session data itself) in the database, allow no direct access from the designated Oracle accounts to the data - only via stored procedures - and pass the session id as an authentication token to the stored procedure and validate it there (but note that this will be rather expensive if you are using trivial ORM).

Logistics and transportation planning techniques [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I'm actually developping a system for bus ticket reservation. The provider has many routes and different trips. I've setup a rather comprehensive database that maps all this together, but i'm having trouble getting the pathing algorithm working when i comes to cross route reservation.
For example, the user wants to go from Montreal to Sherbrooke, he'll only take what we call here Route #47. But in the event he goes to Sutton instead of Sherbrooke, he now has to transfer into route #53 at some point.
Now, it isn't too hard to detect one and only one transfer. But when i comes to detecting what are the options he can do to cross multiple routes, i'm kinda scared. I've devised a cute and relatively efficient way to do so on 1-3 hops using only SQL but i'm wondering how i should organize all this in a much broader spectrum as the client will probably not stay with 2 routes for the rest of is life.
Example of what i've thought of so far:
StartingStop
joins to Route
joins to StopsOfTheRoute
joins to TransfersOnThatStop
joins to TargetStopOfThatTransfer
joins to RouteOfThatStop
joins to StopsOfThatNewRoute
[wash rince repeat for more hops]
where StopsOFThatNewRoute = EndingStop
Problem is, if i have more than 3 hops, i'm sure my SQL server will choke rather fast under the pressure, even if i correctly index my database, i can easily predict a major failure eventually...
Thank you

My understanding of your problem: You are looking for an algorithm that will help you identify a suitable path (covering segments of one or more routes).
This is as Jason correctly points out, a pathfinding problem. For this kind of problem, you should probably start by having a look at Wikipedia's article on pathfinding, and then dig into the details of Djikstra's algorithm. That will probably get you started.
You will however most likely soon realise that your data model might pose a problem, both from a structure and performance perspective. Typical example would be if you need to manage time constraints: which path has the shortest travel time, assuming you find several? One path might be shortest, but only provide one ride per day, while another path might be longer but provide several rides per day.
A possible way of handling this is to create a graph where each node corresponds to a particular stop, at a particular time. An edge would connect from this stop in room-time to both other geographical stops, as well as itself at the next point in time.
My suggestion would be to start by reading up on the pathfinding algorithms, then revisit your data model with regards to any constraints you might have. Then, focus on the structure for storing the data, and the structure for seaching for paths.
Suggestion (not efficient, but could work if you have a sufficient amount of RAM to spare). Use the relational database server for storing the basics: stops, which routes are connected to which stops and so on. Seems you have this covered already. Then build an in-memory representation of a graph given the constraints that you have. You could probably build your own library for this pretty fast (I am not aware of any such libraries for PHP).
Another alternative could be to use a graph database such as Neo4j and its REST interface. I guess this will require significant some redesign of your application.
Hope this gives you some helpful pointers.

When introducing licensing to a web-based system, how should multiple instances be handled? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
We're taking our functional web-based software that's previously been exclusive to our business and introducing licensing options so that other businesses may use it.
What considerations should be taken into account when choosing between the two approaches:
Modify the code to permit multiple users
Install multiple instances of the code; one for each new user. E.G. completely duplicated, separate databases & PHP.
The software is PHP-based. We intend to offer multiple packages. Server load grows quadratically with increased use per license, due to large amounts of processing that occurs through scheduled cron jobs.
Update: despite the only answer suggesting we should not do this, we are still leaning toward modifying the code to permit multiple users. Does anyone else have any input?
Update 2: for the security reasons, we again changing our position to the multiple-instances solution.

Having done this myself in the last several months, My advice is don't do what we did, which is modify the code to permit multiple users. Turns out that's a rabbit hole and will introduce:
code complexity (adding new simple features will often become difficult)
bugs (due to increased complexity)
security problems (a huge amount of time was spent ensuring clients cannot access each other's data)
performance issues (tables with ~5,000 rows will suddenly grow to ~5,000,000 rows. Performance issues that weren't even noticeable suddenly created ~20 second page load times)
If we could do it again our approach would be something like:
Put each client on a subdomain (maybe even allow them to supply their own full domain name), allowing you to have a separate apache virtual host for each one. Buying a license to something like cPanel is worth serious consideration, and investigate how to automate or semi-automate creating new accounts.
Have a separate database for each client. Each with a different database password. This will provide excellent security and excellent performance (all the databases (and their tables) will be small).
It's up to you whether the actual php source code should be shared between all of these clients, or have a separate copy for each one. A global directory for the files is perfectly reasonable, and will make updates easy, while a separate copy will make customisations easier. Perhaps a hybrid is the right approach here.
One day we might even end up tearing out most of the work done in the last six months, to start again with this approach.
At first glance it seems like this will increase server load, but in reality if you have enough clients for load to even be a consideration, then you will want to be able to spread clients across multiple servers. And that's a piece of cake if it's well segregated.

Learning about MySql for large database/tables?

I've been working on a new site of mine for a couple of days now which will be retrieving almost all of its most used content from a MySql database. Seeming as the Database and website is still under development the tables are really small at the moment and speed is of no concern yet.
But you know what they say, a little bit of hard work now saves you a headache later on.
Now I'm only 17, the only database I've ever been taught was through Microsoft Access, and we were practically given the database completed - we learned up to 3NF, but that was about it.
I remember reading once when I was looking to pull data (randomly) out of a database how large databases were taking several seconds/minutes to complete a single query, so this just got me thinking. In a fraction of a second I can submit a search to google, google processes the query and returns the result, and then my browser renders it - all done in the blink of an eye. And google has billions of records to search through. And they're also doing this for millions of users simultaneously.
I'm thinking, how do they do it? I know that they have huge data centers, but still.
I realize that it probably comes down to the design of the database, how it's been optimized, and obviously the configuration. And I guess that's my question really. Could someone please tell me how to design high performance databases for millions/billions of rows (yes, I'm being optimistic), and possibly point me towards some good reading material to help me learn further?
Also, all my queries are done via PHP, if that's at all relevant to any answers.

The blog http://highscalability.com/ has some good articles and pointers to how companies handle large problems.
Specifically related to MySQL, you can Google for craigslist.org's use of MySQL.
http://www.slideshare.net/jzawodn/mysql-and-search-at-craigslist

First the good news... MySQL scales well (depending on the hardware) to at least hundreds of millions of rows.
Once you get to a certain point, a single database server will have trouble managing the load. That's when you get into the realm of partitioning or sharding... spreading the load across multiple database servers using any one of a number of different schemes (e.g. putting unrelated tables on different servers, spreading a single table across multiple servers e.g. by using the ID or date range as a partitioning key).
SQL does shard, but is not fundamentally designed to shard well. There's a whole category of storage alternatives collectively referred to as NoSQL that are designed to solve that very problem (MongoDB, Cassandra, HBase are a few).
When you use SQL at very large scale, you run into any number of issues such as making data model changes across a DB server farm, trouble keeping up with data backups, etc. That's a very complex topic, and people that solve it well are rare. For a glimpse at the issues, have a look at http://gigaom.com/cloud/facebook-trapped-in-mysql-fate-worse-than-death/
When selecting a database platform for a specific project, benchmark the solution early and often to understand whether or not it will meet the performance requirements that you envision. Having a framework to do that will help you learn about scalability, and will help you decide whether to invest effort in improving the data storage part of your solution, and will help you know where best to invest your time.

No one can tell how to design databases. It comes after much reading and many hour working on them. A good design is product of many many years doing them though. As you've only seen Access you got no knowledge of databases. Search through Amazon.com and you'll get tons of titles. For someone that's starting, anyone will do it.
I mean no disrespect. I've been there and I'm also tutor of some people learning programming/database design. I do know that there's no silver bullet or shortcuts for the work you have ahead.
If you intend to work with high performance database, you should have something in mind. The design of them in per application. A good design depends on learning more and more how the app's users interact with the system, the usage patterns, etc. The things you'll learn from books will give you options, using them will depend heavily on the scenario.
Good luck!

It doesn't all come down to the design of the database, though that is indeed a big part of it. The guys who made Google are geniouses, and if I'm not completely wrong about Google you won't be able to find out exactly how they do what they do. Also, I know that years back they had more than 10,000 computers processing queries, and today they probably have many more. I also suspect them for caching most of the recent/popular keywords. And all the websites have been indexed and analyzed using an unknown algorithm which will make sure the computers don't have to look through all the words on every page.
In fact, Google crawls the entire internet around every 14 days, so when you do a search you do not search the entire internet. Your search gets broken down into keywords and then these keywords are used to narrow the number of relevant pages - and I'm pretty sure all pages have already been analyzed for important and/or relevant keywords before you even thought of visiting google.com.
Have a look at this question.

Have a look into Sphinx server.
http://sphinxsearch.com/
Craigslist uses that for their search engine. Basically, you give it a source and it indexes whatever you want (mysql database/table, text files, etc.). If it works for craigslist, it should work for you.

data system design

Need some ideas/help on best way to approach a new data system design. Basically, the way this will work is there will be a bunch of different database/tables that will need to be updated on a regular (daily/weekly/monthly) basis with new records.
The people that will be imputing the data will be proficient in excel. The input process will be done via a simple upload form. Then the system needs to add what was imported to the existing data in the databases. There needs to be a "rollback" process that'll reset the database to any day within the last week.
There will be approximatively 30 to 50 different data sources. the main primary interface will be an online search area area. so all of the records need to be indexed/searchable.
Ideas/thoughts on how to best approach this? It needs to be built mostly out of php/mysql.

imputing the data
Typo?
What you are asking takes people with several years formal training to do. Conventionally, the approach would be to draw up a set of requirements, then a set of formal specifications, then the architecture of the system would be designed, then the data design, then the code implementation. There are other approaches which tend to shortcut this. However even in the case of a single table (although it does not necessarily follow that one "simple upload form" corresponds to one table), with a single developer there's a couple of days work before any part of the design could be finalised, the majority of which is finding out what the system is supposed to do. But you've given no indication of the usage nor data complexity of the system.
Also what do you mean by upload? That implies they'll be manipulating the data elsewhere and uploading files rather than inputting values directly.
You can't adequately describe the functionality of a complete system in a 9 line SO post.
You're unlikely to find people here to do your work for free.
You're not going to get the information you're asking for in a S.O. answer.
You seem to be struggling to use the right language to describe the facts you know.
Your question is very vague.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.